Skip to contents

This function checks for missing values in a specified column of a data frame that contains years. It also ensures that the values are numeric and fall within a valid range. If transform_numeric is TRUE, the function attempts to convert non-numeric values to numeric.

Usage

check_year_column(
  df,
  col_year,
  year_range = c(1800, 2024),
  transform_numeric = TRUE
)

Arguments

df

A data frame containing the data to be checked.

col_year

The name of the column in the data frame that contains the year values.

year_range

A numeric vector of length 2 specifying the valid range of years (e.g., c(1900, 2024)).

transform_numeric

Logical. If TRUE, attempts to convert non-numeric year values to numeric. Defaults to TRUE.

Value

A list containing:

  • missing_values: Indices of missing values in the year column.

  • invalid_years: Indices of values that fall outside the valid year range.

  • updated_df: A data frame with the updated year values and a flag indicating whether the values were transformed.

Examples

df <- data.frame(year = c("2001", "2005", NA, "two thousand and ten", "2018", "2050"))
result <- check_year_column(df, "year", year_range = c(1800, 2024))
#> Error in check_year_column(df, "year", year_range = c(1800, 2024)): The column could not be converted to numeric.