Skip to contents

This function cleans species occurrence records downloaded using the download_gbif_records() function from this package. It applies filters based on coordinate precision, uncertainty, and geographic buffers, removes records with low precision or uncertainty, excludes records near country, capital, and zoo/herbaria centroids using methods from the CoordinateCleaner package, and eliminates duplicates based on longitude, latitude, speciesKey, and datasetKey. Additionally, it removes records that do not contain species or year information.

Usage

clean_gbif(
  df,
  coordinatePrecision = NULL,
  coordinateUncertaintyInMeters = NULL,
  buffer_centroid_countries = NULL,
  buffer_centroid_capitals = NULL,
  buffer_centroid_zoo_herbaria = NULL,
  decimalLongitude = "decimalLongitude",
  decimalLatitude = "decimalLatitude",
  speciesKey = "speciesKey",
  datasetKey = "datasetKey",
  year = "year",
  species = "species",
  duplicate = FALSE
)

Arguments

df

A data frame containing species occurrence records, typically downloaded using the download_gbif_records() function from this package.

coordinatePrecision

Numeric. Remove records below this precision.

coordinateUncertaintyInMeters

Numeric. Remove records below this uncertainty or specific common values (301, 3036, 999, 9999).

buffer_centroid_countries

Numeric. Buffer size for removing records near country centroids.

buffer_centroid_capitals

Numeric. Buffer size for removing records near capital centroids.

buffer_centroid_zoo_herbaria

Numeric. Buffer size for removing records near zoo or herbaria centroids.

decimalLongitude

String. Column name for longitude. Default is "decimalLongitude".

decimalLatitude

String. Column name for latitude. Default is "decimalLatitude".

speciesKey

String. Column name for species key. Default is "speciesKey".

datasetKey

String. Column name for dataset key. Default is "datasetKey".

year

String. Column name for the year of occurrence. Default is "year".

species

String. Column name for the species. Default is "species".

duplicate

Logical. Default is FALSE. Whether or not to remove duplicates that have the same longitude, latitude, speciesKey, and datasetKey.

Value

A cleaned data frame with species occurrences.

Details

The function uses the following methods from the CoordinateCleaner package:

  • cc_cen(): Removes records near country centroids.

  • cc_cap(): Removes records near capital centroids.

  • cc_inst(): Removes records near zoo or herbaria centroids.

Examples

# Example usage
df <- download_gbif_records(...)
#> Error: '...' used in an incorrect context
cleaned_df <- clean_gbif(df, coordinatePrecision = 0.01,
                         coordinateUncertaintyInMeters = 100,
                         buffer_centroid_countries = 10,
                         buffer_centroid_capitals = 10,
                         buffer_centroid_zoo_herbaria = 10,
                         decimalLongitude = "decimalLongitude",
                         decimalLatitude = "decimalLatitude",
                         speciesKey = "speciesKey",
                         datasetKey = "datasetKey",
                         year = "year",
                         species = "species")
#> Error in clean_gbif(df, coordinatePrecision = 0.01, coordinateUncertaintyInMeters = 100,     buffer_centroid_countries = 10, buffer_centroid_capitals = 10,     buffer_centroid_zoo_herbaria = 10, decimalLongitude = "decimalLongitude",     decimalLatitude = "decimalLatitude", speciesKey = "speciesKey",     datasetKey = "datasetKey", year = "year", species = "species"): Missing required columns: decimalLongitude, decimalLatitude, speciesKey, datasetKey, year, species