
Clean GBIF Occurrence Records
clean_gbif.RdThis function cleans species occurrence records downloaded using the
download_gbif_records() function from this package. It applies filters based on
coordinate precision, uncertainty, and geographic buffers, removes records with low precision
or uncertainty, excludes records near country, capital, and zoo/herbaria centroids using methods
from the CoordinateCleaner package, and eliminates duplicates based on longitude,
latitude, speciesKey, and datasetKey. Additionally, it removes records
that do not contain species or year information.
Usage
clean_gbif(
df,
coordinatePrecision = NULL,
coordinateUncertaintyInMeters = NULL,
buffer_centroid_countries = NULL,
buffer_centroid_capitals = NULL,
buffer_centroid_zoo_herbaria = NULL,
decimalLongitude = "decimalLongitude",
decimalLatitude = "decimalLatitude",
speciesKey = "speciesKey",
datasetKey = "datasetKey",
year = "year",
species = "species",
duplicate = FALSE
)Arguments
- df
A data frame containing species occurrence records, typically downloaded using the
download_gbif_records()function from this package.- coordinatePrecision
Numeric. Remove records below this precision.
- coordinateUncertaintyInMeters
Numeric. Remove records below this uncertainty or specific common values (301, 3036, 999, 9999).
- buffer_centroid_countries
Numeric. Buffer size for removing records near country centroids.
- buffer_centroid_capitals
Numeric. Buffer size for removing records near capital centroids.
- buffer_centroid_zoo_herbaria
Numeric. Buffer size for removing records near zoo or herbaria centroids.
- decimalLongitude
String. Column name for longitude. Default is "decimalLongitude".
- decimalLatitude
String. Column name for latitude. Default is "decimalLatitude".
- speciesKey
String. Column name for species key. Default is "speciesKey".
- datasetKey
String. Column name for dataset key. Default is "datasetKey".
- year
String. Column name for the year of occurrence. Default is "year".
- species
String. Column name for the species. Default is "species".
- duplicate
Logical. Default is FALSE. Whether or not to remove duplicates that have the same longitude, latitude, speciesKey, and datasetKey.
Details
The function uses the following methods from the CoordinateCleaner package:
cc_cen(): Removes records near country centroids.cc_cap(): Removes records near capital centroids.cc_inst(): Removes records near zoo or herbaria centroids.
Examples
# Example usage
df <- download_gbif_records(...)
#> Error: '...' used in an incorrect context
cleaned_df <- clean_gbif(df, coordinatePrecision = 0.01,
coordinateUncertaintyInMeters = 100,
buffer_centroid_countries = 10,
buffer_centroid_capitals = 10,
buffer_centroid_zoo_herbaria = 10,
decimalLongitude = "decimalLongitude",
decimalLatitude = "decimalLatitude",
speciesKey = "speciesKey",
datasetKey = "datasetKey",
year = "year",
species = "species")
#> Error in clean_gbif(df, coordinatePrecision = 0.01, coordinateUncertaintyInMeters = 100, buffer_centroid_countries = 10, buffer_centroid_capitals = 10, buffer_centroid_zoo_herbaria = 10, decimalLongitude = "decimalLongitude", decimalLatitude = "decimalLatitude", speciesKey = "speciesKey", datasetKey = "datasetKey", year = "year", species = "species"): Missing required columns: decimalLongitude, decimalLatitude, speciesKey, datasetKey, year, species