-
Notifications
You must be signed in to change notification settings - Fork 61
Open
Description
Hello!
OTN is a great project, thank you all for it.
This issue aims to document a possible error in the "resolved" categorization.
While using the dataset, Thiago @thiago-goncalves-souza and I noticed a possible categorization error on the try
dataset (https://opentraits.org/datasets/try).
If we filter OTN to get only rows that are from the try
dataset AND Animalia Kingdom (resolveKingdomName == "Animalia"
), we get more than 5k rows.
# download data from
# https://github.com/open-traits-network/otn-taxon-trait-summary/blob/main/traits.csv.gz
otn_raw <-
readr::read_csv("traits.csv")
otn_dataset_try <- otn_raw |>
# filter only the animal kingdom
dplyr::filter(resolveKingdomName == "Animalia") |>
dplyr::filter(datasetId == "https://opentraits.org/datasets/try")
dplyr::glimpse(otn_dataset_try)
# Rows: 5,311
# Columns: 31
# $ taxonIdVerbatim <chr> "1669", "1669", "1669", "1669", "1669", "1…
# $ scientificNameVerbatim <chr> "Agathis philippinensis", "Agathis philipp…
# $ resolvedTaxonId <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ resolvedTaxonName <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ parentTaxonId <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ family <chr> "Araucariaceae", "Araucariaceae", "Araucar…
# $ phylum <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ traitIdVerbatim <dbl> 37, 3400, 759, 98, 3401, 43, 22, 17, 4, 38…
# $ traitNameVerbatim <chr> "Leaf phenology type", "Plant growth form …
# $ bucketId <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ bucketName <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ counts <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ datasetId <chr> "https://opentraits.org/datasets/try", "ht…
# $ numberOfRecords <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 3, …
# $ curator <chr> "https://opentraits.org/members/brian-s-ma…
# $ accessDate <date> 2022-08-19, 2022-08-19, 2022-08-19, 2022-…
# $ comment <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ relationName <chr> "HAS_ACCEPTED_NAME", "HAS_ACCEPTED_NAME", …
# $ resolvedExternalId <chr> "COL:6635V", "COL:6635V", "COL:6635V", "CO…
# $ resolvedName <chr> "Agathis philippinensis", "Agathis philipp…
# $ resolvedRank <chr> "species", "species", "species", "species"…
# $ resolvedCommonNames <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ resolvedPath <chr> "Biota | Animalia | Arthropoda | Insecta |…
# $ resolvedPathIds <chr> "COL:5T6MX | COL:N | COL:RT | COL:H6 | COL…
# $ resolvedPathNames <chr> "unranked | kingdom | phylum | class | ord…
# $ resolvedExternalUrl <chr> "https://www.catalogueoflife.org/data/taxo…
# $ resolveKingdomName <chr> "Animalia", "Animalia", "Animalia", "Anima…
# $ resolvedPhylumName <chr> "Arthropoda", "Arthropoda", "Arthropoda", …
# $ resolvedFamilyName <chr> "Braconidae", "Braconidae", "Braconidae", …
# $ providedTraitName <chr> "Leaf phenology type", "Plant growth form …
# $ resolvedTraitName <chr> "Phenology", "Morphology", "UNCATEGORIZED_…
But some of the traits seems like they are from plants:
otn_dataset_try |>
dplyr::count(datasetId,
resolveKingdomName,
providedTraitName,
sort = TRUE) |>
head()
datasetId | resolveKingdomName | providedTraitName | n |
---|---|---|---|
https://opentraits.org/datasets/try | Animalia | Plant growth form | 482 |
https://opentraits.org/datasets/try | Animalia | Leaf type | 257 |
https://opentraits.org/datasets/try | Animalia | Leaf compoundness | 255 |
https://opentraits.org/datasets/try | Animalia | Plant woodiness | 255 |
https://opentraits.org/datasets/try | Animalia | Leaf phenology type | 178 |
https://opentraits.org/datasets/try | Animalia | Leaf area (in case of compound leaves: leaflet | 161 |
Here are some of the most frequent categories that appear in resolvedPhylumName/resolvedName from this query:
otn_dataset_try |>
dplyr::count(datasetId,
resolveKingdomName,
resolvedPhylumName,
resolvedName,
sort = TRUE) |>
head()
datasetId | resolveKingdomName | resolvedPhylumName | resolvedName | n |
---|---|---|---|---|
https://opentraits.org/datasets/try | Animalia | Mollusca | Ficus | 162 |
https://opentraits.org/datasets/try | Animalia | Chordata | Salix | 118 |
https://opentraits.org/datasets/try | Animalia | Arthropoda | Eugenia | 117 |
https://opentraits.org/datasets/try | Animalia | Arthropoda | Inga | 117 |
https://opentraits.org/datasets/try | Animalia | Arthropoda | Viola | 94 |
https://opentraits.org/datasets/try | Animalia | Chordata | Phyllanthus | 88 |
Metadata
Metadata
Assignees
Labels
No labels