Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable occurrence search by IUCN redlist category #257

Closed
timrobertson100 opened this issue Mar 19, 2020 · 47 comments
Closed

Enable occurrence search by IUCN redlist category #257

timrobertson100 opened this issue Mar 19, 2020 · 47 comments
Assignees

Comments

@timrobertson100
Copy link
Member

Use the IUCN Red List API to add an IUCN Red List category field to every record.
Enable this as a search filter and include this in both simple and DwC-A download files.

@fmendezh
Copy link
Contributor

@timrobertson100 I assume that this is the ws we should use (match by name)https://apiv3.iucnredlist.org/api/v3/docs#species-individual-name
@muttcg maybe this is another candidate for a kv-store cache?

@timrobertson100
Copy link
Member Author

timrobertson100 commented Mar 19, 2020

@andrewrodrigues will confirm. He is our IUCN liaison and requested this (and the other IUCN ones I just logged while we were on a call)

I believe that is the correct one, so extracting the category from this example

@timrobertson100
Copy link
Member Author

When establishing the IUCN API Token please use [email protected] or similar (@MattBlissett - your preference?) and not a personal email address.

@andrewrodrigues
Copy link

Yes, that would the ws. There may be some taxonomical issues as we do not have an up to date red list published but I am working on that

@andrewrodrigues
Copy link

Is there a reason why the threat status is coming from Wikidata and but the source link takes you through to the red list page? Can you confirm that we will be getting the threat statuses through their API? There was a suggestion from IUCN to automate the process of IUCN checklist update from our side through their - Marie had said this was possible - and at the same time retrieve the threat statuses of the species

@andrewrodrigues
Copy link

The scale we use for the showing the red list is from Wikidata and is not the correct branding that IUCN uses. This will need to be updated.

@MattBlissett
Copy link
Member

(@andrewrodrigues, please try and keep the discussion on this issue, and open new issues as required -- it's very confusing tracing a topic through 3-4 different issues! https://github.com/search?type=Issues&state=open&q=org%3Agbif+IUCN is all GBIF issues with "IUCN" mentioned, and https://github.com/issues/mentioned those where you are mentioned, and https://github.com/issues/assigned those where you have been assigned. The latter two come via "Issues" at the top of the page.)

Yes, that would the ws. There may be some taxonomical issues as we do not have an up to date red list published but I am working on that

I have made an issue for this, gbif/data-mobilization#175.

Is there a reason why the threat status is coming from Wikidata and but the source link takes you through to the red list page?

Their API wasn't fast / reliable enough. See gbif/portal16#1322 (comment)

Can you confirm that we will be getting the threat statuses through their API?

Receiving it as a published dataset is preferable, based on our past experience with their API. Building it from their API, and "publishing" it ourselves, is probably still better than using an unreliable API -- it's fast, and the origin of the data is easier to trace.

@MattBlissett MattBlissett self-assigned this Sep 10, 2020
@MattBlissett
Copy link
Member

The IUCN Red List data is now available on GBIF: https://doi.org/10.15468/0qnb58

We can now use GBIF APIs to interpret the Red List status of occurrences in GBIF. (However, I think there was another discussion on whether we should enable a filter for (e.g.) all endangered species seen within the last week, and I do not remember the conclusion.)

@MattBlissett MattBlissett removed their assignment Jan 15, 2021
@timrobertson100
Copy link
Member Author

timrobertson100 commented Feb 8, 2021

With this now in place, I think we can proceed as follows:

  1. Add a service in the checklistbank API that allows retrieval of the IUCN Red List category (note it is a global not a local RL) for a species.
  2. Add the category to all occurrence records which may be null (suggest name it occurrence.iucnRedListCategory) by looking up using the service (presume a cache will be used)
  3. Add capability to search and download occurrences using the IUCN category, and include the field in the DwC-A downloads

This does have the caveat that this is an assignment of the category based on global and not local red lists i.e. all occurrence records will carry the global red list category regardless of any local red list process.

@andrewrodrigues - can you please confirm this is as agreed?

@fmendezh fmendezh self-assigned this Feb 8, 2021
@andrewrodrigues
Copy link

Yes, this is correct. Would be good to set this up in UAT first before gbif.org to test the functionality and capture any concerns before it is rolled out.

fmendezh added a commit to gbif/checklistbank that referenced this issue Feb 9, 2021
fmendezh added a commit to gbif/checklistbank that referenced this issue Feb 9, 2021
fmendezh added a commit to gbif/checklistbank that referenced this issue Feb 9, 2021
@MattBlissett
Copy link
Member

@mdoering, @fmendezh, @andrewrodrigues, @ahahn-gbif

I had a quick look at the suggested change to Checklistbank, but I think we need a bit more logic around synonymy -- or at least, to discuss whether we need to handle it, and work out where we explain any differences between the Red List and the GBIF Backbone.

For example, our occurrences of Loxodonta cyclotis (African elephant) should be marked as vulnerable, but the IUCN name is a synonym (without a threat status). It's IUCN Loxodonta africana (African elephant) that is accepted in the IUCN Red List, and has the threat status.

There's also the situation where the occurrence matches to a synonym backbone name but accepted backbone name has an IUCN name with a RL status.

@mdoering
Copy link
Member

Solving the synonym issues properly is difficult as this is exactly the reason why we need to work with taxon concepts, not just names as we do now. It makes a real difference which threat status a species has if the name was split and there is a broader and a narrower concept, i.e. more or less included individual.

Nevertheless, knowing we don't show the right status in some cases, how about:

  • we always resolve any IUCN synonym to its accepted name in the CLB service and take the threat status from there
  • we pass in the accepted GBIF name if the occurrence is linked to a synonym?

mdoering added a commit to gbif/checklistbank that referenced this issue Feb 10, 2021
fmendezh added a commit to gbif/key-value-store that referenced this issue Feb 10, 2021
fmendezh added a commit that referenced this issue Feb 10, 2021
marcos-lg pushed a commit to gbif/key-value-store that referenced this issue Feb 11, 2021
* Adding the IUCN RedList Category to the name match cache
gbif/pipelines#257

* Checking possible null values in the NameUsageMatch response and in WS responses
muttcg pushed a commit that referenced this issue Feb 11, 2021
#257 Adding IUCN RedList Category to taxon record, hdfs record, and Elasticsearch record and schema
@timrobertson100
Copy link
Member Author

Aside:

@andrewrodrigues verified with the IUCN that the name of this field internally, in downloads, in the API, and on the web should be IUCN Red List Category.

@MattBlissett
Copy link
Member

Here's a real example:

Goniastrea deformis is a synonym in GBIF's backbone: https://www.gbif.org/species/2260144 (and has occurrences)

But in the IUCN Red List, it is an accepted name with Vulnerable status: https://www.gbif.org/species/176597529

@fmendezh
Copy link
Contributor

I agree with @mdoering that the best approach is to use the nubKey to assess the threat status and not use the accepted name. If we follow this approach the JSON response can stay as it is with only the threat status name and code.

fmendezh added a commit to gbif/key-value-store that referenced this issue Feb 15, 2021
Using the nubKey to get the IUCN Red List category, the accepted name is not being used toa void misleading results after mixing multiple checklists
@mdoering
Copy link
Member

If we follow this approach the JSON response can stay as it is with only the threat status name and code.

We could return the accepted name according to IUCN - that might be informative to users.

@andrewrodrigues
Copy link

Thinking through what information you would want to to retrieve from a search. if you searched a country for all globally critically endangered species, would you

  1. Retrieve all occurrences where the taxon assessed by IUCN matches the accepted name in GBIF?
  2. Retrieve occurrences where the taxon assessed by IUCN is considered a synonym by GBIF? In this case, the assessment associated to the GBIF accepted names should be flagged as assessed as the IUCN taxonomic concept.
  3. Retrieve occurrences where GBIF accepted names are considered synonyms of the taxon assessed by IUCN. In this case, the threat category for GBIF synonyms should be flagged as being assessed as the IUCN taxonomic concept.

Would this be the case with the approach above?

One problem I can see with this approach is where species have been lumped and there have been assessments of all the previous species but not of the new species. In this instance we might have several assessments associated to one accepted taxon in GBIF. Not sure how to get around that.

@mdoering
Copy link
Member

mdoering commented Feb 16, 2021

I would ignore the GBIF taxonomy and try to lookup the GBIF name in IUCN and evaluate the redlist category according to IUCNs taxonomy as this is what the assessment is based on. In order to show that the assessed IUCN name was a different one we should return the accepted IUCN name from the API - as well as the exact scientific name spelling used in IUCN and possibly the taxonomic status & rank, as we are not only dealing with species.

/species/2435349/iucnRedList
{
  "category": "VULNERABLE",
  "code": "VU",
  "scientificName: "Loxodonta cyclotis Matschie, 1900",
  "usageKey: 176661685,
  "taxonomicStatus: "synonym",
  "acceptedName: "Loxodonta africana (Blumenbach, 1797)",
  "acceptedUsageKey: 176661683
}

For anything more accurate we would need a better handle on taxon concepts.

By including the redlist in the backbone sources we also make sure that all names in IUCN are included in the backbone and we can do the name matching from occurrences in all cases. Right now we get 91% overlap only, i.e. there are some IUCN names that we miss.

@andrewrodrigues
Copy link

This sounds like a good approach, and I am assuming that with the red list in backbone it would then also be able to pick up examples like this ?
Paragoniastrea deformis is an accepted name in GBIF's backbone with no matching name in IUCN : https://www.gbif.org/species/7813935 (and has occurrences). However, there is an assessment for it´s synonym goniastrea deformis assessed as Vulnerable : https://www.gbif.org/species/176597529

@timrobertson100
Copy link
Member Author

timrobertson100 commented Feb 16, 2021

If we followed this approach, then In this case all 14 records are provided with a scientific name of G. deformis and so yes they would pick up vulnerable. If they had been supplied with P. deformis as the scientific name, then no, they would not get a category.

The nice thing about ignoring the GBIF Backbone would be that it is a defensible approach, but may in some cases miss things that arguably could be inferred.

@andrewrodrigues
Copy link

Thanks Tim for explaining. This definitely seems like the best approach.

djtfmartin pushed a commit that referenced this issue Mar 1, 2021
@andrewrodrigues
Copy link

Not sure why the filter was taken off the UAT environment. Can this filter be introduced again with a view to making it available outside of the UAT environment?

@MortenHofft
Copy link
Member

It was removed from the UAT website because UAT is used for many other tests than IUCN. Do you need it to be public or would a private test environment be sufficient?

@andrewrodrigues
Copy link

It would be good to make the filter public.

@MattBlissett
Copy link
Member

Note also the related issue here: #495.

There's also a more recent version of the IUCN Red List. I will update our imported version.

@mdoering
Copy link
Member

Is it possible to automate the redlist dwca generation and build it regularly?

@andrewrodrigues
Copy link

There are limitations on the size of the calls we can make on the API which may limit the automation of red list updates, will leave it with Matt for more details. Regarding the periodicity of updates, there is no fixed schedule of red lists updates with the list updated regularly on continual basis. I would suggest updates every 6 months unless of course there are large updates that we know are coming up such as for the Congress where we might want additional updates

@MattBlissett
Copy link
Member

We weren't able to use the API, as the bulk download we need to use isn't available that way, and downloading the whole list through the API isn't practical.

I think they mark a release every 6 months or so as a new version. The current version is 2021-2, which is shown in the API: https://apiv3.iucnredlist.org/api/v3/version . I can set up a monitor to detect what that number increases, and prompt someone (me) to download the new data.

@mdoering
Copy link
Member

How do you download the data, can that be scripted? For ITIS or NCBI we also check for newly available db dumps and then process them into a DwCA. Maybe we can integrate IUCN also into https://github.com/mdoering/checklist_builder?

@MattBlissett
Copy link
Member

It is already there: https://github.com/mdoering/checklist_builder/blob/master/src/main/java/de/doering/dwca/iucn/ArchiveBuilder.java#L82-L104 (and I just requested new downloads).

Requesting a download from the IUCN site requires several steps in the browser, agreeing to various terms and conditions etc.

@MattBlissett
Copy link
Member

I've added the 2021-2 version to https://hosted-datasets.gbif.org/datasets/iucn/ and updated the endpoint in the registry.

We'll need to reinterpret all occurrences before we launch the feature, but it probably doesn't make much difference while we have the other issues around synonymy and data security.

@MattBlissett
Copy link
Member

The other pending issue is the #496 interpretation of aff. species.

@andrewrodrigues
Copy link

andrewrodrigues commented Dec 10, 2021

The Red List has juts been updated and we will have to update the categories via the IUCN API. The IUCN filter should show apply a NE - Not Evaluated - category to those species that have not been assessed by IUCN. This category is not applied systematically by IUCN and thus not available through their API, we will need to apply this category to all species with no IUCN assessment @MattBlissett

@andrewrodrigues
Copy link

andrewrodrigues commented Dec 10, 2021

The categories within the filter should be ordered and capitalised as per the following diagram https://www.iucnredlist.org/about/faqs#What%20are%20the%20Red%20List%20Categories%20and%20Criteria

@MattBlissett
Copy link
Member

I've moved the NE change to a new issue, so we can close this one.

The ordering is already as on that diagram, and the capitalization is consistent with the other filters (GBIF style) ­-- but that's up to @kcopas / @dnoesgaard anyway.

@kcopas
Copy link
Member

kcopas commented Jan 13, 2022

I'm looking for those categories' strings in CrowdIn without success. Can someone direct my attention to their whereabouts?

@MattBlissett
Copy link
Member

https://crowdin.com/translate/gbif-portal-en/787/endk-en?filter=basic&value=3 I think.

@kcopas
Copy link
Member

kcopas commented Jan 13, 2022

wow, just couldn't see the tree for the forest—thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants