Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track dataset DOIs and their references #100

Open
MathewBiddle opened this issue Jan 10, 2025 · 7 comments
Open

Track dataset DOIs and their references #100

MathewBiddle opened this issue Jan 10, 2025 · 7 comments

Comments

@MathewBiddle
Copy link
Contributor

What should we add?

@7yl4r started some great work trying to track DOI citation counts. To get a sense of how IOOS data are used and referenced, it would be interesting to see if this process works for dataset DOIs.

Through OBIS (https://obis.org/institute/23070) we can get dataset DOI's from GBIF in an automated way. That would probably be a good start for now.

Other dataset DOIs would probably be a manually curated list from NCEI as I don't think the RAs are minting dataset DOIs.

Reference

@MathewBiddle
Copy link
Contributor Author

from ioos_metrics import ioos_metrics

#| code-summary: fetch the latest mbon data
stats_df = ioos_metrics.mbon_stats()

stats_df['doi'].unique().tolist()
['10.15468/fxjpbr',
 '10.15468/rvlbkl',
 '10.15468/sqpu7z',
 '10.15468/hczoxz',
 '10.15468/nvvifc',
 '10.15468/yg2y7v',
 '10.15468/mrxge2',
 '10.15468/3ifpo0',
 '10.6073/pasta/d302929b97723a1425364e1a19efbf55',
 '10.15468/buqg4u',
 '10.15468/ykbf7p',
 '10.15468/yvgzjt',
 '10.15468/rdkfyf',
 '10.15468/gaekez',
 '10.15468/nuqkih',
 '10.15468/uzpt9m',
 '10.15468/g8q8ey',
 '10.15468/dwxlan',
 '10.15468/quejlc',
 '10.15468/419say',
 '10.15468/zq1ep2',
 '10.15468/es1iso',
 '10.15468/kfnaep',
 '10.15468/1pyhh5',
 '10.15468/dple14',
 '10.15468/pcikkj',
 '10.15468/rexjmu',
 '10.15468/jlkkrw',
 '10.15468/t0r3vt',
 '10.15468/dfyb57',
 '10.15468/ranfdh',
 '10.15468/6chrsz',
 '10.15468/7dnpl0',
 '10.15468/84ntea',
 '10.15468/oomxex',
 '10.15468/7zofww',
 '10.15468/vnvtmr',
 '10.15468/06aqle',
 '10.15468/adis7b',
 '10.15468/tnn5ra',
 '10.25494/p6js3m']

@ocefpaf
Copy link
Member

ocefpaf commented Jan 10, 2025

@7yl4r started some great work trying to track DOI citation counts.

Tylar, what are you using to do this? Any existing library or something you create?

@7yl4r
Copy link
Contributor

7yl4r commented Jan 10, 2025

@7yl4r started some great work trying to track DOI citation counts.

Tylar, what are you using to do this? Any existing library or something you create?

OpenCitations is doing all the fancy stuff [stack overflow overview].

All I've done is parse the json response [R code here].

@ocefpaf
Copy link
Member

ocefpaf commented Jan 24, 2025

I'm not able to create an access token. Look like they believe I'm a bot :-/
Are you using one @7yl4r ? Did it work for you?

Image

@ocefpaf
Copy link
Member

ocefpaf commented Jan 24, 2025

Got one, had to change my ISP. A quick test doesn't show interesting results though, no DOI in that list worked for me:

https://gist.github.com/ocefpaf/b46f2edf8230cb698652ef9586616d4d

@MathewBiddle
Copy link
Contributor Author

Interesting. When I go to one of the dataset DOI landing pages (eg. https://dx.doi.org/10.15468/fxjpbr) you can see there are 42 citations (https://www.gbif.org/resource/search?contentType=literature&gbifDatasetKey=f4b56e69-4ff5-4cd2-bc48-ec273232d9e0). You can download them as a tsv here https://api.gbif.org/v1/literature/export?format=TSV&gbifDatasetKey=f4b56e69-4ff5-4cd2-bc48-ec273232d9e0

As I look at the papers citing that dataset, I can see most of them are using the entire GBIF download, not the individual dataset. For example, this paper cites GBIF. 2022. Global Biodiversity Information occurrence. Download from 1 March 2022. doi: [10.15468/dl.nqbg5v](https://doi.org/10.15468/dl.nqbg5v).

So, this makes me think that the citations list at GBIF is manually curated. Interesting. I've learned something and, as expected, things are more difficult than anticipated.

@sformel-usgs do you have any insight into how the number of citations is computed at GBIF? It looks like it might be manually curated?

@ocefpaf
Copy link
Member

ocefpaf commented Jan 24, 2025

Just for fun I tried Google Scholar via the scholarly library:

https://gist.github.com/ocefpaf/f6a65876874f6ee3065a9f18dda49719

Only two DOIs are found and both have lower citation numbers than their actual page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants