This repository tracks the development of efforts to map Neotoma dataset records against the DarwinCore schema to facilitate greater data discovery, reuse and sustainability of records archived within the Neotoma Paleoecological Database. This project is part of the EarthCube Integrative Activities proposal between Neotoma and the Paleobiological Database, and is one step along the path to upload Neotoma records to BISON and GBIF.
Initial work on this project was made possible through collaboration as part of the Cyber4Paleo Community Development Workshop in Boulder, CO, July, 2016. Much of this work is archived as part of the Cyber4Paleo GitHub organization and GitHub pages.
This work is carried out by the Earthlife Consortium, funded by NSF through the EarthCube initiative.
We welcome contributions from any individual, whether code, documentation, or issue tracking. All participants are expected to follow the code of conduct for this project.
- Simon Goring - Assistant Scientist, University of Wisconsin
- Jack Williams - Professor, University of Wisconsin
- Mark Uhen - George Mason University
- Michael McClennan - University of Wisconsin - Madison
- John Wieczorek - Information Architect, University of California, Berkeley
Mapping the Neotoma Database structure onto DarwinCore standards is relatively complex. While some of the data structure maps easily, the content of the database, and the conceptual structure of the paleoecological records is not consistently equivalent to the semantic structure of the DarwinCore schema. The Rmd
has some simple relationships described in the markdown portion of the document, based on a cross-walk started by Michael McClennan, and extended by Jack Williams and Mark Uhen at the Cyber4Paleo Community Development Workshop. Simon Goring developed the Rmd
and implemented the actual conversion of the database structure to the csv
file output.
The database itself is available as a SQL Server snapshot from the Neotoma Paleoecological Database's website here, or on figshare.org at the Neotoma Database Snapshot project.
With the snapshot loaded into your local server, replace the connection string in functionalized_run.R
(around line 27) and the code should "just run", provided you have the required packages. In this case you need libraries RODBC
, neotoma
, dplyr
and tidyr
.
- If there are missing fields, or poorly coded fields, feel free to provide suggestions.
- If there are efficiencies in coding, feel free to provide them
- If you feel documentation is incomplete, feel free to suggest imrpovements
- I'd (ideally)like to improve the
Rmd
so that it is, in some sense, publishable as a data/methods paper. We welcome contribution that would assist in this effort. If you feel like you would be able to contribute significantly enough to be considered an author please contact us first.
This work is supported through the National Science Foundation's EarthCube Initiative through NSF Award Numbers 1541002 and 1340301.