Skip to content

Commit 4ca8320

Browse files
committed
Add datapackage.json conversion as new project idea
1 parent 29416ae commit 4ca8320

File tree

1 file changed

+61
-1
lines changed

1 file changed

+61
-1
lines changed

2016/ideas-list-ecodata-retriever.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,4 +127,64 @@ GitHub.
127127
* @ethanwhite
128128
* @henrysenyondo
129129

130-
## Appendix
130+
131+
## Upgrade to datapackage.json standard
132+
133+
**Please ask questions [here](https://github.com/numfocus/gsoc/issues).**
134+
135+
### Rationale
136+
137+
The [EcoData Retriever](http://ecodataretriever.org) is a Python based tool for
138+
automatically downloading, cleaning up, and restructuring ecological data. It
139+
does the hard work of data munging so that scientists can focus on doing
140+
science.
141+
142+
One of the ways the Retriever makes it easy to add new datasets is by allowing
143+
datasets to be added using simple plain text descriptions of the data that are
144+
then parsed into Python. There is an emerging standard for this kind of metadata
145+
called
146+
[datapackage.json](http://dataprotocols.org/data-packages/#descriptor-datapackagejson).
147+
148+
### Approach
149+
150+
This project would upgrade the EcoData Retriever to use the datapackage.json
151+
standard. Specifically this would involve:
152+
153+
* Understanding the metadata standard
154+
* Extending the standard to include metadata that is required by the EcoData
155+
Retriever that is not part of the standard
156+
* Converting the Python parser to take JSON as input instead of current YAML
157+
like scripts
158+
* Converting existing scripts to the JSON standard
159+
* (stretch goal) developing a simple web or command line tool for generating
160+
JSON for new datasets
161+
162+
### Challenges
163+
164+
The EcoData Retriever scripts contain information on how to clean and modify
165+
data that is not currently part of the datapackage.json standard. Designing how
166+
to include this information in a way that is easy to read by both humans and
167+
computers will require good design work.
168+
169+
### Involved toolkits or projects
170+
171+
* The [EcoData Retriever](http://ecodataretriever.org)
172+
* Python
173+
* JSON
174+
* [datapackage.json](http://dataprotocols.org/data-packages/#descriptor-datapackagejson)
175+
176+
### Degree of difficulty and needed skills
177+
178+
* Moderate Difficulty
179+
* Knowledge of Python
180+
* Knowledge of JSON
181+
182+
### Involved developer communities
183+
184+
The EcoData Retriever primarily interacts via issues and pull requests on
185+
GitHub.
186+
187+
### Mentors
188+
189+
* @ethanwhite
190+
* @henrysenyondo

0 commit comments

Comments
 (0)