@@ -127,4 +127,64 @@ GitHub.
127
127
* @ethanwhite
128
128
* @henrysenyondo
129
129
130
- ## Appendix
130
+
131
+ ## Upgrade to datapackage.json standard
132
+
133
+ ** Please ask questions [ here] ( https://github.com/numfocus/gsoc/issues ) .**
134
+
135
+ ### Rationale
136
+
137
+ The [ EcoData Retriever] ( http://ecodataretriever.org ) is a Python based tool for
138
+ automatically downloading, cleaning up, and restructuring ecological data. It
139
+ does the hard work of data munging so that scientists can focus on doing
140
+ science.
141
+
142
+ One of the ways the Retriever makes it easy to add new datasets is by allowing
143
+ datasets to be added using simple plain text descriptions of the data that are
144
+ then parsed into Python. There is an emerging standard for this kind of metadata
145
+ called
146
+ [ datapackage.json] ( http://dataprotocols.org/data-packages/#descriptor-datapackagejson ) .
147
+
148
+ ### Approach
149
+
150
+ This project would upgrade the EcoData Retriever to use the datapackage.json
151
+ standard. Specifically this would involve:
152
+
153
+ * Understanding the metadata standard
154
+ * Extending the standard to include metadata that is required by the EcoData
155
+ Retriever that is not part of the standard
156
+ * Converting the Python parser to take JSON as input instead of current YAML
157
+ like scripts
158
+ * Converting existing scripts to the JSON standard
159
+ * (stretch goal) developing a simple web or command line tool for generating
160
+ JSON for new datasets
161
+
162
+ ### Challenges
163
+
164
+ The EcoData Retriever scripts contain information on how to clean and modify
165
+ data that is not currently part of the datapackage.json standard. Designing how
166
+ to include this information in a way that is easy to read by both humans and
167
+ computers will require good design work.
168
+
169
+ ### Involved toolkits or projects
170
+
171
+ * The [ EcoData Retriever] ( http://ecodataretriever.org )
172
+ * Python
173
+ * JSON
174
+ * [ datapackage.json] ( http://dataprotocols.org/data-packages/#descriptor-datapackagejson )
175
+
176
+ ### Degree of difficulty and needed skills
177
+
178
+ * Moderate Difficulty
179
+ * Knowledge of Python
180
+ * Knowledge of JSON
181
+
182
+ ### Involved developer communities
183
+
184
+ The EcoData Retriever primarily interacts via issues and pull requests on
185
+ GitHub.
186
+
187
+ ### Mentors
188
+
189
+ * @ethanwhite
190
+ * @henrysenyondo
0 commit comments