Skip to content

Commit 0bc202c

Browse files
authored
Merge pull request #4 from PecanProject/metadata_entry
Updates to Metadata Entry Section
2 parents f3f038e + 706d445 commit 0bc202c

10 files changed

+285
-32
lines changed

_bookdown.yml

+8-5
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,14 @@ rmd_files: ["index.Rmd",
44
"finding_data.md",
55
"preparing_published_data.md",
66
"overview.md",
7-
"add_citation.md",
8-
"add_site.md",
9-
"add_treatment.md",
10-
"adding_managements.md",
11-
"adding_pfts_species_cultivars.md",
7+
"adding_metadata_web_interface.md",
8+
"add_citation.md",
9+
"add_site.md",
10+
"add_treatment.md",
11+
"adding_managements.md",
12+
"adding_pfts_species_cultivars.md",
13+
"adding_metadata_betydb_yaba.md",
14+
"adding_metadata_sql.md",
1215
"adding_trait_and_yield_data.md",
1316
"adding_traits_and_yields.md",
1417
"bulk_upload.md",

add_citation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Adding a Citation
1+
### Adding a Citation
22

33
Citation provides information regarding the source of the data.
44
A PDF copy of each paper should be available through Mendeley.

add_site.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Adding a Site
1+
### Adding a Site
22

33
Each experiment is conducted at a unique site. In the context of BETY,
44
the term 'site' refers to a specific location and it is common for many
@@ -59,15 +59,15 @@ Interface for adding a new site:
5959
| Rooting Zone Depth | Measured in Meters (m) |
6060
|Depth of Water Table| Measured in Meters (m) |
6161

62-
### When not to enter a new site
62+
#### When not to enter a new site
6363

6464
Do **not** enter a new site when plants (or seeds) are collected from multiple locations and then grown in the same location; this is called "common garden experiment". In this case, the location of the study is included as site information. Information about the seed source can be entered as a distinct cultivar.
6565

66-
### Site Location
66+
#### Site Location
6767

6868
Points can be added via the web interface; spatial geometries, e.g. a plot, site, or country boundary, must be entered via the PostgreSQL command line.
6969

70-
#### Point Locations
70+
##### Point Locations
7171

7272
If latitude and longitude coordinates are not available, it is often
7373
possible to determine the site location based on the site name, city,
@@ -91,12 +91,12 @@ Table: Level of accuracy to record in lat and lon fields.
9191
| Acre | 0.001 |
9292
| 10 Meters | 0.0001 |
9393

94-
#### Boundaries
94+
##### Boundaries
9595

9696
A vector boundary must be obtained. Here is one way to obtain a site boundary using R:
9797

9898

99-
##### A rectangular plot (with bounding box) {-}
99+
###### A rectangular plot (with bounding box) {-}
100100

101101
Here I set the bounding box for a plot by specifying the plot corners and elevation. Notice that it is necessary to specify the first point twice, once at the beginning and once at the end.
102102

@@ -110,7 +110,7 @@ Here I set the bounding box for a plot by specifying the plot corners and elevat
110110
ID = 1123;
111111

112112

113-
##### A country boundary: {-}
113+
###### A country boundary: {-}
114114

115115

116116
library(prevR)# for `create.boundary` function
@@ -127,7 +127,7 @@ Then import at the command line (can also copy / paste to terminal, but this bou
127127

128128
psql -U bety -d bety < uk.sql
129129

130-
#### References
130+
##### References
131131

132132
* PostGIS `ST_GeomFromText` documentation: [http://www.postgis.org/docs/ST_GeomFromText.html](http://www.postgis.org/docs/ST_GeomFromText.html){target="_blank"}
133133
* gis.stackexchange: [http://gis.stackexchange.com/q/111212/1239](http://gis.stackexchange.com/q/111212/1239){target="_blank"}

add_treatment.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
## Adding Treatments and Managements
1+
### Adding Treatments and Managements
22

3-
### Adding a Treatment
3+
#### Adding a Treatment
44

55
Treatments provide a description of a study’s
66
treatments. Any specific information such as rate of fertilizer

adding_managements.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
### Adding Managements
1+
#### Adding Managements
22

33
There are two ways to add management information, through the web interface or from a spreadsheet. These are discussed in turn, below. Recall that managements can be associated with one or more treatments.
44

@@ -24,7 +24,7 @@ For a multi-year experiment, there may be multiple entries for the same type of
2424

2525

2626

27-
#### Types of Managements
27+
##### Types of Managements
2828

2929
The following table shows a list of managements to enter. It is more important to have management records for yields than for traits. For greenhouse experiments, it is not necessary to include informaton on fertilizaton, lighting, or greenhouse temperature.
3030

@@ -49,7 +49,7 @@ Table: Management Types
4949

5050

5151

52-
#### Via Web interface
52+
##### Via Web interface
5353

5454

5555
Managements can be entered via the web interface. First enter the management, and then associate it with one or more treatments. To associate a management with multiple treatments, first create the
@@ -60,14 +60,14 @@ management, then edit the management and add treatment relationships.
6060

6161

6262

63-
#### Preparing a managements spreadsheet for Upload
63+
##### Preparing a managements spreadsheet for Upload
6464

6565
When there is a long list of managements, the `insert_managements` scripts enables users to insert data organized in a text based (csv) file.
6666

6767
Preparing the csv file can be done in any spreadsheet program such as Excel or Google Sheets. The insertion is straightforward, but requires familiarity with the bash shell as well as administrative access to the Postgres database.
6868

6969

70-
##### File format {-}
70+
###### File format {-}
7171

7272
**Required Fields** the spreadsheet or CSV file must contain the following column headings:
7373

@@ -92,14 +92,14 @@ Each optional column heading corresponds to an optional field in the database ma
9292

9393
If the table is prepared in a spreadsheet program, use the "save as → .csv" option to export a single text based .csv file.
9494

95-
#### Inserting Management Insertion Script
95+
##### Inserting Management Insertion Script
9696

9797
The [`insert_managements.rb`](https://github.com/PecanProject/bety/blob/master/script/insert_managements.rb){target="_blank"} script takes a CSV file describing managements to be added to the database as input and outputs a file containing SQL statements to do the required insertions.
9898

9999

100100
The script `insert_managements.rb` is in the directory `RAILS_ROOT/script`. The complete usage instructions (also obtainable by running `./insert_managements --man`) follow. For additional information, see [Github issue #288](https://github.com/PecanProject/bety/issues/288#issuecomment-153440839){target="_blank"}.
101101

102-
##### `insert_managements.rb` {-}
102+
###### `insert_managements.rb` {-}
103103

104104

105105
```
@@ -114,12 +114,12 @@ where [options] are:
114114
-h, --help Show this message
115115
```
116116

117-
##### Database Specification {-}
117+
###### Database Specification {-}
118118

119119
The database used by the script is determined by the environment specified by the '--environment' option (or 'development' if not specified) and the contents of the configuration file 'config/database.yml'.
120120
(Run 'rake dbconf' to view the contents of this file on the command line.)
121121

122-
##### Using the Script to Update the Production Database {-}
122+
###### Using the Script to Update the Production Database {-}
123123

124124
There are three options for using this script to update the production database.
125125

adding_metadata_betydb_yaba.md

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
## BETYdb-YABA and Python Client Library
2+
3+
BETYdb-YABA and its python client library provides an automated bulk upload to many BETYdb tables. API endpoints have been implemented to upload data to respective tables. For more information including examples for all endpoints and a link to the Swagger documentation, see the BETYdb-YABA [README](https://github.com/PecanProject/BETYdb-YABA/blob/master/README.md){target="_blank"}.
4+
5+
Here are some examples of how to hit Experiments and Sites endpoints:
6+
7+
* Experiments:
8+
```sh
9+
curl -F "fileName=@input_files/experiments.csv" \
10+
http://localhost:5001/yaba/v1/experiments?username=guestuser
11+
```
12+
* Sites:
13+
```sh
14+
curl -F "fileName=@input_files/sites.csv" \
15+
-F "shp_file=@input_files/S8_two_row_polys.shp" \
16+
-F "dbf_file=@input_files/S8_two_row_polys.dbf" \
17+
-F "prj_file=@input_files/S8_two_row_polys.prj" \
18+
-F "shx_file=@input_files/S8_two_row_polys.shx" \
19+
http://localhost:5001/yaba/v1/sites
20+
```
21+
22+
23+
24+

adding_metadata_sql.md

+204
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
## SQL
2+
3+
SQL should be used only when the data manipulation to be done either can't be done using the web interface or where doing so would be exceedingly tedious.
4+
For example, some of the steps below can't be done in either the web interface, and the BETYdb-YABA API is not always available.
5+
6+
SQL should primarily be used in the following cases:
7+
8+
* correcting systemic data errors involving large numbers of rows
9+
* assigning non-point geometries to sites
10+
* associating experiments with sites and treatments
11+
* associating sites with cultivars
12+
13+
14+
### Metadata Entry Workflow for BETYdb
15+
16+
The following steps have been implemented in the BETYdb-YABA API and python client, but are described below for reference
17+
18+
Here we show SQL code used to add metadata required each season by the TERRA REF database, including experiments, sites, treatments, cultivars, and citations that is required prior to uploading the trait or yield data. Most PEcAn users _will not_ need to add experiments and cultivars, or associate these records with sites.
19+
20+
Also note that in the TERRA REF and other agronomic applications, each record in the 'sites' table will correspond to an experimental plot.
21+
22+
### Step 1: Add new experiments
23+
24+
```sql
25+
insert into experiments (name, start_date, end_date, user_id)
26+
values ('MAC Season 6: Sorghum BAP', '2018-04-06', '2018-08-01', 'some text', 'some text', 6000000004);
27+
```
28+
29+
### Step 2: Add new sites
30+
31+
To add sites (or many plots at a single site) using the BETYdb-YABA API, you must provide shapefile for a site to have an associated geometry.
32+
33+
Otherwise, you can add sites with points or polygons to the database as follows.
34+
35+
```sql
36+
insert into sites (city, state, country, sitename)
37+
values ('Maricopa', 'Arizona', 'USA', 'MAC Field Scanner Season 7 Range 9 Column 15');
38+
```
39+
40+
You can add simple locations with a single point thus:
41+
42+
```sql
43+
insert into sites (city, state, country, sitename, geometry)
44+
values ('Urbana', 'Illinois', 'United States', 'My garden plot', ST_SetSRID(ST_makePoint(88, 40, 222), 4326));
45+
For the TERRA REF Project, plot definitions may be copied from previous season if same plots are used.
46+
47+
```sql
48+
with season6 as (
49+
select city, state, replace(sitename, 'Season 4', 'Season 6') as sitename, greenhouse, geometry, time_zone from sites where sitename like '%Season 4%'
50+
)
51+
insert into sites (city, state, sitename, greenhouse, geometry, time_zone) select * from season6;
52+
```
53+
54+
55+
### Step 3: Add new treatments
56+
57+
```sql
58+
insert into treatments (name, definition, control)
59+
values ('MAC Season 6: Sorghum', 'some text', 't');
60+
```
61+
62+
### Step 4: Add new cultivars
63+
64+
Each cultivar must be associated with a species. If there is no entry for the species in the species table, it must be added before adding the new cultivar.
65+
66+
67+
```sql
68+
insert into cultivars (name, specie_id)
69+
values ('RIL-CS27_(TX2910/(Macia/R07007)-CS44)-CSF1-PRF2-CS27', 2588);
70+
```
71+
72+
### Step 5: Add new citations
73+
74+
```sql
75+
insert into citations (author, year, title)
76+
values ('Newcomb, Maria', 2016, 'Maricopa Agricultural Center Field Activities');
77+
```
78+
79+
### Step 6: Associate experiments with sites
80+
81+
As an example, this statement could be used to associate the experiment named "MAC Season 6: Sorghum BAP" with the site named "MAC Field Scanner Season 6 Range 1 Column 1 E".
82+
```sql
83+
insert into experiments_sites (experiment_id, site_id) values
84+
((select id from experiments where name = 'MAC Season 6: Sorghum BAP'),
85+
(select id from sites where sitename = 'MAC Field Scanner Season 6 Range 1 Column 1 E'));
86+
```
87+
88+
When adding a new season for the TERRA REF project, a statement like the following can be used for associating the new season's experiment [or "an experiment for the new season"] with all of the new season's sites. For example, since MAC Field Center sites are consistently named following the format MAC Field Scanner Season x Range a Column b, we could use the following statement to associate the experment named "MAC Season 6: Sorghum BAP" with all Season 6 sites:
89+
90+
```sql
91+
insert into experiments_sites (experiment_id, site_id)
92+
select e.experiment_id, s.site_id
93+
from (select id as experiment_id from experiments where name = 'MAC Season 6: Sorghum BAP') as e
94+
cross join
95+
(select id as site_id from sites where sitename like 'MAC Field Scanner Season 6%') as s;
96+
```
97+
98+
### Step 7: Associate experiments with treatments
99+
100+
```sql
101+
insert into experiments_treatments (experiment_id, treatment_id)
102+
values ((select id from experiments where name = 'MAC Season 6: Sorghum BAP'),
103+
(select id from treatments where name = 'MAC Season 6: Sorghum'));
104+
```
105+
When adding a new season for the TERRA REF project, a statement like the following can be used to associate all season 6 experiments with all season 6 treatments assuming the experiment and treatment names follow the format convention `MAC Season x: subexperiment name`.
106+
107+
```sql
108+
insert into experiments_treatments (experiment_id, treatment_id)
109+
select e.experiment_id, t.treatment_id
110+
from (select id as experiment_id from experiments where name like 'MAC Season 6:%') as e
111+
cross join
112+
(select id as treatment_id from treatments where name like 'MAC Season 6:%') as s;
113+
```
114+
115+
### Step 8: Associate sites with cultivars
116+
117+
As an example, this statement could be used to associate the site name MAC Field Scanner Season 8 Range 1 Column 1 E with the Sorghum bicolor cultivar Tiburon:
118+
119+
```sql
120+
insert into sites_cultivars (site_id, cultivar_id)
121+
values ((select id from sites where sitename = 'MAC Field Scanner Season 8 Range 1 Column 1 E'),
122+
(select id from cultivars where name = 'Tiburon' and specie_id =
123+
(select id from species where scientificname = 'Sorghum bicolor')));
124+
```
125+
126+
### Step 9: Associate sites with citations
127+
128+
```sql
129+
insert into citations_sites (citation_id, site_id)
130+
values ((select id from citations where author = 'Newcomb, Maria' and year = 2016 and title = 'MAC Field Activities'),
131+
(select id from sites where sitename = 'MAC Field Scanner Season 6 Range 1 Column 1 E'));
132+
```
133+
134+
When adding a new season for the TERRA REF project, a statement like the following can be used to associate all citations with author "Newcomb, Maria" with all Season 6 sites since MAC Field Center sites are consistently named following the format `MAC Field Scanner Season x Range a Column b`
135+
136+
137+
```sql
138+
insert into citations_sites (citation_id, site_id)
139+
select c.id, s.id
140+
from (select id from citations where author = 'Newcomb, Maria')
141+
AS c
142+
cross join
143+
(select id from sites where sitename like 'MAC Field Scanner Season 6%') AS s;
144+
```
145+
146+
147+
## Removing duplicate records
148+
It sometimes happens that multiple rows in a BETYdb table are duplicates, e.g., two species or two sites that were independently added but reference the same entity. A function such as the one that follows could be used to delete a duplicate row and change all references to it to point to a row that we are retaining. (Note that there are serious problems associated with the use of this function, as outlined in [this comment in issue 185](https://github.com/PecanProject/bety/issues/185#issuecomment-530554650){target="_blank"}. We present it here only to outline what might be possible along the lines of partially automating the correction of data errors.)
149+
150+
The function takes three arguments: the name of the table we wish to update (as a string), the id number of the row we wish to remove (the "duplicate"), and the id number of the similar row we wish to retain. For example, if we have two citation rows having essentially the same information having id numbers 286 and 289, we could remove the first and update references to it to point to the second with the statement
151+
152+
```sql
153+
SELECT update_refs_from_to('citations', 286, 289);
154+
```
155+
156+
The following function can be used to combine duplicates or replace an old record with a new one. There are many caveats described in [Issue 185](https://github.com/PecanProject/bety/issues/185){target="_blank"}.
157+
158+
But the basic usage, e.g. to convert all records with `citation_id = 286` to `citation_id = 289`:
159+
160+
```sql
161+
SELECT update_refs_from_to('citations', 286, 289);
162+
```
163+
164+
To make the function available, first run this code:
165+
166+
```sql
167+
CREATE OR REPLACE FUNCTION update_refs_from_to(
168+
primary_table_name varchar,
169+
old_id bigint,
170+
new_id bigint
171+
) RETURNS void AS $$
172+
DECLARE
173+
foreign_key_col_name varchar;
174+
referring_table_name varchar;
175+
update_stmt varchar;
176+
delete_stmt varchar;
177+
BEGIN
178+
foreign_key_col_name := regexp_replace(primary_table_name, 's$', '') || '_id';
179+
FOR referring_table_name IN SELECT table_name FROM information_schema.columns WHERE table_schema = 'public' AND "column_name" = foreign_key_col_name AND is_updatable = 'YES' LOOP
180+
181+
BEGIN
182+
update_stmt := 'UPDATE ' || referring_table_name || ' SET ' || foreign_key_col_name || ' = ' || new_id || ' WHERE ' || foreign_key_col_name || ' = ' || old_id;
183+
RAISE NOTICE 'Attempting to run %', update_stmt;
184+
EXECUTE update_stmt;
185+
RAISE NOTICE 'Success!';
186+
EXCEPTION
187+
WHEN unique_violation THEN
188+
RAISE NOTICE 'UPDATE FAILED!!!';
189+
RAISE NOTICE 'Updating table column % in table % would violate uniqueness constraints', foreign_key_col_name, referring_table_name;
190+
END;
191+
END LOOP;
192+
BEGIN
193+
delete_stmt := 'DELETE FROM ' || primary_table_name || ' WHERE id = ' || old_id;
194+
RAISE NOTICE 'Attempting to run %', delete_stmt;
195+
EXECUTE delete_stmt;
196+
RAISE NOTICE 'Success!';
197+
EXCEPTION
198+
WHEN foreign_key_violation THEN
199+
RAISE NOTICE 'DELETION FAILED!!!';
200+
RAISE NOTICE 'Deletion from table % of the row with id % would cause a foreign-key violation', primary_table_name, old_id;
201+
END;
202+
END
203+
$$ LANGUAGE plpgsql;
204+
```

0 commit comments

Comments
 (0)