-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add site cultivars, treatments metadata to plot metadata #573
Comments
@max-zilla I had a few questions about this one. At the point where we want to add the sites_cultivars and sites_treatments, we have a site_id but not an experiment id. If I had the experiment_id, there should be endpoints in brapi that will supply what we need. So I see two ways of handling this. The first would be that in terrautils, the experiments json is cached. I can add a method that takes a site_id and returns and experiment or experiment_id. That would require parsing that json for every site, which is the drawback. The second way would be that when the sites json is created, the method iterates over each experiment, and then builds an object for each site. We could just put the experiment_id in with the sites. It's just adding a single id and not a full object. Let me know if either of those are acceptable. |
Would it help to have a view pre populated in the database? |
The second option would probably be better, each extractor creates the cache on startup if it needs BETY stuff (we don't have a shared cache file like traitvis) so inserting the experiment into the site object and saving potentially many scans of the JSON later sounds more efficient. |
K. I will implement the second then. |
I have now changed the get_sites method in betydb in the following ways: Each entry for a site now contains the corresponding experiment id. The cultivar information is obtained by the following: Using the brapi v1/study/{studyDbId}/layouts endpoint, a dictionary is created that maps site ids to germplasm ids, and gets the corresponding information from the /germplasm endpoint in brapi. I think that this can be further optimized by using the stored BETYDB_CULTIVARS instead of making a call to the brapi /germplasm enpoint. Will check into that next, but this means that the cultivar info can be added in lemnatec.py without making any calls there. |
Both the cultivar and treatments information should now be contained in the sites retrieved from betydb. Some sites do not have cultivar or treatment data. Currently the value for those keys is 'no info available' but if there is a better value for those, I can change that. If someone can take a look, I think this one is pretty close to done. |
@tcnichol I can take a look if you can point to some of the records that are missing information. |
Here is one in layouts, there are entries for these 2 plots MAC Field Scanner Season 4 Range 6 Column 10 W/E However, this plot MAC Field Scanner Season 4 Range 48 Column 11 W doesn't have any corresponding entries in the /observationunits endpoint. Since all the entries in observationunits have a corresponding entry in /layouts, but not all entries in /layouts have a corresponding entry in observationunits, are these control plots? What would be the best value to put under 'treatments' if there is no entry for the plot in observationunits? |
it sounds like there are two things to address:
Other thoughts:
|
I will check into 1 tomorrow. I can create an issue for changing location_abbrevation to observationUnitName in brapi as well if that does not match the spec. |
I synced my local database with bety and did not see a sites_treatments table, but I was able to create something comparable with this sql.
the next step was running this query
Here are the results So it looks like bety does have a treatment with a name/definition associated with this plot. Could this mean that there is something up with the brapi endpoint or the query it uses? I will check it and compare what it does with what I get here. |
i posted in the brapi channel about pagination. |
i think i figured out the issue with pagination, testing a fix in brapi locally with this. |
This pull request fixes the issue in brapi. With pagination working correctly, there should be an entry found for every plot. |
@robkooper thanks for merging that one in. Since @dlebauer pointed out a problem with the results returned by observationunits, I have created this issue. Will fix that endpoint and then fix the branch associated with this in terrautils to use the correct keys rather than 'location_abbrevation.' |
This pull request fixes location_abbrevation, and I have changed terrautils to take this change into account for the branch associated with this issue. |
sample metadata as would be generated by terrautils.lemnatec._get_sites
|
(just a few keys, I didn't want to post something huge.) |
pull request open |
The PR is ready for review but the get_sites process runs extremely slowly. I will review with an eye toward possible caching or other improvements to make this viable in terrautils metadata cleaning pipeline. |
Description
Add site cultivar and site treatments data to sites metadata.
The change would be made here
https://github.com/terraref/terrautils/blob/master/terrautils/lemnatec.py#L118
This may be possible to accomplish using brapi endpoints, or else will require modification of betydb.
this is the branch i am working on for this issue:
https://github.com/terraref/terrautils/tree/add-cultivars-site-metadata
The text was updated successfully, but these errors were encountered: