Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata in plaintext exports #71

Closed
mihai-sysbio opened this issue Dec 3, 2018 · 15 comments
Closed

Metadata in plaintext exports #71

mihai-sysbio opened this issue Dec 3, 2018 · 15 comments

Comments

@mihai-sysbio
Copy link
Member

The plaintext model files (e.g. yaml) should include metadata. This should be at least the version number, but it could also be a list of authors, a url, and a short description.

@haowang-bioinfo
Copy link
Member

@mihai-sysbio please clarify the issue by providing more explanations, and maybe examples.

If this issue is to adjust/modify the spec or content of YAML format, maybe it should be posted as an issue of RAVEN.

@mihai-sysbio
Copy link
Member Author

How about something like this:

!!omap
- name: "official name"
- version: "x.x.x"
- authors: "author1, author2"
- description: "one line description"
- source: "github url"

This information doesn't seem to be part of the model files regardless of the format, so even if RAVEN would be able to export this as metadata, it wouldn't know where to get it from.

@haowang-bioinfo
Copy link
Member

haowang-bioinfo commented Dec 4, 2018

Actually, RAVEN does provide fields (description and annotation) for such information, except version.

You may find how these information were organized in the model of Streptomyces coelicolor.

@mihai-sysbio
Copy link
Member Author

mihai-sysbio commented Dec 4, 2018

Point taken!
Is there any reason for which the metadata shouldn't part of the plaintext model files?

@haowang-bioinfo
Copy link
Member

It appears this topic is more relevant to RAVEN (or COBRApy), where would be more appropriate for such discussion/question.

@haowang-bioinfo
Copy link
Member

One thing should be noted is the version field, which has not been included either in RAVEN or COBRA model spec. But it is quite useful for such GEMs deposited as GitHub repo in my view.

@haowang-bioinfo
Copy link
Member

haowang-bioinfo commented Dec 8, 2018

@mihai-sysbio jus got a second thought about this: tailored Yaml specifications with additional information definitely can be applied, or at least tested, in human-GEM as long as they are beneficial for publishing GEMs as GitHub repos after open discussion.

@mihai-sysbio
Copy link
Member Author

@pecholleyc how about we start with a hand-crafted yaml for model metadata and see where that takes us?

@pecholleyc
Copy link
Contributor

pecholleyc commented Mar 14, 2019

Updated content of the metadata:

- metadata:
    id         : "HumanGEM"
    short_name : "human"
    full_name  : "Human metabolic model v1"
    description: "1-3 lines description"
    version    : "1.0.0"
    author:
      - first_name  : "fn"
        last_name   : "ln"
        email       : "email"
        organization: "org"
      - first_name  : "fn2"
        last_name   : "ln2"
        email       : "email2"
        organization: "org2"
    date       : "YYYY-MM-DD"
    sample     : "tissue, cell line, cell type, organ etc.."
    condition  : "Generic metabolism"
    pmid       : (optional)
      - "PMID1"
      - "PMID2"
    github     : "https://github.com/SysBioChalmers/human-GEM"

This section should be added in the very beginning of the file, on top of "metabolites".
The format is a bit different from the rest of the existing YAML file. We will have to decide if the whole file should be standardize or if you want to keep a compatible format with cobrapy (assuming this add-on do not already break the compatibility)

@haowang-bioinfo
Copy link
Member

@pecholleyc regarding the proposed spec for metadata, I suggest to remove the following information: short_name, full_name, date, condition,pmid and github. Because they are not required and can be skipped now. In addition, might be better to rename sample to sampleType.

@mihai-sysbio
Copy link
Member Author

@Hao-Chalmers all the fields you mentioned are required by Metabolic Atlas.

@pecholleyc
Copy link
Contributor

pecholleyc commented Mar 19, 2019

Let me clarify why these information should be in the YAML:

  • All those fields are available in the SBML format of any models, why? because what you download and store on your computer is the file itself, not the repo (e.g. downloading just the zipped release). People might not be able (/want) to fetch this information from the internet. You have also to realize that using github and extracting information might be tedious for a lot of users, so better show the maximum information on MA directly.

  • Most of them are required on Metabolic Atlas and will not be fetched automatically from the repo:

    • id is not used on Metabolic Atlas but can be useful if one downloads the YAML from the FTP or any other place other than this github repository.
    • short_name + version is the value displayed on the header bar of MA. e.g. for yeast we have Yeast 8.3.3. Have the same logic the value should be something like Human1 v1.0.2 (a bit redundant)
    • full_name is meant to be a bit more descriptive, will be displayed on the list selection of models. Meant to avoid confusion is MA have multiple human models (!= human1 model)
    • I expect description to be even more comprehensive, for instance you can specify the condition, cell type and the use of tINIT etc... The content can probably be extracted from the abstract of the paper.
    • date is also very valuable and all others models in MA are dated with the year from the publication (not very accurate I know). And I think people should be able select from MA the most recent models for the analysis.
    • sample is a mix of data: cell line, cell type, tissue, organ/system. But I would prefer to split it into theses 4 fields, and I have them optional.
    • condition is "the conditions of your experiment or the pathological condition that you want to shed light to by reconstructing this model" (from the SOP) e.g. Cancer, Malnourishment, Starvation, Oxidative Stress, Rich Media;
    • github. Obviously important. Ideally if all models integrated on MA have github account would be great, but is not likely to be the case, so I guess the it should be an optional field. It feels redundant to have the link if you are already on GitHub, one can look at the file locally and then have no clue what the original repo is.
    • pmid. Might not be available so optional. Some people are not interested into models details but only into the scientific use of the model. Giving direct access to the paper is always better than asking them to navigate through the repo to get the pmid.

I was thinking also adding information such as species, taxonomy and organism. Fields that are commonly found in the SBML format.

Note: I hope one day, some of this content will be available on Metabolic Atlas documentation in a section called something like "Import your own model".

@haowang-bioinfo
Copy link
Member

@pecholleyc I'm a bit confusing, how could something be both required and optional? I feel like the inclusion of metadata wouldn't be done at once, shall we just start with the certain ones?

@JonathanRob
Copy link
Collaborator

JonathanRob commented Mar 19, 2019

@Hao-Chalmers All of the fields are required, except the ones that @pecholleyc specifically states are optional (gihub and pmid). @pecholleyc can correct me if I misinterpreted, but @Hao-Chalmers makes a good point - maybe make it a bit more clear which fields are optional, and which are required.

haowang-bioinfo added a commit that referenced this issue Mar 29, 2019
This update is implemented by running script `miscModelCurationScript_20190323.m`:
- metadata is incoporated both in mat and yaml files according to #71
- reformat EC-number in `eccodes` field, as discussed in #93
- remove `rxnComps` field, according to #184 in RAVEN
- empty `version` field to enable a simple and clear work flow
- initialize `rxnConfidenceScores` field with zero, as discussed in #48
@haowang-bioinfo haowang-bioinfo mentioned this issue Apr 2, 2019
2 tasks
@haowang-bioinfo
Copy link
Member

This issue had been well resolved in #95, and thus closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants