Metadata in plaintext exports #71

mihai-sysbio · 2018-12-03T14:18:52Z

The plaintext model files (e.g. yaml) should include metadata. This should be at least the version number, but it could also be a list of authors, a url, and a short description.

The text was updated successfully, but these errors were encountered:

haowang-bioinfo · 2018-12-03T14:42:15Z

@mihai-sysbio please clarify the issue by providing more explanations, and maybe examples.

If this issue is to adjust/modify the spec or content of YAML format, maybe it should be posted as an issue of RAVEN.

mihai-sysbio · 2018-12-03T14:57:34Z

How about something like this:

!!omap
- name: "official name"
- version: "x.x.x"
- authors: "author1, author2"
- description: "one line description"
- source: "github url"

This information doesn't seem to be part of the model files regardless of the format, so even if RAVEN would be able to export this as metadata, it wouldn't know where to get it from.

haowang-bioinfo · 2018-12-04T08:01:34Z

Actually, RAVEN does provide fields (description and annotation) for such information, except version.

You may find how these information were organized in the model of Streptomyces coelicolor.

mihai-sysbio · 2018-12-04T08:32:19Z

Point taken!
Is there any reason for which the metadata shouldn't part of the plaintext model files?

haowang-bioinfo · 2018-12-04T09:31:43Z

It appears this topic is more relevant to RAVEN (or COBRApy), where would be more appropriate for such discussion/question.

haowang-bioinfo · 2018-12-05T10:34:55Z

One thing should be noted is the version field, which has not been included either in RAVEN or COBRA model spec. But it is quite useful for such GEMs deposited as GitHub repo in my view.

haowang-bioinfo · 2018-12-08T20:56:34Z

@mihai-sysbio jus got a second thought about this: tailored Yaml specifications with additional information definitely can be applied, or at least tested, in human-GEM as long as they are beneficial for publishing GEMs as GitHub repos after open discussion.

mihai-sysbio · 2019-01-24T10:28:48Z

@pecholleyc how about we start with a hand-crafted yaml for model metadata and see where that takes us?

pecholleyc · 2019-03-14T15:02:24Z

Updated content of the metadata:

- metadata:
    id         : "HumanGEM"
    short_name : "human"
    full_name  : "Human metabolic model v1"
    description: "1-3 lines description"
    version    : "1.0.0"
    author:
      - first_name  : "fn"
        last_name   : "ln"
        email       : "email"
        organization: "org"
      - first_name  : "fn2"
        last_name   : "ln2"
        email       : "email2"
        organization: "org2"
    date       : "YYYY-MM-DD"
    sample     : "tissue, cell line, cell type, organ etc.."
    condition  : "Generic metabolism"
    pmid       : (optional)
      - "PMID1"
      - "PMID2"
    github     : "https://github.com/SysBioChalmers/human-GEM"

This section should be added in the very beginning of the file, on top of "metabolites".
The format is a bit different from the rest of the existing YAML file. We will have to decide if the whole file should be standardize or if you want to keep a compatible format with cobrapy (assuming this add-on do not already break the compatibility)

haowang-bioinfo · 2019-03-18T13:03:00Z

@pecholleyc regarding the proposed spec for metadata, I suggest to remove the following information: short_name, full_name, date, condition,pmid and github. Because they are not required and can be skipped now. In addition, might be better to rename sample to sampleType.

mihai-sysbio · 2019-03-18T13:12:49Z

@Hao-Chalmers all the fields you mentioned are required by Metabolic Atlas.

pecholleyc · 2019-03-19T10:10:50Z

Let me clarify why these information should be in the YAML:

All those fields are available in the SBML format of any models, why? because what you download and store on your computer is the file itself, not the repo (e.g. downloading just the zipped release). People might not be able (/want) to fetch this information from the internet. You have also to realize that using github and extracting information might be tedious for a lot of users, so better show the maximum information on MA directly.
Most of them are required on Metabolic Atlas and will not be fetched automatically from the repo:
- id is not used on Metabolic Atlas but can be useful if one downloads the YAML from the FTP or any other place other than this github repository.
- short_name + version is the value displayed on the header bar of MA. e.g. for yeast we have Yeast 8.3.3. Have the same logic the value should be something like Human1 v1.0.2 (a bit redundant)
- full_name is meant to be a bit more descriptive, will be displayed on the list selection of models. Meant to avoid confusion is MA have multiple human models (!= human1 model)
- I expect description to be even more comprehensive, for instance you can specify the condition, cell type and the use of tINIT etc... The content can probably be extracted from the abstract of the paper.
- date is also very valuable and all others models in MA are dated with the year from the publication (not very accurate I know). And I think people should be able select from MA the most recent models for the analysis.
- sample is a mix of data: cell line, cell type, tissue, organ/system. But I would prefer to split it into theses 4 fields, and I have them optional.
- condition is "the conditions of your experiment or the pathological condition that you want to shed light to by reconstructing this model" (from the SOP) e.g. Cancer, Malnourishment, Starvation, Oxidative Stress, Rich Media;
- github. Obviously important. Ideally if all models integrated on MA have github account would be great, but is not likely to be the case, so I guess the it should be an optional field. It feels redundant to have the link if you are already on GitHub, one can look at the file locally and then have no clue what the original repo is.
- pmid. Might not be available so optional. Some people are not interested into models details but only into the scientific use of the model. Giving direct access to the paper is always better than asking them to navigate through the repo to get the pmid.

I was thinking also adding information such as species, taxonomy and organism. Fields that are commonly found in the SBML format.

Note: I hope one day, some of this content will be available on Metabolic Atlas documentation in a section called something like "Import your own model".

haowang-bioinfo · 2019-03-19T12:36:02Z

@pecholleyc I'm a bit confusing, how could something be both required and optional? I feel like the inclusion of metadata wouldn't be done at once, shall we just start with the certain ones?

JonathanRob · 2019-03-19T12:42:37Z

@Hao-Chalmers All of the fields are required, except the ones that @pecholleyc specifically states are optional (gihub and pmid). @pecholleyc can correct me if I misinterpreted, but @Hao-Chalmers makes a good point - maybe make it a bit more clear which fields are optional, and which are required.

This update is implemented by running script `miscModelCurationScript_20190323.m`: - metadata is incoporated both in mat and yaml files according to #71 - reformat EC-number in `eccodes` field, as discussed in #93 - remove `rxnComps` field, according to #184 in RAVEN - empty `version` field to enable a simple and clear work flow - initialize `rxnConfidenceScores` field with zero, as discussed in #48

haowang-bioinfo · 2020-04-06T14:02:45Z

This issue had been well resolved in #95, and thus closed.

haowang-bioinfo added enhancement discussion labels Dec 3, 2018

JonathanRob mentioned this issue Mar 14, 2019

merge YamlExport branch with devel #94

Closed

2 tasks

haowang-bioinfo mentioned this issue Mar 23, 2019

feat: add metadata and miscellaneous adjustments #95

Merged

2 tasks

haowang-bioinfo mentioned this issue Apr 2, 2019

human v1.0.1 #97

Merged

2 tasks

haowang-bioinfo mentioned this issue Sep 24, 2019

feat: enable lossless conversion between mat and yml model formats #132

Merged

2 tasks

haowang-bioinfo closed this as completed Apr 6, 2020

haowang-bioinfo mentioned this issue May 10, 2020

YAML format issues in modelFiles/yml/HumanGEM.yml #169

Closed

3 tasks

haowang-bioinfo mentioned this issue Jun 23, 2020

name of folder with models MetabolicAtlas/standard-GEM#4

Closed

haowang-bioinfo mentioned this issue Jul 20, 2020

feat: yaml worflow #173

Merged

2 tasks

haowang-bioinfo mentioned this issue Jul 27, 2020

feat: addition of metadata section to the yaml file specification in RAVEN SysBioChalmers/RAVEN#311

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata in plaintext exports #71

Metadata in plaintext exports #71

mihai-sysbio commented Dec 3, 2018

haowang-bioinfo commented Dec 3, 2018

mihai-sysbio commented Dec 3, 2018

haowang-bioinfo commented Dec 4, 2018 •

edited

Loading

mihai-sysbio commented Dec 4, 2018 •

edited

Loading

haowang-bioinfo commented Dec 4, 2018

haowang-bioinfo commented Dec 5, 2018

haowang-bioinfo commented Dec 8, 2018 •

edited

Loading

mihai-sysbio commented Jan 24, 2019

pecholleyc commented Mar 14, 2019 •

edited

Loading

haowang-bioinfo commented Mar 18, 2019

mihai-sysbio commented Mar 18, 2019

pecholleyc commented Mar 19, 2019 •

edited

Loading

haowang-bioinfo commented Mar 19, 2019

JonathanRob commented Mar 19, 2019 •

edited

Loading

haowang-bioinfo commented Apr 6, 2020

Metadata in plaintext exports #71

Metadata in plaintext exports #71

Comments

mihai-sysbio commented Dec 3, 2018

haowang-bioinfo commented Dec 3, 2018

mihai-sysbio commented Dec 3, 2018

haowang-bioinfo commented Dec 4, 2018 • edited Loading

mihai-sysbio commented Dec 4, 2018 • edited Loading

haowang-bioinfo commented Dec 4, 2018

haowang-bioinfo commented Dec 5, 2018

haowang-bioinfo commented Dec 8, 2018 • edited Loading

mihai-sysbio commented Jan 24, 2019

pecholleyc commented Mar 14, 2019 • edited Loading

haowang-bioinfo commented Mar 18, 2019

mihai-sysbio commented Mar 18, 2019

pecholleyc commented Mar 19, 2019 • edited Loading

haowang-bioinfo commented Mar 19, 2019

JonathanRob commented Mar 19, 2019 • edited Loading

haowang-bioinfo commented Apr 6, 2020

haowang-bioinfo commented Dec 4, 2018 •

edited

Loading

mihai-sysbio commented Dec 4, 2018 •

edited

Loading

haowang-bioinfo commented Dec 8, 2018 •

edited

Loading

pecholleyc commented Mar 14, 2019 •

edited

Loading

pecholleyc commented Mar 19, 2019 •

edited

Loading

JonathanRob commented Mar 19, 2019 •

edited

Loading