Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ModGP Outputs into RO-Crates #1

Open
ErikKusch opened this issue Jan 24, 2024 · 4 comments
Open

Make ModGP Outputs into RO-Crates #1

ErikKusch opened this issue Jan 24, 2024 · 4 comments

Comments

@ErikKusch
Copy link
Member

Improve fairness by packaging ModGP outputs into RO-Crates

@jgrieb
Copy link

jgrieb commented Jan 24, 2024

Preliminary thoughts

RO-Crates provide a lightweight technology stack to implement the FAIR Digital Object concept based on common web technologies involving provision of structured (meta)data with schema.org extensions such as Bioschemas and typed relationships with FAIR Signposting.

Data deposition

The generated outputs should be stored as RO-Crates (in the best case within a sustainable data repository that ensures the long-term availability of the data) and made available to clients via the web. The data should receive a PID. Some options could be

  • github.com (however: run by a commercial provider, storing large binary files on the long-term might not be guaranteed)
  • a self-hosted solution (however: out of scope to set this up during this hackathon)
  • rohub.com (some research needed regarding the long-term perspective of this repository)

rohub.com seems to be the best option to quickly draft and publish an RO-Crate for one of the outputs of the generated outputs of ModGP

@jgrieb
Copy link

jgrieb commented Jan 24, 2024

E.g. publishing via ROHub could result in the following ROCrate metadata description

{
  "@context": [
    "https://w3id.org/ro/crate/1.1/context",
    "https://w3id.org/ro/terms/earth-science#",
    {
      "description": "http://purl.org/dc/terms/description",
      "title": "http://purl.org/dc/terms/title",
      "creation_mode": "http://w3id.org/ro-id/rohub/model#creation_mode"
    }
  ],
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.1"
      },
      "about": {
        "@id": "./"
      }
    },
    {
      "@id": "./",
      "identifier": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e",
      "hasPart": [
        {
          "@id": "data%2F"
        },
        {
          "@id": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09"
        }
      ],
      "@type": [
        "Dataset"
      ],
      "creator": [
        "https://orcid.org/0000-0002-4984-7646"
      ],
      "author": [
        "https://orcid.org/0000-0002-4984-7646"
      ],
      "studySubject": [
        "http://eurovoc.europa.eu/632"
      ],
      "citeAs": "Erik Kusch. \"Lathyrus aphaca distribution.\" ROHub. Jan 24 ,2024. https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e.",
      "datePublished": "2024-01-24 09:35:26.012186+00:00",
      "dateCreated": "2024-01-24 09:35:26.012186+00:00",
      "dateModified": "2024-01-24 10:31:46.204306+00:00",
      "contributors": [],
      "name": "Lathyrus aphaca distribution",
      "contentSize": 0,
      "encodingFormat": "application/ld+json",
      "contentUrl": "https://api.rohub.org/api/ros/db91bd2f-2886-4078-90a3-0e5d21003b7e/crate/download/",
      "mainEntity": "Dataset",
      "keywords": [
        "SDM",
        "ModGP"
      ],
      "description": "ModGP output for Lathyrus aphaca",
      "https://w3id.org/ro/terms/earth-science#template": "https://w3id.org/ro/terms/earth-science#DataCentricResearchObjectTemplate",
      "modifiedTime": "2024-01-24 10:31:46.204306+00:00",
      "http://w3id.org/ro-id/rohub/model#creation_mode": "MANUAL"
    },
    {
      "@id": "biblio%2F",
      "@type": [
        "Dataset",
        "http://purl.org/wf4ever/wf4ever#Folder"
      ],
      "name": "biblio"
    },
    {
      "@id": "data%2F",
      "@type": [
        "Dataset",
        "http://purl.org/wf4ever/wf4ever#Folder"
      ],
      "name": "data"
    },
    {
      "@id": "metadata%2F",
      "@type": [
        "Dataset",
        "http://purl.org/wf4ever/wf4ever#Folder"
      ],
      "name": "metadata"
    },
    {
      "@id": "raw%20data%2F",
      "@type": [
        "Dataset",
        "http://purl.org/wf4ever/wf4ever#Folder"
      ],
      "name": "raw data"
    },
    {
      "@id": "https://w3id.org/ro-id/db91bd2f-2886-4078-90a3-0e5d21003b7e/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09",
      "@type": [
        "File",
        "Dataset"
      ],
      "name": "Lathyrus_aphaca-Outputs.nc",
      "sdDatePublished": "2024-01-24 10:31:43.534408+00:00",
      "dateCreated": "2024-01-24 10:31:43.534408+00:00",
      "dateModified": "2024-01-24 10:31:46.089805+00:00",
      "contentUrl": "https://api.rohub.org/api/resources/62f1e81b-cd81-46ef-b092-8240da5d9c09/download/",
      "@reverse": {
        "hasPart": [
          {
            "@id": "data%2F"
          }
        ]
      },
      "contentSize": 50020602,
      "encodingFormat": "application/x-netcdf"
    },
    {
      "@id": "https://w3id.org/ro-id/users/https%3A//orcid.org/0000-0002-4984-7646",
      "email": "[email protected]",
      "@type": "agent"
    }
  ]
}

@jgrieb
Copy link

jgrieb commented Jan 24, 2024

An alternative to publishing the RO-Crate directly in a repository like ROHub would be to create RO-Crate as packaged .zip files as an output of the ModGP script. This mode of using the ROCrate is also called "attached mode" (see here) Afterwards we evaluate how well these RO-Crate packages can be uploaded e.g. into the ROHub.

The difference between the attached RO-Crate and the uploaded one is that in the "packaged" mode in the metadata file the IRIs are relative paths within the local directory structure, while after uploading to a repository the IRIs should become web-resolvable URLs.

A minimal representation of the ro-crate-metadata.json of the attached RO-Crate could be:

{
  "@context": [
    "https://w3id.org/ro/crate/1.1/context"
  ],
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.1"
      },
      "about": {
        "@id": "./"
      }
    },
    {
      "@id": "./",
      "hasPart": [
        {
          "@id": "Lathyrus_aphaca-Outputs.nc"
        }
      ],
      "@type": [
        "Dataset"
      ],
      "creator": [
        "https://orcid.org/0000-0002-4984-7646"
      ],
      "author": [
        "https://orcid.org/0000-0002-4984-7646"
      ],
      "studySubject": [
        "http://eurovoc.europa.eu/632"
      ],
      "datePublished": "2024-01-24 09:35:26.012186+00:00",
      "name": "Lathyrus aphaca distribution",
      "encodingFormat": "application/ld+json",
      "contentUrl": "https://api.rohub.org/api/ros/db91bd2f-2886-4078-90a3-0e5d21003b7e/crate/download/",
      "mainEntity": "Dataset",
      "keywords": [
        "SDM",
        "ModGP"
      ],
      "description": "ModGP output for Lathyrus aphaca",
    },
    {
      "@id": "Lathyrus_aphaca-Outputs.nc",
      "@type": [
        "Dataset",
         "File"
      ],
      "name": "Lathyrus_aphaca-Outputs.nc",
      "contentSize": 50020602,
      "encodingFormat": "application/x-netcdf"
    },
    {
      "@id": "https://orcid.org/0000-0002-4984-7646",
      "name": "Erik Kusch"
    }
  ]
}

@jgrieb
Copy link

jgrieb commented Jan 25, 2024

Update and provisional hackathon result

We have manually modeled an example of how the ModGP output should be stored as an RO-Crate: By simply adding an ro-crate-metadata.json file into the directory of outputs per species. The example metadata.json file can be found here: https://github.com/jgrieb/CWR-Hackathon/blob/ro-crate-manual-example/ModGP/example-output/Lathyrus_aphaca/ro-crate-metadata.json

Note that we have additionally also modeled a simplified RO-Crate which represents the ModGP tool itself and thus can be referenced from within the provenance section of the output RO-Crate (section CreateAction). The tool is modeled as a ComputationalWorkflow in line with bioschema's ComputationalWorkflow profile 1.0. This example can be found here: https://github.com/jgrieb/CWR-Hackathon/blob/ro-crate-manual-example/ModGP/tool-ro-crate-metadata.json

Further below we provide some more documentation on the two example files

Outlook

In order to publish the ModGP model and output data in a FAIR way, two steps are required:

  1. The R script which generates the output files after computation for a certain species must be modified, in order to dynamically generate the ro-crate-metadata.json file, based on the manually created example. Afterwards, the complete RO-Crate (including the metadata and the data files itself) should automatically be uploaded and published in the ROHub repository.

  2. When the work on ModGP itself is finished, the tool should be published in a FAIR way. In this case, this would mean uploading the model code as an RO-Crate in WorkflowHub. For this, the second example of the ro-crate-metadata.json mentioned above must be finalized (some metadatafields still incomplete).

Documentation on the two examples

Output dataset RO-Crate

  • Note that the hasParts section in this example only covers one file (Lathyrus_aphaca-Outputs.nc), however all generated output files must be added here in production
  • The taxonomic information in the dataset is reflected by adding the about statement which points to a bioschemas:Taxon entity. This will supposedly be the way how bioschemas recommends to link a dataset to a taxon

ModGP ComputationalWorkflow

  • Missing required field for the ComputationalWorkflow profile: sdPublisher, version
  • It would be recommendable to also model the different inputs to the model (as in the example in the RO-Crate specification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants