Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta issue for high-level error checking #110

Open
ctb opened this issue Mar 9, 2019 · 2 comments
Open

meta issue for high-level error checking #110

ctb opened this issue Mar 9, 2019 · 2 comments

Comments

@ctb
Copy link
Member

ctb commented Mar 9, 2019

see #103 (comment) for initial motivation --

I think we need a modular way to do high-level correctness checking.

e.g.,

  • if quantify is run, either some assembly thingy needs to be specified OR a reference transcriptome needs to be provided
  • if gene_trans_map is true, the gene trans map file should exist
  • if the reads aren't gzipped, we should flag that somewhere ref get_data target assumes input data is gzipped. #30

I don't think run_eelpond should have this error checking in it directly, tho! Maybe we could put in something that when a particular rule file is included, it has some high level checks that it runs, or maybe that should be connected in some way to the higher level workflows mentioned in pipeline_defaults.yaml?

@bluegenes
Copy link
Member

bluegenes commented Mar 14, 2019

I think in the eelpond_params section of the params.yml file, we can add a require parameter that describes the required rules. Include utility rules like get_data, etc. Need to have an "or" option in place though, for situations wither either assemblyinput or assembly are required.

Not sure yet how to check if something has already been run (e.g. trimmomatic). Maybe don't check, but add a help section to the eelpond_params that has a brief description of the workflow & its required components. Would be helpful to run elvers examples/nema.yml assembly -h to return this help description to stdout.

@bluegenes
Copy link
Member

the require idea outlined above would involve updating requirements with exact rules that exist (e.g. right now salmon requires either get_reference or trinity, but in the future, other assemblers may work).

To get around this, maybe we instead create input/output categories that go in each params.yml files. When running a workflow, we check that all inputs are satisfied, and if not, print a list of all rules or utilities that provide that output. For example, if we need 'transcriptome", we have two rules that produce that, get_reference and trinity, and we can print a helpful message to suggest the user provide either rule.

something like this?

salmon:
  inputs:
      read:
        - raw
        - trimmed
      reference:
        - transcriptome
  outputs:
    read:
      - counts
deseq2:
  inputs:
      read:
        - counts
      reference:
        - transcriptome
  outputs:
    base:
      - diffexp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants