meta issue for high-level error checking #110

ctb · 2019-03-09T15:27:28Z

see #103 (comment) for initial motivation --

I think we need a modular way to do high-level correctness checking.

e.g.,

if quantify is run, either some assembly thingy needs to be specified OR a reference transcriptome needs to be provided
if gene_trans_map is true, the gene trans map file should exist
if the reads aren't gzipped, we should flag that somewhere ref get_data target assumes input data is gzipped. #30

I don't think run_eelpond should have this error checking in it directly, tho! Maybe we could put in something that when a particular rule file is included, it has some high level checks that it runs, or maybe that should be connected in some way to the higher level workflows mentioned in pipeline_defaults.yaml?

The text was updated successfully, but these errors were encountered:

bluegenes · 2019-03-14T18:12:57Z

I think in the eelpond_params section of the params.yml file, we can add a require parameter that describes the required rules. Include utility rules like get_data, etc. Need to have an "or" option in place though, for situations wither either assemblyinput or assembly are required.

Not sure yet how to check if something has already been run (e.g. trimmomatic). Maybe don't check, but add a help section to the eelpond_params that has a brief description of the workflow & its required components. Would be helpful to run elvers examples/nema.yml assembly -h to return this help description to stdout.

bluegenes · 2019-03-17T22:16:34Z

the require idea outlined above would involve updating requirements with exact rules that exist (e.g. right now salmon requires either get_reference or trinity, but in the future, other assemblers may work).

To get around this, maybe we instead create input/output categories that go in each params.yml files. When running a workflow, we check that all inputs are satisfied, and if not, print a list of all rules or utilities that provide that output. For example, if we need 'transcriptome", we have two rules that produce that, get_reference and trinity, and we can print a helpful message to suggest the user provide either rule.

something like this?

salmon:
  inputs:
      read:
        - raw
        - trimmed
      reference:
        - transcriptome
  outputs:
    read:
      - counts

deseq2:
  inputs:
      read:
        - counts
      reference:
        - transcriptome
  outputs:
    base:
      - diffexp

bluegenes mentioned this issue Mar 12, 2019

fix assemblyinput for ppl specifying out_path #96

Merged

This was referenced Mar 15, 2019

do not require samples tsv for assembly-based workflows #82

Closed

only read in samples IF we need them #120

Merged

This was referenced Mar 17, 2019

[MRG] add config validation, better input file finding #122

Merged

'quantify' target is currently failing. #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta issue for high-level error checking #110

meta issue for high-level error checking #110

ctb commented Mar 9, 2019

bluegenes commented Mar 14, 2019 •

edited

Loading

bluegenes commented Mar 17, 2019

meta issue for high-level error checking #110

meta issue for high-level error checking #110

Comments

ctb commented Mar 9, 2019

bluegenes commented Mar 14, 2019 • edited Loading

bluegenes commented Mar 17, 2019

bluegenes commented Mar 14, 2019 •

edited

Loading