fix assemblyinput for ppl specifying out_path #96

bluegenes · 2019-03-01T05:07:44Z

With out_path, if the user is starting from an assemblyinput point, we need to build the outdir before attempting to copy to it.

charlesreid1

(see comment)

rules/utils/assemblyinput.rule

charlesreid1 · 2019-03-11T22:40:38Z

This PR has been reviewed, but the review needs a review before I can finish my review

bluegenes · 2019-03-14T21:19:02Z

ep_utils/utils.py

+    if assemblyfile:
+        assert os.path.exists(assemblyfile), 'Error: cannot find input assembly at {}\n'.format(assemblyfile)
+        sys.stderr.write('\tFound input assembly at {}\n'.format(assemblyfile))
+        assemblyfile = os.path.realpath(assemblyfile)


@charlesreid1 ah! here we go

hm. may still have an issue finding it to begin with?

but! this is a parameter that the user is specifying - I think we need to trust that they give us the correct path?

What about when you're running tests? e.g., testing the salmon rule requires an assembly file.

I currently do os.chdir into the test directory within the rule folder. I followed the snakemake-wrappers example and put some test data into the rule folder. Thinking now this isn't a great plan, because it will get large. Most programs can use trivial fq files, but salmon doesn't run, so I used these https://bitbucket.org/snakemake/snakemake-wrappers/src/5a5bd45590896a7c7ca00bd6d558cdf40bc78c20/bio/salmon/quant/test/?at=master

trivial trimmomatic test data we could use https://bitbucket.org/snakemake/snakemake-wrappers/src/5a5bd45590896a7c7ca00bd6d558cdf40bc78c20/bio/trimmomatic/pe/test/reads/a.1.fastq?at=master&fileviewer=file-view-default

Not sure if you want users running tests like this, but if I try running a test directly (not through pytest), I run into the problem of not finding assembly data:

$ ./run_eelpond rules/salmon/test/test.yml get_data -------- checking for required files: -------- /temp/eelpond/ep_utils/utils.py:34: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. yamlD = yaml.load(stream) Traceback (most recent call last): File "./run_eelpond", line 356, in <module> sys.exit(main(args)) File "./run_eelpond", line 169, in main configD, assemblyinput_ext = handle_assemblyinput(assembInput, configD) File "/temp/eelpond/ep_utils/utils.py", line 80, in handle_assemblyinput assert os.path.exists(assemblyfile), 'Error: cannot find input assembly at {}\n'.format(assemblyfile) AssertionError: Error: cannot find input assembly at assembly/transcriptome.fasta

Note that if #116 (packaging elvers) is merged, the rules and their test data will be included in the manifest, so they'll be installed as part of installing the package. That means you could add some logic around finding the assembly files to either use the relative directory (if it exists) or try and put together an absolute directory (prefixed by the installation location).

(But then again, maybe having the user run tests directly (like the above) is weird - not sure what makes sense here and don't want to make extra work for you.)

hmm I think adding some logic after #116 that makes this work would be nice, just to facilitate things, but not sure I care to support it just yet? Thoughts on the test data inclusion?

I think the solution might be to use get_data to download any data files that are not trivial? e.g. include trivial files, which should make the short dryrun version of tests work. Then, for programs like salmon, which need longer files to actually run, download the data for the long test (use a different yml file).

something like this: single test dir (rules/test, with trivial fq.gz and .fa files. write these into short_test.yml. Then upload larger test data, and provide a long_test.yml with a tsv containing links to the larger data. When testing, include short or long yml as an --extra_config.

fix assemblyinput for ppl specifying out_path

2e5ce1d

bluegenes assigned charlesreid1 Mar 8, 2019

bluegenes added the ready for review label Mar 8, 2019

charlesreid1 requested changes Mar 8, 2019

View reviewed changes

rules/utils/assemblyinput.rule Show resolved Hide resolved

charlesreid1 self-requested a review March 11, 2019 22:39

bluegenes and others added 4 commits March 12, 2019 11:49

make output dirs when using out_path; use realpaths for assemblyinput

bca3070

propagate gene_trans_map info to deseq2

fd439a3

add thought on gtmap file check

aa07cf6

Merge branch 'master' into fix_ainput_err

ee8c949

bluegenes mentioned this pull request Mar 13, 2019

[MRG] Add rule testing framework #99

Merged

bluegenes commented Mar 14, 2019

View reviewed changes

charlesreid1 approved these changes Mar 14, 2019

View reviewed changes

bluegenes merged commit 83abf82 into master Mar 14, 2019

bluegenes deleted the fix_ainput_err branch March 14, 2019 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix assemblyinput for ppl specifying out_path #96

fix assemblyinput for ppl specifying out_path #96

bluegenes commented Mar 1, 2019

charlesreid1 left a comment

charlesreid1 commented Mar 11, 2019

bluegenes Mar 14, 2019

bluegenes Mar 14, 2019 •

edited

Loading

bluegenes Mar 14, 2019

charlesreid1 Mar 14, 2019

bluegenes Mar 14, 2019 •

edited

Loading

bluegenes Mar 14, 2019

charlesreid1 Mar 14, 2019

bluegenes Mar 14, 2019

bluegenes Mar 14, 2019

bluegenes Mar 14, 2019

fix assemblyinput for ppl specifying out_path #96

fix assemblyinput for ppl specifying out_path #96

Conversation

bluegenes commented Mar 1, 2019

charlesreid1 left a comment

Choose a reason for hiding this comment

charlesreid1 commented Mar 11, 2019

Choose a reason for hiding this comment

bluegenes Mar 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluegenes Mar 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluegenes Mar 14, 2019 •

edited

Loading

bluegenes Mar 14, 2019 •

edited

Loading