-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix assemblyinput for ppl specifying out_path #96
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see comment)
This PR has been reviewed, but the review needs a review before I can finish my review |
if assemblyfile: | ||
assert os.path.exists(assemblyfile), 'Error: cannot find input assembly at {}\n'.format(assemblyfile) | ||
sys.stderr.write('\tFound input assembly at {}\n'.format(assemblyfile)) | ||
assemblyfile = os.path.realpath(assemblyfile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesreid1 ah! here we go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm. may still have an issue finding it to begin with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but! this is a parameter that the user is specifying - I think we need to trust that they give us the correct path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about when you're running tests? e.g., testing the salmon rule requires an assembly file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently do os.chdir
into the test directory within the rule folder. I followed the snakemake-wrappers example and put some test data into the rule folder. Thinking now this isn't a great plan, because it will get large. Most programs can use trivial fq files, but salmon doesn't run, so I used these https://bitbucket.org/snakemake/snakemake-wrappers/src/5a5bd45590896a7c7ca00bd6d558cdf40bc78c20/bio/salmon/quant/test/?at=master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you want users running tests like this, but if I try running a test directly (not through pytest), I run into the problem of not finding assembly data:
$ ./run_eelpond rules/salmon/test/test.yml get_data
--------
checking for required files:
--------
/temp/eelpond/ep_utils/utils.py:34: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
yamlD = yaml.load(stream)
Traceback (most recent call last):
File "./run_eelpond", line 356, in <module>
sys.exit(main(args))
File "./run_eelpond", line 169, in main
configD, assemblyinput_ext = handle_assemblyinput(assembInput, configD)
File "/temp/eelpond/ep_utils/utils.py", line 80, in handle_assemblyinput
assert os.path.exists(assemblyfile), 'Error: cannot find input assembly at {}\n'.format(assemblyfile)
AssertionError: Error: cannot find input assembly at assembly/transcriptome.fasta
Note that if #116 (packaging elvers) is merged, the rules and their test data will be included in the manifest, so they'll be installed as part of installing the package. That means you could add some logic around finding the assembly files to either use the relative directory (if it exists) or try and put together an absolute directory (prefixed by the installation location).
(But then again, maybe having the user run tests directly (like the above) is weird - not sure what makes sense here and don't want to make extra work for you.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I think adding some logic after #116 that makes this work would be nice, just to facilitate things, but not sure I care to support it just yet? Thoughts on the test data inclusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the solution might be to use get_data
to download any data files that are not trivial? e.g. include trivial files, which should make the short
dryrun version of tests work. Then, for programs like salmon
, which need longer files to actually run, download the data for the long test (use a different yml
file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this: single test
dir (rules/test
, with trivial fq.gz
and .fa
files. write these into short_test.yml
. Then upload larger test data, and provide a long_test.yml
with a tsv
containing links to the larger data. When testing, include short or long yml
as an --extra_config
.
With
out_path
, if the user is starting from anassemblyinput
point, we need to build the outdir before attempting to copy to it.