-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start project with readme, environment, and itial snakemake workflow to execute sourmash commands #1
Conversation
and maybe of interest to @mertcelebi...this is a demo of how I'm expecting code pushes and PRs to look for data analysis projects. |
## Download sourmash databases & taxonomy files | ||
########################################################## | ||
|
||
rule download_genbank_bacteria_zip_k21: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this will change to pulling from S3 once we have the databases bucket figured out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! or be an ifelse/parameterized statement to avoid download. Although...download ended up being less horrible than I thought...I was using some weird links originally and it was taking FOREVER to get the data, but then I updated to these and it was 50mb/s and that was perfectly acceptable. See sourmash-bio/sourmash#2179 and sourmash-bio/sourmash#2136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and the README looks excellent!
Awesome, thanks @elizabethmcd! |
import argparse | ||
|
||
def main(): | ||
p = argparse.ArgumentParser() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might wanna check click for creating command-line runnable scripts! It's slightly neater than argparse
IMO. Or you can try to use the help
option for add_argument
in argparse to clean up the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for @taylorreiter - we use click in many places - e.g. genome-grist link. I prefer argparse for complex situations (b/c it's been around longer, has more stackoverflow answers for oddball things) but click is way friendlier!
- defaults | ||
dependencies: | ||
- sourmash-minimal=4.4.3 | ||
- pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably pin versions of all of these (like you did in environment.yml
)
shell:''' | ||
python scripts/sig_to_csv.py {wildcards.ksize} {input} {output} | ||
''' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is more of an aesthetic thing, but I like the consistency of it. Using a linter (or a text editor plugin), we should make sure there is only a single trailing newline at EOF
This PR is the first PR for this repository. It does two things:
a.
Snakefile
: snakemake workflow that coordinates the execution of sourmash commands on metagenome assemblies. I have run this workflow and can confirm it runs correctly :) Eventually, I will add notebooks that will visualize the output of the workflow, but I wanted to have this portion reviewed before dumping a bunch more code.b.
environment.yml
: specifies the run environment for the workflow. SeeREADME.md
for more information.c.
envs/*yml
: environments created and managed by the snakefile (see theconda:
directive in each rule to know which environment is used by each step of the workflow.d.
scripts/
: folder for auxiliary scripts executed by the snakemake workflow. In this case, it only includessig_to_csv.py
, a python script to convert a sourmash sketch into a csv file.e.
inputs/metadata.csv
: metadata file encoding sample names. Used by the snakefile to determine file prefixes.