-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding kneaddata pipeline #4
Changes from 5 commits
2ac9903
20495f5
b142aa8
6ff7909
7ee38b6
c12f0d0
370611b
6715d47
8b5abb3
978f613
e65adba
d306b90
4f1cbf4
85a15ee
3c2bbda
caf10fd
38a3724
8aa7664
0415af8
e7f42ef
47fb1cc
55c7cb2
e6a41a5
7ba8411
2a718f5
c24774d
c6e7632
3817629
1c00529
726c580
5c9dade
7536a5f
b6452a3
6bacd88
2e953f1
25f47a5
0d94279
2b3abf7
a86586d
7b18a9f
4d6498d
87f1caf
416f407
13fdbb3
31bbe78
440bac9
4f56331
74ad741
5dea337
482703d
36a6f31
6b5bb74
9c6635a
6d03d18
af9278a
8b5fc09
a22922d
cd5f23f
8457f93
6f0f4a6
830bd81
6faf771
04533eb
d0ec922
1448263
40e9ed7
5167ad4
d76b614
ea6dfff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (c) 2014--, The Qiita Development Team. | ||
# | ||
# Distributed under the terms of the BSD 3-clause License. | ||
# | ||
# The full license is in the file LICENSE, distributed with this software. | ||
# ----------------------------------------------------------------------------- | ||
|
||
|
||
# Initialize the plugin | ||
plugin = QiitaPlugin( | ||
'KneadData', '0.5.1', 'KneadData is a tool designed to perform quality ' | ||
'control on metagenomic and metatranscriptomic sequencing data, ' | ||
'especially data from microbiome experiments.') | ||
|
||
# Define the HUMAnN2 command | ||
req_params = {'input': ('artifact', ['per_sample_FASTQ'])} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we specify that it needs forward and (or or) reverse fastqs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code will always return all the available ones, in your code you need to check if they exist, for example this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok I get it... Currently this check occurs later on in processing. |
||
opt_params = { | ||
# there are other parameters not included that will be ignored in this | ||
# configuration as we assume that the correct values were set in the env | ||
# by the admin installing the tools: | ||
# trimmomatic | ||
# bowtie2 | ||
|
||
# --input # input FASTQ file (add a second argument instance to run with paired input files) | ||
# --output # directory to write output files | ||
# --output-prefix # prefix for all output files [ DEFAULT : $SAMPLE_kneaddata ] | ||
# --log # filepath for log [ DEFAULT : $OUTPUT_DIR/$SAMPLE_kneaddata.log ] | ||
# --trimmomatic # path to trimmomatic executable | ||
# --bowtie2 # path to bowtie executable | ||
# --bmtagger # path to bmtagger exectuable | ||
# --trf # path to TRF executable | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure how these executable paths get specified. I assume that's what line 19-23 are referring to, and that we just need to have an executable in the $PATH? |
||
'reference-db': ['choice:["human_genome"]', 'human_genome'], # ref db | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we may want to specify additional reference DBs by including multiple iterations of this flag, one per ref (examples might include PhiX + Human + Mouse). How can we make that work? |
||
'bypass-trim': ['bool', 'False'], # bypass the trim step | ||
'threads': ['integer', '1'], # threads to run | ||
'processes': ['integer', '1'], # processes to run | ||
'quality-scores': ['choice:["phred33","phred64"]', 'phred33'], # quality mapping | ||
'run-bmtagger': ['bool', 'False'], # run BMTagger instead of Bowtie2 | ||
'run-trf': ['bool', 'False'], # run TRF repeat finder tool | ||
'run-fastqc-start': ['bool', 'True'], # run FastQC on original data | ||
'run-fastqc-end': ['bool', 'True'], # run FastQC on filtered data | ||
'store-temp-output': ['bool', 'False'], # store temp output files | ||
'log-level': ['choice:["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]', | ||
'DEBUG'], | ||
|
||
# Trimmomatic options | ||
'max-memory': ['integer', '500'], # max memory in mb [ DEFAULT : 500 ] | ||
'trimmomatic-options': ['string', 'ILLUMINACLIP:$trimmomatic/adapters/' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This depends on a hardcoded filepath. How can we specify this given this option formatting? |
||
'TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 ' | ||
'SLIDINGWINDOW:4:15 MINLEN:36'], | ||
|
||
# Bowtie2 options | ||
'bowtie2-options': ['string', '--very-sensitive'] | ||
|
||
# BMTagger options | ||
|
||
# TRF options | ||
'match': ['integer', '2'], # matching weight | ||
'mismatch': ['integer', '7'], # mismatching penalty | ||
'delta': ['integer', '7'], # indel penalty | ||
'pm': ['integer', '80'], # match probability | ||
'pi': ['integer', '10'], # indel probability | ||
'minscore': ['integer', '50'], # mimimum alignment score to report | ||
'maxperiod': ['integer', '500'] # maximum period size to report | ||
|
||
# FastQC options | ||
} | ||
outputs = {'per_sample_FASTQ': 'per_sample_FASTQ'} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. KneadData should produce a lot of outputs, and the quantity will depend on the other options specified (e.g. if you provide multiple reference dbs, or skip fastQC). how can we deal with this? |
||
dflt_param_set = { | ||
'Defaults': { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will need to be updated accordingly as the parameters above get removed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup. |
||
'reference-db': 'human_genome', 'bypass-trim': False, 'threads': 1, | ||
'processes': 1, 'quality-scores': 'phred33', 'run-bmtagger': False, | ||
'run-trf': False, 'run-fastqc-start': True, 'run-fastqc-end': True, | ||
'store-temp-output': False, 'log-level': 'DEBUG', 'max-memory': 500, | ||
'trimmomatic-options': 'ILLUMINACLIP:$trimmomatic/adapters/' | ||
'TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 ' | ||
'SLIDINGWINDOW:4:15 MINLEN:36', | ||
'bowtie2-options': '--very-sensitive', 'match': 2, 'mismatch': 7, | ||
'delta': 7, 'pm': 80, 'pi': 10, 'minscore': 50, 'maxperiod': '500'} | ||
} | ||
kneaddata_cmd = QiitaCommand( | ||
"KneadData", "Sequence QC", kneaddata, req_params, opt_params, | ||
outputs, dflt_param_set) | ||
plugin.register_command(kneaddata_cmd) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (c) 2014--, The Qiita Development Team. | ||
# | ||
# Distributed under the terms of the BSD 3-clause License. | ||
# | ||
# The full license is in the file LICENSE, distributed with this software. | ||
# ----------------------------------------------------------------------------- | ||
|
||
from os.path import basename, join | ||
|
||
from future.utils import viewitems | ||
import pandas as pd | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove extra line |
||
|
||
|
||
def kneaddata(qclient, job_id, parameters, out_dir): | ||
"""Run kneaddata with the given parameters | ||
|
||
Parameters | ||
---------- | ||
qclient : tgp.qiita_client.QiitaClient | ||
The Qiita server client | ||
job_id : str | ||
The job id | ||
parameters : dict | ||
The parameter values to run split libraries | ||
out_dir : str | ||
Yhe path to the job's output directory | ||
|
||
Returns | ||
------- | ||
bool, list, str | ||
The results of the job | ||
""" | ||
# Step 1 get the rest of the information need to run kneaddata | ||
qclient.update_job_step(job_id, "Step 1 of 3: Collecting information") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. K |
||
artifact_id = parameters['input_data'] | ||
|
||
# Get the artifact filepath information | ||
artifact_info = qclient.get("/qiita_db/artifacts/%s/" % artifact_id) | ||
fps = artifact_info['files'] | ||
|
||
# Get the artifact type | ||
artifact_type = artifact_info['type'] | ||
|
||
# Get the artifact metadata | ||
prep_info = qclient.get('/qiita_db/prep_template/%s/' | ||
% artifact_info['prep_information'][0]) | ||
qiime_map = prep_info['qiime-map'] | ||
|
||
# Step 2 generating command humann2 | ||
qclient.update_job_step(job_id, "Step 2 of 3: Generating kneaddata command") | ||
|
||
# Step 3 execute humann2: TODO | ||
qclient.update_job_step(job_id, "Step 3 of 3: Executing kneaddata") | ||
|
||
artifacts_info = [] | ||
|
||
return True, artifacts_info, "" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (c) 2014--, The Qiita Development Team. | ||
# | ||
# Distributed under the terms of the BSD 3-clause License. | ||
# | ||
# The full license is in the file LICENSE, distributed with this software. | ||
# ----------------------------------------------------------------------------- |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (c) 2014--, The Qiita Development Team. | ||
# | ||
# Distributed under the terms of the BSD 3-clause License. | ||
# | ||
# The full license is in the file LICENSE, distributed with this software. | ||
# ----------------------------------------------------------------------------- | ||
|
||
from unittest import TestCase, main | ||
from os import close, environ | ||
from tempfile import mkstemp | ||
from json import dumps | ||
|
||
from qiita_client import QiitaClient | ||
|
||
from qp_shotgun.humann2.humann2 import ( | ||
get_sample_names_by_run_prefix) | ||
|
||
from qp_shotgun.kneaddata.kneaddata import ( | ||
) | ||
|
||
|
||
CLIENT_ID = '19ndkO3oMKsoChjVVWluF7QkxHRfYhTKSFbAVt8IhK7gZgDaO4' | ||
CLIENT_SECRET = ('J7FfQ7CQdOxuKhQAf1eoGgBAE81Ns8Gu3EKaWFm3IO2JKh' | ||
'AmmCWZuabe0O5Mp28s1') | ||
|
||
|
||
class KneaddataTests(TestCase): | ||
@classmethod | ||
def setUpClass(cls): | ||
server_cert = environ.get('QIITA_SERVER_CERT', None) | ||
cls.qclient = QiitaClient("https://localhost:21174", CLIENT_ID, | ||
CLIENT_SECRET, server_cert=server_cert) | ||
cls.params = {} | ||
cls._clean_up_files = [] | ||
|
||
@classmethod | ||
def tearDownClass(cls): | ||
cls.qclient.post('/apitest/reset/') | ||
|
||
|
||
|
||
MAPPING_FILE = ( | ||
"#SampleID\tplatform\tbarcode\texperiment_design_description\t" | ||
"library_construction_protocol\tcenter_name\tprimer\trun_prefix\t" | ||
"instrument_model\tDescription\n" | ||
"SKB7.640196\tILLUMINA\tA\tA\tA\tANL\tA\ts3\tIllumina MiSeq\tdesc1\n" | ||
"SKB8.640193\tILLUMINA\tA\tA\tA\tANL\tA\ts1\tIllumina MiSeq\tdesc2\n" | ||
"SKD8.640184\tILLUMINA\tA\tA\tA\tANL\tA\ts2\tIllumina MiSeq\tdesc3\n" | ||
) | ||
|
||
MAPPING_FILE_2 = ( | ||
"#SampleID\tplatform\tbarcode\texperiment_design_description\t" | ||
"library_construction_protocol\tcenter_name\tprimer\t" | ||
"run_prefix\tinstrument_model\tDescription\n" | ||
"SKB7.640196\tILLUMINA\tA\tA\tA\tANL\tA\ts3\tIllumina MiSeq\tdesc1\n" | ||
"SKB8.640193\tILLUMINA\tA\tA\tA\tANL\tA\ts1\tIllumina MiSeq\tdesc2\n" | ||
"SKD8.640184\tILLUMINA\tA\tA\tA\tANL\tA\ts1\tIllumina MiSeq\tdesc3\n" | ||
) | ||
|
||
if __name__ == '__main__': | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is going to work. Each repository should have a single "QiitaPlugin" instance. In case that we want to have multiple "QiitaPlugin" instances we need a configure and a start_* script for each of them, in which case I don't see the benefit of having all these tools in a single repository. I think this can be added as a command to the shotgun plugin.