Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding kneaddata pipeline #4

Closed
wants to merge 69 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
2ac9903
Initial creation of blank kneaddata scripts
Sep 28, 2016
20495f5
Merge branch 'adding-humann2-pipeline' into adding-kneaddata-pipeline
Sep 29, 2016
b142aa8
Created empty kneaddata command
Sep 29, 2016
6ff7909
Added kneaddata import to __init__.py
Sep 29, 2016
7ee38b6
Added kneaddata default params to __init__.py
Sep 29, 2016
c12f0d0
Merging Antonio's newest pull request
Sep 29, 2016
370611b
adding kneaddata functionality
Sep 29, 2016
6715d47
Merge branch 'master' of https://github.com/qiita-spots/qp-shotgun in…
Sep 29, 2016
8b5abb3
Added method for matching reads and run prefixes
Sep 29, 2016
978f613
Added tests for make_read_pairs_per_sample
Sep 29, 2016
e65adba
Added generate_kneaddata_commands
Sep 30, 2016
d306b90
Added tests for generate_kneaddata_commands
Sep 30, 2016
4f1cbf4
commented out unused parameter for running TRF
Sep 30, 2016
85a15ee
Fixed flake8 errors
Sep 30, 2016
3c2bbda
Added kneaddata to setup
Sep 30, 2016
caf10fd
Added kneaddata plugin
Sep 30, 2016
38a3724
Typos and flake8
Sep 30, 2016
8aa7664
Fixed plugin name collision
Sep 30, 2016
0415af8
Fixed error in generate_kneaddata_commands
Sep 30, 2016
e7f42ef
Fixed flake8 errors
Sep 30, 2016
47fb1cc
Fixing unit tests
Sep 30, 2016
55c7cb2
Commented out unused commands for flake8
Sep 30, 2016
e6a41a5
Fixing flake8 errors and command tests
Sep 30, 2016
7ba8411
Fixing flake8 errors and command tests
Sep 30, 2016
2a718f5
Moved param string formatting to function
Sep 30, 2016
c24774d
Fixed param string formatting
Sep 30, 2016
c6e7632
Fixed typo
Sep 30, 2016
3817629
Added test for format_kneaddata_params
Sep 30, 2016
1c00529
Fixing flake8 errors
Sep 30, 2016
726c580
Fixing typos
Sep 30, 2016
5c9dade
Fixing typos
Sep 30, 2016
7536a5f
Fixed flake8 errors
Sep 30, 2016
b6452a3
Fixed flake8 errors
Sep 30, 2016
6bacd88
Fixed merge of antonios humann2 pull request
Oct 3, 2016
2e953f1
Added install info for FastQC and KneadData
Oct 3, 2016
25f47a5
Fixed dependency link?
Oct 3, 2016
0d94279
Merge branch 'humann2-full-pipeline' into adding-kneaddata-pipeline
Oct 3, 2016
2b3abf7
Fixed typo
Oct 3, 2016
a86586d
Upgrade pip
Oct 3, 2016
7b18a9f
Trying to process dependency links
Oct 3, 2016
4d6498d
Declared egg ver in dep link
Oct 3, 2016
87f1caf
Added trimmomatic and bowtie2 install code
Oct 11, 2016
416f407
Adding Trimmomatic install
Oct 16, 2016
13fdbb3
Merged with Antonio's most recent PR
Oct 16, 2016
31bbe78
changed kneaddata dependency install to .travis
Oct 16, 2016
440bac9
Commented out Metaphlan install for travis testing
Oct 16, 2016
4f56331
Fixed bowtie2 conda install
Oct 16, 2016
74ad741
Fixed bowtie2 conda install
Oct 16, 2016
5dea337
Adding kneaddata functionality
Oct 17, 2016
482703d
Adding tests for kneaddata functionality
Oct 17, 2016
36a6f31
Updating Trimmomatic install info
Oct 17, 2016
6b5bb74
Fixed KneadData execution
Oct 17, 2016
9c6635a
Fixed KneadData execution
Oct 17, 2016
6d03d18
Added some more fastq test files
Oct 17, 2016
af9278a
Fixed local db reference
Oct 17, 2016
8b5fc09
Flake8
Oct 17, 2016
a22922d
hiding humann2 tests from nose
Oct 17, 2016
cd5f23f
Fixed to handle single read case in artifact generation
Oct 18, 2016
8457f93
Fixed merge conflicts with master
Oct 18, 2016
6f0f4a6
Updated unit tests
Oct 18, 2016
830bd81
Addressing @wasade's comments
Oct 18, 2016
6faf771
Please work jarvis
Oct 19, 2016
04533eb
Please work jarvis
Oct 19, 2016
d0ec922
flake8
Oct 19, 2016
1448263
Fixed merge conflicts with upstream master
Oct 19, 2016
40e9ed7
Adding some comments about read pairing
Oct 20, 2016
5167ad4
Removing humann2 test masking from .travis.yml
Oct 20, 2016
d76b614
flake8
Oct 20, 2016
ea6dfff
Merging Antonios humann2 code
Oct 25, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion qp_shotgun/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@
from qiita_client import QiitaPlugin, QiitaCommand

from .humann2.humann2 import humann2
from .kneaddata.kneaddata import kneaddata

__all__ = ['humann2']
__all__ = ['humann2', 'kneaddata']


# Initialize the plugin
Expand Down
84 changes: 84 additions & 0 deletions qp_shotgun/kneaddata/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# -----------------------------------------------------------------------------
# Copyright (c) 2014--, The Qiita Development Team.
#
# Distributed under the terms of the BSD 3-clause License.
#
# The full license is in the file LICENSE, distributed with this software.
# -----------------------------------------------------------------------------


# Initialize the plugin
plugin = QiitaPlugin(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is going to work. Each repository should have a single "QiitaPlugin" instance. In case that we want to have multiple "QiitaPlugin" instances we need a configure and a start_* script for each of them, in which case I don't see the benefit of having all these tools in a single repository. I think this can be added as a command to the shotgun plugin.

'KneadData', '0.5.1', 'KneadData is a tool designed to perform quality '
'control on metagenomic and metatranscriptomic sequencing data, '
'especially data from microbiome experiments.')

# Define the HUMAnN2 command
req_params = {'input': ('artifact', ['per_sample_FASTQ'])}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we specify that it needs forward and (or or) reverse fastqs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code will always return all the available ones, in your code you need to check if they exist, for example this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I get it... Currently this check occurs later on in processing.

opt_params = {
# there are other parameters not included that will be ignored in this
# configuration as we assume that the correct values were set in the env
# by the admin installing the tools:
# trimmomatic
# bowtie2

# --input # input FASTQ file (add a second argument instance to run with paired input files)
# --output # directory to write output files
# --output-prefix # prefix for all output files [ DEFAULT : $SAMPLE_kneaddata ]
# --log # filepath for log [ DEFAULT : $OUTPUT_DIR/$SAMPLE_kneaddata.log ]
# --trimmomatic # path to trimmomatic executable
# --bowtie2 # path to bowtie executable
# --bmtagger # path to bmtagger exectuable
# --trf # path to TRF executable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how these executable paths get specified. I assume that's what line 19-23 are referring to, and that we just need to have an executable in the $PATH?

'reference-db': ['choice:["human_genome"]', 'human_genome'], # ref db
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we may want to specify additional reference DBs by including multiple iterations of this flag, one per ref (examples might include PhiX + Human + Mouse). How can we make that work?

'bypass-trim': ['bool', 'False'], # bypass the trim step
'threads': ['integer', '1'], # threads to run
'processes': ['integer', '1'], # processes to run
'quality-scores': ['choice:["phred33","phred64"]', 'phred33'], # quality mapping
'run-bmtagger': ['bool', 'False'], # run BMTagger instead of Bowtie2
'run-trf': ['bool', 'False'], # run TRF repeat finder tool
'run-fastqc-start': ['bool', 'True'], # run FastQC on original data
'run-fastqc-end': ['bool', 'True'], # run FastQC on filtered data
'store-temp-output': ['bool', 'False'], # store temp output files
'log-level': ['choice:["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]',
'DEBUG'],

# Trimmomatic options
'max-memory': ['integer', '500'], # max memory in mb [ DEFAULT : 500 ]
'trimmomatic-options': ['string', 'ILLUMINACLIP:$trimmomatic/adapters/'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on a hardcoded filepath. How can we specify this given this option formatting?

'TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 '
'SLIDINGWINDOW:4:15 MINLEN:36'],

# Bowtie2 options
'bowtie2-options': ['string', '--very-sensitive']

# BMTagger options

# TRF options
'match': ['integer', '2'], # matching weight
'mismatch': ['integer', '7'], # mismatching penalty
'delta': ['integer', '7'], # indel penalty
'pm': ['integer', '80'], # match probability
'pi': ['integer', '10'], # indel probability
'minscore': ['integer', '50'], # mimimum alignment score to report
'maxperiod': ['integer', '500'] # maximum period size to report

# FastQC options
}
outputs = {'per_sample_FASTQ': 'per_sample_FASTQ'}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KneadData should produce a lot of outputs, and the quantity will depend on the other options specified (e.g. if you provide multiple reference dbs, or skip fastQC). how can we deal with this?

dflt_param_set = {
'Defaults': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be updated accordingly as the parameters above get removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

'reference-db': 'human_genome', 'bypass-trim': False, 'threads': 1,
'processes': 1, 'quality-scores': 'phred33', 'run-bmtagger': False,
'run-trf': False, 'run-fastqc-start': True, 'run-fastqc-end': True,
'store-temp-output': False, 'log-level': 'DEBUG', 'max-memory': 500,
'trimmomatic-options': 'ILLUMINACLIP:$trimmomatic/adapters/'
'TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 '
'SLIDINGWINDOW:4:15 MINLEN:36',
'bowtie2-options': '--very-sensitive', 'match': 2, 'mismatch': 7,
'delta': 7, 'pm': 80, 'pi': 10, 'minscore': 50, 'maxperiod': '500'}
}
kneaddata_cmd = QiitaCommand(
"KneadData", "Sequence QC", kneaddata, req_params, opt_params,
outputs, dflt_param_set)
plugin.register_command(kneaddata_cmd)
59 changes: 59 additions & 0 deletions qp_shotgun/kneaddata/kneaddata.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# -----------------------------------------------------------------------------
# Copyright (c) 2014--, The Qiita Development Team.
#
# Distributed under the terms of the BSD 3-clause License.
#
# The full license is in the file LICENSE, distributed with this software.
# -----------------------------------------------------------------------------

from os.path import basename, join

from future.utils import viewitems
import pandas as pd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra line



def kneaddata(qclient, job_id, parameters, out_dir):
"""Run kneaddata with the given parameters

Parameters
----------
qclient : tgp.qiita_client.QiitaClient
The Qiita server client
job_id : str
The job id
parameters : dict
The parameter values to run split libraries
out_dir : str
Yhe path to the job's output directory

Returns
-------
bool, list, str
The results of the job
"""
# Step 1 get the rest of the information need to run kneaddata
qclient.update_job_step(job_id, "Step 1 of 3: Collecting information")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 1 of 5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K

artifact_id = parameters['input_data']

# Get the artifact filepath information
artifact_info = qclient.get("/qiita_db/artifacts/%s/" % artifact_id)
fps = artifact_info['files']

# Get the artifact type
artifact_type = artifact_info['type']

# Get the artifact metadata
prep_info = qclient.get('/qiita_db/prep_template/%s/'
% artifact_info['prep_information'][0])
qiime_map = prep_info['qiime-map']

# Step 2 generating command humann2
qclient.update_job_step(job_id, "Step 2 of 3: Generating kneaddata command")

# Step 3 execute humann2: TODO
qclient.update_job_step(job_id, "Step 3 of 3: Executing kneaddata")

artifacts_info = []

return True, artifacts_info, ""
7 changes: 7 additions & 0 deletions qp_shotgun/kneaddata/tests/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# -----------------------------------------------------------------------------
# Copyright (c) 2014--, The Qiita Development Team.
#
# Distributed under the terms of the BSD 3-clause License.
#
# The full license is in the file LICENSE, distributed with this software.
# -----------------------------------------------------------------------------
62 changes: 62 additions & 0 deletions qp_shotgun/kneaddata/tests/test_kneaddata.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# -----------------------------------------------------------------------------
# Copyright (c) 2014--, The Qiita Development Team.
#
# Distributed under the terms of the BSD 3-clause License.
#
# The full license is in the file LICENSE, distributed with this software.
# -----------------------------------------------------------------------------

from unittest import TestCase, main
from os import close, environ
from tempfile import mkstemp
from json import dumps

from qiita_client import QiitaClient

from qp_shotgun.humann2.humann2 import (
get_sample_names_by_run_prefix)

from qp_shotgun.kneaddata.kneaddata import (
)


CLIENT_ID = '19ndkO3oMKsoChjVVWluF7QkxHRfYhTKSFbAVt8IhK7gZgDaO4'
CLIENT_SECRET = ('J7FfQ7CQdOxuKhQAf1eoGgBAE81Ns8Gu3EKaWFm3IO2JKh'
'AmmCWZuabe0O5Mp28s1')


class KneaddataTests(TestCase):
@classmethod
def setUpClass(cls):
server_cert = environ.get('QIITA_SERVER_CERT', None)
cls.qclient = QiitaClient("https://localhost:21174", CLIENT_ID,
CLIENT_SECRET, server_cert=server_cert)
cls.params = {}
cls._clean_up_files = []

@classmethod
def tearDownClass(cls):
cls.qclient.post('/apitest/reset/')



MAPPING_FILE = (
"#SampleID\tplatform\tbarcode\texperiment_design_description\t"
"library_construction_protocol\tcenter_name\tprimer\trun_prefix\t"
"instrument_model\tDescription\n"
"SKB7.640196\tILLUMINA\tA\tA\tA\tANL\tA\ts3\tIllumina MiSeq\tdesc1\n"
"SKB8.640193\tILLUMINA\tA\tA\tA\tANL\tA\ts1\tIllumina MiSeq\tdesc2\n"
"SKD8.640184\tILLUMINA\tA\tA\tA\tANL\tA\ts2\tIllumina MiSeq\tdesc3\n"
)

MAPPING_FILE_2 = (
"#SampleID\tplatform\tbarcode\texperiment_design_description\t"
"library_construction_protocol\tcenter_name\tprimer\t"
"run_prefix\tinstrument_model\tDescription\n"
"SKB7.640196\tILLUMINA\tA\tA\tA\tANL\tA\ts3\tIllumina MiSeq\tdesc1\n"
"SKB8.640193\tILLUMINA\tA\tA\tA\tANL\tA\ts1\tIllumina MiSeq\tdesc2\n"
"SKD8.640184\tILLUMINA\tA\tA\tA\tANL\tA\ts1\tIllumina MiSeq\tdesc3\n"
)

if __name__ == '__main__':
main()