Skip to content

Commit 363d1eb

Browse files
committed
Various cleanups
1 parent 36ae863 commit 363d1eb

File tree

28 files changed

+492
-44
lines changed

28 files changed

+492
-44
lines changed

assets/blastdb.fasta.gz

-6.18 KB
Binary file not shown.

assets/bloomfilter/phix.bf

6.78 KB
Binary file not shown.

assets/bloomfilter/phix.fasta

+79
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
>NC_001422.1
2+
GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTT
3+
GATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAA
4+
ATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTG
5+
TCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTA
6+
GATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATC
7+
TGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTT
8+
TCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTT
9+
CGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCT
10+
TGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCG
11+
TCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTAC
12+
GGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTA
13+
CGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAG
14+
TGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACT
15+
AAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGC
16+
CCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCA
17+
TCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGAC
18+
TCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTA
19+
CTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAA
20+
GGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTT
21+
GGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACA
22+
ACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGC
23+
TCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTT
24+
TCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGC
25+
ATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAAC
26+
CTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTT
27+
GATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGC
28+
CGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGAC
29+
TAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTG
30+
TATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGT
31+
TTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGA
32+
AGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGAT
33+
TATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTT
34+
ATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGAGTGTGAGGTTATAAC
35+
GCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGC
36+
TTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGT
37+
TCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTA
38+
TATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTG
39+
TCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGC
40+
CTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTG
41+
AATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGC
42+
CGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGT
43+
TTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTG
44+
CTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAA
45+
AGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGCT
46+
GGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTG
47+
GTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGA
48+
TAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTAT
49+
CTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGG
50+
TTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGA
51+
GATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGAC
52+
CAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTA
53+
TGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCA
54+
AACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGAC
55+
TTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTT
56+
CTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGA
57+
TACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCG
58+
TCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTT
59+
CTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTAT
60+
TGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGC
61+
ATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATG
62+
TTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGA
63+
ATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGG
64+
GACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCC
65+
CTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATT
66+
GCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAG
67+
GCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTT
68+
ATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCG
69+
CAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGC
70+
CGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTC
71+
GTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCAT
72+
CGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAG
73+
CCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATA
74+
TGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACT
75+
TCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTG
76+
TCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGC
77+
AGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACC
78+
TGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA
79+

assets/bloomfilter/phix.txt

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
[user_input_options]
2+
filter_id=phix
3+
kmer_size=25
4+
desired_false_positve_rate=0.0078125
5+
number_of_hash_functions=7
6+
expected_num_entries=5362
7+
sequence_sources=phix.fasta
8+
9+
[runtime_options]
10+
size=54208
11+
num_entries=5357
12+
approximate_false_positive_rate=0.00773732
13+
redundant_sequences=5
14+
redundant_fpr=0.0012635

assets/genomes/tomato/amplicon.txt

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
CGAACCCTAGCAGATCGTCT TCAAAACAACCATTAATCCTTCCCT 162 IL.S.tSIGAD3
2+
AAGACAATAGCCTCCACAACG AGTCAGTACAAGACATAATAATACAAAGAG 438 N028_SiGAD3_N-term-seq2
3+
AGGGATATCGAAATGTAATGGAAAATTG CAATTCAATAGAACAAAGGATGATACATTC 510 N029_SiGAD3_N-term-seq1

assets/genomes/tomato/primers.fa

+2-3
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ AGGGATATCGAAATGTAATGGAAAATTG
77
>N029_SiGAD3_N-term-seq1_rv
88
CAATTCAATAGAACAAAGGATGATACATTC
99
>IL.S.t.SIGAD3.5s
10-
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGcgaaccctagcagatcgtct
10+
CGAACCCTAGCAGATCGTCT
1111
>IL.S.t.SIGAD3.4as
12-
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGtcaaaacaaccattaatccttccct
13-
12+
TCAAAACAACCATTAATCCTTCCCT

assets/genomes/tomato/rules.json

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"rules": {
3+
"vsearch-blast": {
4+
"payload": [
5+
{
6+
"format": "XML",
7+
"target": "SiGAD3|NM_001246898.2",
8+
"matcher": "AAAG-TGGA",
9+
"yields": "Diese Probe enthält eine GABA Mutation in SIGAD3. Nachweis erbraucht über: Amplicon Analyse."
10+
}
11+
]
12+
13+
},
14+
"bwa-freebayes": {
15+
"payload": [
16+
{
17+
"format": "VCF",
18+
"target": "1:14834",
19+
"matcher": "1\t14834\t.\tGTG\tGTTG",
20+
"yields": "Diese Probe enthält eine GABA Mutation in SIGAD3. Nachweis erbracht über: Varianten Analyse."
21+
}
22+
]
23+
}
24+
}
25+
}

assets/genomes/tomato/targets.bed

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
1 14784 14946 SIGAD3

bin/analyze_blast.rb

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/bin/env ruby
2+
3+
require 'optparse'
4+
require 'ostruct'
5+
require 'nokogiri'
6+
7+
### Define modules and classes here
8+
9+
### Get the script arguments and open relevant files
10+
options = OpenStruct.new()
11+
opts = OptionParser.new()
12+
opts.banner = "Reads Fastq files from a folder and writes a sample sheet to STDOUT"
13+
opts.separator ""
14+
opts.on("-b","--blast", "=BLAST","Blast report to read") {|argument| options.vcf = argument }
15+
opts.on("-j","--json", "=JSON","JSON to read") {|argument| options.json = argument }
16+
opts.on("-h","--help","Display the usage information") {
17+
puts opts
18+
exit
19+
}
20+
21+
opts.parse!
22+
23+
date = Time.now.strftime("%Y-%m-%d")
24+
25+
file = File.open(options.blast)
26+
27+
xml = Nokogiri::XML(file)
28+

bin/analyze_vcf.rb

+123
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
#!/bin/env ruby
2+
3+
require 'optparse'
4+
require 'ostruct'
5+
require 'json'
6+
7+
### Define modules and classes here
8+
9+
class VCFEntry
10+
11+
attr_accessor :seq, :pos, :id, :ref, :alt, :qual, :filter, :info, :format, :samples, :sample_names
12+
13+
def initialize(string,header)
14+
# #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S100
15+
elements = string.strip.split("\t")
16+
@seq,@pos,@id,@ref,@alt,@qual,@filter,info,format, = elements[0..8]
17+
@info = {}
18+
info.split(";").each do |i|
19+
key,value = i.split("=")
20+
@info[key] = value
21+
end
22+
@format = format.split(":")
23+
24+
@samples = []
25+
@sample_names = header[9..-1]
26+
elements[9..-1].each_with_index do |sample,i|
27+
sample_elements = sample.split(":")
28+
sample_data = {}
29+
@format.each_with_index do |k,i|
30+
val = sample_elements[i]
31+
sample_data[k] = val
32+
end
33+
@samples << sample_data
34+
end
35+
end
36+
37+
def allele_string
38+
return "#{self.seq}\t#{self.pos}\t.\t#{self.ref}\t#{self.alt}"
39+
end
40+
41+
end
42+
43+
def parse_vcf(file)
44+
45+
data = []
46+
47+
header = []
48+
49+
vcf = File.open(file)
50+
51+
while (line = vcf.gets)
52+
53+
next if line.match(/^##.*/)
54+
55+
if line.match(/^#CHROM.*/)
56+
header = line.split("\t").collect{|k| k.strip }
57+
else
58+
entry = VCFEntry.new(line,header)
59+
data << entry
60+
end
61+
end
62+
63+
vcf.close
64+
65+
return data
66+
end
67+
68+
### Get the script arguments and open relevant files
69+
options = OpenStruct.new()
70+
opts = OptionParser.new()
71+
opts.banner = "Reads Fastq files from a folder and writes a sample sheet to STDOUT"
72+
opts.separator ""
73+
opts.on("-v","--vcf", "=VCF","VCF to read") {|argument| options.vcf = argument }
74+
opts.on("-j","--json", "=JSON","JSON to read") {|argument| options.json = argument }
75+
opts.on("-h","--help","Display the usage information") {
76+
puts opts
77+
exit
78+
}
79+
80+
opts.parse!
81+
82+
date = Time.now.strftime("%Y-%m-%d")
83+
84+
json = JSON.parse(IO.readlines(options.json).join)
85+
86+
rules = json["rules"]["bwa-freebayes"]["payload"]
87+
88+
vcf = parse_vcf(options.vcf)
89+
90+
vcf.each do |entry|
91+
92+
allele = entry.allele_string
93+
94+
sample_name = entry.sample_names[0]
95+
puts ">>>" + sample_name + "<<<"
96+
97+
has_matched = false
98+
99+
rules.each do |rule|
100+
string = rule["matcher"]
101+
if string == allele
102+
has_matched = true
103+
104+
puts rule["yields"]
105+
sample = entry.samples[0]
106+
genotype = sample["GT"]
107+
if genotype == "0/0"
108+
puts "Varianten Frequenz unter Detektierungsschwelle!"
109+
end
110+
rcov,acov = sample["AD"].split(",")
111+
perc = (acov.to_f / rcov.to_f)*100.0
112+
puts "\tGenotyp: #{sample["GT"]}\tAnteil: #{perc.round(2)}%\tRef: #{rcov}\tAlt: #{acov}\t"
113+
114+
end
115+
end
116+
117+
if !has_matched
118+
puts "Keine GABA Mutation nachgewiesen!"
119+
end
120+
121+
puts "==============================================================================="
122+
123+
end

conf/resources.config

+3-1
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,11 @@ params {
66
genomes {
77
tomato {
88
fasta = "/home/marc/projects/gaba/references/solanum_lycopersicum.fa"
9+
fai = "/home/marc/projects/gaba/references/solanum_lycopersicum.fa.fai"
910
dict = "/home/marc/projects/gaba/references/solanum_lycopersicum.dict"
10-
primers = "${baseDir}/assets/genomes/tomato/primers.fa"
11+
amplicon_txt = "${baseDir}/assets/genomes/tomato/amplicon.txt"
1112
bed = "${baseDir}/assets/genomes/tomato/primers.bed"
13+
url = "https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-58/fasta/solanum_lycopersicum/dna/Solanum_lycopersicum.SL3.0.dna.toplevel.fa.gz"
1214
}
1315
}
1416
}

main.nf

+2-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ WorkflowMain.initialise(workflow, params, log)
2727
// TODO: Rename this and the file under lib/ to something matching this pipeline (e.g. WorkflowAmplicons)
2828
WorkflowPipeline.initialise(params, log)
2929

30-
include { GMO } from './workflows/gmo'
30+
include { GMO } from './workflows/gmo'
31+
//include { BUILD_REFERENCES } from './workflows/build_references'
3132

3233
multiqc_report = Channel.from([])
3334

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
process BIOBLOOMTOOLS_CATEGORIZER {
2+
3+
publishDir "${params.outdir}/Bloomfilter", mode: 'copy'
4+
5+
label 'short_parallel'
6+
7+
tag "${meta.sample_id}|${meta.library_id}|${meta.readgroup_id}"
8+
9+
conda 'bioconda::biobloomtools=2.3.5'
10+
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
11+
'https://depot.galaxyproject.org/singularity/biobloomtools:2.3.5--h4056dc3_2' :
12+
'quay.io/biocontainers/biobloomtools:2.3.5--h4056dc3_2' }"
13+
14+
input:
15+
tuple val(meta), path(r1), path(r2)
16+
17+
output:
18+
tuple val(meta), path(r1_trim), path(r2_trim), emit: reads
19+
path('versions.yml'), emit: versions
20+
path("*summary.tsv"), emit: results
21+
22+
script:
23+
filtered = meta.sample_id + "_" + meta.library_id + "_" + meta.readgroup_id
24+
r1_trim = filtered + "_noMatch_1.fq.gz"
25+
r2_trim = filtered + "_noMatch_2.fq.gz"
26+
27+
"""
28+
biobloomcategorizer -p $filtered -t ${task.cpus} -n --fq --gz_out -i -e -f "${params.bloomfilter}" $r1 $r2
29+
30+
cat <<-END_VERSIONS > versions.yml
31+
"${task.process}":
32+
BioBloomtools: \$(biobloomcategorizer -version 2>&1 | head -n1 | sed -e "s/.*) //g")
33+
END_VERSIONS
34+
35+
"""
36+
}

modules/blast/blastn/main.nf

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ process BLAST_BLASTN {
1717
path("versions.yml"), emit: versions
1818

1919
script:
20-
blastout = meta.sample_id + '.blast.txt'
20+
blastout = meta.sample_id + '.blast.xml'
2121

2222
"""
2323
DB=`find -L ./ -name "*.nal" | sed 's/\\.nal\$//'`
@@ -29,7 +29,7 @@ process BLAST_BLASTN {
2929
blastn -num_threads ${task.cpus} \
3030
-db \$DB \
3131
-query $fasta \
32-
-outfmt 0 \
32+
-outfmt 5 \
3333
-out $blastout \
3434
-evalue 0.0001
3535

0 commit comments

Comments
 (0)