Skip to content

Commit a327718

Browse files
fix: order (#161)
* fix: removed superflous conditional * feat: slide set enriched with shell intro * feat: added background image * feat: renamed Snakefile 01 - better semantic mapping * fix: header reflecting the fact that the shell command has been introduced * fix: name change now in slide * feat: 03 Snakefile to 03_Snakefile_wildcard to get better sematic overview * feat: 04_Snakefile to 04_Snakefile_new_rule to get better sematic overview * fix: wildcard typesetting as verbatim, where a simple string was used * fix: replaced 'cloze text' with the more familiar 'example' * fix: comment about the dash in the samtools view command. * fix: every 'in- and output' to 'input and output' for clarity * fix: comment on line breaks - more clarity * fix: clozure File to Cloze Files - clarity * fix: comment about the dash in the samtools view command. Removed 2nd false explanation * fix: moved 05_Snakemfile to 05_Snakefile_target for semantic clarity * fix: bcftools seemed missing in environment.yaml * feat: added seaborn to environment.yaml * feat: new python rule demonstrating the run directive * feat: added Snakefile to illustrate the python run directive and a solution * feat: introducing the parson problem * feat: extended reach of the Python section's scope * fix: typo * fix: added pip install of pysam to environment yaml file * fix: 06_Snakefile_run task setting now syntactically correct * feat: new slide introducing the Parson problem from the last commits as a task * feat: added results positions file to illustrate the solution * fix: removed outdated solution file 06 * fix: added filnalized version of for the run directive * feat: no cloze for 07 script, but a solution * feat: plot quals to rule creation, only discussing script in the future * fix: bcftools with minimum version * fix: removed doubled bcftools entry * feat: added exercise to move run to script content * fix: typo * fix: grammar * fix: layout glitches * fix: layout (tabs to spaces) * fix: a little clearer message * fix: typo * fix: added missing semantic suffic to 05 template * fix: finished open sentence * feat: update to new versions of the workflow * fix: layout * feat: graphical chapter intro to tutorial scenario * feat: added solution for config - with bcftools-support * feat: added sample script to plot the deviation positions * feat: adding black and snakefmt to check formatting issues * style: blacked previous commit * feat: new sample file featuring bwa input function * feat: autoscaling of inline listings to current font size * feat: re-ordered slide set up to inline functions * fix: no cut in the bottom of the positions image * feat: answers on slide for the input function test file * feat: new Snakefile demonstrating the params directive * fix: some small fixes in demonstrating the params directive * feat: re-worked the logging task * fix: added missing read groupt to bwa rule * fix: added missing read groupt to bwa rule * feat: little layout improvement * feat: gnarf - to be tested solution * fix: syntax in config sample file * fix: considering multiple plot rules for the decision to make something a temp file * fix: more coherent intro to temp files * fix: slide introducing the protected feature improved by mentioning its just a mention ... * fix: deleted all Snakefiles without suffix * fix: removed print statement * fix: wrong file suffixes in samtools_sort for log files * fix: using samtools -o option * fix: using samtools -o option * fix: using samtools -o option * fix: corrected paths in config * fix: using samtools -o option * fix: using samtools -o option * fix: using samtools -o option * fix: using samtools -o option * fix: using samtools -o option * fix: typo * Update setup_creators/solutions/05_Snakefile_target Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update setup_creators/tutorial/05_Snakefile_target Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent ca629a1 commit a327718

38 files changed

+1076
-549
lines changed
139 KB
Loading

images/misc/pipe.png

9.11 KB
Loading

images/results/positions.png

29.6 KB
Loading
7.21 KB
Loading

images/workflows/dag_lifescience.png

7.49 KB
Loading
39 KB
Loading
8.01 KB
Loading

setup_creators/environment.yaml

+7-2
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,19 @@ dependencies:
55
- snakemake-minimal >=8.4.4
66
- snakemake-executor-plugin-slurm
77
- snakemake-storage-plugin-fs
8+
- black
9+
- snakefmt
810
- jinja2
911
- matplotlib
1012
- graphviz
11-
- bcftools =1.19
13+
- bcftools >=1.19
1214
- samtools =1.19.2
1315
- bwa =0.7.17
14-
# - pysam =0.22
16+
- pip:
17+
- pysam
1518
# at the time of writing - 7. Feb 24 - pysam will require
1619
# a lower python version than snakemake, install pysam
1720
# using pip
1821
- pygments
22+
- seaborn
23+

setup_creators/solutions/04_Snakefile setup_creators/solutions/04_Snakefile_new_rule

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@ rule samtools_sort:
1616
"sorted_reads/{sample}.bam"
1717
shell:
1818
"samtools sort -T sorted_reads/{wildcards.sample} "
19-
"-O bam {input} > {output}"
19+
"-O bam -o {output} {input}"

setup_creators/solutions/05_Snakefile setup_creators/solutions/05_Snakefile_target

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ rule samtools_sort:
2222
"sorted_reads/{sample}.bam"
2323
shell:
2424
"samtools sort -T sorted_reads/{wildcards.sample} "
25-
"-O bam {input} > {output}"
25+
"-O bam -o {output} {input}"
2626

2727

2828
rule samtools_index:
+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# our samples are pre-configured
2+
SAMPLES = ["A", "B"]
3+
4+
rule all:
5+
input:
6+
"calls/all.vcf",
7+
"calls/positions.png"
8+
9+
rule bwa_map:
10+
input:
11+
"data/genome.fa",
12+
"data/samples/{sample}.fastq"
13+
output:
14+
"mapped_reads/{sample}.bam"
15+
shell:
16+
"bwa mem {input} | samtools view -Sb - > {output}"
17+
18+
19+
rule samtools_sort:
20+
input:
21+
"mapped_reads/{sample}.bam"
22+
output:
23+
"sorted_reads/{sample}.bam"
24+
shell:
25+
"samtools sort -T sorted_reads/{wildcards.sample} "
26+
"-O bam -o {output} {input}"
27+
28+
29+
rule samtools_index:
30+
input:
31+
"sorted_reads/{sample}.bam"
32+
output:
33+
"sorted_reads/{sample}.bam.bai"
34+
shell:
35+
"samtools index {input}"
36+
37+
38+
rule bcftools_call:
39+
input:
40+
fa="data/genome.fa",
41+
bam=expand("sorted_reads/{sample}.bam", sample=SAMPLES),
42+
bai=expand("sorted_reads/{sample}.bam.bai", sample=SAMPLES)
43+
output:
44+
"calls/all.vcf"
45+
shell:
46+
"bcftools mpileup -f {input.fa} {input.bam} | "
47+
"bcftools call -mv - > {output}"
48+
49+
rule plot_positions:
50+
input:
51+
rules.bcftools_call.output
52+
output:
53+
"calls/positions.png"
54+
run:
55+
import matplotlib
56+
matplotlib.use("Agg") # to suppress interactive plotting
57+
import matplotlib.pyplot as plt
58+
import numpy as np
59+
from pysam import VariantFile
60+
import seaborn as sns
61+
#TODO: this parameter needs to be configurable
62+
# see one of the next exercises
63+
window_size = 500
64+
65+
pos = [record.pos for record in VariantFile(input[0])]
66+
# setup windows
67+
bins = np.arange(0, max(pos), window_size)
68+
69+
# use window midpoints as x coordinate
70+
x = (bins[1:] + bins[:-1])/2
71+
72+
# compute variant density in each window
73+
h, _ = np.histogram(pos, bins=bins)
74+
y = h / window_size
75+
76+
# plot
77+
fig, ax = plt.subplots(figsize=(12, 3))
78+
sns.despine(ax=ax, offset=10)
79+
ax.plot(x, y)
80+
ax.set_xlabel('Chromosome position (bp)')
81+
ax.set_ylabel('Variant density (bp$^{-1}$)')
82+
plt.savefig(output[0])
83+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# our samples are pre-configured
2+
SAMPLES = ["A", "B"]
3+
4+
5+
rule all:
6+
input:
7+
"calls/all.vcf",
8+
"calls/positions.png",
9+
"calls/quals.svg"
10+
11+
rule bwa_map:
12+
input:
13+
"data/genome.fa",
14+
"data/samples/{sample}.fastq"
15+
output:
16+
"mapped_reads/{sample}.bam"
17+
shell:
18+
"bwa mem {input} | samtools view -Sb - > {output}"
19+
20+
21+
rule samtools_sort:
22+
input:
23+
"mapped_reads/{sample}.bam"
24+
output:
25+
"sorted_reads/{sample}.bam"
26+
shell:
27+
"samtools sort -T sorted_reads/{wildcards.sample} "
28+
"-O bam -o {output} {input}"
29+
30+
31+
rule samtools_index:
32+
input:
33+
"sorted_reads/{sample}.bam"
34+
output:
35+
"sorted_reads/{sample}.bam.bai"
36+
shell:
37+
"samtools index {input}"
38+
39+
40+
rule bcftools_call:
41+
input:
42+
fa="data/genome.fa",
43+
bam=expand("sorted_reads/{sample}.bam", sample=SAMPLES),
44+
bai=expand("sorted_reads/{sample}.bam.bai", sample=SAMPLES)
45+
output:
46+
"calls/all.vcf"
47+
shell:
48+
"bcftools mpileup -f {input.fa} {input.bam} | "
49+
"bcftools call -mv - > {output}"
50+
51+
rule plot_positions:
52+
input:
53+
rules.bcftools_call.output
54+
output:
55+
"calls/positions.png"
56+
run:
57+
import matplotlib
58+
matplotlib.use("Agg") # to suppress interactive plotting
59+
import matplotlib.pyplot as plt
60+
import numpy as np
61+
from pysam import VariantFile
62+
import seaborn as sns
63+
#TODO: this parameter needs to be configurable
64+
# see one of the next exercises
65+
window_size = 500
66+
67+
pos = [record.pos for record in VariantFile(input[0])]
68+
# setup windows
69+
bins = np.arange(0, max(pos), window_size)
70+
71+
# use window midpoints as x coordinate
72+
x = (bins[1:] + bins[:-1])/2
73+
74+
# compute variant density in each window
75+
h, _ = np.histogram(pos, bins=bins)
76+
y = h / window_size
77+
78+
# plot
79+
fig, ax = plt.subplots(figsize=(12, 3))
80+
sns.despine(ax=ax, offset=10)
81+
ax.plot(x, y)
82+
ax.set_xlabel('Chromosome position (bp)')
83+
ax.set_ylabel('Variant density (bp$^{-1}$)')
84+
plt.savefig(output[0])
85+
86+
rule plot_quals:
87+
input:
88+
"calls/all.vcf"
89+
output:
90+
"calls/quals.svg"
91+
script:
92+
"scripts/plot-quals.py"
93+

setup_creators/solutions/08_Snakefile

-63
This file was deleted.

setup_creators/tutorial/06_Snakefile setup_creators/solutions/08_Snakefile_script2

+15-15
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,12 @@
11
# our samples are pre-configured
22
SAMPLES = ["A", "B"]
33

4-
# Task: Which is our second target file?
5-
# Add it to the input of the 'all' rule.
6-
# Attention: What do you need to do, to
7-
# keep the syntax right?
8-
#
9-
# Subsequently run the workflow - if
10-
# there is an error (e.g. a typo from
11-
# copying the slides), run Snakemake
12-
# with the '--debug' flag.
134

145
rule all:
156
input:
16-
"calls/all.vcf"
7+
"calls/all.vcf",
8+
"calls/positions.png",
9+
"calls/quals.svg"
1710

1811
rule bwa_map:
1912
input:
@@ -32,7 +25,7 @@ rule samtools_sort:
3225
"sorted_reads/{sample}.bam"
3326
shell:
3427
"samtools sort -T sorted_reads/{wildcards.sample} "
35-
"-O bam {input} > {output}"
28+
"-O bam -o {output} {input}"
3629

3730

3831
rule samtools_index:
@@ -55,12 +48,19 @@ rule bcftools_call:
5548
"bcftools mpileup -f {input.fa} {input.bam} | "
5649
"bcftools call -mv - > {output}"
5750

51+
rule plot_positions:
52+
input:
53+
rules.bcftools_call.output
54+
output:
55+
"calls/positions.png"
56+
script:
57+
"scripts/plot-positions.py"
58+
5859
rule plot_quals:
5960
input:
60-
"calls/all.vcf"
61+
"calls/all.vcf"
6162
output:
62-
"plots/quals.svg"
63+
"calls/quals.svg"
6364
script:
64-
"scripts/plot-quals.py"
65-
65+
"scripts/plot-quals.py"
6666

setup_creators/solutions/06_Snakefile setup_creators/solutions/09_Snakefile_config

+15-8
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# our samples are pre-configured
21
configfile: "config/config.yaml"
32

43
rule all:
54
input:
6-
"plots/quals.svg",
7-
"calls/all.vcf"
5+
"calls/all.vcf",
6+
"calls/positions.png",
7+
"calls/quals.svg"
88

99
rule bwa_map:
1010
input:
@@ -23,7 +23,7 @@ rule samtools_sort:
2323
"sorted_reads/{sample}.bam"
2424
shell:
2525
"samtools sort -T sorted_reads/{wildcards.sample} "
26-
"-O bam {input} > {output}"
26+
"-O bam -o {ouput} {input}"
2727

2828

2929
rule samtools_index:
@@ -46,12 +46,19 @@ rule bcftools_call:
4646
"bcftools mpileup -f {input.fa} {input.bam} | "
4747
"bcftools call -mv - > {output}"
4848

49+
rule plot_positions:
50+
input:
51+
rules.bcftools_call.output
52+
output:
53+
"calls/positions.png"
54+
script:
55+
"scripts/plot-positions.py"
56+
4957
rule plot_quals:
5058
input:
51-
"calls/all.vcf"
59+
"calls/all.vcf"
5260
output:
53-
"plots/quals.svg"
61+
"calls/quals.svg"
5462
script:
55-
"scripts/plot-quals.py"
56-
63+
"scripts/plot-quals.py"
5764

0 commit comments

Comments
 (0)