You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/count_example1.rst
+28-13
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@
7
7
Basic Experiment workflow
8
8
=========================
9
9
10
-
This example runs the count workflow on 5'/5' WT MPRA data in the HEPG2 cell line from `Klein J., Agarwal, V., Keith, A., et al. 2019 <https://www.biorxiv.org/content/10.1101/576405v1.full.pdf>`_.
10
+
This example runs the count workflow on 5'/5' WT MPRA data in the HepG2 cell line from `Klein J., Agarwal, V., Keith, A., et al. 2019 <https://www.biorxiv.org/content/10.1101/576405v1.full.pdf>`_.
11
11
12
12
Prerequirements
13
13
======================
@@ -21,8 +21,8 @@ Installing MPRAsnakeflow
21
21
Please install conda, the MPRAsnakeflow environment, and clone the actual ``MPRAsnakeflow`` master branch. You will find more help under :ref:`Installation`.
22
22
23
23
Producing an association (.tsv.gz) file
24
-
------------------------------------
25
-
This workflow requires a python dictionary of candidate regulatory sequence (CRS) mapped to their barcodes in a tab separated (.tsv) format. For this example the file can be generated using :ref:`Assignment example` or it can be found in :code:`resources/count_basic` folder in `MPRAsnakelfow <https://github.com/kircherlab/MPRAsnakeflow/>`_.
24
+
----------------------------------------
25
+
This workflow requires a python dictionary of candidate regulatory sequence (CRS) mapped to their barcodes in a tab separated (.tsv) format. For this example the file can be generated using :ref:`Assignment example` or it can be found in :code:`resources/count_basic` folder in `MPRAsnakelfow <https://github.com/kircherlab/MPRAsnakeflow/>`_(file :code:`SRR10800986_barcodes_to_coords.tsv.gz`).
26
26
27
27
Alternatively, if the association file is in pickle (.pickle) format because you used MPRAflow, you can convert the same file to .tsv.gz format with the in-built function in MPRsnakeflow with the following code:
28
28
@@ -98,7 +98,7 @@ The folder should look like this:
@@ -185,7 +190,7 @@ You should see a list of rules that will be executed. This is the summary:
185
190
statistic_counts_frequent_umis 6 1 1
186
191
statistic_counts_stats_merge 2 1 1
187
192
statistic_counts_table 12 1 1
188
-
total 139 1 10
193
+
total 941 1
189
194
190
195
When dry-drun does not give any errors we will run the workflow. We use a machine with 30 threads/cores to run the workflow. The MPRAsnakeflow command is:
191
196
@@ -195,20 +200,30 @@ When dry-drun does not give any errors we will run the workflow. We use a machin
195
200
196
201
.. note:: Please modify your code when running in a cluster environment. We have an example SLURM config file here :code:`config/sbatch.yml`.
197
202
198
-
If everything works fine the 25 rules showed above will run:
203
+
If everything works fine the 29 rules showed above will run. Everything starting with :code:`counts_` beolngs to raw count rules, with :code:`assigned_counts_` to counts assigned to the assignment and :code:`statistic_` to statistics. Here is a brief description of the rules.
199
204
200
205
all
201
206
The overall all rule. Here is defined what final output files are expected.
202
207
counts_create_BAM_umi
203
-
TODO
204
-
counts_dna_rna_merge_counts
205
-
TODO
208
+
Create a BAM file from FASTQ input, merge FW and REV read and save UMI in XI flag.
209
+
counts_raw_counts_umi
210
+
Counting BCsxUMIs from the BAM files.
206
211
counts_filter_counts
207
-
TODO
212
+
Filter the counts to BCs only of the correct length (defined in the config file).
208
213
counts_final_counts_umi
209
-
TODO
210
-
counts_raw_counts_umi
211
-
TODO
214
+
Discarding PCR duplicates (taking BCxUMI only one time). Final result of counts can be found here: :code:`results/experiments/exampleCount/counts/HepG2_<1,2,3>_<DNA/RNA>_filtered_counts.tsv.gz`.
215
+
counts_dna_rna_merge_counts
216
+
Merge DNA and RNA counts together.
217
+
This is done in two ways. First no not allow zeros in DNA or RNA BCs (when :code:`min_counts` is not zero for DNA and RNA).
218
+
Second with zeros, so a BC can be defined only in the DNA or RNA (when :code:`min_counts` is zero for DNA or RNA)
219
+
assigned_counts_filterAssignment
220
+
Use only unique assignments.
221
+
assigned_counts_assignBarcodes
222
+
Assign RNA and DNA barcodes seperately to make the statistic for assigned.
223
+
assigned_counts_dna_rna_merge
224
+
Assign merged RNA/DNA barcodes. Filter BC depending on the min_counts option. Output for each replicate is here: :code:`results/experiments/exampleCount/assigned_counts/fromFile/exampleConfig/HepG2_<1,2,3>_merged_assigned_counts.tsv.gz`.
225
+
assigned_counts_make_master_tables
226
+
Final master table with all replicates combined. Output is here: :code:`results/experiments/exampleCount/assigned_counts/fromFile/exampleConfig/HepG2_allreps_merged.tsv.gz` and using the :code:`bc-threshold` here :code:`results/experiments/exampleCount/assigned_counts/fromFile/exampleConfig/HepG2_allreps_minThreshold_merged.tsv.gz`.
0 commit comments