Skip to content

Commit d97a99c

Browse files
authored
deblur 2021.09 (#3141)
1 parent dcf6cbe commit d97a99c

File tree

4 files changed

+80
-5
lines changed

4 files changed

+80
-5
lines changed

Diff for: CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
Version 2021.09
44
---------------
55

6+
* Updated the qp-deblur plugin to version 2021.09 addressing a bug in fragment insertion parsing and caching; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/deblur_2021.09.html).
67
* Double the number of possible connections for the Qiita database: 100 -> 200 simultaneous connections.
78
* Added a new data type: "Job Output Folder" and artifact type definition: "job-output-folder" to initially support admin-only standalone commands in Qiita.
89
* The study listing is now sorted by descending study id and then ascending number of available artifacts.
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
deblur version 2021.09
2+
======================
3+
4+
The deblur version 2021.09 addresses a bug with the fragment insertion parsing and
5+
cache that ignored some fragments for getting an accurate placement in the tree. In
6+
summary, in some occasions SEPP will return multiple fragments in a single entry; which
7+
was unexpected by the qp-deblur plugin parser, which assumed only one entry - the
8+
extra features will be seen as missing by the plugin and recorded in the cache as such. As these fragments were reported as missing in the cache, the effect was propagated to future studies and
9+
meta-analyses.
10+
11+
This bug was resolved in this `pull request <https://github.com/qiita-spots/qp-deblur/pull/60>`__.
12+
13+
It is important to note that this bug only applies to the fragments inserted into the tree, which is
14+
only part of the `deblur reference hit table`.
15+
16+
How do I know if my study processing had this bug?
17+
----------------------------------------------------
18+
19+
The easiest is to check the table summary reported `Number rejected fragments`. If the number is
20+
different between the qp-deblur v1.1.0 and qp-deblur v2021.09 then your study had this bug. To
21+
see the table summary, you need to navigate to the processing graph, click on the
22+
`deblur reference hit table` artifact and see the table summary.
23+
24+
25+
Sample counts implications
26+
--------------------------
27+
28+
At the time of writing of this documentation Qiita had 978,052 16S deblured private or pubic samples.
29+
In the figure below, we have at different trimming lengths how samples we will recover
30+
based on the minimum number of sequences per sample - this is an important consideration
31+
as we normally need to remove samples below a given threshold for beta diversity
32+
calculations (via rarefaction) or differential abundance testing.
33+
34+
.. figure:: deblur2021.09_private_public.png
35+
:align: center
36+
37+
A few conclusions from this plot:
38+
39+
- The maximum number of samples that we will recover are 6,771 at `Trimming (length: 150)`
40+
and min_seqs of 1,500; which represents a 0.7% increment in private and public samples.
41+
- At all Trimming lengths the curve tends to go up and then down based on min_seq,
42+
which is a common trend seen in rarefacion plots
43+
44+
How perversive is this bug?
45+
---------------------------
46+
47+
For a better assessment we are going to focus on only on the 150 bps trimming length. At
48+
the time of writing of this documentation Qiita had 1,484 16S preparations that have a
49+
150 bps deblur table. Of those:
50+
51+
- 96.6% of preparations had 0-10% of features lost
52+
- 12.6% had 10-20% of the features lost
53+
- 9.7% 20-30%
54+
- 6.9% 30-40%
55+
- 4.9% 40-50%
56+
- 3.3% 50-60%
57+
- 2.0% 60-70%
58+
- 1.3% 70-80%
59+
- 0.6% 80-90%
60+
- 0.2% 90-100%
61+
62+
Remember that the percentage reported above is inclusive at the next level, for example
63+
the studies with 40-50% lost are also accounted for at lower levels.
64+
65+
Additionally, after a Qiita wide review, we did not find a strong patterns among the
66+
studies that were most greatly affected, whether they were from a specific sample type
67+
(according to EMPO 3 category) or target 16S variable region (according to the reported
68+
target_subfragment).
69+
70+
Reaching out to affected study owners
71+
-------------------------------------
72+
73+
As you saw in the previous section the effect of the missing fragments depends on the
74+
study, the trimming length and the minimum per sample sequence count. As a
75+
general rule of thumb, as a first analytical pass for meta-analysis for 16S data, we use
76+
5,000 sequences per sample and we prefer 150 base pair trimming. Thus, we directly
77+
contacted all study owners that would recover more than 5% of the samples in their study
78+
(total 24).

Diff for: qiita_pet/support_files/doc/source/processingdata/index.rst

+1-5
Original file line numberDiff line numberDiff line change
@@ -164,14 +164,10 @@ Deblurring
164164

165165
* **Deblur Final Table** :ref:`[5]<reference5>` : Contains all the sequences.
166166

167-
Deblur Quality Filtering
168-
~~~~~~~~~~~~~~~~~~~~~~~~
169-
170-
Looking for information about debluring? Please see the document here:
171-
172167
.. toctree::
173168

174169
deblur_quality.rst
170+
deblur_2021.09.rst
175171

176172
Closed-Reference OTU Picking
177173
----------------------------

0 commit comments

Comments
 (0)