Skip to content

Commit c1a0dc9

Browse files
committed
big report improvements
1 parent 53cbdf1 commit c1a0dc9

File tree

1 file changed

+57
-31
lines changed

1 file changed

+57
-31
lines changed

harpy/reports/align_stats.qmd

+57-31
Original file line numberDiff line numberDiff line change
@@ -395,7 +395,6 @@ rm(hs_mult)
395395
rm(valids)
396396
rm(invalids)
397397
rm(non_singletons)
398-
#knitr::knit_exit()
399398
```
400399
```{r imports}
401400
covfile <- params$coverage
@@ -484,7 +483,7 @@ list(
484483

485484
```{r}
486485
#| content: valuebox
487-
#| title: "Mol. Average Depth"
486+
#| title: "Linked Depth"
488487
list(
489488
color = "#d4d4d4",
490489
value = round(mol_global_avg,2)
@@ -493,7 +492,7 @@ list(
493492

494493
```{r}
495494
#| content: valuebox
496-
#| title: "Mol. Stdev Depth"
495+
#| title: "Stdev Linked Depth"
497496
list(
498497
color = "#d4d4d4",
499498
value = round(mol_global_sd,2)
@@ -503,7 +502,8 @@ list(
503502
## Distdesc header
504503
::: {.card title="Alignment Depth Distribution"}
505504
These are the frequencies of interval coverage across all **`r windowskb` kilobase** intervals for all contigs.
506-
For visual clarity, the distribution of alignment-depths (not molecule depths) is truncated at the 99% quantile, which is **`r q99`** for the alignment data.
505+
For visual clarity, the distributions are truncated at the 99% quantile, which is **`r q99`** for the alignments
506+
and **`r mol_q99`** for the inferred molecules.
507507

508508
```{r depth_distribution}
509509
hs <- hist(tb$depth[tb$depth <= q99], breaks = 30, plot = F)
@@ -515,11 +515,11 @@ hs_mol <- data.frame(val = hs_mol$breaks[-1], freq = hs_mol$counts)
515515
516516
highchart() |>
517517
hc_chart(type = "area", animation = F) |>
518-
hc_add_series(data = hs, type = "areaspline", color = "#757575", name = "by alignments", hcaes(x = val, y = freq), marker = list(enabled = FALSE)) |>
519-
hc_add_series(data = hs_mol, type = "areaspline", hcaes(x = val, y = freq), color = "#9c3b94", name = "by inferred molecules", marker = list(enabled = FALSE)) |>
518+
hc_add_series(data = hs, type = "areaspline", color = "#757575", name = "alignments", hcaes(x = val, y = freq), marker = list(enabled = FALSE)) |>
519+
hc_add_series(data = hs_mol, type = "areaspline", hcaes(x = val, y = freq), color = "#9c3b94", name = "inferred molecules", marker = list(enabled = FALSE)) |>
520520
hc_xAxis(max = mol_q99, title = list(text = "depth")) |>
521521
hc_yAxis(title = list(text = "% intervals")) |>
522-
hc_caption(text = "inferred molecules calculate \"effective\" or \"molecule\" coverage") |>
522+
hc_caption(text = "inferred molecules provide \"linked\" coverage") |>
523523
hc_title(text = "Distribution of Alignment Depths") |>
524524
hc_exporting(
525525
enabled = T, filename = paste0(samplename, ".cov"),
@@ -541,7 +541,7 @@ column_description <- c(
541541
"average alignment depth",
542542
"standard deviation of alignment depth",
543543
"average depth including gaps between linked-reads",
544-
"standard deviation of the inferred molecule alignment depth"
544+
"standard deviation of the depth including gaps between linked-reads"
545545
)
546546
547547
headerCallback <- c(
@@ -557,7 +557,7 @@ DT::datatable(
557557
contig_avg,
558558
rownames = F,
559559
extensions = 'Buttons',
560-
colnames = c('Contig', 'Average Depth', 'Standard Deviation', 'Average Depth (molecules)', 'Standard Deviation (Molecules)'),
560+
colnames = c('Contig', 'Average Depth', 'Standard Deviation', 'Average Linked Depth', 'Standard Deviation Linked'),
561561
fillContainer=T,
562562
options = list(
563563
dom = 'Brtp',
@@ -573,7 +573,7 @@ DT::datatable(
573573
###
574574
::: {.card title="Depth Outliers"}
575575
This table shows the `r windowskb`kbp intervals considered outliers, as determined by
576-
having depth greater than the 99th percentile (`r q99`) of aligment depths.
576+
having depth greater than the 99% quantile (`r q99`) of aligment depths.
577577

578578
```{r depth_plot}
579579
column_description <- c(
@@ -596,7 +596,7 @@ DT::datatable(
596596
outliers,
597597
rownames = F,
598598
extensions = 'Buttons',
599-
colnames = c('Contig', 'Interval', 'Depth', 'Depth (molecules)'),
599+
colnames = c('Contig', 'Interval', 'Depth', 'Linked Depth'),
600600
fillContainer=T,
601601
options = list(
602602
dom = 'Brtp',
@@ -613,22 +613,25 @@ DT::datatable(
613613
##
614614
::: {.card title="Depth and Coverage, Visualized" fill="false"}
615615
This is a circular vizualization the depth information across up to 30 of the largest contigs (unless specific contigs were provided).
616-
For clarity, this visualization truncates coverage at the 99th percentile (`r q99` for alignments and `r mol_q99` for molecule coverage).
616+
For clarity, this visualization truncates coverage at the 99% quantile (`r q99` for alignments and `r mol_q99` for inferred molecules).
617617
If you are unfamiliar with this kind of visualization, it's a circular representation of a linear genome.
618618
Each arc (segment) is a different contig, from position 0 to the end of the contig, and is labelled by the contig name.
619-
The internal (grey) rings are a barplot where each bar represents the alignment depth at a `r windowskb` kilobase
620-
genomic interval. The inner ring (grey bars) is the number of reads that had a _proper_ alignment in the `r windowskb` kilobase interval, where
619+
The internal (grey) rings are a histogram where each bar represents the alignment depth at a `r windowskb` kilobase
620+
genomic interval. These reads are considered to have _proper_ alignment in the `r windowskb` kilobase interval, where
621621
"proper" refers to a read not marked as a duplicate or flagged with the SAM `UNMAP`, `SECONDARY`, or `QCFAIL` flags. The outer ring (magenta bars),
622-
is the number of _molecules_ spanning that interval.
622+
is the number of _molecules_ spanning that interval. The outer ring (magenta) is linked depth, which is the alignment depth of the molecules
623+
inferred from linked-read data, included unsequenced segments between reads sharing the same linked-read barcode. It's common for the
624+
linked depth histogram to gradually increase towards the center of a contig (your plot may resemble petals of a flower) due
625+
to the likelihood of linked molecules spanning the center of a contig.
623626
:::
624627

625628
##
626629
### {width=20%}
627630
::: {.card title="Navigating the Plot"}
628-
You may hover your cursor over variants to view their positions, pan by clicking and dragging,
631+
You may hover your cursor over bars to view their positions and depths, pan by clicking and dragging,
629632
and zoom using scroll (mouse or touchpad). In case you become unable to scroll up from the plot due to these interactive
630-
features, place your cursor over this left column and you will
631-
be able to scroll the report instead of zooming on the plot.
633+
features, place your cursor over this left column and you will be able to scroll the report instead of zooming on the plot.
634+
Try refreshing the browser window if no plot is appearing in the pane. Note: The depth values are rounded to 4 decimal places.
632635
:::
633636

634637
###
@@ -655,7 +658,7 @@ if (all(plot_contigs == "default")){
655658
```
656659

657660
```{r circleplot, fig.align='center', out.width= "80%", out.height="900px"}
658-
#| title: Depth and Coverage Across the Genome (refresh browser window if not showing up)
661+
#| title: Depth and Coverage Across the Genome
659662
660663
genomeChr <- .contigs$size
661664
names(genomeChr) <- .contigs$contig
@@ -674,19 +677,19 @@ for (i in names(genomeChr)){
674677
paste0("cov_", i),
675678
chromosome = i,
676679
starts = chrcov$position, ends = chrcov$position_end,
677-
values = pmin(chrcov$depth, q99),
680+
values = round(pmin(chrcov$depth, q99),3),
678681
color = "#757575",
679-
minRadius = inner[1],
680-
maxRadius = inner[2]
682+
minRadius = inner[1] + 0.02,
683+
maxRadius = inner[2] - 0.02
681684
)
682685
tracks <- tracks + BioCircosBarTrack(
683686
paste0("molcov_", i),
684687
chromosome = i,
685688
starts = chrcov$position, ends = chrcov$position_end,
686-
values = pmin(chrcov$mol_depth, mol_q99),
689+
values = round(pmin(chrcov$mol_depth, mol_q99),3),
687690
color = "#9c3b94",
688-
minRadius = outer[1],
689-
maxRadius = outer[2]
691+
minRadius = outer[1] + 0.02,
692+
maxRadius = outer[2] - 0.02
690693
)
691694
}
692695
# Add background
@@ -755,12 +758,35 @@ aligned_bp
755758
:::
756759

757760
###
758-
::: {.card title="Interpreting Barcode Validity"}
759-
BX barcode validity is classified into one of two categories:
761+
::: {.card title="Interpreting Linked-Read Terminology"}
762+
BX barcode validity is classified into one of three categories:
760763

761-
valid BX
762-
: a complete BX barcode was present in the read (i.e. no 00 for any segments)
764+
Valid
765+
: A complete BX barcode was present in the read (i.e. no 00 for any segments)
766+
767+
Invalid
768+
: A barcode was present in the read, but it contained 00 in at least one of the barcode segments
769+
770+
Missing
771+
: There is no barcode in the read. For technical reasons this is usually equivalent to `invalid`
772+
773+
Linked-read data is specific for the definition of a "molecule":
774+
775+
Unique/Inferred Molecules
776+
: Given linked-read barcode information, the original piece of DNA from which the sequenced fragments are considered to originate from.
777+
778+
Inferred Sequence
779+
: While somewhat similar to "inferred molecule", the inferred sequence describes the original DNA fragment that was put on the sequencer. If the fragment was longer than the sequencer could fully sequence, e.g. 400bp fragment and the sequencer can only sequence 300bp, then the inferred sequence is 400bp long, even though only 300bp are represented in the sequence data. If the entire fragment was sequenced, then the inferred length and sequence lengths should be identical.
780+
781+
There are several kinds of "coverage" when working with linked-read data:
782+
783+
Aligned Depth/Coverage
784+
: The standard interpretation of depth comparing the number of aligned base-pairs to the genome or contigs
785+
786+
Molecule Coverage
787+
: The coverage breadth or depth of sequences onto _unique molecules_ (rather than the genome), as inferred from linked-read barcodes
788+
789+
Linked Depth/Coverage
790+
: The coverage breadth or depth that _includes unsequenced gaps between linked sequences_ that are associated with a single unique molecule. For example, if two 300bp paired-end reads share the same barcode and map 2000bp apart, the calculation *includes* the 1400bp between the sequences as if they were present. This is similar to _inferred sequences_ described above, except spanning across linked sequences rather than within a paired-end sequence.
763791

764-
invalid BX
765-
: a barcode was present in the read, but it contained 00 in at least one of the barcode segments
766792
:::

0 commit comments

Comments
 (0)