Skip to content
Ed Harry edited this page Feb 11, 2022 · 12 revisions

About

LinkStats collects, compiles and processes QC statistics on algined barcoded genomic reads. It produces CSV reports and graphical plots.

Overview

Overview

LinkStats consists of a three-stage pipeline:

  1. Read any input SAM/BAM/CRAM data sources and compile linked-read data.
    • Optionally output summary and/or molecule and/or coverage data as CSV files.
  2. Process data into histograms.
    • Optionally read in additional molecule and/or coverage data from CSV files.
    • Optionally ouput histogram data as CSV files.
  3. Create plots from histogram data.
    • Optionally input additional histogram data from CSV files.

Usage Example

> LinkStats sam-data sample_reads.cram save-csvs results/csvs/sample save-plots results/plots/sample

This reads alignment data from the file sample_reads.cram, saves summary data to the directory results/csvs with the prefix sample_ and saves histogram plots to the directory results/plots with the prefix sample_. See manual for detailed usage instructions.

Alignment Data Requirments

Alignment data must be coordinate-sorted SAM data with barcodes as BX:Z: tags. Optional, additional barcode grouping can be made by MI:i tags. Sample names are taken from SM:Z: (preferably) or RG:Z: tags if present, or can be set with a command-line option.

Example Output

Summary Statistics

Sample Name Genome Length Total Alignments Duplicates QCFail Unmapped Mapped No BX No MI Zero MapQ N50 Reads Per Molecule N90 Reads Per Molecule auN Reads Per Molecule No. Molecules (No. Reads >= 1) Mean Read MapQ Per Molecule (No. Reads >= 1) Mean Molecule Length (No. Reads >= 1) N50 Molecule Length (No. Reads >= 1) N90 Molecule Length (No. Reads >= 1) auN Molecule Length (No. Reads >= 1) No. Molecules (No. Reads >= 3) Mean Read MapQ Per Molecule (No. Reads >= 3) Mean Molecule Length (No. Reads >= 3) N50 Molecule Length (No. Reads >= 3) N90 Molecule Length (No. Reads >= 3) auN Molecule Length (No. Reads >= 3) No. Molecules (No. Reads >= 5) Mean Read MapQ Per Molecule (No. Reads >= 5) Mean Molecule Length (No. Reads >= 5) N50 Molecule Length (No. Reads >= 5) N90 Molecule Length (No. Reads >= 5) auN Molecule Length (No. Reads >= 5) No. Molecules (No. Reads >= 10) Mean Read MapQ Per Molecule (No. Reads >= 10) Mean Molecule Length (No. Reads >= 10) N50 Molecule Length (No. Reads >= 10) N90 Molecule Length (No. Reads >= 10) auN Molecule Length (No. Reads >= 10) Median Insert Size Mean Short Read Depth Mean Short Read Depth Per Molecule (No. Reads >= 1) Molecule Read Depth (No. Reads >= 1) Mean Short Read Depth Per Molecule (No. Reads >= 3) Molecule Read Depth (No. Reads >= 3) Mean Short Read Depth Per Molecule (No. Reads >= 5) Molecule Read Depth (No. Reads >= 5) Mean Short Read Depth Per Molecule (No. Reads >= 10) Molecule Read Depth (No. Reads >= 10)
sample_1 475288177 76630734 0.31437793614243603 0.0 0.020081903430547852 0.9799180965694522 0.02013981753065291 0.02013981753065291 0.18363639841946444 2 2 4.508073122000967 14206313 35.64287354551842 3552.3337376840846 28158 5566 31222.761770932833 3026475 42.63248095789803 15475.771771780703 29732 8844 33071.83794352541 1290033 45.54367534814054 23451.85308205294 34597 12794 38466.78045604522 312474 47.91390010477723 36652.9717896529 45269 21138 49249.24314427641 207 23.94266163073524 1.0525249483701045 106.1788771530919 0.27878003867807977 98.54450129315967 0.12086867821999131 63.65330730076208 0.07543374059593067 24.097171487183868
sample_2 2782012602 589796532 0.7195109838319632 0.0 0.015188315824142554 0.9848116841758574 0.01340242027737118 0.01340242027737118 0.034898881026312986 8 2 14.249821430536441 29550007 45.66036275340808 6104.185723543145 37299 9131 42537.54489314595 9470847 47.529572485517356 18313.300594867596 38452 11597 43997.00931155544 5978121 49.441155002674705 25248.238797943366 41386 14216 47093.63476301429 3164327 50.59626523246061 35413.34951065424 47624 19803 53344.70567932435 271 31.482526333286536 0.9676011811817129 64.83749596616673 0.5293547845761889 62.344242392831546 0.33715994721512627 54.254616410612506 0.19225956277851122 40.27998217421447

Molecule Data

Sample Name Molecule Length No. Reads MI BX Reference Mean Read Depth Mean MapQ No. Gaps Mean Gap Size Max Gap Size
sample_1 151 1 42.0 AAACTTATAAACAAAT-1 SUPER_2 0.9933774834437086 39.0 0 0.0 0
sample_1 693 2 62.0 AAACTTATAAACAAAT-1 SUPER_2 0.42857142857142855 60.0 1 396.0 396
sample_1 147 2 0.0 AAACTTATAAACAACA-1 SUPER_2 2.020408163265306 52.0 0 0.0 0

Coverage Gap Data

Sample Name Reference Gap Length
sample_1 SUPER_2 492
sample_1 SUPER_2 503
sample_1 SUPER_2 85

Molecule Length Histogram Data

PDF CDF Molecule Length Sample Name Min No. Reads
9.367449744132983e-06 1.3014845860860104e-05 96.0 sample_1 1
0.0012569059966509256 0.0017593212851612847 109.5 sample_1 1
0.0010904307120362616 0.0032743320967205856 110.5 sample_1 1

Coverage Gap Histogram Data

PDF CDF Coverage Gap Length Sample Name
0.014091511936339523 0.01579554609844439 1.5 sample_1
0.011836098945160693 0.029062940495769775 2.5 sample_1
0.011832243538338165 0.04232601326625564 3.5 sample_1

Molecule Length PDFs

image image

Molecule Length CDFs

image image

Coverage Gap PDFs

image

Coverage Gap CDFs

image