-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Ed Harry edited this page Feb 11, 2022
·
12 revisions
LinkStats collects, compiles and processes QC statistics on algined barcoded genomic reads. It produces CSV reports and graphical plots.
LinkStats consists of a three-stage pipeline:
- Read any input SAM/BAM/CRAM data sources and compile linked-read data.
- Optionally output summary and/or molecule and/or coverage data as CSV files.
- Process data into histograms.
- Optionally read in additional molecule and/or coverage data from CSV files.
- Optionally ouput histogram data as CSV files.
- Create plots from histogram data.
- Optionally input additional histogram data from CSV files.
> LinkStats sam-data sample_reads.cram save-csvs results/csvs/sample save-plots results/plots/sample
This reads alignment data from the file sample_reads.cram
, saves summary data to the directory results/csvs
with the prefix sample_ and saves histogram plots to the directory results/plots
with the prefix sample_. See manual for detailed usage instructions.
Alignment data must be coordinate-sorted SAM data with barcodes as BX:Z:
tags. Optional, additional barcode grouping can be made by MI:i
tags. Sample names are taken from SM:Z:
(preferably) or RG:Z:
tags if present, or can be set with a command-line option.
Sample Name | Genome Length | Total Alignments | Duplicates | QCFail | Unmapped | Mapped | No BX | No MI | Zero MapQ | N50 Reads Per Molecule | N90 Reads Per Molecule | auN Reads Per Molecule | No. Molecules (No. Reads >= 1) | Mean Read MapQ Per Molecule (No. Reads >= 1) | Mean Molecule Length (No. Reads >= 1) | N50 Molecule Length (No. Reads >= 1) | N90 Molecule Length (No. Reads >= 1) | auN Molecule Length (No. Reads >= 1) | No. Molecules (No. Reads >= 3) | Mean Read MapQ Per Molecule (No. Reads >= 3) | Mean Molecule Length (No. Reads >= 3) | N50 Molecule Length (No. Reads >= 3) | N90 Molecule Length (No. Reads >= 3) | auN Molecule Length (No. Reads >= 3) | No. Molecules (No. Reads >= 5) | Mean Read MapQ Per Molecule (No. Reads >= 5) | Mean Molecule Length (No. Reads >= 5) | N50 Molecule Length (No. Reads >= 5) | N90 Molecule Length (No. Reads >= 5) | auN Molecule Length (No. Reads >= 5) | No. Molecules (No. Reads >= 10) | Mean Read MapQ Per Molecule (No. Reads >= 10) | Mean Molecule Length (No. Reads >= 10) | N50 Molecule Length (No. Reads >= 10) | N90 Molecule Length (No. Reads >= 10) | auN Molecule Length (No. Reads >= 10) | Median Insert Size | Mean Short Read Depth | Mean Short Read Depth Per Molecule (No. Reads >= 1) | Molecule Read Depth (No. Reads >= 1) | Mean Short Read Depth Per Molecule (No. Reads >= 3) | Molecule Read Depth (No. Reads >= 3) | Mean Short Read Depth Per Molecule (No. Reads >= 5) | Molecule Read Depth (No. Reads >= 5) | Mean Short Read Depth Per Molecule (No. Reads >= 10) | Molecule Read Depth (No. Reads >= 10) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sample_1 | 475288177 | 76630734 | 0.31437793614243603 | 0.0 | 0.020081903430547852 | 0.9799180965694522 | 0.02013981753065291 | 0.02013981753065291 | 0.18363639841946444 | 2 | 2 | 4.508073122000967 | 14206313 | 35.64287354551842 | 3552.3337376840846 | 28158 | 5566 | 31222.761770932833 | 3026475 | 42.63248095789803 | 15475.771771780703 | 29732 | 8844 | 33071.83794352541 | 1290033 | 45.54367534814054 | 23451.85308205294 | 34597 | 12794 | 38466.78045604522 | 312474 | 47.91390010477723 | 36652.9717896529 | 45269 | 21138 | 49249.24314427641 | 207 | 23.94266163073524 | 1.0525249483701045 | 106.1788771530919 | 0.27878003867807977 | 98.54450129315967 | 0.12086867821999131 | 63.65330730076208 | 0.07543374059593067 | 24.097171487183868 |
sample_2 | 2782012602 | 589796532 | 0.7195109838319632 | 0.0 | 0.015188315824142554 | 0.9848116841758574 | 0.01340242027737118 | 0.01340242027737118 | 0.034898881026312986 | 8 | 2 | 14.249821430536441 | 29550007 | 45.66036275340808 | 6104.185723543145 | 37299 | 9131 | 42537.54489314595 | 9470847 | 47.529572485517356 | 18313.300594867596 | 38452 | 11597 | 43997.00931155544 | 5978121 | 49.441155002674705 | 25248.238797943366 | 41386 | 14216 | 47093.63476301429 | 3164327 | 50.59626523246061 | 35413.34951065424 | 47624 | 19803 | 53344.70567932435 | 271 | 31.482526333286536 | 0.9676011811817129 | 64.83749596616673 | 0.5293547845761889 | 62.344242392831546 | 0.33715994721512627 | 54.254616410612506 | 0.19225956277851122 | 40.27998217421447 |
Sample Name | Molecule Length | No. Reads | MI | BX | Reference | Mean Read Depth | Mean MapQ | No. Gaps | Mean Gap Size | Max Gap Size |
---|---|---|---|---|---|---|---|---|---|---|
sample_1 | 151 | 1 | 42.0 | AAACTTATAAACAAAT-1 | SUPER_2 | 0.9933774834437086 | 39.0 | 0 | 0.0 | 0 |
sample_1 | 693 | 2 | 62.0 | AAACTTATAAACAAAT-1 | SUPER_2 | 0.42857142857142855 | 60.0 | 1 | 396.0 | 396 |
sample_1 | 147 | 2 | 0.0 | AAACTTATAAACAACA-1 | SUPER_2 | 2.020408163265306 | 52.0 | 0 | 0.0 | 0 |
Sample Name | Reference | Gap Length |
---|---|---|
sample_1 | SUPER_2 | 492 |
sample_1 | SUPER_2 | 503 |
sample_1 | SUPER_2 | 85 |
CDF | Molecule Length | Sample Name | Min No. Reads | |
---|---|---|---|---|
9.367449744132983e-06 | 1.3014845860860104e-05 | 96.0 | sample_1 | 1 |
0.0012569059966509256 | 0.0017593212851612847 | 109.5 | sample_1 | 1 |
0.0010904307120362616 | 0.0032743320967205856 | 110.5 | sample_1 | 1 |
CDF | Coverage Gap Length | Sample Name | |
---|---|---|---|
0.014091511936339523 | 0.01579554609844439 | 1.5 | sample_1 |
0.011836098945160693 | 0.029062940495769775 | 2.5 | sample_1 |
0.011832243538338165 | 0.04232601326625564 | 3.5 | sample_1 |