Skip to content
Ed Harry edited this page Feb 11, 2022 · 14 revisions

Usage

LinkStats [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...

Options:
  -t, --threads INTEGER RANGE    Number of threads to use. Default=4.  [x>=1]
  -m, --min_reads INTEGER RANGE  Minimum reads per molecule for analysis,
                                 multiple values possible. Default=(1, 3, 5,
                                 10).  [x>=1]
  --version                      Show the version and exit.
  --help                         Show help message and exit.

Commands:
  cov-gap-hist-data  Read in coverage gap histogram data from a CSV FILE.
  coverage-data      Read in coverage gap data from a CSV FILE.
  mol-len-hist-data  Read in molecule length histogram data from a CSV FILE.
  molecule-data      Read in molecular data from a CSV FILE.
  sam-data           Read SAM/BAM/CRAM data from PATH.
  save-csvs          Saves summary, molecule, coverage or histogram data to CSV files at PREFIX_.
  save-plots         Generates plots from any histogram data and saves them at PREFIX_.

Main Options

  • -t, --threads Maximum number of computational threads to use (defaults to 4).
  • -m, --min_reads Minimum number of reads for a barcode grouping to be classified as a molecule. Multiple values are allowed, resulting in multiple molecule definitions / classifications, which will be reflected in the downstream statistics (defaults to 1, 3, 5, and 10).

Example

> LinkStats -t 16 -m 5 -m 10 COMMAND1 [ARGS]... [COMMAND2 [ARGS]...] ...

Uses 16 threads and sets two definitions for the minimum reads per molecule: 5 or 10.

Commands

sam-data

LinkStats ... sam-data [OPTIONS] PATH
 
  Read SAM/BAM/CRAM data from PATH.

  Creates summary and molecular data-sets for each sample-name (SM:Z tag or RG:Z SAM tag).

  Alignments must have BX:Z (barcode) SAM tags.

Options:
  -r, --reference PATH     FASTA reference for CRAM decoding.
  -n, --name TEXT          Sample name, overrides name from SM or RG tags.
  --mi / --no-mi           Group by MI:I as well as BX:Z SAM tags.
                           Default=False.
  -t, --threshold INTEGER  Maximum allowed separation between alignments
                           grouped to the same molecule.
  --help                   Show help message and exit.

Command to read a SAM alignment data source. Data must be coordinate sorted and the alignments must carry barcodes as BX:Z tags. The command can be issued multiple times to read multiple sources e.g. LinkStats ... sam-data [OPTIONS] file1.cram ... sam-data [OPTIONS] file2.bam .... Set - as the PATH to read from <stdin> e.g. samtools view -u file.cram | LinkStats ... sam-data [OPTIONS] - ...

Options

  • -r, --reference Sets or overrides the path to a FASTA reference for CRAM decoding.
  • -n, --name Sets the sample name for all data from this PATH. Overrides any names from SM:Z or RG:Z tags.
  • --mi / --no-mi Group reads into molecules by MI:i as well as BX:Z tags (defaults to false).
  • -t, --threshold Maximum allowed distance (in bases) between read alignments grouped into the same molecule. Groupings with gaps larger than this threshold will be broken into multiple molecules (defaults to 50000 bp).

Example

> LinkStats ... sam-data -r reference/ref.fa -n sample --mi -t 30000 aln/linked-read.cram ...

Read aln/linked-read.cram, decoded with reference/ref.fa, set the name to sample and set the clustering threshold to 30000 bp.

molecule-data

LinkStats ... molecule-data [OPTIONS] FILE

  Read in molecular data from a CSV FILE.

  Use to re-calculate histogram data.

Options:
  --help  Show help message and exit.

Command to read in molecule data from a CSV file (generated with the save-csvs command). Use to read in previously compiled molecule data (possibly combining with sam-data), to re-generate histograms and/or plots. The command can be issued multiple times to read in multiple files e.g. LinkStats ... molecule-data [OPTIONS] file1 ... molecule-data [OPTIONS] file2 ...

coverage-data

LinkStats ... coverage-data [OPTIONS] FILE

  Read in coverage gap data from a CSV FILE.

  Use to re-calculate histogram data.

Options:
  --help  Show help message and exit.

Command to read in coverage data from a CSV file (generated with the save-csvs command). Use to read in previously compiled coverage data (possibly combining with sam-data), to re-generate histograms and/or plots. The command can be issued multiple times to read in multiple files e.g. LinkStats ... coverage-data [OPTIONS] file1 ... coverage-data [OPTIONS] file2 ...

mol-len-hist-data

LinkStats ... mol-len-hist-data [OPTIONS] FILE

  Read in molecule length histogram data from a CSV FILE.

  Use to re-generate or create combined plots.

Options:
  --help  Show help message and exit.

Command to read in molecular length histogram data from a CSV file (generated with the save-csvs command). Use to read in previously processed molecular length histogram data (possibly combining with sam-data and/or molecule-data), to re-generate plots. The command can be issued multiple times to read in multiple files e.g. LinkStats ... mol-len-hist-data [OPTIONS] file1 ... mol-len-hist-data [OPTIONS] file2 ...

cov-gap-hist-data

LinkStats ... cov-gap-hist-data [OPTIONS] FILE

  Read in coverage gap histogram data from a CSV FILE.

  Use to re-generate or create combined plots.

Options:
  --help  Show help message and exit.

Command to read in coverage histogram data from a CSV file (generated with the save-csvs command). Use to read in previously processed coverage histogram data (possibly combining with sam-data and/or coverage-data), to re-generate plots. The command can be issued multiple times to read in multiple files e.g. LinkStats ... cov-gap-hist-data [OPTIONS] file1 ... cov-gap-hist-data [OPTIONS] file2 ...

save-csvs

LinkStats ... save-csvs [OPTIONS] PREFIX

  Saves summary, molecule or histogram data to CSV files at PREFIX_.

  By default, only summary data is saved.

Options:
  --summ / --no-summ          Save summary data table. Default=True.
  --mol / --no-mol            Save molecule data table. Default=False.
  --cov / --no-cov            Save coverage data table. Default=False.
  --mol-hist / --no-mol-hist  Save molecular-length histogram data table.
                              Default=False.
  --cov-hist / --no-cov-hist  Save coverage-gap histogram data table.
                              Default=False.
  --help                      Show help message and exit.

Command to save data to CSV file(s), prepended with a given PREFIX_ (can include a relative path e.g. ../dir1/dir2/prefix).

Options

  • --summ / --no-summ Save summary QV data (defaults to true). Only possible if at least one sam-data command is given.
  • --mol / --no-mol Save molecule data (defaults to false). Can be re-read by the molecule-data command. Only possible if at least one sam-data or molecule-data command is given.
  • --cov / --no-cov Save coverage gap data (defaults to false). Can be re-read by the coverage-data command. Only possible if at least one sam-data or coverage-data command is given.
  • --mol-hist / --no-mol-hist Save molecular-length histogram data (defaults to false). Can be re-read by the mol-len-hist-data command. Only possible if at least one sam-data, molecule-data or mol-len-hist-data command is given.
  • --cov-hist / --no-cov-hist Save coverage-gap histogram data (defaults to false). Can be re-read by the cov-gap-hist-data command. Only possible if at least one sam-data, molecule-data or cov-gap-hist-data command is given.

Example

> LinkStats ... save-csvs --summ --mol --cov --mol-hist --cov-hist csvs/sample ...

Save all data to the folder csvs with the prefix sample_.

save-plots

LinkStats ... save-plots [OPTIONS] PREFIX

  Generates plots from any histogram data and saves them at PREFIX_.

Options:
  --help  Show help message and exit.

Generates and saves histogram plots to PNG files, prepended with a given PREFIX_ (can include a relative path e.g. ../dir1/dir2/prefix).

Typical Usage Patterns

Read Multiple Alignment Files

> LinkStats sam-data sample_1.cram sam-data sample_2.bam sam-data sample_3.sam save-csvs all_samples save-plots all_samples

Merge Molecule / Coverage Data

> LinkStats sam-data sample_1.cram save-csvs --mol --cov sample_1
> LinkStats sam-data sample_2.cram save-csvs --mol --cov sample_2
> LinkStats sam-data sample_3.cram save-csvs --mol --cov sample_3
> LinkStats molecule-data sample_1_molecular_data.csv.bz2 molecule-data sample_2_molecular_data.csv.bz2 molecule-data sample_3_molecular_data.csv.bz2 coverage-data sample_1_coverage_data.csv.bz2 coverage-data sample_2_coverage_data.csv.bz2 coverage-data sample_3_coverage_data.csv.bz2 save-csvs --no-summ --mol-hist --cov-hist all_samples save-plots all_samples

Merge Histogram Data

> LinkStats sam-data sample_1.cram save-csvs --mol-hist --cov-hist sample_1
> LinkStats sam-data sample_2.cram save-csvs --mol-hist --cov-hist sample_2
> LinkStats sam-data sample_3.cram save-csvs --mol-hist --cov-hist sample_3
> LinkStats mol-len-hist-data sample_1_molecular_length_histograms.csv.bz2 mol-len-hist-data sample_2_molecular_length_histograms.csv.bz2 mol-len-hist-data sample_3_molecular_length_histograms.csv.bz2 cov-gap-hist-data sample_1_coverage_gap_histograms.csv.bz2 cov-gap-hist-data sample_2_coverage_gap_histograms.csv.bz2 cov-gap-hist-data sample_3_coverage_gap_histograms.csv.bz2 save-plots all_samples