Validation suite for RustQC -- comparing its outputs against the upstream bioinformatics tools it reimplements.
RustQC reimplements common RNA-seq QC tools in Rust. This repository:
- Generates reference outputs from the original upstream tools (RSeQC, dupRadar, featureCounts, Qualimap, preseq, samtools)
- Runs RustQC on the same input data
- Compares outputs between RustQC and upstream tools, with per-tool tolerance rules
- Tracks regressions via nf-test snapshots -- if RustQC output changes, the snapshot test fails
All upstream tools are run via standard nf-core modules, so reference outputs match what users get from nf-core/rnaseq.
| RustQC output | Upstream tool | Comparison |
|---|---|---|
| dupRadar | dupRadar (R/Bioconductor) | TSV match, float tolerance 1e-10 |
| featureCounts | Subread featureCounts | Column subset match (gene + count) |
| bam_stat | RSeQC bam_stat.py | Text match, skip log headers |
| infer_experiment | RSeQC infer_experiment.py | Text match, skip info headers |
| read_duplication | RSeQC read_duplication.py | TSV exact match |
| read_distribution | RSeQC read_distribution.py | TSV match, 0.5 relative tolerance |
| junction_annotation | RSeQC junction_annotation.py | Row-sorted TSV + BED comparison |
| junction_saturation | RSeQC junction_saturation.py | Structural check (stochastic tool) |
| inner_distance | RSeQC inner_distance.py | TSV match, 0.1 relative tolerance |
| qualimap | Qualimap rnaseq | comparison TBD |
| preseq | preseq lc_extrap | comparison TBD |
There are two layers of tests, both using nf-test:
Both BAM and GTF are required inputs. The pipeline automatically derives a BED gene model from the GTF annotation using the GTF2BED local module. This BED file is used by the upstream RSeQC Python tools that require a BED gene model (read_distribution, inner_distance, junction_annotation, junction_saturation, infer_experiment, tin). RustQC does not need a BED file — it works directly from the GTF annotation.
Each test runs one nf-core module (e.g. RSEQC_BAMSTAT) against the small test dataset and snapshots the output. This captures what the upstream tool produces so we can detect if upstream changes.
Each test runs the RUSTQC_RNA process and does two things:
- Cross-comparison -- uses
CompareUtilsto compare RustQC output against the reference files insnapshots/rna/small/, with per-tool tolerance rules - Regression snapshot -- calls
snapshot()on the RustQC output, so any future change to RustQC output is caught
RustQC output files are found by suffix pattern (e.g. endsWith('bam_stat.txt')), making the tests resilient to output directory structure changes.
test-data/rna/small/ Small test BAM + annotations (~7 MB, committed)
snapshots/rna/small/ Reference outputs (committed, plots gitignored)
dupradar/ Upstream dupRadar output
featurecounts/ Upstream featureCounts output
rseqc/ Upstream RSeQC output, one subdir per tool
bam_stat/ bam_stat.txt
infer_experiment/ infer_experiment.txt
read_distribution/ read_distribution.txt
read_duplication/ pos.DupRate.xls, seq.DupRate.xls, ...
inner_distance/ inner_distance.txt, inner_distance_freq.txt, ...
junction_annotation/ junction.bed, junction.xls, ...
junction_saturation/ junctionSaturation_plot.r
rustqc/ RustQC output, same tool subdirectory structure
dupradar/ test_dupMatrix.txt, test_intercept_slope.txt, ...
featurecounts/ test.featureCounts.tsv, ...
rseqc/bam_stat/ test.bam_stat.txt
rseqc/infer_experiment/ test.infer_experiment.txt
... (mirrors upstream structure)
tests/
lib/CompareUtils.groovy Shared comparison utilities (tsvMatch, textMatch, etc.)
rna/upstream/ 9 nf-test files, one per upstream tool
rna/rustqc/ 9 nf-test files, one per RustQC tool output
rna/pipeline.nf.test Smoke test for the full workflow
modules/local/rustqc_rna.nf RustQC Nextflow process definition
modules/local/gtf2bed/ GTF2BED module (converts GTF to BED gene model)
bin/gtf2bed GTF2BED conversion script
modules/nf-core/ 14 upstream tool modules (dupradar, qualimap, rseqc/*, subread, samtools)
workflows/rustqc-benchmarks.nf Main pipeline workflow
conf/
rna_test.config Small dataset parameters
rna_test_full.config Large dataset parameters (S3, incomplete)
modules.config Per-module publishDir and ext.args settings
The nf-test.config already sets the test,docker profiles, so no --profile flag is needed.
nf-test test tests/rna/upstream/ tests/rna/rustqc/# All upstream reference tests
nf-test test --tag upstream
# All RustQC comparison tests
nf-test test --tag rustqc
# A single tool (runs both upstream + rustqc for that tool)
nf-test test --tag bam_stat
# Everything tagged rna (upstream + rustqc + pipeline)
nf-test test --tag rnanf-test test --tag rna --verboseEvery test has multiple tags so you can slice in different ways:
| Tag | What it selects |
|---|---|
upstream |
All 9 upstream nf-core module tests |
rustqc |
All 9 RustQC comparison tests |
rna |
All RNA tests (upstream + rustqc + pipeline) |
small |
Small dataset tests |
bam_stat, dupradar, featurecounts, qualimap, ... |
Both upstream + rustqc tests for that tool |
pipeline |
Pipeline-level smoke test |
The Nextflow pipeline can also be run standalone (e.g. on Seqera Platform for benchmarking):
# RustQC only (default)
nextflow run main.nf -profile rna_test,docker
# Upstream tools only
nextflow run main.nf -profile rna_test,docker --run_upstream --run_rustqc false
# Both
nextflow run main.nf -profile rna_test,docker --run_upstreamLaunch the pipeline from Seqera Platform using the pre-configured test profiles.
Both profiles set strandedness = 'reverse' (matching the library prep of the bundled test data).
Small test (local test data, ~7 MB BAM):
Pipeline: https://github.com/seqeralabs/rustqc-benchmarks
Revision: main
Profile: rna_test,docker
Parameters: --run_upstream true
Large test (GM12878 markdup-sorted BAM from nf-core/rnaseq megatests, ~8 GB):
Pipeline: https://github.com/seqeralabs/rustqc-benchmarks
Revision: main
Profile: rna_test_full,docker
Parameters: --run_upstream true
Strandedness is not auto-detected. This pipeline takes a pre-aligned BAM as input, so there is no Salmon-based strandedness inference like nf-core/rnaseq. The test profiles default to
reverse. When running with your own data, set--strandednessto match your library prep (reverse,forward, orunstranded) — this affects Qualimap, dupRadar, and RustQC output.
nextflow run main.nf -profile rna_test,docker \
--rustqc_image '' \
--rustqc_binary /path/to/rustqc| Parameter | Default | Description |
|---|---|---|
--run_rustqc |
true |
Run RustQC |
--run_upstream |
false |
Run upstream reference tools |
--rustqc_image |
ghcr.io/seqeralabs/rustqc:dev |
RustQC Docker image |
--rustqc_binary |
null |
Local RustQC binary (overrides Docker) |
--bam / --bai |
(from profile) | Input BAM and index (required) |
--gtf |
(from profile) | GTF annotation file (required) |
--sample_id |
test |
Sample identifier (used in output filenames) |
--paired |
true |
Paired-end data |
--strandedness |
unstranded |
Library strandedness |
--outdir |
results |
Output directory |
If RustQC output intentionally changes, the regression snapshots need updating:
# Re-run RustQC tests and update their .snap files
nf-test test --tag rustqc --update-snapshot
# Review the diff
git diff tests/rna/rustqc/*.nf.test.snap
# If the changes look correct, also update the committed RustQC snapshots.
# Find an output dir from the nf-test work directory and copy files
# into the matching subdirectory structure under snapshots/rna/small/rustqc/.
# For example:
SRC=".nf-test/tests/<hash>/work/<hash>/output"
cp "$SRC"/test_dupMatrix.txt snapshots/rna/small/rustqc/dupradar/
cp "$SRC"/test.featureCounts.tsv snapshots/rna/small/rustqc/featurecounts/
cp "$SRC"/test.bam_stat.txt snapshots/rna/small/rustqc/rseqc/bam_stat/
# ... etc for each tool
# Commit
git add tests/rna/rustqc/*.nf.test.snap snapshots/rna/small/rustqc/
git commit -m "Update RustQC snapshots for <reason>"If nf-core modules are updated and upstream tool output changes:
# Re-run upstream tests and update their .snap files
nf-test test --tag upstream --update-snapshot
# Copy fresh upstream outputs to the reference snapshots directory
# (the rustqc tests read files from snapshots/rna/small/ for comparison)
# dupradar (note: uses test_ prefix in filenames)
cp .nf-test/tests/<hash>/work/<hash>/test_dupMatrix.txt snapshots/rna/small/dupradar/dupMatrix.txt
cp .nf-test/tests/<hash>/work/<hash>/test_intercept_slope.txt snapshots/rna/small/dupradar/intercept_slope.txt
# featurecounts
cp .nf-test/tests/<hash>/work/<hash>/test.featureCounts.tsv snapshots/rna/small/featurecounts/
cp .nf-test/tests/<hash>/work/<hash>/test.featureCounts.tsv.summary snapshots/rna/small/featurecounts/
# rseqc -- each tool has its own subdirectory
cp .nf-test/tests/<hash>/work/<hash>/test.bam_stat.txt snapshots/rna/small/rseqc/bam_stat/bam_stat.txt
cp .nf-test/tests/<hash>/work/<hash>/test.infer_experiment.txt snapshots/rna/small/rseqc/infer_experiment/
cp .nf-test/tests/<hash>/work/<hash>/test.pos.DupRate.xls snapshots/rna/small/rseqc/read_duplication/pos.DupRate.xls
# ... etc for each tool
# Re-run rustqc tests to check if comparisons still hold
nf-test test --tag rustqc
# Commit
git add snapshots/ tests/rna/upstream/*.nf.test.snap
git commit -m "Regenerate upstream reference snapshots"The shared comparison library (tests/lib/CompareUtils.groovy) provides:
Line-by-line TSV comparison with configurable tolerance.
CompareUtils.tsvMatch(
path(actualFile).readLines(),
path(expectedFile).readLines(),
[
tolerance: 1e-10, // absolute numeric tolerance
relTolerance: 0.02, // relative numeric tolerance (passes if EITHER is met)
skipPrefixes: ['#'], // ignore lines starting with these
skipColumns: [1,2] as Set, // ignore specific columns
delimiter: '\t', // column delimiter (default: tab)
]
)Exact line-by-line text comparison, filtering lines by prefix.
CompareUtils.textMatch(
path(actualFile).readLines(),
path(expectedFile).readLines(),
['Load BAM', 'processing'] // ignore lines starting with these
)Asserts a file exists and meets a minimum size. Useful for plot files.
CompareUtils.fileMinSize(path(plotFile), 1000)This repo is organized by RustQC subcommand. To add a new suite (e.g. rustqc dna):
- Create
modules/local/rustqc_dna.nf - Install relevant nf-core modules (
nf-core modules install ...) - Add test data to
test-data/dna/small/ - Add
conf/dna_test.configwith input paths - Write upstream tests in
tests/dna/upstream/ - Run upstream tests, copy outputs to
snapshots/dna/small/ - Write RustQC comparison tests in
tests/dna/rustqc/ - Extend the workflow or create
workflows/dna.nf
Nothing in the RNA suite is touched.
Built with the nf-core pipeline template and community modules.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.