Skip to content
Joel Natividad edited this page May 13, 2026 · 2 revisions

Performance Tuning

Tier: Advanced

Canonical reference: docs/PERFORMANCE.md and the TL;DR docs/PERFORMANCE_TLDR.md. Benchmarks: docs/BENCHMARKS.md and the live dashboard at qsv.dathere.com/benchmarks.

This page is a workflow guide to performance — when to reach for which knob, in what order. Numbers quoted on this page link to BENCHMARKS.md as the source of truth.

The five-minute rule

If you have five minutes to make qsv faster on your workload, in order:

  1. Index your file. qsv index data.csv — one-time cost (~14s on 15 GB). Speeds up 9 commands.
  2. Pre-populate the stats cache. qsv stats --stats-jsonl data.csv — makes frequency, schema, validate, pragmastat, pivotp, describegpt, sqlp, and scoresql smarter.
  3. Set QSV_AUTOINDEX_SIZE. Auto-creates the index for any file above the threshold; auto-refreshes stale ones.
  4. For Polars commands, generate a Polars schema. qsv schema --polars data.csv — written to data.pschema.json, picked up automatically by sqlp / joinp / pivotp.
  5. For files > RAM, use extsort / extdedup instead of sort / dedup.

Most users stop here. The rest of this page is for the remaining 1%.

When to index, and what gets faster

qsv builds a <file>.idx sidecar that lets commands skip parsing rows they don't need.

Command Without index With index
count scans the file instantaneous
sample (reservoir) scans the file random I/O — no full scan
slice parses rows up to the slice parses only the sliced rows
stats streaming, single-threaded streaming, multithreaded
frequency single-threaded multithreaded
split single-threaded multithreaded
schema single-threaded multithreaded
luau sequential mode only unlocks random-access mode + LASTROW
search sequential parallel (per-chunk)

Index size is typically ~1-2% of the source file. A 15 GB CSV indexes in ~14 seconds and the index is ~27 MB.

qsv index huge.csv
# Or — auto-index everything above 100 MB:
export QSV_AUTOINDEX_SIZE=100000000

See Indexing, Compression & Diff → index.

Stats cache — the secret weapon

Running qsv stats --stats-jsonl <file> writes two sidecar files:

  • <file>.stats.csv — human-readable
  • <file>.stats.csv.data.jsonl — machine-readable (the file other commands actually consume)

Every "smart" command (🪄 in the README legend) automatically picks up the JSONL and uses it:

  • frequency — short-circuits all-unique columns (would otherwise blow memory)
  • schema — skips redundant type inference
  • validate — faster generated-schema runs
  • pragmastat — date-aware mode requires the cache
  • pivotp — smart aggregation auto-selection
  • tojsonl — smart JSON type inference
  • sqlp scoresql — query plan analysis
  • describegpt — deterministic statistical context for the LLM
  • sample --systematic/--weighted/--cluster — extra checks against cardinality

Set QSV_STATSCACHE_MODE=force to automatically create the cache whenever a smart command runs on a file that doesn't have one yet. See Stats Cache & Caching.

Multithreading

qsv detects logical CPUs and uses all of them by default. Cap with QSV_MAX_JOBS=N if you want to share CPU with other workloads.

Always-multithreaded (🚀): apply, applydp, blake3, datefmt, dedup, diff, excel, extsort, geocode, joinp, jsonl, replace, snappy, sort, sqlp, template, to, tojsonl, validate.

Multithreaded with an index (🏎️): frequency, sample, schema, search, searchset, split, stats.

For the full mapping, see the README legend.

Memory management

Commands marked 🤯 load the entire file into memory:

  • dedup (unless --sorted)
  • pragmastat
  • reverse (unless an index is present — then constant memory)
  • sort — for files > RAM use extsort
  • stats — for --cardinality / --quartiles / --median modes
  • table
  • transpose (unless --multipass or --long)

Commands marked 😣 use memory proportional to column cardinality:

  • frequency
  • schema
  • tojsonl

For huge files where you can't avoid memory mode, set the safety env vars:

export QSV_MEMORY_CHECK=1                  # opt into pre-check
export QSV_FREEMEMORY_HEADROOM_PCT=10      # reserve 10% of free RAM

With QSV_MEMORY_CHECK=1, qsv computes available memory (including swap), applies a platform multiplier (1.3× macOS, 1.15× Linux, 1.0× Windows), subtracts the headroom, and aborts if the file won't fit.

Approximate algorithms — bounded memory for huge cardinalities

stats and frequency can switch to Apache DataSketches for constant-memory operation:

# stats: approximate quartiles (t-digest) and cardinality (HyperLogLog)
qsv stats --everything --cardinality-method approx --quantile-method approx data.csv

# frequency: top-K with bounded error
qsv frequency --sketch-method frequent_items --sketch-map-size 16384 data.csv

Both add ~1.5% error in exchange for O(1) memory per column. Use when exact methods OOM.

Build-time optimizations

target-cpu=native

Prebuilt binaries for x86_64 are conservative to avoid SIGILL on older CPUs. If you build from source on your own machine, opt into your CPU's full instruction set:

CARGO_BUILD_RUSTFLAGS='-C target-cpu=native' \
  cargo build --release --locked --bin qsv -F all_features

Apple Silicon, Windows-on-ARM, IBM Power, and IBM Z prebuilds always have CPU optimizations enabled — no need to rebuild on those platforms.

Nightly Release Builds

qsv ships a nightly-toolchain prebuilt with extra optimizations enabled. See docs/PERFORMANCE.md#nightly-release-builds.

Allocator choice

qsv builds with mimalloc by default. For huge-file workloads, jemalloc is sometimes faster. Both have their own environment variables — see the mimalloc options and jemalloc options docs.

To see which allocator your binary uses: qsv --version shows it in the version string.

Buffer sizes

export QSV_RDR_BUFFER_CAPACITY=131072    # reader buffer, default 128 KB
export QSV_WTR_BUFFER_CAPACITY=524288    # writer buffer, default 512 KB

Bump up on slow disks (network mounts, spinning disks). Bump down for memory-constrained environments.

Polars-specific tuning

Polars-powered commands (🐻‍❄️): color, count, joinp, lens, pivotp, prompt, schema, scoresql, sqlp. They respond to Polars's env vars:

export POLARS_VERBOSE=1                # log plan + execution to stderr
export POLARS_PANIC_ON_ERR=1           # panic instead of returning an error
export POLARS_BACKTRACE_IN_ERR=1       # include backtrace in errors

For the full set, see Polars env vars.

The single biggest Polars optimization is the Polars schema (qsv schema --polars data.csv). It tells Polars the column types without an inference scan.

Tuning checklist

  • qsv index <file> (or QSV_AUTOINDEX_SIZE)
  • qsv stats --stats-jsonl <file> (or QSV_STATSCACHE_MODE=force)
  • qsv schema --polars <file> (for Polars commands)
  • Replace sortextsort, dedupextdedup for files > RAM
  • For huge cardinalities: --cardinality-method approx / --quantile-method approx
  • For huge files: --sketch-method frequent_items in frequency
  • Set QSV_MAX_JOBS to share CPU; QSV_MEMORY_CHECK=1 for safety
  • Build with target-cpu=native if compiling from source
  • Set QSV_TMPDIR to a fast spill location (NVMe preferred)
  • If on x86_64 and getting SIGILL, switch to the portable qsvp variant

See also

Clone this wiki locally