Skip to content
Joel Natividad edited this page May 13, 2026 · 10 revisions

FAQ

Tier: Beginner

The evolving Q&A lives in GitHub Discussions → FAQ category. This page collects the questions we hear most often as a quick reference.

Why qsv vs other tools?

Why qsv instead of pandas?

Pandas is in-memory and Python-native. qsv is streaming (for most commands) and shell-native. They serve different needs:

  • For sub-second profiling of a multi-GB CSV in a shell or CI step, qsv wins by 10-50×.
  • For ad-hoc analytics inside a notebook with charts and ML, pandas wins.
  • The two coexist — see Integrations → Python notebooks.

Why qsv instead of DuckDB?

DuckDB is an embedded SQL database. qsv is a CSV-shaped Swiss-army knife. They complement each other:

  • qsv to parquet + DuckDB = the gold-standard CSV-to-analytics path
  • qsv sqlp runs Polars SQL; qsv scoresql --duckdb lets you analyze a query via DuckDB's planner.
  • qsv describegpt with QSV_DUCKDB_PATH uses DuckDB for SQL-RAG.

See Integrations → DuckDB.

Why qsv instead of csvkit?

csvkit was the first serious Python CSV toolkit and is excellent for what it does. qsv is 10-14× faster on real workloads (compiled Rust + multithreading) and has many more commands (geocoding, validation, fetch, describegpt, …). If you're already happy with csvkit on small files, no rush — they serve overlapping niches. If you're processing > 100 MB at a time, qsv will save you minutes per run.

Why qsv instead of Miller?

Miller (mlr) is broader (handles TSV, JSON, DKVP, PPRINT, …); qsv is more CSV-specialized with deeper stats and validation. There's significant overlap; pick whichever feels right to your shell muscle memory. Many users use both.

Why qsv instead of xsv?

xsv was BurntSushi's original CSV tool. qsv is a maintained, multithreaded, feature-expanded fork. If you like xsv, install qsvlite — it's the same command set with the same flags, just faster and actively developed. See Binary Variants and Comparison.

Compatibility

Does it work on Windows?

Yes. There's an MSI Easy Installer, Scoop packages, and direct prebuilt downloads. Some notes:

  • macOS line endings (\n) vs Windows (\r\n) are handled transparently. Set QSV_OUTPUT_BOM=1 for Excel-friendly outputs.
  • foreach works best inside Git Bash on Windows.
  • lens, clipboard, prompt, and pro lens are part of the UI feature group — full support on Windows.

See Installation → Windows notes.

What's the Rust MSRV?

Rust 1.95 (as of qsv 20.0.0). The MSRV bumps occasionally — check the README badge.

Does it run on ARM (Apple Silicon, Raspberry Pi, AWS Graviton)?

Yes. Prebuilt binaries for aarch64-apple-darwin (macOS), linux-aarch64-gnu, and Windows-on-ARM. ARM builds have target-cpu=native enabled — fastest on those platforms.

Does it run on IBM Power / Z mainframes?

Yes. Prebuilt binaries for linux-powerpc64le-gnu and s390x. Both have target-cpu=native enabled.

Concepts

Streaming vs in-memory commands?

Commands marked 🤯 in the README legend load the entire CSV into memory: dedup (unless --sorted), pragmastat, reverse (unless indexed), sort, stats (for advanced metrics), table, transpose.

Everything else streams. For files larger than RAM, use the ext-* companions (extsort, extdedup) and Polars commands (sqlp, joinp, pivotp).

See Performance Tuning → Memory management.

What's an index?

A <file>.idx sidecar that lets random-access commands skip parsing rows they don't need. Build it once (qsv index file.csv); many commands speed up 2-9×. See Indexing, Compression & Diff.

What's the stats cache?

A <file>.stats.csv.data.jsonl sidecar that pre-computes statistics other commands can reuse. Many "smart" commands (frequency, schema, validate, pragmastat, pivotp, tojsonl, describegpt, scoresql) skip work when the cache exists. See Stats Cache & Caching.

What's "automagical"? (🪄)

Commands with the 🪄 symbol use the stats / frequency caches to work smarter — auto-pick aggregations, short-circuit known-cardinality columns, infer the right JSON type, etc. See docs/PERFORMANCE.md.

Operations

How do I update qsv?

qsv --update

This works for prebuilt binaries (the self_update feature is built in). For package-manager installs, use the package manager's update command. To disable update checks:

export QSV_NO_UPDATE=1

How do I see what features my qsv has?

qsv --version

The output includes the version, enabled features, and bundled library versions.

How do I see active environment variables?

qsv --envlist

Lists every QSV_* env var plus allocator-specific (MIMALLOC_* / JEMALLOC_* / MALLOC_CONF) and HTTP proxy vars.

Where are logs?

QSV_LOG_DIR (default: directory where qsv was started). Enable with QSV_LOG_LEVEL=info. The MCP server has a separate audit log (qsvmcp.log) controlled by QSV_MCP_LOG_LEVEL. See docs/Logging.md.

AI / LLM

Does qsv send my data to OpenAI?

Only if you tell it to. qsv describegpt calls whichever LLM endpoint you point it at via QSV_LLM_BASE_URL — that can be OpenAI, but it can equally be Ollama on localhost, LM Studio, Jan, or any OpenAI-compatible endpoint.

The qsv MCP Server and Claude Cowork Plugin only send statistical summaries (stats + frequency) as LLM context, not the raw rows. See Discussions: Why MCP?.

Is describegpt deterministic?

The statistical context is deterministic (qsv's stats and frequency caches). The LLM narrative is not (LLMs are stochastic). In SQL-RAG sub-mode, the LLM writes SQL that qsv runs — the SQL output IS deterministic.

Reporting bugs

Where do I report bugs?

Include qsv --version, qsv --envlist, and a minimal reproducer.

Can I contribute?

Yes — the project welcomes PRs. See CONTRIBUTING.md for the development setup, plus Contributing to the Wiki for wiki edits specifically.

See also

Clone this wiki locally