-
Notifications
You must be signed in to change notification settings - Fork 105
Tier: Beginner
The evolving Q&A lives in GitHub Discussions → FAQ category. This page collects the questions we hear most often as a quick reference.
Pandas is in-memory and Python-native. qsv is streaming (for most commands) and shell-native. They serve different needs:
- For sub-second profiling of a multi-GB CSV in a shell or CI step, qsv wins by 10-50×.
- For ad-hoc analytics inside a notebook with charts and ML, pandas wins.
- The two coexist — see Integrations → Python notebooks.
DuckDB is an embedded SQL database. qsv is a CSV-shaped Swiss-army knife. They complement each other:
-
qsv to parquet+ DuckDB = the gold-standard CSV-to-analytics path -
qsv sqlpruns Polars SQL;qsv scoresql --duckdblets you analyze a query via DuckDB's planner. -
qsv describegptwithQSV_DUCKDB_PATHuses DuckDB for SQL-RAG.
csvkit was the first serious Python CSV toolkit and is excellent for what it does. qsv is 10-14× faster on real workloads (compiled Rust + multithreading) and has many more commands (geocoding, validation, fetch, describegpt, …). If you're already happy with csvkit on small files, no rush — they serve overlapping niches. If you're processing > 100 MB at a time, qsv will save you minutes per run.
Miller (mlr) is broader (handles TSV, JSON, DKVP, PPRINT, …); qsv is more CSV-specialized with deeper stats and validation. There's significant overlap; pick whichever feels right to your shell muscle memory. Many users use both.
xsv was BurntSushi's original CSV tool. qsv is a maintained, multithreaded, feature-expanded fork. If you like xsv, install qsvlite — it's the same command set with the same flags, just faster and actively developed. See Binary Variants and Comparison.
Yes. There's an MSI Easy Installer, Scoop packages, and direct prebuilt downloads. Some notes:
- macOS line endings (
\n) vs Windows (\r\n) are handled transparently. SetQSV_OUTPUT_BOM=1for Excel-friendly outputs. -
foreachworks best inside Git Bash on Windows. -
lens,clipboard,prompt, andpro lensare part of the UI feature group — full support on Windows.
See Installation → Windows notes.
Rust 1.95 (as of qsv 20.0.0). The MSRV bumps occasionally — check the README badge.
Yes. Prebuilt binaries for aarch64-apple-darwin (macOS), linux-aarch64-gnu, and Windows-on-ARM. ARM builds have target-cpu=native enabled — fastest on those platforms.
Yes. Prebuilt binaries for linux-powerpc64le-gnu and s390x. Both have target-cpu=native enabled.
Commands marked 🤯 in the README legend load the entire CSV into memory: dedup (unless --sorted), pragmastat, reverse (unless indexed), sort, stats (for advanced metrics), table, transpose.
Everything else streams. For files larger than RAM, use the ext-* companions (extsort, extdedup) and Polars commands (sqlp, joinp, pivotp).
See Performance Tuning → Memory management.
A <file>.idx sidecar that lets random-access commands skip parsing rows they don't need. Build it once (qsv index file.csv); many commands speed up 2-9×. See Indexing, Compression & Diff.
A <file>.stats.csv.data.jsonl sidecar that pre-computes statistics other commands can reuse. Many "smart" commands (frequency, schema, validate, pragmastat, pivotp, tojsonl, describegpt, scoresql) skip work when the cache exists. See Stats Cache & Caching.
Commands with the 🪄 symbol use the stats / frequency caches to work smarter — auto-pick aggregations, short-circuit known-cardinality columns, infer the right JSON type, etc. See docs/PERFORMANCE.md.
qsv --updateThis works for prebuilt binaries (the self_update feature is built in). For package-manager installs, use the package manager's update command. To disable update checks:
export QSV_NO_UPDATE=1qsv --versionThe output includes the version, enabled features, and bundled library versions.
qsv --envlistLists every QSV_* env var plus allocator-specific (MIMALLOC_* / JEMALLOC_* / MALLOC_CONF) and HTTP proxy vars.
QSV_LOG_DIR (default: directory where qsv was started). Enable with QSV_LOG_LEVEL=info. The MCP server has a separate audit log (qsvmcp.log) controlled by QSV_MCP_LOG_LEVEL. See docs/Logging.md.
Only if you tell it to. qsv describegpt calls whichever LLM endpoint you point it at via QSV_LLM_BASE_URL — that can be OpenAI, but it can equally be Ollama on localhost, LM Studio, Jan, or any OpenAI-compatible endpoint.
The qsv MCP Server and Claude Cowork Plugin only send statistical summaries (stats + frequency) as LLM context, not the raw rows. See Discussions: Why MCP?.
The statistical context is deterministic (qsv's stats and frequency caches). The LLM narrative is not (LLMs are stochastic). In SQL-RAG sub-mode, the LLM writes SQL that qsv runs — the SQL output IS deterministic.
- Questions: Discussions Q&A
- FAQ-worthy: Discussions FAQ
- Confirmed bugs: Issues
- Security: SECURITY.md
Include qsv --version, qsv --envlist, and a minimal reproducer.
Yes — the project welcomes PRs. See CONTRIBUTING.md for the development setup, plus Contributing to the Wiki for wiki edits specifically.
- Troubleshooting — specific error → fix
- Comparison vs others
- Discussions → FAQ category — the evolving canonical Q&A
- Discussions → Q&A category
- External Resources
- Contributing to the Wiki
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Conversion & I/O
- Geospatial
- HTTP & Web
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation