GitHub - jsai28/dfkit: Command-line toolkit for interactive SQL and data manipulation on CSV, Parquet, JSON, and Avro files. Powered by Apache Arrow and DataFusion.

dfkit

dfkit is an extensive suite of command-line functions to easily view, query, and manipulate CSV, Parquet, JSON, and Avro files. Written in Rust and powered by Apache Arrow and Apache DataFusion. Currently a work in progress.

Highlights

Here's a high level overview of some of the features in dfkit:

Supports viewing, querying, and manipulating files stored locally, on the web, or from cloud storage services such as Amazon S3 and Google Cloud Storage.
Works with CSV, JSON, Parquet, and Avro files
Ultra-fast performance powered by Apache Arrow and DataFusion
Transform data with SQL or with several other built-in functions
Written entirely in Rust!

Commands

dfkit 0.2.0

USAGE:
    dfkit <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    cat         Concatenate multiple files or all files in a directory
    convert     Convert file format (CSV, Parquet, JSON)
    count       Count the number of rows in a file
    dedup       Remove duplicate rows
    describe    Show summary statistics for a file
    help        Prints this message or the help of the given subcommand(s)
    query       Run a SQL query on a file
    reverse     Reverse the order of rows
    schema      Show schema of a file
    sort        Sort rows by one or more columns
    split       Split a file into N chunks
    view        View the contents of a file

Installation

dfkit can be installed via cargo (requires rust):

cargo install dfkit

Examples

View takes the filename and an optional limit argument.

dfkit view sample.csv

+-------+-----+
| name  | age |
+-------+-----+
| Joe   | 34  |
| Matt  | 24  |
| Emily | 65  |
+-------+-----+

Query allows you to query the data with SQL. An optional output argument can also be supplied to save the results.

dfkit query sample.csv --sql "SELECT * FROM t WHERE age < 50"

+------+-----+
| name | age |
+------+-----+
| Joe  | 34  |
| Matt | 24  |
+------+-----+

Show the file schema.

dfkit schema sample.csv

+-------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-------------+-----------+-------------+
| name        | Utf8      | YES         |
| age         | Int64     | YES         |
+-------------+-----------+-------------+

Show summary statistics of a file with describe.

dfkit describe sample.csv

+------------+-------+-------------------+
| describe   | name  | age               |
+------------+-------+-------------------+
| count      | 3     | 3.0               |
| null_count | 0     | 0.0               |
| mean       | null  | 41.0              |
| std        | null  | 21.37755832643195 |
| min        | Emily | 24.0              |
| max        | Matt  | 65.0              |
| median     | null  | 34.0              |
+------------+-------+-------------------+

Reverse the order of rows (save the output with --output)

dfkit reverse sample.csv

+-------+-----+
| name  | age |
+-------+-----+
| Emily | 65  |
| Matt  | 24  |
| Joe   | 34  |
+-------+-----+

Sort rows and optionally save the output with --output. You can specify multiple columns as a comma separated string.

dfkit sort sample.csv --columns "age"

+-------+-----+
| name  | age |
+-------+-----+
| Matt  | 24  |
| Joe   | 34  |
| Emily | 65  |
+-------+-----+

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dfkit

Highlights

Commands

Installation

Examples

About

Releases

Packages

Languages

License

jsai28/dfkit

Folders and files

Latest commit

History

Repository files navigation

dfkit

Highlights

Commands

Installation

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages