The project is organized into two crates:
tpchgen: The core library that implements the data generation logic for TPCH.tpchgen-arrow: Generates the TPCH data directly as the Apache Arrow in memory formattpchgen-cli: A CLI tool that uses thetpchgenlibrary to generate TPCH data.
The tpchgen crate is designed to be embeddable in as many locations as
possible and thus has no dependencies by design. For example, it does
not depend on arrow or parquet crates or display libraries.
tpchgen-arrow is similarly designe to be embeddable with minimal dependencies
and only depends on the arrow crate
The tpchgen-cli crate is designed to include many useful features, and thus
has many more dependencies.
Speed is a very important aspect of this project, and care has been taken to keep the code as fast as possible, using some of the following techniques:
- Avoiding heap allocations during data generation
- Integer arithmetic and display instead of floating point arithmetic and display
- Using multiple cores and tuned buffer sizes