-
Notifications
You must be signed in to change notification settings - Fork 47
feat: tpchgen runners as a lib #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks @clflushopt! No comments on my side, seems quite straight forward to use! |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @clflushopt -- this is a really neat idea
it looks like some of the CI tests are failing, and I have a few API suggestions (but nothing I think is a deal breaker or that we couldn't do as a follow on PR
| // Re-export commonly used types | ||
| pub use ::parquet::basic::Compression; | ||
|
|
||
| // Internal modules (pub for use by binary, but considered internal API) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't undertstand this comment -- if they are pub then this is all part of the public API 🤔
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` | ||
| pub async fn generate(self) -> io::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the only function on TpchGenerator is generate() we can probably simplify the API by consolidating the builder and generator
So instead of
TpchGenerator::builder()
.scale_factor(1.0)
.build()
.generate()
.await?;Something like
TpchGenerator::new()
.scale_factor(1.0)
.generate()
.await?;| /// ```no_run | ||
| /// use tpchgen_cli::{GeneratorConfig, OutputFormat}; | ||
| /// | ||
| /// // Usually you would use TpchGenerator::builder() instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we provide a Builder API what is the usecase for the config as well?
If we made the GeneratorConfig non public, the API becomes simpler I think, and we could add new fields without causing breaking changes in the future
Or put another way, I feel like we should have either GeneratorConfig or TpchGeneratorBuilder but not both 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking along the same lines, I think we can deprecate GeneratorConfig in favor of exposing the builder API.
| } | ||
|
|
||
| /// Set the scale factor (e.g., 1.0 for 1GB, 10.0 for 10GB) | ||
| pub fn scale_factor(mut self, scale_factor: f64) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally recommend:
- Adding accessor for all these properties (e.g. scale_factor() returns
f64) - Changing the setters for the builder api to be named using
with_prefixes -- e.g.with_scale_factor
| //! | ||
| //! This crate provides a CLI for generating TPCH data and tries to remain close | ||
| //! API wise to the original dbgen tool, as in we use the same command line flags | ||
| //! and arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to move tpchgen-cli/src/main.rs into `tchgen-cli/bin/tpchgen-cli.rs or something to make it clear this is now a driver and the crate can be used as a library too
Exposes the CLI core runner capabilities as a library allowing it to be ended rather easily.