Skip to content

Conversation

@clflushopt
Copy link
Owner

Exposes the CLI core runner capabilities as a library allowing it to be ended rather easily.

@clflushopt clflushopt requested a review from alamb November 19, 2025 00:06
@guillesd
Copy link

Thanks @clflushopt! No comments on my side, seems quite straight forward to use!

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @clflushopt -- this is a really neat idea

it looks like some of the CI tests are failing, and I have a few API suggestions (but nothing I think is a deal breaker or that we couldn't do as a follow on PR

// Re-export commonly used types
pub use ::parquet::basic::Compression;

// Internal modules (pub for use by binary, but considered internal API)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't undertstand this comment -- if they are pub then this is all part of the public API 🤔

/// # Ok(())
/// # }
/// ```
pub async fn generate(self) -> io::Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the only function on TpchGenerator is generate() we can probably simplify the API by consolidating the builder and generator

So instead of

TpchGenerator::builder()
        .scale_factor(1.0)
        .build()
        .generate()
        .await?;

Something like

TpchGenerator::new()
        .scale_factor(1.0)
        .generate()
        .await?;

/// ```no_run
/// use tpchgen_cli::{GeneratorConfig, OutputFormat};
///
/// // Usually you would use TpchGenerator::builder() instead
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we provide a Builder API what is the usecase for the config as well?

If we made the GeneratorConfig non public, the API becomes simpler I think, and we could add new fields without causing breaking changes in the future

Or put another way, I feel like we should have either GeneratorConfig or TpchGeneratorBuilder but not both 🤔

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking along the same lines, I think we can deprecate GeneratorConfig in favor of exposing the builder API.

}

/// Set the scale factor (e.g., 1.0 for 1GB, 10.0 for 10GB)
pub fn scale_factor(mut self, scale_factor: f64) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally recommend:

  1. Adding accessor for all these properties (e.g. scale_factor() returns f64)
  2. Changing the setters for the builder api to be named using with_ prefixes -- e.g. with_scale_factor

//!
//! This crate provides a CLI for generating TPCH data and tries to remain close
//! API wise to the original dbgen tool, as in we use the same command line flags
//! and arguments.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to move tpchgen-cli/src/main.rs into `tchgen-cli/bin/tpchgen-cli.rs or something to make it clear this is now a driver and the crate can be used as a library too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants