feat(docker-images): Extrapolate image build flow from benchmark runs; Move configuration files into project sources. #17

Bill-hbrhbr · 2025-10-09T17:38:59Z

Description

Previously, each benchmark run tore down and rebuilt the Docker image for its associated service engine, resulting in unnecessary rebuilds of identical images.

Now we make Docker image building into a separate preparation step for the benchmark runs, so that future code will directly use build images instead of repeating the setup -> tear down process.

The presto_clp and presto_parquet images are temporarily disabled in this PR because their Dockerfiles are currently incorrect and result in failed builds:

presto_clp: we will like use the CLP package to spin up benchmark services
presto_parquet: will be fixed in a future PR.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

task docker-images:build builds the images uccessfully.

Summary by CodeRabbit

Documentation
- Replaced venv-centric setup with Docker-based benchmark instructions, added dataset download steps and clarified build/run flow for engines and dataset handling.
New Features
- Added per-service Docker image build workflows and task automation to build all or individual benchmark images.
- Added CLI tooling and helpers to construct and validate consistent image names.
Chores
- Cleared obsolete Dockerfile contents and tightened type-checker configuration.

coderabbitai · 2025-10-09T17:39:36Z

Warning

Rate limit exceeded

@Bill-hbrhbr has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 14 minutes and 40 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f324788 and f0e5170.

📒 Files selected for processing (1)

src/log_archival_bench/scripts/docker_images/__init__.py (1 hunks)

Walkthrough

Switches docs to a dataset- and Docker-centric workflow, adds Docker image build scripts and helpers, introduces centralized project path constants, registers a docker-images Taskfile, and clears contents of two existing Dockerfiles.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Replaces venv setup with dataset download and Docker-based execution instructions; adds Docker image build commands and reorganizes run instructions to emphasize datasets and per-service images.
Cleared Dockerfiles `assets/overhead_test/Dockerfile`, `assets/template/Dockerfile`	Removed existing Dockerfile contents (base image declaration and package installation steps).
Mypy config `pyproject.toml`	Adds `explicit_package_bases = true` and `mypy_path = ["src"]` under `[tool.mypy]`.
Project path config `src/log_archival_bench/utils/project_config.py`	New module exposing `PACKAGE_ROOT`, `BUILD_DIR`, and `CONFIG_DIR` for central path resolution.
Docker image scripts — CLI `src/log_archival_bench/scripts/docker_images/build.py`	New CLI script: parses `--service-name`, validates it, locates `CONFIG_DIR/docker-images/<service>/Dockerfile`, composes `docker build` (tag from utils) using `PACKAGE_ROOT` as context, runs via `subprocess.run`, returns exit code.
Docker image scripts — utils `src/log_archival_bench/scripts/docker_images/utils.py`	New helpers: `get_image_name(service_name: str) -> str` (formats tag `log-archival-bench-{service_name}-ubuntu-jammy:dev-{user}` with `USER` fallback) and `validate_service_name(service_name: str) -> None` (allowed: `clickhouse`, `clp`, `elasticsearch`, `sparksql`, `zstandard`).
Module docstrings `src/log_archival_bench/scripts/docker_images/__init__.py`, `src/log_archival_bench/utils/__init__.py`	New `__init__.py` files containing module-level docstrings.
Taskfile registration `taskfile.yaml`	Adds include for `taskfiles/docker-images/main.yaml` and defines `G_PROJECT_SRC_DIR = "{{.ROOT_DIR}}/src/log_archival_bench"`.
Docker images Taskfile `taskfiles/docker-images/main.yaml`	New Taskfile defining `G_BENCHMARK_DOCKER_SERVICES`, `G_DOCKER_IMAGE_SCRIPT_DIR`, a parallel `build` task to iterate services, and `build-single-benchmark-service-image` that runs the new `build.py`.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Task Runner
    participant Taskfile as taskfiles/docker-images/main.yaml
    participant BuildPy as build.py
    participant Utils as utils.py
    participant Docker as Docker Engine

    Runner->>Taskfile: invoke build (parallel)
    loop per service in G_BENCHMARK_DOCKER_SERVICES
        Taskfile->>BuildPy: run --service-name <service>
        BuildPy->>Utils: validate_service_name(service)
        alt valid
            Utils-->>BuildPy: ok
            BuildPy->>Utils: get_image_name(service)
            Utils-->>BuildPy: image_tag
            BuildPy->>BuildPy: locate Dockerfile at CONFIG_DIR/docker-images/service/Dockerfile
            alt Dockerfile found
                BuildPy->>Docker: docker build -t image_tag -f <Dockerfile> <PACKAGE_ROOT>
                Docker-->>BuildPy: build result
                BuildPy-->>Taskfile: exit 0
            else missing
                BuildPy-->>Taskfile: raise RuntimeError
            end
        else invalid
            Utils-->>BuildPy: raise ValueError
            BuildPy-->>Taskfile: error
        end
    end
    Taskfile-->>Runner: complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Verify validate_service_name list matches G_BENCHMARK_DOCKER_SERVICES and any external taskfile references.
Check get_image_name formatting and USER fallback for Docker tagging.
Inspect build.py path resolution (use of CONFIG_DIR / PACKAGE_ROOT) and error messages when Dockerfile is absent.
Confirm Taskfile iteration, labels, and parameter passing to the build.py invocation.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: extracting Docker image build into a separate flow and moving configuration files into project sources, which aligns with the detailed file modifications across the PR.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b986094 and e9faabe.

📒 Files selected for processing (11)

README.md (1 hunks)
assets/overhead_test/Dockerfile (0 hunks)
assets/template/Dockerfile (0 hunks)
pyproject.toml (1 hunks)
src/log_archival_bench/scripts/docker_images/__init__.py (1 hunks)
src/log_archival_bench/scripts/docker_images/build.py (1 hunks)
src/log_archival_bench/scripts/docker_images/utils.py (1 hunks)
src/log_archival_bench/utils/__init__.py (1 hunks)
src/log_archival_bench/utils/path_utils.py (1 hunks)
taskfile.yaml (1 hunks)
taskfiles/docker-images/main.yaml (1 hunks)

💤 Files with no reviewable changes (2)

assets/template/Dockerfile
assets/overhead_test/Dockerfile

🧰 Additional context used

🧬 Code graph analysis (1)

src/log_archival_bench/scripts/docker_images/build.py (2)

src/log_archival_bench/scripts/docker_images/utils.py (1)

get_image_name (6-12)

src/log_archival_bench/utils/path_utils.py (3)

get_config_dir (24-26)

get_package_root (14-16)

which (29-41)

🪛 Ruff (0.13.3)

src/log_archival_bench/scripts/docker_images/build.py

53-53: subprocess call: check for execution of untrusted input

(S603)

65-65: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (10)

src/log_archival_bench/scripts/docker_images/__init__.py (1)

1-1: LGTM!

Standard package initializer with an appropriate docstring.

pyproject.toml (1)

27-28: LGTM!

The mypy configuration correctly supports the new package structure under src/. The explicit_package_bases and mypy_path settings enable proper type checking for the newly introduced modules.

src/log_archival_bench/utils/__init__.py (1)

1-1: LGTM!

Standard package initializer with an appropriate docstring.

taskfile.yaml (2)

8-8: LGTM!

Appropriately includes the new docker-images taskfile to enable the image build workflow.

12-12: LGTM!

The G_PROJECT_SRC_DIR variable correctly points to the source directory for the Docker image build context.

README.md (1)

10-48: LGTM!

The documentation updates clearly explain the new Docker-based workflow, including dataset download, image building (both concurrent and single-engine), and execution. The shift from virtual environment setup to Docker environments aligns well with the PR objectives.

src/log_archival_bench/scripts/docker_images/utils.py (1)

6-12: LGTM!

The get_image_name function provides a clean utility for generating consistent Docker image tags. The fallback to "clp-user" appropriately handles environments where the USER variable might not be set.

src/log_archival_bench/scripts/docker_images/build.py (1)

17-67: LGTM!

The build logic is well-structured:

Properly validates Dockerfile existence before building

Uses the which() helper to safely locate the Docker executable

Handles optional metadata dumping with appropriate error checking

The subprocess calls are safe despite Ruff S603 warnings (the script is designed for developer/CI use with controlled inputs)

src/log_archival_bench/utils/path_utils.py (2)

8-26: LGTM!

The path derivation logic is correct:

_PACKAGE_ROOT correctly resolves to the package directory via __file__.parent

_BUILD_DIR and _CONFIG_DIR are appropriately derived relative to the package root

The getter functions provide clean public interfaces to these paths

29-41: LGTM!

The which() helper provides robust executable location with clear error handling. Raising RuntimeError with a descriptive message when the binary is not found is appropriate for this use case.

src/log_archival_bench/scripts/docker_images/build.py

src/log_archival_bench/utils/path_utils.py

taskfiles/docker-images/main.yaml

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9faabe and a28735d.

📒 Files selected for processing (3)

src/log_archival_bench/scripts/docker_images/build.py (1 hunks)
src/log_archival_bench/utils/path_utils.py (1 hunks)
taskfiles/docker-images/main.yaml (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/log_archival_bench/scripts/docker_images/build.py (2)

src/log_archival_bench/scripts/docker_images/utils.py (1)

get_image_name (6-12)

src/log_archival_bench/utils/path_utils.py (3)

get_config_dir (24-26)

get_package_root (14-16)

which (29-41)

🪛 Ruff (0.13.3)

src/log_archival_bench/scripts/docker_images/build.py

53-53: subprocess call: check for execution of untrusted input

(S603)

65-65: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (4)

src/log_archival_bench/utils/path_utils.py (1)

1-41: LGTM! Clean implementation of path utilities.

The module provides well-structured path helpers with appropriate error handling. The use of Path objects and validation in which() ensures robust path resolution.

taskfiles/docker-images/main.yaml (1)

1-76: LGTM! Task configuration is well-structured.

The checksum validation has been correctly moved from sources to deps, resolving the previous parsing issue. The parallel build orchestration and incremental rebuild detection via checksums align well with the PR objectives.

src/log_archival_bench/scripts/docker_images/build.py (2)

43-53: Docker build implementation is correct.

The subprocess call is safe given that docker_bin is resolved via which() and other arguments are constructed from validated or trusted sources. The static analysis warning (S603) is a false positive in this context, assuming the engine_name validation suggested earlier is implemented.

Note: This approval assumes the engine_name input validation is added as suggested in the earlier comment.

55-65: LGTM! Config dump implementation follows best practices.

The optional JSON metadata dump correctly creates parent directories and uses a context manager for file operations. The subprocess call is safe given validated inputs.

src/log_archival_bench/scripts/docker_images/build.py

Eden-D-Zhang

A few things.

src/log_archival_bench/scripts/docker_images/build.py

taskfiles/docker-images/main.yaml

src/log_archival_bench/scripts/docker_images/build.py

src/log_archival_bench/utils/path_utils.py

taskfiles/docker-images/main.yaml

Co-authored-by: Eden Zhang <[email protected]>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a28735d and 52064cf.

📒 Files selected for processing (3)

src/log_archival_bench/scripts/docker_images/build.py (1 hunks)
src/log_archival_bench/utils/path_utils.py (1 hunks)
taskfiles/docker-images/main.yaml (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/log_archival_bench/scripts/docker_images/build.py (2)

src/log_archival_bench/scripts/docker_images/utils.py (1)

get_image_name (6-12)

src/log_archival_bench/utils/path_utils.py (3)

get_config_dir (24-26)

get_package_root (14-16)

which (29-41)

🪛 Ruff (0.14.1)

src/log_archival_bench/scripts/docker_images/build.py

53-53: subprocess call: check for execution of untrusted input

(S603)

65-65: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (12)

src/log_archival_bench/utils/path_utils.py (4)

1-6: LGTM!

Module docstring is clear and imports are appropriate.

8-11: LGTM!

Package root and directory constants are correctly derived.

14-26: LGTM!

The getter functions are straightforward with clear docstrings.

29-41: LGTM!

The which() helper properly validates binary existence and provides clear error messages.

taskfiles/docker-images/main.yaml (4)

1-20: LGTM!

The configuration structure is clear, and the note explaining the commented Presto-related images addresses previous review feedback.

22-35: LGTM!

Task dependencies and initialisation logic are correctly structured.

37-45: LGTM!

Parallel build orchestration using Task's for loop is correctly implemented.

47-77: LGTM!

The task structure is correct with proper use of sources, deps, and cmds. The past issue regarding checksum validation placement has been resolved.

src/log_archival_bench/scripts/docker_images/build.py (4)

1-14: LGTM!

Shebang, docstring, and imports are appropriate.

36-41: LGTM!

Image name resolution and Dockerfile validation are correctly implemented. Once the allowlist validation (flagged in lines 24-34) is added, this serves as a good secondary check for configuration errors.

43-53: LGTM!

Docker build command construction and execution are correct. All inputs are validated or derived from controlled sources.

55-71: LGTM!

Optional configuration dump logic is correctly implemented with proper file handling and directory creation. The main guard follows Python conventions.

src/log_archival_bench/scripts/docker_images/build.py

…eprocessing. Internal for actual docker build run cmds

README.md

hoophalab · 2025-10-20T19:40:56Z

src/log_archival_bench/scripts/docker_images/native/build.py

+    build_cmds = [
+      docker_bin,
+      "build",
+      "--tag", image_name,
+      str(get_package_root()),
+      "--file", docker_file_path,
+    ]


I might be missing some context. Why do we want to separate native/build and build?

build does a bunch of preprocessing and argument validation, while native/build calls the actual docker cmds for building. It may seem redundant now but I think it will be helpful in the future.

if that's the case, how about calling a python function rather than

build_cmd = [ which("python3"), "-m", "log_archival_bench.scripts.docker_images.native.build", "--image-name", get_image_name(engine_name), "--docker-file-path", str(docker_file_path), ]

For example, if the user isn't using ubuntu/debian, only python but not python3 is available, and even if on ubuntu, if someone uses uv to run the script, this line will call system python3.

The best solution is to simply call a native python function rather than proxying the call through subprocess.run

src/log_archival_bench/utils/path_utils.py

taskfiles/docker-images/main.yaml

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd218fb and e1405af.

📒 Files selected for processing (4)

src/log_archival_bench/scripts/docker_images/build.py (1 hunks)
src/log_archival_bench/scripts/docker_images/native/build.py (1 hunks)
src/log_archival_bench/utils/project_config.py (1 hunks)
taskfiles/docker-images/main.yaml (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

src/log_archival_bench/scripts/docker_images/build.py (2)

src/log_archival_bench/scripts/docker_images/utils.py (1)

get_image_name (6-12)

src/log_archival_bench/scripts/docker_images/native/build.py (1)

main (12-49)

src/log_archival_bench/scripts/docker_images/native/build.py (1)

src/log_archival_bench/scripts/docker_images/build.py (1)

main (16-60)

🪛 Ruff (0.14.1)

src/log_archival_bench/scripts/docker_images/build.py

58-58: subprocess call: check for execution of untrusted input

(S603)

src/log_archival_bench/scripts/docker_images/native/build.py

40-40: subprocess call: check for execution of untrusted input

(S603)

47-47: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (7)

src/log_archival_bench/scripts/docker_images/build.py (2)

35-38: LGTM! Engine validation properly implemented.

The explicit allowlist validation addresses the path traversal concerns raised in previous reviews. The error message provides clear feedback when an invalid engine is specified.

58-58: Static analysis false positive.

The S603 warning about untrusted subprocess input is a false positive. The command invokes an internal Python module (log_archival_bench.scripts.docker_images.native.build) with validated arguments: engine_name was validated against the allowlist (lines 35-38), and docker_file_path was verified to exist (lines 40-43).

src/log_archival_bench/utils/project_config.py (1)

1-11: LGTM! Clean configuration module.

The path constants are properly defined using pathlib.Path, and the module provides a clear, centralized location for project-level paths.

src/log_archival_bench/scripts/docker_images/native/build.py (2)

31-40: LGTM! Docker build execution is correct.

The build command construction and execution are proper. The S603 static analysis warning is a false positive—the subprocess executes docker build with validated arguments passed from the wrapper script.

42-47: LGTM! Metadata dump implementation is correct.

The optional metadata dump properly creates directories, uses explicit UTF-8 encoding, and safely redirects docker inspect output. The S603 static analysis warning is a false positive—image_name was validated by the wrapper script.

taskfiles/docker-images/main.yaml (2)

1-46: LGTM! Well-structured task workflow.

The taskfile properly orchestrates parallel Docker image builds with clear separation of concerns. The note explaining disabled Presto images (line 16) addresses the previous review feedback.

47-78: LGTM! Checksum-based rebuild detection properly implemented.

The task correctly addresses previous review feedback:

Checksum validation is in deps (lines 63-66), not sources, ensuring proper Task parsing

Dependencies include utils/**/* (line 59) to detect changes in utility modules

The workflow validates existing checksums before building and computes new checksums after successful builds

src/log_archival_bench/scripts/docker_images/build.py

taskfiles/docker-images/main.yaml

hoophalab

one more comment

hoophalab · 2025-10-21T16:43:16Z

src/log_archival_bench/scripts/docker_images/native/build.py

+    build_cmds = [
+      docker_bin,
+      "build",
+      "--tag", image_name,
+      str(get_package_root()),
+      "--file", docker_file_path,
+    ]


if that's the case, how about calling a python function rather than

build_cmd = [ which("python3"), "-m", "log_archival_bench.scripts.docker_images.native.build", "--image-name", get_image_name(engine_name), "--docker-file-path", str(docker_file_path), ]

For example, if the user isn't using ubuntu/debian, only python but not python3 is available, and even if on ubuntu, if someone uses uv to run the script, this line will call system python3.

The best solution is to simply call a native python function rather than proxying the call through subprocess.run

This reverts commit a976ae3.

…age checking

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

src/log_archival_bench/scripts/docker_images/build.py (1)
28-31: Add a synchronization reminder comment.

The valid_engines list must be kept in sync with G_DOCKER_IMAGE_ENGINES in taskfiles/docker-images/main.yaml (lines 12-20). When Presto images are re-enabled, both locations need updating.

Apply this diff to add a reminder comment:
+    # NOTE: Keep in sync with G_DOCKER_IMAGE_ENGINES in taskfiles/docker-images/main.yaml
     valid_engines = ["clickhouse", "clp", "elasticsearch", "sparksql", "zstandard"]
     if engine_name not in valid_engines:

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d86c14a and 7080d4a.

📒 Files selected for processing (3)

src/log_archival_bench/scripts/docker_images/build.py (1 hunks)
src/log_archival_bench/scripts/docker_images/inspect.py (1 hunks)
taskfiles/docker-images/main.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-21T17:00:54.873Z

Learnt from: hoophalab
PR: y-scope/log-archival-bench#17
File: taskfiles/docker-images/main.yaml:55-60
Timestamp: 2025-10-21T17:00:54.873Z
Learning: In the log-archival-bench project's taskfiles, when tracking Python source files in the `sources` list for Docker image builds, prefer using a glob pattern `{{.G_PROJECT_SRC_DIR}}/**/*` with exclusion `!{{.G_PROJECT_SRC_DIR}}/config/**` rather than manually listing individual Python files, to avoid having to update the taskfile every time Python imports change.

Applied to files:

taskfiles/docker-images/main.yaml

🧬 Code graph analysis (2)

src/log_archival_bench/scripts/docker_images/inspect.py (2)

src/log_archival_bench/scripts/docker_images/utils.py (1)

get_image_name (6-12)

src/log_archival_bench/scripts/docker_images/build.py (1)

main (13-49)

src/log_archival_bench/scripts/docker_images/build.py (1)

src/log_archival_bench/scripts/docker_images/utils.py (1)

get_image_name (6-12)

🪛 Ruff (0.14.1)

src/log_archival_bench/scripts/docker_images/inspect.py

48-48: subprocess call: check for execution of untrusted input

(S603)

src/log_archival_bench/scripts/docker_images/build.py

47-47: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (5)

src/log_archival_bench/scripts/docker_images/build.py (1)

38-49: LGTM - Static analysis warning is a false positive.

The subprocess call constructs the Docker build command from validated and trusted inputs:

engine_name is validated against an allowlist (line 29)

docker_file_path is checked for existence (line 34)

CONFIG_DIR and PACKAGE_ROOT are trusted constants

All other arguments are literal strings

The S603 warning can be safely disregarded.

src/log_archival_bench/scripts/docker_images/inspect.py (1)

47-49: Address static analysis warning after adding validation.

Once you add the engine name validation (see previous comment), the subprocess call will be safe because engine_name will be validated against an allowlist and get_image_name will format it safely. The S603 warning will then be a false positive.

However, without validation, this represents a security concern.
taskfiles/docker-images/main.yaml (3)
12-20: LGTM - Engine list aligns with Python scripts.

The G_DOCKER_IMAGE_ENGINES list correctly matches the valid_engines allowlist in both Python scripts. The comment explaining disabled Presto images is helpful.

61-67: Verify sources pattern matches intended rebuild triggers.

The sources list uses {{.G_PROJECT_SRC_DIR}}/**/*.py (line 62) to track only Python files. The retrieved learning suggests using {{.G_PROJECT_SRC_DIR}}/**/* (all files) with exclusions to avoid manual updates when imports change.

However, tracking only .py files may be intentional if non-Python source changes (e.g., .yaml, .txt, .sh) in the source directory should not trigger Docker image rebuilds.

Based on learnings

If all source files should trigger rebuilds, apply this diff:
     sources:
-      - "{{.G_PROJECT_SRC_DIR}}/**/*.py"
+      - "{{.G_PROJECT_SRC_DIR}}/**/*"
       - "{{.TASKFILE}}"
       - exclude: "{{.G_PROJECT_SRC_DIR}}/build/**/*"
       - exclude: "{{.G_PROJECT_SRC_DIR}}/config/**/*"
85-107: LGTM - Checksum workflow is well-structured.

The checksum validation and computation logic is correct:

Line 91: Allows docker inspect to fail gracefully if the image doesn't exist yet (first build).

Line 103: Expects docker inspect to succeed after the build completes.

The workflow ensures checksums are validated before builds and computed after successful builds.

src/log_archival_bench/scripts/docker_images/inspect.py

…ng it inside a dir

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 553a971 and f23386e.

📒 Files selected for processing (1)

taskfiles/docker-images/main.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-21T17:00:54.873Z

Learnt from: hoophalab
PR: y-scope/log-archival-bench#17
File: taskfiles/docker-images/main.yaml:55-60
Timestamp: 2025-10-21T17:00:54.873Z
Learning: In the log-archival-bench project's taskfiles, when tracking Python source files in the `sources` list for Docker image builds, prefer using a glob pattern `{{.G_PROJECT_SRC_DIR}}/**/*` with exclusion `!{{.G_PROJECT_SRC_DIR}}/config/**` rather than manually listing individual Python files, to avoid having to update the taskfile every time Python imports change.

Applied to files:

taskfiles/docker-images/main.yaml

🔇 Additional comments (4)

taskfiles/docker-images/main.yaml (4)

47-86: Verify the INSPECT_CMD invocation includes the --trim flag intentionally.

Line 58 includes --trim in the inspect command. Please confirm that this flag is supported by inspect.py and that trimming metadata is the intended behaviour for checksum validation. If inspect.py doesn't support this flag, the taskfile will fail at runtime.

71-76: Checksum validation structure is sound.

The checksum validation is correctly placed in deps (line 72) rather than sources, which addresses the previous critical issue. The validation task properly chains INSPECT_CMD, CHECKSUM_FILE, and METADATA_FILE variables through the task invocation. The || true on line 93 gracefully handles the case when an image doesn't yet exist.

Also applies to: 87-98

37-45: Parallel build orchestration is well-structured.

The build-all-parallel task correctly uses Task's for loop with var: G_DOCKER_IMAGE_ENGINES to invoke build-single-image for each engine with the ENGINE_NAME variable set. This enables proper parallelization of independent per-engine builds.

62-69: The Dockerfile path pattern on line 69 is correct and requires no changes.

The repository contains all Dockerfiles in a flat structure at src/log_archival_bench/config/docker-images/{{ENGINE_NAME}}.Dockerfile, which exactly matches the pattern specified on line 69. All active engines (clickhouse, clp, elasticsearch, sparksql, zstandard) and commented-out engines (presto_clp, presto_parquet) have corresponding Dockerfile files in this location. The original concern about potential subdirectory structure was unfounded.

Likely an incorrect or invalid review comment.

taskfiles/docker-images/main.yaml

hoophalab

The code looks good, but I have a few high-level concerns:

Is checksumming a Docker image necessary, given that images are generally immutable?
docker build doesn't rebuild an image that is already built. We also don't checksum docker images in the clp repo.
If we really want to checksum images, is docker inspect a reliable source for a checksum? Can we use the image ID?

Additionally, the README isn't the best, but we should improve that after everything is done.

hoophalab · 2025-10-27T15:40:07Z

taskfiles/docker-images/main.yaml

+      - "{{.G_PROJECT_SRC_DIR}}/**/*"
+      - "{{.ROOT_DIR}}/Taskfile.yaml"
+      - "{{.ROOT_DIR}}/taskfiles/**/*.yaml"
+      - "{{.ROOT_DIR}}/tools/yscope-dev-utils/exports/taskfiles/**/*.yaml"


- "{{.ROOT_DIR}}/tools/yscope-dev-utils/exports/taskfiles/**/*.yaml"

We might need to confirm with kirk whether this is needed

hoophalab · 2025-10-27T16:02:36Z

src/log_archival_bench/scripts/docker_images/inspect.py

+    metadata_file_path = Path(metadata_file)
+    metadata_file_path.parent.mkdir(parents=True, exist_ok=True)
+    with metadata_file_path.open("w", encoding="utf-8") as f:
+        json.dump(inspect_json, f, indent=4)


If we want to use inspect output as the checksum source, we might need to sort the keys. The consistent order of enumerating dictionary keys is an implementation detail of the interpreter

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

taskfiles/docker-images/main.yaml (1)

16-18: Consider adding a tracking reference for Presto engine fixes.

The note explains the Presto engines are pending fixes, but there's no reference to a tracking issue or milestone. Adding a link (e.g., "pending fixes in #123") would help maintainers know where to follow progress and when to re-enable these entries.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 008b8cc and 803daa7.

📒 Files selected for processing (2)

src/log_archival_bench/scripts/docker_images/inspect.py (1 hunks)
taskfiles/docker-images/main.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-21T17:00:54.873Z

Learnt from: hoophalab
PR: y-scope/log-archival-bench#17
File: taskfiles/docker-images/main.yaml:55-60
Timestamp: 2025-10-21T17:00:54.873Z
Learning: In the log-archival-bench project's taskfiles, when tracking Python source files in the `sources` list for Docker image builds, prefer using a glob pattern `{{.G_PROJECT_SRC_DIR}}/**/*` with exclusion `!{{.G_PROJECT_SRC_DIR}}/config/**` rather than manually listing individual Python files, to avoid having to update the taskfile every time Python imports change.

Applied to files:

taskfiles/docker-images/main.yaml

🧬 Code graph analysis (1)

src/log_archival_bench/scripts/docker_images/inspect.py (2)

src/log_archival_bench/scripts/docker_images/utils.py (2)

get_image_name (6-12)

validate_engine_name (15-24)

src/log_archival_bench/scripts/docker_images/build.py (1)

main (13-46)

🪛 Ruff (0.14.1)

src/log_archival_bench/scripts/docker_images/inspect.py

52-52: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (4)

src/log_archival_bench/scripts/docker_images/inspect.py (2)

44-52: JSON key ordering concern is mitigated by current usage.

The Taskfile invokes this script with --id-only (line 67 in taskfiles/docker-images/main.yaml), so only the stable image ID digest is written to the metadata file, making JSON key-ordering irrelevant for checksum computation. However, if future callers omit --id-only, the full JSON output could have non-deterministic key ordering across Python versions or implementations.

Based on the discussion in past review comments, the current implementation is correct for the intended usage.

52-52: Static analysis warning is a false positive.

The S603 warning about untrusted subprocess input is a false positive. The engine_name is validated against an allowlist at line 42 via validate_engine_name, and get_image_name constructs a safe image tag. The command is built as a list (not shell=True), preventing shell injection.

taskfiles/docker-images/main.yaml (2)

70-78: Excellent use of glob-based source tracking.

The sources list follows the project learning to use {{.G_PROJECT_SRC_DIR}}/**/* with strategic exclusions, avoiding tedious manual updates when Python imports change. The explicit re-inclusion of the engine-specific Dockerfile (line 78) ensures per-engine dependencies are properly tracked.

Based on learnings.

74-74: Verify necessity of yscope-dev-utils taskfile dependency.

Including {{.ROOT_DIR}}/tools/yscope-dev-utils/exports/taskfiles/**/*.yaml in sources means changes to utility taskfiles will trigger image rebuilds. This is reasonable if those taskfiles affect build behaviour, but hoophalab's past comment suggests confirming with Kirk whether this dependency is necessary.

Based on past review comments.

coderabbitai · 2025-10-27T18:09:10Z

taskfiles/docker-images/main.yaml

+      # Task may fail if the image does not exist.
+      - "{{.INSPECT_CMD}} || rm -f '{{.METADATA_FILE}}'"


🧹 Nitpick | 🔵 Trivial

Clarify error-handling scope in validation task.

Line 115 uses {{.INSPECT_CMD}} || rm -f '{{.METADATA_FILE}}', which removes the metadata file on any inspect failure (docker daemon down, permissions, missing image, etc.), not just "image does not exist" as the comment suggests. This broad catch-all might be intentional (any failure should trigger rebuild), but the comment could be more precise.

Consider updating the comment to reflect that any inspect failure—not just a missing image—will trigger metadata removal and subsequent rebuild.

🤖 Prompt for AI Agents

In taskfiles/docker-images/main.yaml around lines 114 to 115, the comment "Task may fail if the image does not exist." is misleading because the command `{{.INSPECT_CMD}} || rm -f '{{.METADATA_FILE}}'` will remove the metadata file on any inspect failure (docker daemon down, permissions, network, missing image, etc.); update the comment to explicitly state that any inspect failure will trigger metadata removal and a rebuild, and if you instead want to only remove metadata for a missing image, replace the command with a more specific check that inspects exit codes or parses inspect output to detect "no such image" before removing the metadata.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/log_archival_bench/scripts/docker_images/utils.py (1)

6-12: Consider adding defensive Docker name validation.

While the current allowlist ensures all service names are valid Docker image names, adding explicit validation would make this function more robust against future modifications to the service list.

 def get_image_name(service_name: str) -> str:
     """
     :param service_name:
     :return: The name assigned to the Docker image that contains the service.
     """
+    import re
+    # Docker image names must match [a-z0-9][a-z0-9_.-]*
+    if not re.match(r'^[a-z0-9][a-z0-9_.-]*$', service_name):
+        err_msg = (
+            f"Invalid service name for Docker image: '{service_name}'. "
+            f"Must contain only lowercase letters, digits, hyphens, underscores, and periods."
+        )
+        raise ValueError(err_msg)
     user = os.getenv("USER", "clp-user")
     return f"log-archival-bench-{service_name}-ubuntu-jammy:dev-{user}"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 803daa7 and 5d805f6.

📒 Files selected for processing (3)

src/log_archival_bench/scripts/docker_images/build.py (1 hunks)
src/log_archival_bench/scripts/docker_images/utils.py (1 hunks)
taskfiles/docker-images/main.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: hoophalab
Repo: y-scope/log-archival-bench PR: 17
File: taskfiles/docker-images/main.yaml:55-60
Timestamp: 2025-10-21T17:00:54.873Z
Learning: In the log-archival-bench project's taskfiles, when tracking Python source files in the `sources` list for Docker image builds, prefer using a glob pattern `{{.G_PROJECT_SRC_DIR}}/**/*` with exclusion `!{{.G_PROJECT_SRC_DIR}}/config/**` rather than manually listing individual Python files, to avoid having to update the taskfile every time Python imports change.

📚 Learning: 2025-10-21T17:00:54.873Z

Learnt from: hoophalab
Repo: y-scope/log-archival-bench PR: 17
File: taskfiles/docker-images/main.yaml:55-60
Timestamp: 2025-10-21T17:00:54.873Z
Learning: In the log-archival-bench project's taskfiles, when tracking Python source files in the `sources` list for Docker image builds, prefer using a glob pattern `{{.G_PROJECT_SRC_DIR}}/**/*` with exclusion `!{{.G_PROJECT_SRC_DIR}}/config/**` rather than manually listing individual Python files, to avoid having to update the taskfile every time Python imports change.

Applied to files:

src/log_archival_bench/scripts/docker_images/build.py
taskfiles/docker-images/main.yaml

📚 Learning: 2025-09-22T13:50:57.351Z

Learnt from: Bill-hbrhbr
Repo: y-scope/log-archival-bench PR: 12
File: taskfiles/lint/yaml.yaml:15-23
Timestamp: 2025-09-22T13:50:57.351Z
Learning: In projects using Taskfile, submodules are automatically initialized during taskfile compilation, so referenced files in submodules (like tools/yscope-dev-utils/exports/lint-configs/.yamllint.yml) will be available at runtime even if not present in the current repository state.

Applied to files:

taskfiles/docker-images/main.yaml

🧬 Code graph analysis (1)

src/log_archival_bench/scripts/docker_images/build.py (1)

src/log_archival_bench/scripts/docker_images/utils.py (2)

get_image_name (6-12)

validate_service_name (15-26)

🪛 Ruff (0.14.2)

src/log_archival_bench/scripts/docker_images/build.py

46-46: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (3)

src/log_archival_bench/scripts/docker_images/build.py (1)

13-48: LGTM! Subprocess call is secure.

The implementation correctly validates the service_name via the allowlist (line 30) before using it in the Docker build command. The static analysis warning (S603) is a false positive—all subprocess arguments are either string literals, validated inputs, or safe project paths.

The error handling via check=True ensures failures propagate appropriately.

taskfiles/docker-images/main.yaml (1)

9-17: Service list is properly synchronized.

The G_DOCKER_IMAGE_SERVICES list matches the valid_services allowlist in src/log_archival_bench/scripts/docker_images/utils.py (line 21), ensuring consistent validation across the build pipeline.

src/log_archival_bench/scripts/docker_images/utils.py (1)

15-26: LGTM! Validation is thorough and well-documented.

The allowlist validation correctly matches the services defined in taskfiles/docker-images/main.yaml (lines 9-17), and the sync reminder comment helps maintain consistency. The error message clearly identifies invalid inputs and lists valid alternatives.

coderabbitai · 2025-11-01T03:31:06Z

taskfiles/docker-images/main.yaml

+  build-single-image:
+    internal: true
+    label: "{{.TASK}}:{{.SERVICE_NAME}}"
+    requires:
+      vars: ["SERVICE_NAME"]
+    run: "when_changed"
+    cmds:
+      - "uv run '{{.G_DOCKER_IMAGE_SCRIPT_DIR}}/build.py' --service-name {{.SERVICE_NAME}}"


🛠️ Refactor suggestion | 🟠 Major

Consider adding sources for rebuild detection.

The task specifies run: "when_changed" but doesn't define any sources. This means the task will execute on every invocation rather than only when relevant files change.

Based on learnings, consider adding source tracking to enable checksum-based rebuild detection:

build-single-image: internal: true label: "{{.TASK}}:{{.SERVICE_NAME}}" requires: vars: ["SERVICE_NAME"] run: "when_changed" + sources: + - "{{.G_PROJECT_SRC_DIR}}/**/*" + - "!{{.G_PROJECT_SRC_DIR}}/config/**" + - "{{.G_PROJECT_SRC_DIR}}/config/docker-images/{{.SERVICE_NAME}}/Dockerfile" + - "{{.TASKFILE}}" cmds: - "uv run '{{.G_DOCKER_IMAGE_SCRIPT_DIR}}/build.py' --service-name {{.SERVICE_NAME}}"

This aligns with the PR objective of "checksum-based rebuild detection."

Based on learnings

🤖 Prompt for AI Agents

In taskfiles/docker-images/main.yaml around lines 30 to 37, the task uses run: "when_changed" but has no sources defined so it will run every time; add a sources entry listing the files and directories whose checksum should trigger rebuilds (for example: the service source directory, its Dockerfile, any build scripts like G_DOCKER_IMAGE_SCRIPT_DIR/build.py, and any shared templates or config used to build the image) so the task runner can compute checksums and only run when those sources change; ensure paths use the same templating variables already present (e.g. {{.SERVICE_NAME}} and {{.G_DOCKER_IMAGE_SCRIPT_DIR}}) and include both files and relevant directories.

Bill-hbrhbr

self-review and comments

Bill-hbrhbr · 2025-11-04T16:43:03Z

README.md

+uv run src/log_archival_bench/scripts/docker_images/build.py --service-name <service_name>
+```
+
+Each valid service (e.g. `presto`, `clickhouse`, `elasticsearch`) has its own dedicated Docker


Suggested change

Each valid service (e.g. `presto`, `clickhouse`, `elasticsearch`) has its own dedicated Docker

Each valid service (e.g. `presto_parquet`, `clickhouse`, `elasticsearch`) has its own dedicated Docker

Bill-hbrhbr · 2025-11-04T16:43:56Z

README.md

-## Run One Tool
+## Run One Benchmark Engine

-Execute `./assets/{tool name}/main.py {path to <dataset name>.log}` to run ingestion and search on that dataset.
+Execute `./assets/{engine_name}/main.py {path to <dataset_name>.log}` to run ingestion and search on that dataset.


maybe no need to touch this block for now.
we can also defer README fixing once we've moved everything into src/log_archival_bench

Bill-hbrhbr · 2025-11-04T16:44:32Z

pyproject.toml

+explicit_package_bases = true
+mypy_path = ["src"]


These are needed for mypy linting to work in its current form.

src/log_archival_bench/scripts/docker_images/utils.py

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

README.md (1)
37-38: Update the example to use a valid service name.

The example lists presto as a valid service, but according to src/log_archival_bench/scripts/docker_images/utils.py (line 21), the valid services are: clickhouse, clp, elasticsearch, sparksql, and zstandard. The Presto-related services (presto_clp and presto_parquet) are currently disabled.

Apply this diff to use a valid example:
-Each valid service (e.g. `presto`, `clickhouse`, `elasticsearch`) has its own dedicated Docker
+Each valid service (e.g. `clickhouse`, `elasticsearch`, `sparksql`) has its own dedicated Docker

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d805f6 and 9a3ddf3.

📒 Files selected for processing (4)

README.md (1 hunks)
src/log_archival_bench/scripts/docker_images/build.py (1 hunks)
src/log_archival_bench/scripts/docker_images/utils.py (1 hunks)
taskfiles/docker-images/main.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: hoophalab
Repo: y-scope/log-archival-bench PR: 17
File: taskfiles/docker-images/main.yaml:55-60
Timestamp: 2025-10-21T17:00:54.873Z
Learning: In the log-archival-bench project's taskfiles, when tracking Python source files in the `sources` list for Docker image builds, prefer using a glob pattern `{{.G_PROJECT_SRC_DIR}}/**/*` with exclusion `!{{.G_PROJECT_SRC_DIR}}/config/**` rather than manually listing individual Python files, to avoid having to update the taskfile every time Python imports change.

📚 Learning: 2025-09-30T08:58:51.490Z

Learnt from: Bill-hbrhbr
Repo: y-scope/log-archival-bench PR: 13
File: taskfiles/lint/python.yaml:50-65
Timestamp: 2025-09-30T08:58:51.490Z
Learning: The log-archival-bench repository currently does not have unit tests for the benchmark suite, so the tests directory should not be included in PY_LINT_DIRS in taskfiles/lint/python.yaml.

Applied to files:

README.md

📚 Learning: 2025-10-21T17:00:54.873Z

Learnt from: hoophalab
Repo: y-scope/log-archival-bench PR: 17
File: taskfiles/docker-images/main.yaml:55-60
Timestamp: 2025-10-21T17:00:54.873Z
Learning: In the log-archival-bench project's taskfiles, when tracking Python source files in the `sources` list for Docker image builds, prefer using a glob pattern `{{.G_PROJECT_SRC_DIR}}/**/*` with exclusion `!{{.G_PROJECT_SRC_DIR}}/config/**` rather than manually listing individual Python files, to avoid having to update the taskfile every time Python imports change.

Applied to files:

taskfiles/docker-images/main.yaml
src/log_archival_bench/scripts/docker_images/build.py

📚 Learning: 2025-09-22T13:50:57.351Z

Learnt from: Bill-hbrhbr
Repo: y-scope/log-archival-bench PR: 12
File: taskfiles/lint/yaml.yaml:15-23
Timestamp: 2025-09-22T13:50:57.351Z
Learning: In projects using Taskfile, submodules are automatically initialized during taskfile compilation, so referenced files in submodules (like tools/yscope-dev-utils/exports/lint-configs/.yamllint.yml) will be available at runtime even if not present in the current repository state.

Applied to files:

taskfiles/docker-images/main.yaml

🧬 Code graph analysis (1)

src/log_archival_bench/scripts/docker_images/build.py (1)

src/log_archival_bench/scripts/docker_images/utils.py (2)

get_image_name (6-12)

validate_service_name (15-26)

🪛 Ruff (0.14.3)

src/log_archival_bench/scripts/docker_images/build.py

46-46: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (2)

src/log_archival_bench/scripts/docker_images/build.py (1)

1-52: LGTM! The implementation is secure and correct.

The script properly validates the service_name parameter against an allowlist (line 30) before using it to construct paths, checks for Dockerfile existence (lines 32-35), and uses subprocess safely with an array-based command format and validated inputs.

The static analysis warning (S603) about untrusted input is a false positive in this context—all user-controllable values are validated before being passed to subprocess.

src/log_archival_bench/scripts/docker_images/utils.py (1)

1-26: LGTM! Clean and well-structured utility functions.

The module provides focused helper functions with proper validation. The allowlist approach in validate_service_name prevents path traversal issues, and the sync comment (line 20) helps maintain consistency with the Taskfile configuration.

README.md

taskfiles/docker-images/main.yaml

src/log_archival_bench/scripts/docker_images/__init__.py

Bill-hbrhbr added 5 commits October 9, 2025 04:10

relocate docker files

dda5e24

Task version of managing docker images

ac9e110

yaml lint fix

be23651

restructure

49f6e5b

lint fix and others

fc148bc

Bill-hbrhbr added 2 commits October 9, 2025 14:30

Variable refactoring

7fa26c0

Update README

e9faabe

Bill-hbrhbr changed the title ~~feat(docker-images): Task manage docker images~~ feat(docker-images): Decouple image build flow from benchmark runs and add checksum-based rebuild detection. Oct 9, 2025

Bill-hbrhbr marked this pull request as ready for review October 9, 2025 19:10

Bill-hbrhbr requested a review from a team as a code owner October 9, 2025 19:10

Bill-hbrhbr requested review from Eden-D-Zhang and hoophalab October 9, 2025 19:11

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/build.py Outdated Show resolved Hide resolved

src/log_archival_bench/utils/path_utils.py Outdated Show resolved Hide resolved

taskfiles/docker-images/main.yaml Outdated Show resolved Hide resolved

address review comment

a28735d

coderabbitai bot reviewed Oct 10, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/build.py Outdated Show resolved Hide resolved

Eden-D-Zhang reviewed Oct 17, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/build.py Outdated Show resolved Hide resolved

taskfiles/docker-images/main.yaml Show resolved Hide resolved

src/log_archival_bench/scripts/docker_images/build.py Outdated Show resolved Hide resolved

Bill-hbrhbr commented Oct 20, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/build.py Outdated Show resolved Hide resolved

src/log_archival_bench/utils/path_utils.py Outdated Show resolved Hide resolved

taskfiles/docker-images/main.yaml Show resolved Hide resolved

Apply suggestions from code review

52064cf

Co-authored-by: Eden Zhang <[email protected]>

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/build.py Outdated Show resolved Hide resolved

Split docker build into two scripts. External for arg checking and pr…

629264e

…eprocessing. Internal for actual docker build run cmds

Bill-hbrhbr requested a review from Eden-D-Zhang October 20, 2025 18:15

third person tense

bd218fb

hoophalab reviewed Oct 20, 2025

View reviewed changes

Address 2nd-pass review comments

e1405af

Bill-hbrhbr requested a review from hoophalab October 21, 2025 16:28

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/build.py Outdated Show resolved Hide resolved

taskfiles/docker-images/main.yaml Outdated Show resolved Hide resolved

hoophalab reviewed Oct 21, 2025

View reviewed changes

Address review comments

0196c8e

Bill-hbrhbr requested a review from hoophalab October 21, 2025 22:16

Bill-hbrhbr added 4 commits October 23, 2025 17:30

Leverage docker buildx build

a976ae3

Revert "Leverage docker buildx build"

119e62d

This reverts commit a976ae3.

Fix docker inspect checksum system

c0aa216

Trim last tag time from image digest for convinience of up-to-date im…

7080d4a

…age checking

Bill-hbrhbr dismissed hoophalab’s stale review via 7080d4a October 24, 2025 18:19

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/inspect.py Outdated Show resolved Hide resolved

src/log_archival_bench/scripts/docker_images/inspect.py Outdated Show resolved Hide resolved

Bill-hbrhbr added 3 commits October 24, 2025 14:38

address coderabbitai comments

553a971

Various bug fixes. Directly checksum the output json instead of putti…

4a7c224

…ng it inside a dir

Use a more general sources section

f23386e

coderabbitai bot reviewed Oct 25, 2025

View reviewed changes

taskfiles/docker-images/main.yaml Outdated Show resolved Hide resolved

taskfiles/docker-images/main.yaml Outdated Show resolved Hide resolved

Bill-hbrhbr added 2 commits October 25, 2025 03:35

Add task docstrings

008b8cc

Replace --trim with --id-only

803daa7

hoophalab reviewed Oct 27, 2025

View reviewed changes

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

Bill-hbrhbr added 4 commits October 31, 2025 21:48

Move dockerfiles back to their folders

384982c

Move docker image includes to respective folders as well

e037c10

Rename dockerfiles to plain dockerfile names

ed18773

remove checksum mechanism and rename engine to service

5d805f6

Bill-hbrhbr changed the title ~~feat(docker-images): Decouple image build flow from benchmark runs and add checksum-based rebuild detection.~~ feat(docker-images): Extrapolate Docker image build flow from benchmark runs. Nov 1, 2025

Bill-hbrhbr changed the title ~~feat(docker-images): Extrapolate Docker image build flow from benchmark runs.~~ feat(docker-images): Extrapolate image build flow from benchmark runs; Move configuration files into project sources. Nov 1, 2025

coderabbitai bot reviewed Nov 1, 2025

View reviewed changes

Update Readme

a72a9f1

Bill-hbrhbr commented Nov 4, 2025

View reviewed changes

Bill-hbrhbr commented Nov 5, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/utils.py Outdated Show resolved Hide resolved

Bill-hbrhbr added 2 commits November 5, 2025 12:45

Fix docstring return/raise

5e4dab0

Add docstrings for docker image tasks and minor renamings

9a3ddf3

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

taskfiles/docker-images/main.yaml Show resolved Hide resolved

Polisth readme

f324788

Bill-hbrhbr commented Nov 5, 2025

View reviewed changes

src/log_archival_bench/scripts/docker_images/__init__.py Outdated Show resolved Hide resolved

Update src/log_archival_bench/scripts/docker_images/__init__.py

f0e5170

		# Task may fail if the image does not exist.
		- "{{.INSPECT_CMD}} \|\| rm -f '{{.METADATA_FILE}}'"

	Each valid service (e.g. `presto`, `clickhouse`, `elasticsearch`) has its own dedicated Docker
	Each valid service (e.g. `presto_parquet`, `clickhouse`, `elasticsearch`) has its own dedicated Docker

		explicit_package_bases = true
		mypy_path = ["src"]

feat(docker-images): Extrapolate image build flow from benchmark runs; Move configuration files into project sources. #17

Are you sure you want to change the base?

feat(docker-images): Extrapolate image build flow from benchmark runs; Move configuration files into project sources. #17

Uh oh!

Conversation

Bill-hbrhbr commented Oct 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Eden-D-Zhang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hoophalab left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hoophalab left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Bill-hbrhbr commented Oct 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 9, 2025 •

edited

Loading

hoophalab left a comment •

edited

Loading

hoophalab Oct 27, 2025 •

edited

Loading