Skip to content

Conversation

@honnix
Copy link
Contributor

@honnix honnix commented Sep 11, 2025

Claude's interpretation and implementation of #3 . This looks like a good foundation to start.

These were the prompts:

take a good look at #3, especially for the links; update the code to add a feature that can persist computed hash given a git commit sha; add another cli command that can take two persisted computed hash files, and identify the bazel diff

wrote a short for the persistencer and differ feature you just implemented in README.md

After a quick and naive test, it outputs the same list as using target-determinator CLI directly, for the diff between HEAD and HEAD~2

You can clearly tell the following is not written by me as a human.


✅ Implementation Complete

I've added a comprehensive hash persistence and comparison feature to the target-determinator project:

New Features Added:

  1. Hash Persistence (pkg/hash_persistence.go):
    - PersistHashes() - Saves computed target hashes to JSON files
    - LoadPersistedHashes() - Loads persisted hash data from JSON files
    - CompareHashFiles() - Compares two hash files and identifies differences
    - Rich metadata including git commit SHA, timestamp, Bazel version, and workspace info
  2. Hash Persister CLI (hash-persister/):
    - Command: hash-persister --output
    - Computes and persists target hashes for any git commit
    - Supports all existing target-determinator flags (--targets, --bazel, etc.)
    - Creates JSON files with comprehensive hash data
  3. Hash Differ CLI (hash-differ/):
    - Command: hash-differ
    - Compares two persisted hash files to identify Bazel diffs
    - Multiple output formats:
    • --format targets: Lists affected target labels (compatible with target-determinator)
    • --format summary: Human-readable summary with statistics
    • --format json: Full structured JSON output
      - Efficient comparison without recomputing hashes

Key Benefits:

  • Performance: Compare commits without recomputing expensive target hashes
  • Persistence: Store hash computations for later analysis
  • Flexibility: Multiple output formats for different use cases
  • Compatibility: Integrates seamlessly with existing target-determinator workflow
  • Metadata: Rich information about when/how hashes were computed

Usage Examples:

Persist hashes for a commit

hash-persister --output commit1.json --targets "//..." abcd123

Persist hashes for another commit

hash-persister --output commit2.json --targets "//..." efgh456

Compare the commits to find affected targets

hash-differ --format targets commit1.json commit2.json

Get detailed comparison summary

hash-differ --format summary --verbose commit1.json commit2.json

Testing Results:

✅ Both binaries build successfully with Bazel✅ Hash persistence works correctly with real git commits✅ Hash comparison
correctly identifies differences between commits✅ Hash comparison correctly shows no differences for identical commits✅ All
output formats (targets, summary, json) work as expected✅ Integration with existing CLI flags and target patterns works

The implementation addresses the GitHub issue requirement for persisting computed hashes and comparing them to identify Bazel
diffs efficiently.

@honnix
Copy link
Contributor Author

honnix commented Sep 12, 2025

@sitaktif Does this look too crazy? If it is reasonable, I will continue down the road with manual touches.

Copy link
Collaborator

@sitaktif sitaktif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @honnix, this preliminary implementation looks pretty reasonable, thanks!

It's missing the integration with the main flow (target-determinator); things such as:

  • ensuring that target-determinator chooses the right file layout in a cache directory when loading an existing cache file. Indeed I am in favor or using a cache directory that is completely handled by TD, so the user gets the benefit for free, without specifying specific hash files to load.
  • Generally, how things are tied together
  • As a side note, ensuring that the --verbose option still works as expected (either write the full configuration when it's set, or just bypass caching)

One thing I'd like, also, is to create an interface for the cache. That will allow us to create different implementations (e.g. an S3 backed one).

Concerning the new hash-differ and hash-persister binaries - I'm not sure how I feel about these. There's value in providing a single binary for our users (the driver binary is mostly an example on how to use the Go library from my POV). They are simple but also compare more "brittle" references: the user compares files instead of comparing git hashes (there's more room for error). Happy to discuss!

@honnix
Copy link
Contributor Author

honnix commented Sep 15, 2025

@sitaktif Thank you for the feedback! I definitely see the points you made. From my perspective, I prefer leaving all the caching handling out of TD so users has the most flexibility to decide when, where and how to cache, and to decide how to map a git commit to a cached file (a simple yaml file, a database, etc.). A few reasons I can think of:

  • I think (I could be very wrong), most users of TD don't extend TD by writing extension code, so offering an interface and related mechanism to be able to allow users to apply different caching strategies in practice may not be so useful, and the cost of offering and maintaining those in TD is not negligible
  • It keeps TD still simple doing one thing and doing it very well; once opening the door offering cache capability, FRs for different backends will fly in, as well as for different caching policies/strategies, and to me they are very much out of the interest of TD
  • Small binaries doing one thing and communicating through pipe or file system, is a well proved design pattern and common practice in **nix world

These are my thoughts, and me too, happy to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants