Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18500 Build PRs at HEAD commit #18449

Merged
merged 26 commits into from
Mar 4, 2025
Merged

Conversation

mumrah
Copy link
Member

@mumrah mumrah commented Jan 8, 2025

The default checkout behavior for GitHub Actions is to use a special
merge ref which is equivalent to the base branch with the PR merged into
it. While this is crucial for checking compilation issues against trunk,
it significantly diminishes our ability to use any build caching.

This patch changes the JUnit test jobs to checkout the HEAD commit of the PR
when building. The "Compile and Check" step still checks out the merge commit
so we can keep that level of validation.

Reviewers: Ismael Juma [email protected], Chia-Ping Tsai [email protected]

@mumrah mumrah added the do-not-merge PRs that are only open temporarily and should not be merged label Jan 8, 2025
@github-actions github-actions bot added small Small PRs build Gradle build or GitHub Actions labels Jan 8, 2025
@mumrah mumrah force-pushed the tmp-check-caching branch from e68973f to 85a5fde Compare January 15, 2025 14:55
@mumrah mumrah force-pushed the tmp-check-caching branch from 2d485d2 to 8539eb0 Compare January 15, 2025 19:45
@mumrah
Copy link
Member Author

mumrah commented Jan 15, 2025

Ok, there is a fundamental problem here. The pull_request target is building the merge commit of this PR against the base rather than just the PR contents. This means, the build will include changes on trunk which have not yet been cached.

When trunk is moving quickly, our PRs will have little hope to benefit from much caching.

For example:

(trunk) HEAD --- A --- B --- C
(PR) HEAD --- X --- Y --- Z --- C

If commit C was the last trunk commit to be built, there will be Gradle cache files for that commit. Commits A and B are still building. If the PR was simply building X, this would be fine and we would expect cache hits for anything not changed by X, Y, Z. However, the pull_request event will result in a build of something totally different:

(merge) HEAD --- A --- B --- C
            `X --- Y --- Z --- C

So when the PR is built, it will be fetching the latest cache (C), but will include file changes from A and B in addition to the PR changes. This greatly increases cache misses.


I think the merge queue might be a solution to this. If we do a full build as part of the merge queue, then no code will land on trunk that has not been built, tested, and cached. The risk with this approach is that flaky builds will prevent things from getting into trunk.

@ijuma @dajac thoughts?

@ijuma
Copy link
Member

ijuma commented Jan 15, 2025

One way to workaround the flaky test issue would be to increase the number of retries for failed tests when running the build via the merge queue.

That said, we'd want to measure how much time we're taking with this extra time versus time saved in PRs themselves.

@mumrah
Copy link
Member Author

mumrah commented Jan 16, 2025

Ok, I was able to modify the build to check out the PR at HEAD instead of the merge commit. This increase our cache hits a lot.

The recent job finished in around 1 hour because it only had to run :core:test and :streams:test (since those had failures in the trunk job, and so were not cached)

@mumrah mumrah changed the title Checking on the gradle cache KAFKA-18500 Build PRs at HEAD commit Jan 16, 2025
@mumrah mumrah removed the do-not-merge PRs that are only open temporarily and should not be merged label Jan 16, 2025
@mumrah mumrah requested review from chia7712 and ijuma January 16, 2025 19:26
@mumrah
Copy link
Member Author

mumrah commented Jan 16, 2025

I validated the Build Scan status checks on my fork
image

That shows all four build scans being reported to the commit (which will appear in PRs)

@github-actions github-actions bot removed the small Small PRs label Jan 17, 2025
@mumrah
Copy link
Member Author

mumrah commented Jan 18, 2025

@ijuma WDYT about this approach?

@ijuma
Copy link
Member

ijuma commented Jan 18, 2025

I haven't reviewed the code changes - the approach looks good.

@chia7712
Copy link
Member

I was able to modify the build to check out the PR at HEAD instead of the merge commit. This increase our cache hits a lot.

If we run tests against HEAD, can CI check for new tests or compilation issues in the PR? Alternatively, to maximize cache usage, could we automatically run the PR against the latest cached commit as a base?

@ijuma
Copy link
Member

ijuma commented Feb 23, 2025

@chia7712 compilation isn't affected by this PR, just the actual test execution.

The "Compile and Check" step still checks out the merge commit so we can keep that level of validation.

@chia7712
Copy link
Member

compilation isn't affected by this PR, just the actual test execution.

thanks - I overlooked the description :(

@mumrah
Copy link
Member Author

mumrah commented Feb 24, 2025

Alternatively, to maximize cache usage, could we automatically run the PR against the latest cached commit as a base?

This would require doing something similar to the merge ref, but for the latest cache SHA (instead of trunk HEAD). I think it's probably achievable, but maybe a bit tricky to do correctly. I imagine there are probably cases where the PR can't be merged into the cache SHA. For example, maybe the PR has already merged in trunk beyond the cached SHA and made some additional commits.

It still might be worth exploring though.

For now, I think this approach is easy to understand ("build the code in the PR branch"), should give us some caching boost, and hopefully causes few surprises.

@github-actions github-actions bot added the small Small PRs label Feb 25, 2025
Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

```commandline
git fetch origin
./committer-tools/update-cache.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this script requires us to login Github (or access key) - is it possible to use curl + jq to parse https://api.github.com/repos/apache/kafka/actions/caches?key=gradle-home-v1&&ref=refs/heads/trunk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I'm surprised that works! I would have assumed that even public APIs would need some kind of auth token.

I filed https://issues.apache.org/jira/browse/KAFKA-18903 for this

@mumrah mumrah merged commit d518176 into apache:trunk Mar 4, 2025
11 checks passed
azhar2407 pushed a commit to azhar2407/kafka that referenced this pull request Mar 4, 2025
The default checkout behavior for GitHub Actions is to use a special
merge ref which is equivalent to the base branch with the PR merged into
it. While this is crucial for checking compilation issues against trunk,
it significantly diminishes our ability to use any build caching.

This patch changes the JUnit test jobs to checkout the HEAD commit of the PR 
when building. The "Compile and Check" step still checks out the merge commit
so we can keep that level of validation.

Reviewers: Ismael Juma <[email protected]>, Chia-Ping Tsai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Gradle build or GitHub Actions small Small PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants