-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-18500 Build PRs at HEAD commit #18449
Conversation
e68973f
to
85a5fde
Compare
2d485d2
to
8539eb0
Compare
Ok, there is a fundamental problem here. The When trunk is moving quickly, our PRs will have little hope to benefit from much caching. For example:
If commit C was the last trunk commit to be built, there will be Gradle cache files for that commit. Commits A and B are still building. If the PR was simply building X, this would be fine and we would expect cache hits for anything not changed by X, Y, Z. However, the
So when the PR is built, it will be fetching the latest cache (C), but will include file changes from A and B in addition to the PR changes. This greatly increases cache misses. I think the merge queue might be a solution to this. If we do a full build as part of the merge queue, then no code will land on trunk that has not been built, tested, and cached. The risk with this approach is that flaky builds will prevent things from getting into trunk. |
One way to workaround the flaky test issue would be to increase the number of retries for failed tests when running the build via the merge queue. That said, we'd want to measure how much time we're taking with this extra time versus time saved in PRs themselves. |
Ok, I was able to modify the build to check out the PR at HEAD instead of the merge commit. This increase our cache hits a lot. The recent job finished in around 1 hour because it only had to run |
@ijuma WDYT about this approach? |
I haven't reviewed the code changes - the approach looks good. |
If we run tests against HEAD, can CI check for new tests or compilation issues in the PR? Alternatively, to maximize cache usage, could we automatically run the PR against the latest cached commit as a base? |
@chia7712 compilation isn't affected by this PR, just the actual test execution.
|
thanks - I overlooked the description :( |
This would require doing something similar to the merge ref, but for the latest cache SHA (instead of trunk HEAD). I think it's probably achievable, but maybe a bit tricky to do correctly. I imagine there are probably cases where the PR can't be merged into the cache SHA. For example, maybe the PR has already merged in trunk beyond the cached SHA and made some additional commits. It still might be worth exploring though. For now, I think this approach is easy to understand ("build the code in the PR branch"), should give us some caching boost, and hopefully causes few surprises. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
```commandline | ||
git fetch origin | ||
./committer-tools/update-cache.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that this script requires us to login Github (or access key) - is it possible to use curl + jq
to parse https://api.github.com/repos/apache/kafka/actions/caches?key=gradle-home-v1&&ref=refs/heads/trunk
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I'm surprised that works! I would have assumed that even public APIs would need some kind of auth token.
I filed https://issues.apache.org/jira/browse/KAFKA-18903 for this
The default checkout behavior for GitHub Actions is to use a special merge ref which is equivalent to the base branch with the PR merged into it. While this is crucial for checking compilation issues against trunk, it significantly diminishes our ability to use any build caching. This patch changes the JUnit test jobs to checkout the HEAD commit of the PR when building. The "Compile and Check" step still checks out the merge commit so we can keep that level of validation. Reviewers: Ismael Juma <[email protected]>, Chia-Ping Tsai <[email protected]>
The default checkout behavior for GitHub Actions is to use a special
merge ref which is equivalent to the base branch with the PR merged into
it. While this is crucial for checking compilation issues against trunk,
it significantly diminishes our ability to use any build caching.
This patch changes the JUnit test jobs to checkout the HEAD commit of the PR
when building. The "Compile and Check" step still checks out the merge commit
so we can keep that level of validation.
Reviewers: Ismael Juma [email protected], Chia-Ping Tsai [email protected]