-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: add job to verify binary size #475
ci: add job to verify binary size #475
Conversation
How is this going to be different from #458? |
I wasn't aware of that thread. I'll look into getting a baseline from a pipeline and using it for comparison. |
I took a look at FluidFramework, which I used to work on and has a bundle size check as part of their PR workflow. From what I can tell, their way of doing this is to traverse |
Oooooh, prior art, nice. I think the commit we want to compare against is whatever the merge is based on. That would allow us to get as good a measurement of "this PR adds X bytes compared to not having it" as possible. If that commit is still running through CI maybe we just wait for it? If it fails though then walking backwards on main/release does seem like a reasonable fallback strategy. I think for ours we'd prefer to have a gate rather than just a comment, so long as there's some way for us to then override the block and say "yes this is acceptable". But a gate would prevent anyone from merging before the bot comments, for example. We could then have a dedicated size_override reviewers group that the gate requires sign off from to override or something. Also, I'd like to make sure we're actually storing the whole built file that we're using to compare against, not just a pre-computed summary of it. That frees us up to do more complex and involved analysis in the future. |
Tagging #76 |
Man this is exciting to see, the prior solution has been an annoyance for so long now. |
if total_diff > 100 { | ||
anyhow::bail!("{} size verification failed: The total difference ({} KiB) is greater than the allowed difference ({} KiB).", self.new.display(), total_diff, 100); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need some way to override this check on a PR level, some way to say "Yes this size diff is acceptable". Not sure what github allows us to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub doesn't really have a great way to do this. Ideally we could have this always be required and just hit an "override" button but afaik that's not possible. We're also unable to assign a review team through the action (as I painfully learned from trying to re-enable the unsafe reviewers assignment) because actions are scoped to the repo level and our review teams are scoped at the org level (no access).
My thought here is that we should have this action always succeed as long as it finishes all the way through and have it leave a comment on the PR with a summary of the size diff. The onus would be on the reviewer to look at the comment and make sure that the difference is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the best we can do then it's the best we can do I guess. Maybe include some big warning text in the comment if the diff is over a threshold.
We really should figure out some way to get review groups working though. Then we could have the unsafe reviewers group back and create a new binary size reviewer group for large diffs or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need a PAT with org-level team read access and then the reviewer assignment would work. My understanding was that we don't want to deal with maintaining the PAT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have review teams coped to the repo instead of the org? I'm really not familiar with github, so I'm just spitballing. But yeah, maintaining a PAT has bitten us in the past, and definitely isn't ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looping back around to this - a GitHub action with a pull_request
target is unable to comment on PRs (similar limitation to unsafe reviewers check) and as such I think our best bet here is to fail the action if it's over a threshold. In the case where it's over the threshold and we're ok with the size increase my understanding is that we can force the merge with the failing check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's OK for v1. But I think the pattern here to follow for v2 would be to create an additional workflow that depends on this one/is triggered by this one but comes from the base branch. That would allow it to safely have access to add comments, etc. I think this means using pull_request_target
for that workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe workflow_run
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding here is that for dependent workflows like that we would need to use workflow_run
, but that the token passed when triggered has the same permissions as the one triggering it (as in, it would get the pull_request
token that doesn't have comment permissions). I could definitely be wrong here as I didn't try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow started by the workflow_run event is able to access secrets and write tokens, even if the previous workflow was not. This is useful in cases where the previous workflow is intentionally not privileged, but you need to take a privileged action in a later workflow.
|
||
let gh_token = ctx.get_gh_context_var().global().token(); | ||
|
||
ctx.req(build_openhcl_igvm_from_recipe::Request { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're going to want to hit build_openvmm_hcl
directly, and sidestep building all this other stuff we don't currently care about.
note that, in doing so, you won't be able to pass a recipe directly, and have it "just work". instead - you'll need to call recipe_details
on a baseline recipe we want to verify the size of, and manually plumb through the returned openhcl_vmm-specific config data to the build_openvmm_hcl
node
Assuming CI passes, I think this should be about ready to go. Outstanding items as far as I'm aware (feel free to correct) are the following:
|
impl SimpleFlowNode for Node { | ||
type Request = Request; | ||
|
||
fn imports(_ctx: &mut ImportCtx<'_>) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that this node assumes the existince of git
on the machine. this might not be a problem in practice, but for max correctness, you should take a dep on install_git
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ties into #475 (comment), insofar as you should be modeling that constraint at the pipeline level, vs. trying to infer the pipeline scenario within the job, based on the downstream parameters that are being passed in.
pipeline.new_artifact(format!("{arch_tag}-openhcl-igvm-extras")); | ||
let (pub_openhcl_baseline, _use_openhcl_baseline) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should only be published / built in CI configurations. otherwise, you're doing duplicate work with the check size task
flowey/flowey_lib_hvlite/src/_jobs/build_and_publish_openhcl_igvm_from_recipe.rs
Outdated
Show resolved
Hide resolved
@@ -112,6 +123,16 @@ impl SimpleFlowNode for Node { | |||
} | |||
})); | |||
|
|||
if let Some(built_openvmm_hcl) = size_check_openvmm_hcl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fact that you can accept a artifact_dir: ReadVar<PathBuf>
, and then not populate it, is genuinely one of flowey's major warts / footguns. it would be really good to find some time to switch this infra over to instead use artifact_dir: WriteVar<PathBuf>
, which would translate to the flowey compiler erroring-out if it detected a pipeline configuration that would result in a empty artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've still got one iteration left, but this is looking pretty close to gtg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the appetite to land this infrastructure, I'll go ahead and approve this PR.
As of this iteration, I don't think there are any correctness issues with this code. All my remaining feedback is related to writing idiomatic flowey code, which would be nice to resolve via one more iteration on this PR, but could also be solved in a follow-up cleanup PR (that either I, or Justus can take point on).
flowey_lib_hvlite::_jobs::build_and_publish_openhcl_igvm_from_recipe::Params { | ||
igvm_files: igvm_recipes | ||
.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this clone can go away now?
@@ -709,6 +723,30 @@ impl IntoPipeline for CheckinGatesCli { | |||
); | |||
|
|||
all_jobs.push(job.finish()); | |||
|
|||
if arch == CommonArch::X86_64 && matches!(config, PipelineConfig::Pr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're publishing aarch64 baselines, but not actually checking them. is that expected (i.e: something we'll enable in a follow up-PR), or is the x86 gate leftover from an earlier iter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this with fresh eyes, I realize there are still quite a few things here that tie this firmly to microsoft/openvmm
(as hosted on GitHub), vs. being a truly generic node (that would live in flowey_lib_common
).
namely:
- assuming the origin is called
origin
- assuming the existence of
pull/
branches on the remote - using
get_gh_context_var
directly
rather than trying to make this more generic, so it can live in flowey_lib_common
, I think we should just go the other direction, and lean into the OpenVMM-ness of this logic, and move it to flowey_lib_hvlite
, and call it gh_merge_commit
.
openvmm_hcl_output: v, | ||
}); | ||
|
||
let file_name = match target.common_arch().unwrap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the kind of logic you really want to be in the artifact_openvmm_hcl_sizecheck::resolve
node. i.e: you have tight locality between the two bits of code that determine the "shape" of artifact. in addition, having the resolve node gives you a type-safe handle to the artifact content when you download it, vs. needing to do this sort of ad-hoc "unpacking" inline.
if you don't do that, then when the time comes to add more stuff into that artifact, you'll have zero compiler assistance in tracking down what code needs to be updated to support the new artifact.
@@ -693,6 +706,7 @@ impl IntoPipeline for CheckinGatesCli { | |||
artifact_dir_openhcl_igvm: ctx.publish_artifact(pub_openhcl_igvm), | |||
artifact_dir_openhcl_igvm_extras: ctx | |||
.publish_artifact(pub_openhcl_igvm_extras), | |||
artifact_openhcl_verify_size_baseline: publish_baseline_artifact, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't modify this existing build_and_publish_openhcl_igvm_from_recipe
node to also serve "double-duty" to build this verify-size artifact.
simply add a new conditional dep_on(flowey_lib_hvlite::_jobs::build_and_publish_openvmm_hcl_baseline)
that hangs off this pipeline.new_job()
, and avoid this messy manual plumbing at the Job level.
I can follow-up with remaining comments on another PR since we'll need to do a follow-up to start using the baseline artifact instead of igvm-extras anyways. |
…rison (#906) As part of #475, I added a new baseline artifact for binary comparison but didn't use it yet as I wanted to wait for there to be builds to diff against but hadn't followed up yet. #895 renamed the binary in igvm-extras that we were using to diff and broke the comparison. This change uses the baseline artifact for comparison and keeps the binary size artifacts limited in scope.
This PR adds a job to diff binary sizes introduced in changes. As implemented, the action runs
git merge-base
to find a common ancestor with main, fetches a completed build from CI (it will try to up to 5 commits back in case CI hasn't completed for the commit returned bygit merge-base
), and outputs a diff.GitHub actions with a
pull_request
trigger are unable to comment on PRs. As such, this implementation fails the check if the size difference is greater than a threshold. In the case where we're ok with the size increase, my understanding is that we can force the merge without the check passing.