ci: add job to verify binary size #475

justus-camp-microsoft · 2024-12-12T23:12:08Z

This PR adds a job to diff binary sizes introduced in changes. As implemented, the action runs git merge-base to find a common ancestor with main, fetches a completed build from CI (it will try to up to 5 commits back in case CI hasn't completed for the commit returned by git merge-base), and outputs a diff.

GitHub actions with a pull_request trigger are unable to comment on PRs. As such, this implementation fails the check if the size difference is greater than a threshold. In the case where we're ok with the size increase, my understanding is that we can force the merge without the check passing.

jstarks · 2024-12-13T01:06:12Z

How is this going to be different from #458?

justus-camp-microsoft · 2024-12-13T18:15:13Z

I wasn't aware of that thread. I'll look into getting a baseline from a pipeline and using it for comparison.

justus-camp-microsoft · 2024-12-13T19:19:56Z

I took a look at FluidFramework, which I used to work on and has a bundle size check as part of their PR workflow. From what I can tell, their way of doing this is to traverse HEAD~n until it finds a completed build and does a size comparison with that. Their CI has a bot that leaves a comment with the comparison but doesn't look like it blocks merging of a PR. What do we think about that approach?

smalis-msft · 2024-12-16T16:22:22Z

Oooooh, prior art, nice.

I think the commit we want to compare against is whatever the merge is based on. That would allow us to get as good a measurement of "this PR adds X bytes compared to not having it" as possible. If that commit is still running through CI maybe we just wait for it? If it fails though then walking backwards on main/release does seem like a reasonable fallback strategy.

I think for ours we'd prefer to have a gate rather than just a comment, so long as there's some way for us to then override the block and say "yes this is acceptable". But a gate would prevent anyone from merging before the bot comments, for example. We could then have a dedicated size_override reviewers group that the gate requires sign off from to override or something.

Also, I'd like to make sure we're actually storing the whole built file that we're using to compare against, not just a pre-computed summary of it. That frees us up to do more complex and involved analysis in the future.

smalis-msft · 2024-12-17T16:58:44Z

Tagging #76

smalis-msft · 2025-01-03T16:32:49Z

Man this is exciting to see, the prior solution has been an annoyance for so long now.

smalis-msft · 2025-01-03T16:34:05Z

xtask/src/tasks/verify_size.rs

+        if total_diff > 100 {
+            anyhow::bail!("{} size verification failed: The total difference ({} KiB) is greater than the allowed difference ({} KiB).", self.new.display(), total_diff, 100);
+        }


We'll need some way to override this check on a PR level, some way to say "Yes this size diff is acceptable". Not sure what github allows us to do here.

GitHub doesn't really have a great way to do this. Ideally we could have this always be required and just hit an "override" button but afaik that's not possible. We're also unable to assign a review team through the action (as I painfully learned from trying to re-enable the unsafe reviewers assignment) because actions are scoped to the repo level and our review teams are scoped at the org level (no access).

My thought here is that we should have this action always succeed as long as it finishes all the way through and have it leave a comment on the PR with a summary of the size diff. The onus would be on the reviewer to look at the comment and make sure that the difference is acceptable.

If that's the best we can do then it's the best we can do I guess. Maybe include some big warning text in the comment if the diff is over a threshold.

We really should figure out some way to get review groups working though. Then we could have the unsafe reviewers group back and create a new binary size reviewer group for large diffs or something.

We'll need a PAT with org-level team read access and then the reviewer assignment would work. My understanding was that we don't want to deal with maintaining the PAT.

Can we have review teams coped to the repo instead of the org? I'm really not familiar with github, so I'm just spitballing. But yeah, maintaining a PAT has bitten us in the past, and definitely isn't ideal.

Looping back around to this - a GitHub action with a pull_request target is unable to comment on PRs (similar limitation to unsafe reviewers check) and as such I think our best bet here is to fail the action if it's over a threshold. In the case where it's over the threshold and we're ok with the size increase my understanding is that we can force the merge with the failing check.

I think that's OK for v1. But I think the pattern here to follow for v2 would be to create an additional workflow that depends on this one/is triggered by this one but comes from the base branch. That would allow it to safely have access to add comments, etc. I think this means using pull_request_target for that workflow.

Or maybe workflow_run.

My understanding here is that for dependent workflows like that we would need to use workflow_run, but that the token passed when triggered has the same permissions as the one triggering it (as in, it would get the pull_request token that doesn't have comment permissions). I could definitely be wrong here as I didn't try it.

In https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#workflow_run:

The workflow started by the workflow_run event is able to access secrets and write tokens, even if the previous workflow was not. This is useful in cases where the previous workflow is intentionally not privileged, but you need to take a privileged action in a later workflow.

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs

xtask/src/tasks/verify_size.rs

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs

daprilik · 2025-01-30T22:06:10Z

flowey/flowey_lib_hvlite/src/_jobs/check_openvmm_hcl_size.rs

+
+        let gh_token = ctx.get_gh_context_var().global().token();
+
+        ctx.req(build_openhcl_igvm_from_recipe::Request {


you're going to want to hit build_openvmm_hcl directly, and sidestep building all this other stuff we don't currently care about.

note that, in doing so, you won't be able to pass a recipe directly, and have it "just work". instead - you'll need to call recipe_details on a baseline recipe we want to verify the size of, and manually plumb through the returned openhcl_vmm-specific config data to the build_openvmm_hcl node

flowey/flowey_lib_hvlite/src/_jobs/check_openvmm_hcl_size.rs

log comparison commit

justus-camp-microsoft · 2025-01-31T20:02:34Z

Assuming CI passes, I think this should be about ready to go. Outstanding items as far as I'm aware (feel free to correct) are the following:

Adding an action with a workflow_run trigger that comments the output of the size check
Updating the diff flow to use the baseline artifact that will now be generated by CI on merges to main

daprilik · 2025-01-31T20:11:30Z

flowey/flowey_lib_common/src/git_merge_commit.rs

+impl SimpleFlowNode for Node {
+    type Request = Request;
+
+    fn imports(_ctx: &mut ImportCtx<'_>) {}


note that this node assumes the existince of git on the machine. this might not be a problem in practice, but for max correctness, you should take a dep on install_git

this ties into #475 (comment), insofar as you should be modeling that constraint at the pipeline level, vs. trying to infer the pipeline scenario within the job, based on the downstream parameters that are being passed in.

daprilik · 2025-01-31T20:13:42Z

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs

                pipeline.new_artifact(format!("{arch_tag}-openhcl-igvm-extras"));
+            let (pub_openhcl_baseline, _use_openhcl_baseline) =


this should only be published / built in CI configurations. otherwise, you're doing duplicate work with the check size task

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs

flowey/flowey_lib_hvlite/src/_jobs/build_and_publish_openhcl_igvm_from_recipe.rs

daprilik · 2025-01-31T20:16:58Z

flowey/flowey_lib_hvlite/src/_jobs/build_and_publish_openhcl_igvm_from_recipe.rs

@@ -112,6 +123,16 @@ impl SimpleFlowNode for Node {
            }
        }));

+        if let Some(built_openvmm_hcl) = size_check_openvmm_hcl {


the fact that you can accept a artifact_dir: ReadVar<PathBuf>, and then not populate it, is genuinely one of flowey's major warts / footguns. it would be really good to find some time to switch this infra over to instead use artifact_dir: WriteVar<PathBuf>, which would translate to the flowey compiler erroring-out if it detected a pipeline configuration that would result in a empty artifact.

flowey/flowey_lib_hvlite/src/build_openhcl_igvm_from_recipe.rs

daprilik

I think you've still got one iteration left, but this is looking pretty close to gtg

flowey/flowey_lib_hvlite/src/artifact_openvmm_hcl_sizecheck.rs

daprilik

Given the appetite to land this infrastructure, I'll go ahead and approve this PR.

As of this iteration, I don't think there are any correctness issues with this code. All my remaining feedback is related to writing idiomatic flowey code, which would be nice to resolve via one more iteration on this PR, but could also be solved in a follow-up cleanup PR (that either I, or Justus can take point on).

daprilik · 2025-02-03T21:35:29Z

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs

                    flowey_lib_hvlite::_jobs::build_and_publish_openhcl_igvm_from_recipe::Params {
                        igvm_files: igvm_recipes
+                            .clone()


I think this clone can go away now?

daprilik · 2025-02-03T21:36:53Z

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs

@@ -709,6 +723,30 @@ impl IntoPipeline for CheckinGatesCli {
                );

            all_jobs.push(job.finish());
+
+            if arch == CommonArch::X86_64 && matches!(config, PipelineConfig::Pr) {


we're publishing aarch64 baselines, but not actually checking them. is that expected (i.e: something we'll enable in a follow up-PR), or is the x86 gate leftover from an earlier iter?

daprilik · 2025-02-03T21:44:59Z

flowey/flowey_lib_common/src/git_merge_commit.rs

looking at this with fresh eyes, I realize there are still quite a few things here that tie this firmly to microsoft/openvmm (as hosted on GitHub), vs. being a truly generic node (that would live in flowey_lib_common).

namely:

assuming the origin is called origin

assuming the existence of pull/ branches on the remote

using get_gh_context_var directly

rather than trying to make this more generic, so it can live in flowey_lib_common, I think we should just go the other direction, and lean into the OpenVMM-ness of this logic, and move it to flowey_lib_hvlite, and call it gh_merge_commit.

daprilik · 2025-02-03T21:50:49Z

flowey/flowey_lib_hvlite/src/_jobs/check_openvmm_hcl_size.rs

+            openvmm_hcl_output: v,
+        });
+
+        let file_name = match target.common_arch().unwrap() {


this is the kind of logic you really want to be in the artifact_openvmm_hcl_sizecheck::resolve node. i.e: you have tight locality between the two bits of code that determine the "shape" of artifact. in addition, having the resolve node gives you a type-safe handle to the artifact content when you download it, vs. needing to do this sort of ad-hoc "unpacking" inline.

if you don't do that, then when the time comes to add more stuff into that artifact, you'll have zero compiler assistance in tracking down what code needs to be updated to support the new artifact.

daprilik · 2025-02-03T21:53:47Z

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs

@@ -693,6 +706,7 @@ impl IntoPipeline for CheckinGatesCli {
                        artifact_dir_openhcl_igvm: ctx.publish_artifact(pub_openhcl_igvm),
                        artifact_dir_openhcl_igvm_extras: ctx
                            .publish_artifact(pub_openhcl_igvm_extras),
+                        artifact_openhcl_verify_size_baseline: publish_baseline_artifact,


you shouldn't modify this existing build_and_publish_openhcl_igvm_from_recipe node to also serve "double-duty" to build this verify-size artifact.

simply add a new conditional dep_on(flowey_lib_hvlite::_jobs::build_and_publish_openvmm_hcl_baseline) that hangs off this pipeline.new_job(), and avoid this messy manual plumbing at the Job level.

justus-camp-microsoft · 2025-02-03T22:01:31Z

I can follow-up with remaining comments on another PR since we'll need to do a follow-up to start using the baseline artifact instead of igvm-extras anyways.

…rison (#906) As part of #475, I added a new baseline artifact for binary comparison but didn't use it yet as I wanted to wait for there to be builds to diff against but hadn't followed up yet. #895 renamed the binary in igvm-extras that we were using to diff and broke the comparison. This change uses the baseline artifact for comparison and keeps the binary size artifacts limited in scope.

first pass at adding job to verify gh binary size

d5e1b23

justus-camp-microsoft added 2 commits December 30, 2024 14:10

Merge branch 'main' into verify_size

3884a0f

comparison of binaries

99e745b

justus-camp-microsoft changed the title ~~ci: add job to verify binary size~~ WIP: ci: add job to verify binary size Jan 2, 2025

justus-camp-microsoft added 2 commits January 2, 2025 13:35

try comparing two of what should be the same binary

207280e

add done handle, run regen

40aa921

justus-camp-microsoft marked this pull request as ready for review January 2, 2025 22:20

justus-camp-microsoft requested review from a team as code owners January 2, 2025 22:20

justus-camp-microsoft added 3 commits January 2, 2025 14:37

fmt so that ci will actually run

f7ffdcb

clippy

9853a16

fix command line argument

e9854fa

smalis-msft reviewed Jan 3, 2025

View reviewed changes

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs Outdated Show resolved Hide resolved

smalis-msft reviewed Jan 3, 2025

View reviewed changes

xtask/src/tasks/verify_size.rs Outdated Show resolved Hide resolved

smalis-msft reviewed Jan 3, 2025

View reviewed changes

xtask/src/tasks/verify_size.rs Outdated Show resolved Hide resolved

smalis-msft reviewed Jan 3, 2025

View reviewed changes

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs Outdated Show resolved Hide resolved

justus-camp-microsoft added 6 commits January 3, 2025 10:04

swap old and new position

12c39aa

try to download hardcoded artifact

7fe1ab7

remove wrong parameter, clippy

17ac9d5

get merge head to run size check with

9b47097

output error

e9108a5

get stdout and stderr to debug ci issue

51cb2a3

daprilik reviewed Jan 30, 2025

View reviewed changes

flowey/flowey_lib_hvlite/src/_jobs/check_openvmm_hcl_size.rs Outdated Show resolved Hide resolved

Justus Camp and others added 6 commits January 30, 2025 23:22

some of PR feedback

763b746

some more cleanup

d99589b

fix cmd macro

4819b9e

only upload baseline artifact for x64 ship builds

830acb5

log comparison commit

a06a673

Merge pull request #4 from jstarks/verify_size

850a8d3

log comparison commit

jstarks previously approved these changes Jan 31, 2025

View reviewed changes

daprilik reviewed Jan 31, 2025

View reviewed changes

flowey/flowey_hvlite/src/pipelines/checkin_gates.rs Outdated Show resolved Hide resolved

daprilik reviewed Jan 31, 2025

View reviewed changes

flowey/flowey_lib_hvlite/src/_jobs/build_and_publish_openhcl_igvm_from_recipe.rs Outdated Show resolved Hide resolved

daprilik reviewed Jan 31, 2025

View reviewed changes

flowey/flowey_lib_hvlite/src/build_openhcl_igvm_from_recipe.rs Outdated Show resolved Hide resolved

daprilik suggested changes Jan 31, 2025

View reviewed changes

daprilik reviewed Jan 31, 2025

View reviewed changes

flowey/flowey_lib_hvlite/src/artifact_openvmm_hcl_sizecheck.rs Outdated Show resolved Hide resolved

justus-camp-microsoft added 3 commits January 31, 2025 12:46

some of comments

30576ca

new node for size comparison baseline

b416ed6

Merge remote-tracking branch 'upstream/main' into verify_size

df679b1

justus-camp-microsoft dismissed jstarks’s stale review via df679b1 January 31, 2025 22:50

clippy

4b45e65

daprilik approved these changes Feb 3, 2025

View reviewed changes

justus-camp-microsoft merged commit b7ec870 into microsoft:main Feb 3, 2025
26 checks passed

justus-camp-microsoft deleted the verify_size branch February 3, 2025 22:01

justus-camp-microsoft mentioned this pull request Feb 25, 2025

flowey: use baseline artifact instead of igvm-extras for binary comparison #906

Merged

justus-camp-microsoft mentioned this pull request Feb 26, 2025

flowey: follow-up to remaining binary size comparison comments #909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add job to verify binary size #475

ci: add job to verify binary size #475

justus-camp-microsoft commented Dec 12, 2024 •

edited

Loading

jstarks commented Dec 13, 2024

justus-camp-microsoft commented Dec 13, 2024

justus-camp-microsoft commented Dec 13, 2024

smalis-msft commented Dec 16, 2024

smalis-msft commented Dec 17, 2024

smalis-msft commented Jan 3, 2025

smalis-msft Jan 3, 2025 •

edited

Loading

justus-camp-microsoft Jan 7, 2025

smalis-msft Jan 7, 2025

justus-camp-microsoft Jan 7, 2025

smalis-msft Jan 7, 2025

justus-camp-microsoft Jan 8, 2025

jstarks Jan 8, 2025

jstarks Jan 8, 2025

justus-camp-microsoft Jan 8, 2025

jstarks Jan 29, 2025

daprilik Jan 30, 2025

justus-camp-microsoft commented Jan 31, 2025

daprilik Jan 31, 2025

daprilik Jan 31, 2025

daprilik Jan 31, 2025

daprilik Jan 31, 2025

daprilik left a comment

daprilik left a comment

daprilik Feb 3, 2025

daprilik Feb 3, 2025

daprilik Feb 3, 2025

daprilik Feb 3, 2025

daprilik Feb 3, 2025

justus-camp-microsoft commented Feb 3, 2025


		let gh_token = ctx.get_gh_context_var().global().token();

		ctx.req(build_openhcl_igvm_from_recipe::Request {

		pipeline.new_artifact(format!("{arch_tag}-openhcl-igvm-extras"));
		let (pub_openhcl_baseline, _use_openhcl_baseline) =

ci: add job to verify binary size #475

ci: add job to verify binary size #475

Conversation

justus-camp-microsoft commented Dec 12, 2024 • edited Loading

jstarks commented Dec 13, 2024

justus-camp-microsoft commented Dec 13, 2024

justus-camp-microsoft commented Dec 13, 2024

smalis-msft commented Dec 16, 2024

smalis-msft commented Dec 17, 2024

smalis-msft commented Jan 3, 2025

smalis-msft Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justus-camp-microsoft commented Jan 31, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daprilik left a comment

Choose a reason for hiding this comment

daprilik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justus-camp-microsoft commented Feb 3, 2025

justus-camp-microsoft commented Dec 12, 2024 •

edited

Loading

smalis-msft Jan 3, 2025 •

edited

Loading