-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC: cargo-sbom #3553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
RFC: cargo-sbom #3553
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
- Feature Name: `cargo-sbom` | ||
- Start Date: 2023-11-01 | ||
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
This RFC adds an option to Cargo that emits a Software Bill of Materials (SBOM) alongside compiled artifacts. Similar to how Cargo emits split debug info or "dep-info" (.d) files, this change emits an SBOM in a Cargo-specific format alongside outputs in the `target` directory. External tooling or Cargo subcommands can consume this Cargo SBOM file and transform it into other SBOM formats such as SPDX or CycloneDX. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
A SBOM (software bill of materials) is a list of all components and dependencies used to build a piece of software. The two leading SBOM formats being adopted by industry are SPDX and CycloneDX. Both are still evolving and have multiple specification versions & data formats (JSON, XML). | ||
arlosi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
New government initiatives aimed at improving the security of the software supply chain such as the US "Executive Order on Improving the Nation's Cybersecurity" or the EU "Cyber Resilience Act" require a Software Bill of Materials. Generating accurate SBOMs with Cargo is currently difficult because, depending on target selection or activated features, the dependencies may be different. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we aim to participate in such schemes does that impose any new burdens on the project. E.g. would be inaccurate reports be considered a critical issue because it could possibly let security issues go undetected? Would, if individual EU countries implement directives on a national level which use more expansive wording that imposes additional requirements that cargo does not fulfill, that become a priority because we committed to providing "useful" SBOMs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As-is, we aren't providing the final SBOM artifact but information that can feed into it. This intentionally leaves a lot of that information to the caller to get. I expect the mix of end user and regulatory requirements to contradict (they already were in the Pre-RFC thread) which is why I'd want stricken from the RFC a future possibility of providing a final SBOM. We likely can't keep up, we likely can't maintain the compatibility requirements, and we likely can't satisfy them without knobs for everything.
The fun of "fit for use". We'll be providing a report of what information we have. There is more information, like from build scripts linking external libraries, that we can't provide. The usefulness of any of this is dependent on all parties involved cooperating. That said, for what we do provide, if there is a bug, does it need a CVE? Unsure? I'd personally just consult the security folks when it happens. This is less about direct attacks and more about the quality of monitoring. I do wonder if this would be useful as a more general unit-graph report (which it isn't far from). For example, watch tools could use this information to know what changes to watch for for future builds. That might ease some of the pressure on this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it would be good to clarify in advance how much we're promising here. I suspect that down the road some large institutional users will start relying heavily on SBOMs and make noises when it's not working as they want it to.
arlosi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For workspaces that generate multiple compiled artifacts, each artifact may have different dependencies referenced. Existing tools (see prior art section) attempt to approximate the correct dependency set, however precise dependency information for each compiled artifact is difficult without built-in Cargo support. Generating the SBOM at the same time as the compiled artifact allows precise dependency information to be emitted for each compiled artifact. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
The generation of SBOM information is controlled by Cargo's configuration. To enable SBOM generation, set the following configuration: | ||
|
||
```toml | ||
[build] | ||
sbom = true | ||
``` | ||
|
||
Or use the environment variable `CARGO_BUILD_SBOM=true`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What should we actually call this? And is this a build param or a profile param? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought a
If we use profiles, then it becomes harder for tooling wrapping Cargo to unconditionally enable it for the current run. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Profile does not preclude environment variables because the manifest profile is layered with the config profile. $ CARGO_PROFILE_DEV_OPT_LEVEL=10 cargo check
Checking utf8parse v0.2.1
Checking anstyle-query v1.0.0
Checking colorchoice v1.0.0
Checking anstyle v1.0.2
Checking strsim v0.10.0
Checking clap_lex v0.6.0 (/home/epage/src/personal/clap/clap_lex)
error: optimization level needs to be between 0-3, s or z (instead was `10`)
error: could not compile `anstyle-query` (lib) due to previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `clap_lex` (lib) due to previous error
error: could not compile `strsim` (lib) due to previous error
error: could not compile `utf8parse` (lib) due to previous error
error: could not compile `anstyle` (lib) due to previous error
error: could not compile `colorchoice` (lib) due to previous error But it looks like the layering is all-or-nothing so setting one value might be ignored or cause other values to be ignored. That might be too disruptive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should capture this reasoning within the RFC's rationale section There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 for having this in a profile, because I very much expect people will want to have this in some profiles and not in others. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Profile for now is a set of compiler settings. I am not sure if we want to expand the meaning of it to include SBOM. Also given SBOM is only meaningful to final artifacts, if we put it in profiles we need to document If people want to switch build configurations, should we work on stabilizing |
||
|
||
If enabled, an SBOM file will be placed next to each compiled artifact for `bin`, `staticlib` `cdylib` crate types in the `target` directory with the name `<crate_name>.cargo-sbom.json`. The SBOM will contain information about dependencies used to build the compiled artifact. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
The SBOM file generated by Cargo is *not* intended as a final SBOM artifact, but rather a precursor. Post-processing tooling can use the information produced here as part of building a final SBOM. | ||
|
||
The SBOM file will be written to disk before `rustc` is executed for the each artifact. This enables [`RUSTC_WORKSPACE_WRAPPER`](https://doc.rust-lang.org/cargo/reference/config.html#buildrustc-workspace-wrapper) to point at a program that can utilize the SBOM file to embed information into the binary if desired. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the SBOM precursor file will be written first, is there an intention to remove it if the production of the artifact, including perhaps the execution of any There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a strong opinion on this. Do you think it needs to be specified in the RFC? |
||
|
||
## Format | ||
The format will use JSON, but the exact format is not specified in this RFC. Additional fields can be added as needed. | ||
arlosi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
### Resolved Dependency Tree | ||
The SBOM will include the following information (if available) for each crate: | ||
epage marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- ID (opaque identifier) | ||
- Name | ||
- Version | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we uniquely identify one of several crates / build units within a package? The main situations for this
|
||
- Source (registry / git / path etc.) | ||
- Checksum | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need the checksum? For third-party SBOM formats, I would instead encourage them to own the checksum generation and worry people will reuse this and put their own expectations on what this means There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This text makes me wonder if "Checksum" is trying to capture version information for dependencies taken from a repository. Maybe "Checksum" and "Version" could be merged, so "Version" is the git sha when using git as your source for a crate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the challenge with replacing "version" entirely with "checksum" is that a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the git case the sha is already part of the "name": "futures",
"version": "0.4.0-alpha.0",
"id": "futures 0.4.0-alpha.0 (git+https://github.com/rust-lang/futures-rs#f9f8e690504529c2813caadabd85506756f8dc67)",
"source": "git+https://github.com/rust-lang/futures-rs#f9f8e690504529c2813caadabd85506756f8dc67", There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do these details need to be written to the sbom, or could they just be queried from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, there is one detail here that is not available from |
||
- Dependencies (list of IDs) | ||
- Type (normal, build) | ||
- Activated features | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we include env variables (I assume rustc reports to us what it read for us to fingerprint) or file paths (again, I assume rustc tells us what it read to fingerprint)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rustc mentions all read env vars in depinfo. This does not include env vars read by proc macros through std::env::var. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note this can also be moved to a future possibility so long as we ensure the format can support this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The NTIA released Minimum Elements for a SBOM. Currently, mainstream SBOM formats (e.g There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RFC 3052 intentionally made the author field optional. Any SBOM initiative that can't cope with anonymously/pseudonymously authored code is overreach imo. The code can be perfectly viable. And it doesn't change the fact that authors can vanish or give false contact information anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As is mentioned below this section, you can query other metadata by looking up the dependency in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I know what you mean. Maybe we're talking about two dimensions, and you're right from a security point of view. Simply from the point of view of this being a tool, providing the author field may be a better way for developers to use this tool to generate SBOM. :) |
||
|
||
If a crate is used as both a normal dependency and a build dependency that is separately compiled, then separate entries will exist in the dependency tree with the correct activated features listed for each instance. | ||
|
||
Checksum is an optional field, since only crates from registries have checksums. If a checksum is needed for a crate that comes from a path dependency for example, it will be up to the post-processing tool to produce an appropriate value. | ||
|
||
If further information is needed (such as license), then the post-processing tool can use `cargo metadata` or another mechanism to find it. | ||
|
||
### Resolved build confugration | ||
- Rust toolchain version | ||
- `RUSTFLAGS` | ||
- Current build profile name | ||
- Selected profile values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is any other There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or should we defer out all config to a future possibility so long as we make sure the format can support it? |
||
|
||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It introduces yet another SBOM format. However, the format is specifically designed to be used as an intermediate, to be converted to an industry-standard format by external tooling. | ||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
Since there is no consensus on a single SBOM format within the software industry, and existing formats are still evolving, Cargo should not pick an existing SBOM format. If Cargo were to use existing SBOM formats, multiple formats (and multiple versions of each format) would need to be supported. The task of generating a specific SBOM format is best left to applications outside Cargo or Cargo extension. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should call out that its not just SBOM format but also being compliant with internal and external regulations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not seeing what text resolved this so unresolving it |
||
|
||
Unfortunately it's difficult to extract accurate SBOM information with existing options. Using the `Cargo.lock` file or `cargo metadata` overincludes dependencies. Additionally, since Cargo has many different commands that produce compiled artifacts (build, test, bench, etc.) and each of these commands take arguments that can affect the dependency list it's difficult to ensure that the correct dependency list is used. | ||
|
||
Adding an option to `cargo metadata` to support resolver v2 would help with overinclusion of dependencies, but still makes it difficult to ensure the exact set of features, command-line arguments, and other options are taken into account. Additionally, since build scripts (`build.rs`) may impact the output, `cargo metadata` may need to execute them. | ||
epage marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Another alternative is to extract information by setting the `RUSTC_WRAPPER` environment variable, then capture feature flags and dependencies via a wrapper tool. This would require the wrapper tool to parse the rustc command line arguments to capture the set of feature flags and referenced dependencies. This approach would prevent other uses of `RUSTC_WRAPPER`, as well as being potentially fragile. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
* [RFC2801](https://github.com/rust-lang/rfcs/pull/2801): Proposes embedding dependency information directly into the binary. Implemented as the `cargo auditable` extension. | ||
* [cargo-auditable](https://github.com/rust-secure-code/cargo-auditable): Cargo extension that embeds a subset of the information described in this RFC directly into the binary. The JSON format used by this RFC could be based on the cargo-auditable format. | ||
* [cargo-cyclonedx](https://github.com/CycloneDX/cyclonedx-rust-cargo): Cargo extension to generate a CycloneDX SBOM. | ||
* [cargo-bom](https://github.com/sensorfu/cargo-bom): Cargo extension to generate a BOM in an ASCII format including license information. | ||
* [cargo build-plan (#5579)](https://github.com/rust-lang/cargo/issues/5579): Provides an option to emit a JSON representation of the commands to execute, without actually running them. This option has poor integration with `build.rs` and was [planned for deletion](https://github.com/rust-lang/cargo/issues/7614) in 2018. | ||
* [cargo unit graph (#8002)](https://github.com/rust-lang/cargo/issues/8002): Very similar to what this RFC intends on writing to disk. However, since unit-graph runs wihtout a build, it cannot take `build.rs` output into account. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
The exact specifics about what will be included in the SBOM and the specific JSON format are subject to change during the implementation of the RFC. | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
## Industry standard format | ||
If the software industry converges on a single, stable SBOM format, Cargo could directly emit it. The existing SBOM formats are currently changing too much at this time to standardize on a specific format. | ||
epage marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Additional fields can be added to the SBOM without a breaking change. | ||
arlosi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Build scripts | ||
Build scripts could communicate back to Cargo to inject additional dependencies into the SBOM. For example, if a crate builds `c` code and then links with it, it could emit a message that causes Cargo to read in a file describing the `c` dependency. | ||
``` | ||
cargo::sbom=<PATH> | ||
```` | ||
Cargo would then include the additional dependency information in the SBOM graph. | ||
|
||
## Embedding dependency information into binaries | ||
The implementation of [RFC2801](https://github.com/rust-lang/rfcs/pull/2801) could be based on the information provided by this RFC. |
Uh oh!
There was an error while loading. Please reload this page.