-
Notifications
You must be signed in to change notification settings - Fork 270
Validate target environments #2806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
30aa1fc
to
3b25dad
Compare
e74895b
to
828cb70
Compare
Checking environments adds an appreciable bump to the time to run We could, of course, assume that environments are immutable, and key the cache by package reference instead of digest. But that would certainly be an assumption and not guaranteed to be true. |
This is now sort of a thing and can possibly be looked at. Outstanding questions:
Possibly longer term questions:
If folks want to play with this, add the following to your favourite [application]
targets = ["spin:[email protected]"] and set your wkg config (~/.config/wasm-pkg/config.toml) to: default_registry = "registrytest-vfztdiyy.fermyon.app"
[registry."registrytest-vfztdiyy.fermyon.app"]
type = "oci" (No, that is not the cat walking across the keyboard... this is my test registry which backs onto my ghcr.) |
I'd suggest "next to the code that implements them"; ideally generated by that code. |
crates/build/src/manifest.rs
Outdated
let dt = deployment_targets_from_manifest(&manifest); | ||
Ok((bc, dt, Ok(manifest))) | ||
} | ||
Err(e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me what's happening here. It reads like we're trying to get build and deployment configs even when we can't parse the manifest? I think some terse one-line comments on each branch of this match
would go a long way to helping this be a bit more understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code more, I'm unsure why we go through all these great lengths to read the targets config here when later on we only seem to run the targets check if the manifest was successfully parsed. Won't this information just be thrown away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rylev You are absolutely right - this was a holdover from an earlier iteration before I realised I was going to need to full manifest - thanks for catching it. I've pared this back to "has deployment targets", which I think is worth keeping so we can warn if the manifest errors have caused us to bypass checking.
crates/build/src/manifest.rs
Outdated
Ok((bc, dt, Ok(manifest))) | ||
} | ||
Err(e) => { | ||
let bc = fallback_load_build_configs(&manifest_file).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's some need for better error messages here through liberal use of .context
. As the code stands now, component_build_configs
function might return an error saying only "expected table found some other type" which would be very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection these should preserve (and immediately return) the original manifest load error rather than blatting it with whatever went awry during the fallback attempt.
crates/build/src/manifest.rs
Outdated
let table: toml::value::Table = toml::from_str(&manifest_text)?; | ||
let target_environments = table | ||
.get("application") | ||
.and_then(|a| a.as_table()) | ||
.and_then(|t| t.get("targets")) | ||
.and_then(|arr| arr.as_array()) | ||
.map(|v| v.as_slice()) | ||
.unwrap_or_default() | ||
.iter() | ||
.filter_map(|t| t.as_str()) | ||
.map(|s| s.to_owned()) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would serializing to a type through serde::Deserialize
making this easier to read?
crates/build/src/manifest.rs
Outdated
@@ -57,6 +75,30 @@ async fn fallback_load_build_configs( | |||
}) | |||
} | |||
|
|||
async fn fallback_load_deployment_targets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is "deployment targets" the right nomenclature? That sounds like what you would be targeting for spin deploy
and not the environment you're targeting. Perhaps we could play with the wording here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rylev I'm very open to naming on "deployment targets." The current name emerged in a woolly manner from a sense of "these are possible targets for deployment" or "the target environments we want to be able to deploy to." I do feel, albeit not strongly, that it's worth specifying 'deployment' (cf. e.g. 'compilation target') - but for sure let's improve this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"runtime targets"?
async fn load_component_source(&self, source: &Self::Component) -> anyhow::Result<Vec<u8>>; | ||
async fn load_dependency_source(&self, source: &Self::Dependency) -> anyhow::Result<Vec<u8>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we implementing ComponentSourceLoader
more than once? I'm a bit lost why we need the flexibility of defining the type of dependency and component instead of hard coding. Can you explain what this buys us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current build system doesn't create a lockfile. The current composition implementation depends on the locked types (because it runs at trigger time). This allows us to implement composition on the raw AppManifest as well as on the LockedApp.
"# | ||
); | ||
|
||
let doc = wac_parser::Document::parse(&wac_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should implement a programatic API in wac
for this so that we don't need to manipulate a wac script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I tried constructing a wac document AST, which seemed cleaner than parsing a document, but I foundered on - if memory serves - something as trivial as the miette
spans. I tried it with dummy spans but something internal seems to have been trying to use the length of the name in error or something and anyway there was anguish. Using the underlying model would have been awesome, but was (for me) hard to understand and to line up inputs for.
crates/environments/src/loader.rs
Outdated
} | ||
|
||
pub async fn load_and_resolve_all<'a>( | ||
app: &'a spin_manifest::schema::v2::AppManifest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: seems like we pass app
and resolution_context
around a lot. It might be nicer to put those into a Resolver
struct and implement these functions as methods on that struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had an explore of this, and have pushed something like it as a separate commit for easy reversion. I ended up naming the proposed struct ApplicationToValidate
because otherwise I ended up with no "application" visible in a module that was ostensibly all about validating applications. I still have mixed feelings but there's definitely some nice aspects to encapsulating the component loading stuff - I was able to tuck some more clutter away and split things up in a way that's hopefully more readable...
@itowlson I think that's an okay assumption to make, personally. Perhaps paired with a way to flush the local cache explicitly? |
We could also consider introducing something like |
4870bba
to
531e4fb
Compare
Okay I have pencilled in |
Aren't environments also just packages though? Ones that we apply some additional semantics to, yes, but not ones that are someone non-standard. Which actually raises the question: could we use the exact packages to generate bindings for Spin and the SDKs as well? (Not necessarily as part of this effort, but as a future thing.) |
51a09f4
to
76cb8af
Compare
Implemented caching on the basis of immutability as per Till's comment #2806 (comment) (and double checked on Windows). I'll have a think about the lockfile idea. A lockfile mapping environment IDs to digests would allow us to reuse the existing Wasm by-digest caching infra instead of having something custom for name-based lookup. If we generated it into ETA: I implemented Till's lockfile-based cache strategy. It does make some things cleaner for sure. I've done it as a separate commit for now, but will squash once we agree on it. |
b93f02e
to
651aca5
Compare
Signed-off-by: itowlson <[email protected]>
651aca5
to
8427a07
Compare
Signed-off-by: itowlson <[email protected]>
ae77cd9
to
86f61b8
Compare
Signed-off-by: itowlson <[email protected]>
I did some unscientific performance testing with this. Validation for local or HTTP components is quick once registry targets are cached. If they have local dependencies, then composition before validation is also quick. However, if a component has a registry dependency, then that seems to be not cacheable, and that does introduce a perceptible delay. I recall in the previous work on this I tested whether I could speed it up by checking the registry digest and re-downloading only if changed, but it turned out that even just checking fetching the digest was slow. So I am not sure of the way round this. (ETA: for clarity: the initial download of a registry env is a perceptible delay; and we do not currently check for updates (because of the perf hit). You need to delete the lockfile to force re-download. So environments once published must be stable.) |
Yes and no. I found that the canonical We may also find it tricky to manage trigger plugins if we try to unify. E.g. would we want the world in the Spin repo to list the cron trigger export? But the target environments thing is going to have to allow for it I expect. |
The environment package I am using for testing: https://github.com/itowlson/env-wits The (It is published at |
Okay another nasty with the "environments are just packages" thought process is that environments do a bit more than "world" packages, in that they have to map triggers to worlds. This bit me while trying to do a wasi-http environment. There's nothing in the vanilla wasi:http package that says "this world handles the HTTP trigger." At the moment I use a naming convention for worlds to say "this world is supported by that trigger," but that's not going to work for a world called A possible approach is to search through all worlds in the package and see if the component matches any of them, but you often find worlds in the deps, and that doesn't mean Spin supports them. For example, the Spin deps tree contains the On the other hand it would be really nice to unify because it would be really nice to identify target environments with packages rather than having to do some evasive importing and dodgy namespacing (cf. https://github.com/itowlson/env-wits/blob/main/wasi-http-0.2/world.wit which is pretty obnoxious). So I am very much open to ideas here - this may be yet another time I have gotten too close to the problem and can't see the obvious solution! (ETA: although I am not sure wasi:http is a useful world for this because Rust at least seems unable to elide its wasi:environment import so 🤷; perhaps it would be better to define the minimal world as |
I think that's not going to work, and that we'll instead have to have some kind of mapping. Even if we were okay with requiring trigger and/or world naming schemes that would make such a mapping viable, it'd not account for versions: Imagine a situation where people start generating WASIp3 HTTP components, and then try to run them in a version of Spin that doesn't yet support p3: a name based mapping would make the check pass, whereas in reality we couldn't run the component. Would it be possible to change the definitions of triggers to include a set of target world names and versions? I can't really think of another viable solution, and this would also address the issue with the fileserver component you mentioned above. |
If we allow triggers to say which target worlds they support, then we'd presumably also allow trigger plugins to do so. I had always assumed that we'd support fetching target world definitions from more than a single place, so my expectation would be that the plugin would say which world(s) it supports, and where to find their definitions. |
Can you say more about why that's not cacheable? Shouldn't we cache registry dependencies entirely independently from target worlds, because you absolutely don't want to refetch them all the time?
What's different about how we use dependencies over how something like npm or cargo do? For registry dependencies we have a version number, and I think it's more than reasonable to not check whether the registry changed its mind and decided to send something different for the same version. |
I'm not sure what you're envisaging. Are you suggesting Spin should have hardwired knowledge of all trigger type names and what worlds they map to? |
Sorry, it is cacheable and indeed cached, but it is cached by digest. So until we know the digest...
I dunno about npm but Cargo versions are (as you know) immutable. Once you've got serde 1.2.3, you can rely on it never changing, never needing to be re-downloaded. OCI registries don't offer that guarantee.
This would be a change of behaviour for composition, and given that registry tags are mutable, could lead to undesired outcomes if the registry does change its mind. I guess the user could delete their cache. I can do this if people favour it but it is not what composition has done in the past. |
I feel like we've had a lot of design discussions before and this has not been expressed. I'm sorry I didn't communicate what I'd been building more clearly but I did think we had been over the "what does an environment look like and why is it not a world" topic, and I thought I'd described how I proposed to tackle it, but maybe that was to someone other than you. We had previously talked about identifying an environment by a string such as |
Oh, is the idea that Spin should invoke the trigger binary with a |
Not Spin, but a trigger implementation should. And does, in fact, because it contains bindings generated from WIT worlds. I don't want to trivialize the effort required to make this information available in validation, but I do think that this isn't anything fundamentally novel. |
Ah, serves me right for replying before reading all comments, sorry. That seems like a viable option, yes. Perhaps done only once when installing a trigger plugin, instead of at validation time |
We absolutely should, and I agree that now seems to be a very good time to do so. To the particular point of target worlds: I don't see how we could centralize them without also centralizing trigger plugins, which doesn't seem desirable. I'm actually pretty sure that I undercommunicated about this, and want to apologize for that. |
Centralisation would be undesirable for sure, but I'm not talking about centralisation. E.g. the current implementation allows you to specify a registry-package pair so anyone can host an environment definition. But I am struggling to understand how we reconcile "an environment is a package" with "fetch definitions from multiple places." Sorry for the lack of clarity. |
Does all tooling call home for all artifacts all the time, then? That seems very unfortunate if so. And if not, are there patterns we could follow? But also, doesn't this mean we have to do the digest check when running |
I believe we do do the digest check at |
In that case, is there any potential for combining these checks? Alternatively we could consider caching the results at least for a few minutes, so we'd effectively combine them in most cases: |
So this takes us back into "what is There is certainly scope in the As a related (but different) example, #3117 asks us to validate component composition as part of In a sense the trouble is that getting from source to running in Spin has (at least) three phases:
And right now, we strongly couple the load-and-munge step with the execute step, to the point where you cannot do the load-and-munge step without doing the execute step. (Well, you can. And I have no idea what that looks like, yet. But that's where the desire for pre- Sorry I know this seems like a bit of a wild ride from a simple "hey let's cache things" but hopefully it gives some sense of what's going through my mind when people talk about doing more in build. And these are early thoughts and very much in flux so yeah not sure about anything really |
It occurs to me that the long ramble above also ties in with some of the discussion/concerns around the multiple build profiles SIP. |
Okay, on the subject of centralising and decentralising and getting WITs from different sources: My original stab at this (okay, one of my original stabs at this) was that an "environment" was not a WIT package but a document which mapped triggers to possible WIT worlds. This allowed for the "I want to target Spin 3.2" declaration while not requiring worlds to adhere to any convention or be hosted in the same place or whatever. E.g. # spin:[email protected] target (example only!), hosted on e.g. spinframework.dev
# you'd reference this as `targets = ["spin:[email protected]"]`
http = ["spin:up/[email protected]", { registry = "fermyon.com", world = "fermyon:spin/[email protected]" }] # any of these is acceptable
redis = ["fermyon:spin/inbound-redis"]
default= { source = "trigger" } # runs e.g. spin trigger whatever --world-me-up for the WIT
# wasi:[email protected] target (example only!), hosted on e.g. wasi.dev
# you'd reference this as `targets = [{ registry = "wasi.dev", name = "wasi:[email protected]"]`
http = [{ registry = "bytecodealliance.org", package = "wasi:http/[email protected]" }] We moved away from this because of the wish for the environment to be itself a WIT package, allowing it to be used for bindings etc. But that puts us into the sticky situation of needing to infer a relationship between triggers and worlds. It's not enough to say "here is the complete
The current PR (at time of writing) uses a naming convention to express this relationship, but this is problematic (shorthand for "I found it inconvenient" or "Till recoiled in horror", take your pick). We really just want to be able to point to the There are a couple of places where I don't think we can do the mappings:
Again sorry for the rambly post, I am just trying to get down the forces at work and the thought processes and options which I've explored. |
EXTREMELY WIP.There are plenty of outstanding questions here (e.g.
environment definition,Redis trigger,hybrid componentisation), and reintegration work (e.g. composing dependencies before validating)(ETA: component dependencies now work).In its current state, the PR is more for visibility than out of any sense of readiness. Also, it has not had any kind of tidying pass, so a lot of naming is not great.But it does work, sorta kinda, some of the time. So we progress.
cc @tschneidereit