Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLMv1: catalogd metas https endpoint proposal #1749

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

grokspawn
Copy link

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 31, 2025
Copy link
Contributor

openshift-ci bot commented Jan 31, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jan 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mandre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


## Proposal

This proposal introduces an additional HTTPS endpoint to an existing catalogd API. The existing HTTPS "all" endpoint will remain as a default option; the user will be able to enable this new capability via a feature gate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we talk about how we plan to deprecate "all" once the new endpoint is GA?


This option would require clients to query the entirety of the data (~21 MB for operatorhubio catalog) and parse the response to retrieve relevant information every time the client needs the data. Even if clients’ implement some form of caching, the first query the client does to catalogd server is still the dealbreaker. In a highly resource constrained environment (e.g. clusters in Edge devices), this basically translates to a chokepoint for the clients to get started.

- A “path hierarchy” based construction of API endpoints to expose filtered FBC metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are worried about the fact that query endpoint responses will almost always be incomplete, a middle ground might be an endpoint that returns all of the FBC metadata for a specific package, but I'm not sure that endpoint would provide the necessary latency requirements we're shooting for.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comes down to whether we require the new endpoint to always provide valid FBC.
When we start revising FBC schemas I think we're going to have to juggle this.
For example, if we revise olm.package.v2 which uses its package field self-referentially, then we also get package-scoped valid FBC without a change to this endpoint.
But how does a client request this? Does it have to request the v2 schema specifically?

@grokspawn grokspawn changed the title catalogd query https endpoint proposal catalogd metas https endpoint proposal Feb 4, 2025
@grokspawn grokspawn force-pushed the catalogd-web-api branch 3 times, most recently from 1e97427 to 41f501b Compare February 11, 2025 15:05
@grokspawn grokspawn marked this pull request as ready for review February 11, 2025 15:05
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 11, 2025

### Non-Goals

* Redesigning FBC schema to facilitate additional efficiencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very curious what @spadgett and @TheRealJon think about whether this enhancement, on its own (i.e. without further FBC schema evolution), will be a significant enough improvement for Console's use cases, that it is reasonable to keep schema evolution out-of-scope.

I ask because we may need to include the schema evoluation in scope in order to reference it in a graduation criteria for taking the combined OLM and Console changes for OLMv1 support to GA.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair take. All of this originated from our feature RFC, and it was concerned solely with what our team was going to implement in this iteration.
This doesn't seem aligned with this process' scope maybe, and we can adjust as makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to know for sure whether this will make significant improvements to the network latency issues. We still require almost all of the FBC data in order to populate the catalog view as designed. We hope that making more strategic asynchronous requests will help. The data model issues still remain.


### Non-Goals

* Redesigning FBC schema to facilitate additional efficiencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to know for sure whether this will make significant improvements to the network latency issues. We still require almost all of the FBC data in order to populate the catalog view as designed. We hope that making more strategic asynchronous requests will help. The data model issues still remain.

Copy link
Member

@spadgett spadgett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @grokspawn. Really appreciate all that you guys are doing to help the console team.

Comment on lines 386 to 389
The existing `all` endpoint also incentivizes clients to conserve resources via local cache to avoid making
many (potentially duplicate) requests. However, the OCP console proof of concept
required what was deemed an unsupportable amount of code and complexity to cache, decompose, and render the
complete FBC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On duplicate requests: If using cache control headers, this is presumably a quick check if the catalog has been modified and a 304 response. If the catalog has changed, we would want to refresh the cache anyway, so the extra request is actually desirable?

In practice, I think console will either be fetching the entire catalog or just one item. Generally, users work with extensions they've already installed much more often than they'd install a new extension, so we'll usually be getting one item. It's a steep cost to download everything for a single item, even if we only do it once per session. And depending how often the catalog updates, it could be many times per session if we refresh the cache.

I think this is more about performance than code complexity (although complexity is a consideration as well).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something that Joe mentioned in an earlier review and I interpreted as a drawback, but it feels more like expected behavior w.r.t. caching, and also frequently-updated catalogs incur more network bandwidth for /either/ endpoint.

-->
> 1. If a query comes in with `/api/v1/metas?package=foo`, should we include the blob with schema: `olm.package` and name: `foo`?

We feel that it is incorrect for the metas service endpoint to mutate the data model (specifically, to create a synthetic package attribute for the `olm.package` schema). To access all the data modeled for an installable package, separate queries need to be made for the package-level metadata (`schema=olm.package&name=foo`) versus the channel/bundle-level metadata (`package=foo`).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could probably use a little more explanation as to why this is even a question, ie include the context that olm.package objects do not contain packageName


#### Completeness
The previous `all` endpoint always returns valid FBC. The new service cannot make that promise,
so clients could make incorrect assumptions about the suitability of results. See Open Questions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a collection of valid FBC blobs not constitute valid FBC?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. Valid FBC blobs can include metas?schema=olm.channel&name=foo but this will not include bundles or packages, and would fail an opm validate call. I'll add some lang that we're contrasting "valid FBC element collection" with "well-formed FBC for a package and constituents" .

Copy link
Contributor

openshift-ci bot commented Feb 24, 2025

@grokspawn: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@grokspawn grokspawn changed the title catalogd metas https endpoint proposal OLMv1: catalogd metas https endpoint proposal Mar 19, 2025
@grokspawn
Copy link
Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants