Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination from cache KEP #5017

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

serathius
Copy link
Contributor

Create first draft of #4988 as provisional.

/cc @wojtek-t @deads2k @MadhavJivrajani @jpbetz

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 30, 2024
@dims
Copy link
Member

dims commented Dec 30, 2024

cc @mengqiy @chaochn47 @shyamjvs

As still some pagination requests will be delegated to etcd, we will monitor the
success rate by measuring the pagination cache hit vs miss ratio.

Consideration: Should we start respecting the limit parameter?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand - are we not respecting the limit parameter in the current iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently API server doesn't respect limit when serving RV="0". https://github.com/kubernetes/kubernetes/blob/6746df77f2376c6bc1fd0de767d2a94e6bd6cec1/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L806-L818

I think we should consider re-enabling limit for consistency, however we need to better understand consequences. Impact on client/server when we cannot serve pagination from cache, like with L7 LB or pagination taking more than 75s.

I'm not worried about clients, user setting limit should be already prepared to handle pagination as it is required when not setting RV.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually also worried about clients - I think I've seen cases of people doing that and relying on the baviour of lack of pagination for RV=0.
I'm not saying it's a hard-no, but we need to figure out the story here.

That said, I would put it explicitly out-of-scope for this KEP and add that explicitly as future work.


For setups with L4 loadbalancer apiserver can be configured with Goaway, which
requests client reconnects periodically, however per request probability should
be configured around 0.1%.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for 0.1% here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.1% is the recommended to configuration for GOAWAY kubernetes/kubernetes#88567

requests client reconnects periodically, however per request probability should
be configured around 0.1%.

For L7 loadbalancer the default algorithm usually is round-robin. For most LBs
Copy link
Contributor

@MadhavJivrajani MadhavJivrajani Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to better understand this, is the worst case here as follows (assume 3 API Servers A, B, C):

  1. Client hits API Server A: assuming the rv is cached, snapshot is created on receiving a LIST request with a limit parameter set
  2. Client hits API Server B: no snapshot present, we delegate to etcd
  3. Client hits API Server C: no snapshot present, we delegate to etcd

so the performance degenerates to the current situation without cached pagination with slight improvement for (1)? And if (1) also delegates to etcd in case the rv isn't cached, then perf degenerates to the current scenario of no cached pagination?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ^ is assuming all are on a version that has support for pagination from the cache. If there is one server that is on a minor version which does not have support, my understanding is that that would again be delegated to etcd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there is no regression, assuming that:

  • We will delegate continue requests, that we don't have cached responses for, to etcd.
  • We will not change how API server doesn't respect limit for RV="0".

MadhavJivrajani and others added 2 commits January 3, 2025 12:39
kep-4988: flesh out cached pagination procedure
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: serathius
Once this PR has been reviewed and has the lgtm label, please assign apelisse for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Since resourceVersions provide a global logical clock sequencing all events in the cluster, a snapshot
of the watchCache for this resourceVersion is retrieved using the resourceVersion as the key.
- The corresponding snapshot may not be present in the following 2 scenarios at an API Server:
- Snapshot has been cleaned up due to the 75s TTL (see below).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we switched that recently, with 75s still being default, but it now depends on request timeouts

As still some pagination requests will be delegated to etcd, we will monitor the
success rate by measuring the pagination cache hit vs miss ratio.

Consideration: Should we start respecting the limit parameter?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually also worried about clients - I think I've seen cases of people doing that and relying on the baviour of lack of pagination for RV=0.
I'm not saying it's a hard-no, but we need to figure out the story here.

That said, I would put it explicitly out-of-scope for this KEP and add that explicitly as future work.

- Support indices when paginating.
- Eliminate all paginated list requests to etcd.

## Proposal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just very briefly skimmed through it and I don't see alternatives.
The main alternative that I have is that:

  1. we do btree.Clone() on every watch event (and store it until TTL) - i.e. independently if there was a request about it or not
  2. we can now serve every request (e.g. different kube-apiserver risk goes away) up to TTL from cache from any kube-apiserver (we just use largest RV not greater then the provided one)

Unless the overhead of it is substantial (which I don't know) - that would be my preferred option. But I think we need some experiment to estimate the overhead.


#### Memory overhead

No, B-tree only store pointers the actual objects, not the object themselves.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - there is some overhead (as you also write below) - you just claim it's small (can we somehow quantify small?)

For L7 loadbalancer the default algorithm usually is round-robin. For most LBs
it should be possible to switch the algorithm to be based on source IP hash.
Even if that is not possible, stored snapshots will never be used and user will
not be able to benefit from the feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can expect providers to change their LB configuration...

@wojtek-t wojtek-t self-assigned this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants