Skip to content

Conversation

@Alexander-Kita
Copy link
Contributor

@Alexander-Kita Alexander-Kita commented Nov 21, 2025

Fixes #7324

Proposed Changes

  • Pause scraping pods when activator in data path (excess burst capacity < 0)
  • Resume when excess burst capacity >= 0

Feedback needed on the following items (more emphasis on the first):

  1. In the current implementation, due to Improve SKS handling for unavailable Activator. #13027, in the circumstance that excess burst capacity < 0 AND there are no activator endpoints, then there might be metrics missed since SKS forces "serve" mode. Is there a way to float the status ("proxy" or "serve") to the autoscaler? Or, another way to account for this?

  2. Writing unit tests for this situation

Release Note

NONE

@knative-prow knative-prow bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2025
@knative-prow
Copy link

knative-prow bot commented Nov 21, 2025

Hi @Alexander-Kita. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow knative-prow bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 21, 2025
@knative-prow
Copy link

knative-prow bot commented Nov 21, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Alexander-Kita
Once this PR has been reviewed and has the lgtm label, please assign skonto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 34.61538% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.96%. Comparing base (090b6ae) to head (40751ea).
⚠️ Report is 39 commits behind head on main.

Files with missing lines Patch % Lines
pkg/autoscaler/metrics/collector.go 22.72% 16 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16254      +/-   ##
==========================================
- Coverage   80.05%   79.96%   -0.09%     
==========================================
  Files         215      215              
  Lines       13327    13354      +27     
==========================================
+ Hits        10669    10679      +10     
- Misses       2300     2315      +15     
- Partials      358      360       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 4, 2025
@dprotaso
Copy link
Member

dprotaso commented Dec 4, 2025

/ok-to-test

@knative-prow knative-prow bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 4, 2025
@dprotaso
Copy link
Member

dprotaso commented Dec 4, 2025

The e2e failures seem legit

@dprotaso
Copy link
Member

dprotaso commented Dec 4, 2025

Generally after a quick look I like the abstractions used. Though I'm guessing there's something more nuanced that's causing this change to break the e2e tests

@knative-prow
Copy link

knative-prow bot commented Dec 4, 2025

@Alexander-Kita: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
istio-latest-no-mesh_serving_main 40751ea link true /test istio-latest-no-mesh

Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Alexander-Kita
Copy link
Contributor Author

Alexander-Kita commented Dec 9, 2025

I believe I found what is causing these e2e failures. The activator (concurrency_reporter) appears to stop collecting metrics when it sees zero concurrency in the service:

from pkg/activator/handler/concurrency_reporter.go

		// This is only 0 if we have seen no activity for the entire reporting
		// period at all.
		if report.AverageConcurrency == 0 {
			toDelete = append(toDelete, key)
		}

This appears to trigger too early and stop sending metrics (since no concurrency is seen), which is preventing the service from ever scaling to zero since pods are no longer scraped. I added a buffer to test this out (it has to see zero 3 times before stopping) and it passed the e2e test when I ran it. This behavior was probably hidden since we were still scraping metrics while the activator was in the path. How do you recommend I approach a solution to this, if one is still wanted? @dprotaso

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't scrape pods via HTTP if the activator in path.

2 participants