Limit table detail requests in controller UI#18615
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18615 +/- ##
============================================
- Coverage 64.39% 64.36% -0.04%
Complexity 1137 1137
============================================
Files 3336 3336
Lines 206096 206096
Branches 32145 32145
============================================
- Hits 132721 132656 -65
- Misses 62714 62778 +64
- Partials 10661 10662 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
yashmayya
left a comment
There was a problem hiding this comment.
Nice idea throttling these — on big clusters the all-at-once fan-out is a real problem. One concern on the error path though (inline).
| const worker = async () => { | ||
| while (nextIndex < tasks.length) { | ||
| const currentIndex = nextIndex++; | ||
| results[currentIndex] = await tasks[currentIndex](); |
There was a problem hiding this comment.
If a task rejects, this await throws and the worker's while-loop exits — so that worker stops pulling tasks. After a handful of failures the whole pool is dead and the remaining tables never get their requests fired (rows stuck on 'Loading'). The old per-table forEach isolated failures. Maybe wrap the task call in try/catch here so one bad request doesn't starve the rest?
| } | ||
| ); | ||
|
|
||
| await Promise.all([segmentDetails, tableSizes]); |
There was a problem hiding this comment.
getTableSizes/getSegmentCountAndStatus reject when there's no response (e.g. a timeout — result.data is undefined), so this Promise.all rejects and bubbles all the way up to runWithConcurrencyLimit uncaught. That's exactly the large-cluster case this PR is meant to help. Probably want a .catch per request so a failing table just keeps its placeholder instead of taking down the rest.
|
|
||
| const nextTick = () => new Promise((resolve) => setTimeout(resolve, 0)); | ||
|
|
||
| test('runWithConcurrencyLimit caps in-flight tasks', async () => { |
There was a problem hiding this comment.
This only covers the all-resolve path — since a rejection kills the worker, a test where one task rejects is the valuable one to add. Also worth flagging: npm test is still the no-op stub, so this spec doesn't actually run in CI yet.
Summary
Root cause
The table listing loads all tables first, then starts detail requests for every table immediately. On clusters with hundreds of tables, that can overwhelm browsers with many concurrent
/tables/{table}/sizeand segment status requests, matching the behavior reported in #11370.Addresses #11370.
Testing
rm -rf /tmp/pinot-ui-request-limiter-test && ./node_modules/.bin/tsc app/utils/requestLimiter.ts app/utils/requestLimiter.test.ts --outDir /tmp/pinot-ui-request-limiter-test --module commonjs --target es2020 --types node --skipLibCheck --esModuleInterop && node --test /tmp/pinot-ui-request-limiter-test/requestLimiter.test.jsnpm run buildgit diff --checkNotes
./node_modules/.bin/eslint app/utils/requestLimiter.ts app/utils/requestLimiter.test.ts --quietcould not run cleanly because the current ESLint config references missing rules (react/jsx-key,import/no-extraneous-dependencies,react/jsx-filename-extension,import/extensions).