Skip to content

[GatewayV2] Fix over-spanning issue when partial partition keys are provided.#49688

Open
jeet1995 wants to merge 9 commits into
Azure:mainfrom
jeet1995:fix/thinclient-multihash-prefix-epk-overspan
Open

[GatewayV2] Fix over-spanning issue when partial partition keys are provided.#49688
jeet1995 wants to merge 9 commits into
Azure:mainfrom
jeet1995:fix/thinclient-multihash-prefix-epk-overspan

Conversation

@jeet1995

@jeet1995 jeet1995 commented Jul 1, 2026

Copy link
Copy Markdown
Member

Fix thin-client over-span on MULTI_HASH prefix partition-key queries

Problem

Thin-client (GatewayV2, proxy :10250) requests against a hierarchical (MULTI_HASH) container using only a prefix of the key paths (e.g. PK [tenantId, userId], supply only tenantId) over-spanned to every co-located document on the physical partition, returning other tenants' documents. Full partition keys and QueryPlan requests were unaffected.

Fix (ThinClientStoreModel.java)

On the prefix case (kind == MULTI_HASH && components < paths), compute the prefix EPK range and set these RNTBD headers on the request:

  • ReadFeedKeyType = EffectivePartitionKeyRange — tells the backend to interpret the EPK tokens as a range filter; without it the pair is ignored and the read falls back to the whole partition.
  • StartEpk / EndEpk — hex EPK bounds of the prefix range; the backend's doc-level filter that trims out-of-prefix documents.
  • StartEpkHash / EndEpkHash — binary hash bounds that steer to the correct physical partition.

Core HPK tests (ThinClientQueryE2ETest)

Seed = 4 tenants × 6 users on [/tenantId, /userId]; every test is parity-checked against a Direct-mode baseline.

  • testHierarchicalPrefixHalfOpenRange — the query guard: runs SELECT * FROM c ORDER BY c.idx with a prefix PK (tenantId only) → asserts exactly 6 documents (the tenant's users); with the full key → exactly 1.
  • testHierarchicalReadAllItemsPrefixPartitionKeyreadAllItems prefix → exactly the tenant's 6 documents; a per-document assertion fails if any foreign tenant leaks in.
  • testHierarchicalReadManyByPartitionKeysPrefixPartitionKeyreadMany prefix → stays scoped to 6.
  • testHierarchicalReadAllItemsFullPartitionKey — full key → 1 document (no regression).

jeet1995 and others added 2 commits July 1, 2026 09:48
For a partial (prefix) hierarchical (MULTI_HASH) partition key, the thin-client
(Gateway V2) store model previously sent only StartEpkHash/EndEpkHash routing
headers and no doc-level EPK filter. The proxy therefore resolved the request to
the owning physical partition and returned every co-located document (an
over-span) instead of only those matching the prefix.

ThinClientStoreModel.wrapInHttpRequest now detects a prefix MULTI_HASH key and
sets READ_FEED_KEY_TYPE=EffectivePartitionKeyRange, START_EPK and END_EPK on the
request headers before RntbdRequest.from(), so RntbdRequestHeaders serializes the
prefix EPK sub-range [hash(prefix), hash(prefix) + "FF") as the backend doc-level
filter (hex string as UTF-8 bytes), mirroring the Direct/RNTBD FeedRangeEpkImpl
path. StartEpkHash/EndEpkHash routing headers are retained. QueryPlan requests
and full (non-prefix) keys are unchanged.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Ports ThinClientQueryE2ETest and its ThinClientTestBase helper. The suite
self-baselines every query against a Direct (RNTBD) client and the thin-client
(:10250 HTTP/2) path, asserting identical results. This directly guards the
MULTI_HASH prefix partition-key EPK over-span fix: prefix (single-component)
hierarchical-partition-key queries must return only the narrow matching set,
matching Direct, rather than over-spanning to all co-located documents.

Both files are runtime self-enabling (enableThinClientForTest() + builder-driven
HTTP/2 via setEnabled(true)); no module property or TestSuiteBase changes needed.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the Cosmos label Jul 1, 2026
jeet1995 and others added 5 commits July 1, 2026 14:34
The earlier fix (499f928) only scoped the prefix MULTI_HASH read when the
request carried a non-null partition key. The parallel prefix-query path
intentionally nulls the partial hierarchical partition key and instead carries
the narrow prefix effective-partition-key range on the request's feedRange, so
that guard never fired there and the thin-client read still over-spanned to the
whole physical partition.

Mark such requests (prefixPartitionKeyQuery) in
ParallelDocumentQueryExecutionContextBase and, in ThinClientStoreModel, reuse
the already-computed FeedRangeEpkImpl range as the doc-level EPK filter
(START_EPK/END_EPK + StartEpkHash/EndEpkHash) instead of recomputing
getEPKRangeForPrefixPartitionKey from a partition key we no longer have. This
mirrors the Direct/GatewayV1 decision points and honors DRY/SRP.

Also relax assertThinClientEndpointUsed so an all-QueryPlan diagnostics context
passes: in thin-client mode QueryPlan calls resolve via the classic gateway and
are the only requests permitted on a non-thin-client endpoint; any non-QueryPlan
(data) request on a non-thin-client endpoint still fails the assertion.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
…ag on clone, add readAllItems/readMany prefix HPK tests

ThinClientStoreModel.wrapInHttpRequest now sets all five EPK-specific RNTBD headers (ReadFeedKeyType, StartEpk/EndEpk, StartEpkHash/EndEpkHash) in a single block directly on the RNTBD request, replacing the split request-header-map + post-construction approach. Wire encoding is unchanged (hex-string UTF-8 bytes for StartEpk/EndEpk, decoded hash bytes for the hash headers).

RxDocumentServiceRequest.clone() now copies prefixPartitionKeyQuery so hedged/availability-strategy/retry clones keep the prefix signal and do not over-span.

ThinClientQueryE2ETest adds prefix-HPK regression coverage for readAllItems (ReadFeed) and readManyByPartitionKeys, asserting Direct-vs-thin-client parity and per-tenant scoping (no over-span).
…onContextBase

Assign isPrefixPartitionKeyQuery directly from isPartialPartitionKeyQuery(...) and guard only the PARTITION_KEY-setting branch with !isPrefixPartitionKeyQuery, removing the if/else. Behavior is unchanged: a partial (prefix) key still leaves partitionKeyInternal null so the prefix FeedRangeEpkImpl survives createDocumentServiceRequestWithFeedRange.
…cutionContextBase

Keep main's single 'if (!isPartialPartitionKeyQuery)' structure; only capture the predicate in isPrefixPartitionKeyQuery and pass it to request.setPrefixPartitionKeyQuery. Removes the if/else.
@jeet1995 jeet1995 marked this pull request as ready for review July 1, 2026 21:49
@jeet1995 jeet1995 requested review from a team and kirankumarkolli as code owners July 1, 2026 21:49
Copilot AI review requested due to automatic review settings July 1, 2026 21:49
@jeet1995 jeet1995 requested a review from a team as a code owner July 1, 2026 21:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a thin-client (Gateway V2 / :10250 proxy) routing bug where prefix hierarchical (MULTI_HASH) partition keys could over-span to all co-located documents on the physical partition, potentially returning documents from other logical partitions. The change scopes these requests by sending the correct RNTBD EPK-range headers and adds E2E coverage to prevent regression.

Changes:

  • Compute the prefix effective-partition-key (EPK) range for MULTI_HASH prefix keys and apply it via RNTBD headers (ReadFeedKeyType=EffectivePartitionKeyRange, Start/EndEpk, Start/EndEpkHash) in ThinClientStoreModel.
  • Introduce a request flag (prefixPartitionKeyQuery) to preserve prefix-query intent through the parallel query initialization path when the partition key is intentionally nulled.
  • Add thin-client E2E tests validating parity with Direct mode for hierarchical prefix queries, plus supporting test infrastructure and changelog entry.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ThinClientStoreModel.java Detect MULTI_HASH prefix PK requests and set EPK-range RNTBD headers to prevent over-span.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentServiceRequest.java Add prefixPartitionKeyQuery flag with accessors + clone propagation.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/ParallelDocumentQueryExecutionContextBase.java Set prefixPartitionKeyQuery when a partial hierarchical key is detected and PK header is suppressed.
sdk/cosmos/azure-cosmos/CHANGELOG.md Document the thin-client prefix HPK over-span fix.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/ThinClientTestBase.java New shared thin-client test base + endpoint-routing assertions.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/ThinClientQueryE2ETest.java New Direct-vs-thin-client query E2E suite including hierarchical prefix regression tests.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/TestSuiteBase.java Update thin-client endpoint assertion behavior (noting QueryPlan can use classic gateway).

In ThinClientStoreModel.wrapInHttpRequest, check the cached prefix feed-range path (isPrefixPartitionKeyQuery + FeedRangeEpkImpl) first and fall back to recomputing from the partition key only for the PK-bearing prefix case. Behavior is unchanged (the two cases are mutually exclusive).
@jeet1995 jeet1995 changed the title [Cosmos] Fix thin-client MULTI_HASH prefix partition key EPK over-span [GatewayV2] Fix over-spanning issue when partial partition keys are provided. Jul 1, 2026
… assertion

TestSuiteBase.assertThinClientEndpointUsed now validates every request instead of early-returning on the first thin-client match, so a mixed scenario (some data requests via the classic gateway) fails; QueryPlan requests remain exempt and null endpoints are handled. ThinClientTestBase.assertThinClientEndpointUsed delegates to the shared TestSuiteBase implementation, removing duplicated logic that could NPE on a null endpoint and reject valid QueryPlan-via-gateway scenarios.
@jeet1995

jeet1995 commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

@sdkReviewAgent

@tvaron3 tvaron3 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// as the RNTBD EPK headers in the consolidated block after RntbdRequest.from() below.
Range<String> prefixEpkRange = null;
if (request.getOperationType() != OperationType.QueryPlan) {
if (request.isPrefixPartitionKeyQuery() && request.getFeedRange() instanceof FeedRangeEpkImpl) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getFeedrange is of different type - should we convert it to FeedRangeEpkImpl here?

@FabianMeiswinkel FabianMeiswinkel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one possible gap - if the conversion to FeedRangeEpk has happened always before already at least add a comment along those lines - if not, then I think the conversion would have to happen here.

* <li>Result set contents match (document IDs in order for ordered queries, set equality for unordered).</li>
* </ol>
*/
public class ThinClientQueryE2ETest extends TestSuiteBase {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: I think we should have some HPK related tests for gateway v1 and direct mode, why the need for creating new tests case vs also enable those for gateway v2?

// StartEpkHash/EndEpkHash (the decoded hash bytes) additionally steer the proxy to the owning
// physical partition(s). Without this filter the proxy resolves the request to the owning
// physical partition and returns every co-located document (an over-span).
rntbdRequest.setHeaderValue(RntbdConstants.RntbdRequestHeader.ReadFeedKeyType,

@xinlian12 xinlian12 Jul 2, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: how these headers are populated in gateway v1? at some point upstream, we should already populate startEpk & endEpk? why not based on those headers existence to decide which header need to populate, similar idea as how direct mode populate the headers?

and also why we have to calculate whether this is a prefix here only for thin client

// request's feedRange (DocumentQueryExecutionContextFactory#resolveFeedRangeBasedOnPrefixContainer).
// Reuse that cached range directly instead of recomputing it from a partition key we no
// longer have.
prefixEpkRange = ((FeedRangeEpkImpl) request.getFeedRange()).getRange();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking · Correctness · Prefix over-span reintroduced after partition split

prefixEpkRange = ((FeedRangeEpkImpl) request.getFeedRange()).getRange();

The parallel-prefix branch here assumes request.getFeedRange() still carries the narrow prefix EPK range that DocumentQueryExecutionContextFactory.resolveFeedRangeBasedOnPrefixContainer produced. That is true on the initial dispatch, but it breaks on split:

  1. Parent producer has feedRange = FeedRangeEpkImpl(prefixEpkRange) (narrow).
  2. Backend returns GONE; DocumentProducer.produceOnFeedRangeGone calls getReplacementRanges(feedRange.getRange()), which returns the physical partition ranges overlapping the prefix.
  3. When multiple physicals overlap, createReplacingDocumentProducersOnSplit creates each child with new FeedRangeEpkImpl(targetRange.toRange()) — the full physical partition range, not the intersection with the parent prefix (DocumentProducer.java:325).
  4. On the child, createRequestFunc still evaluates isPartialPartitionKeyQuery(...) = true on the same query options, so request.setPrefixPartitionKeyQuery(true).
  5. In this branch we then read feedRange.getRange() and emit StartEpk/EndEpk/StartEpkHash/EndEpkHash = the entire physical partition range as the "prefix" doc filter.

Net result: after a real split (or a stale-routing-map GONE+refresh), the exact over-span this PR is fixing returns — silently, with no exception, no diagnostic log. Because partitionKeyInternal was intentionally nulled upstream, there is no fallback tenant filter either.

Suggested action: Either

  • On the split child, intersect the parent's prefix range with the child physical range before wrapping in FeedRangeEpkImpl (child gets a genuinely narrower EPK range), or
  • In createChildDocumentProducerOnSplit, clear the prefix flag on split-child requests and fall back to the point path (accept a small RU hit but preserve correctness), or
  • Recompute the prefix EPK range from cosmosQueryRequestOptions.getPartitionKey() + partitionKeyDefinition here when isPrefixPartitionKeyQuery() is true, and only use feedRange.getRange() if it is provably ⊆ that recomputed range.

Worth adding a fault-injection test that forces GONE on a prefix query and asserts no foreign-tenant docs come back.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

// as the RNTBD EPK headers in the consolidated block after RntbdRequest.from() below.
Range<String> prefixEpkRange = null;
if (request.getOperationType() != OperationType.QueryPlan) {
if (request.isPrefixPartitionKeyQuery() && request.getFeedRange() instanceof FeedRangeEpkImpl) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Correctness · Silent fallback when the invariant breaks [Partially covered by @FabianMeiswinkel]

Building on Fabian's question just above ("If getFeedrange is of different type — should we convert it to FeedRangeEpkImpl here?"): converting isn't quite the answer, because the underlying Range<String> might still be the wrong range. The deeper issue is that today, if isPrefixPartitionKeyQuery() == true but the feedRange is not a FeedRangeEpkImpl or does not actually contain the prefix range, we fall through — and on the parallel path partitionKey == null (intentionally nulled upstream), so the PK-bearing branch at line 262 also can't fire. prefixEpkRange stays null and execution lands in the else at line 313, which emits StartEpkHash/EndEpkHash from the resolved physical partition range: the exact over-span this PR is fixing, silently, with no test signal.

Concrete scenarios where this fires today:

  1. After a splitDocumentProducer.createChildDocumentProducerOnSplit (DocumentProducer.java:325) wraps the physical partition range in FeedRangeEpkImpl while isPrefixPartitionKeyQuery stays true. The instanceof check passes, but the range is no longer a prefix. See the separate blocking comment on line 261.
  2. Future refactor — any wrapping/decorating of feed ranges (versioned FeedRange, composed feed range) would silently break this branch.

Suggested change: fail loud if the invariant is violated, rather than silently coercing.

if (request.isPrefixPartitionKeyQuery()) {
    FeedRangeInternal fr = request.getFeedRange();
    if (!(fr instanceof FeedRangeEpkImpl)) {
        throw new IllegalStateException(
            "prefixPartitionKeyQuery=true requires FeedRangeEpkImpl on the request, got: " + fr);
    }
    prefixEpkRange = ((FeedRangeEpkImpl) fr).getRange();
}

Also worth guarding against PartitionKey.NONE on the PK-bearing branch: PartitionKeyInternal.None.getComponents() is null. The caller in ParallelDocumentQueryExecutionContextBase guards != PartitionKey.NONE, but DocumentQueryExecutionContextFactory#resolveFeedRangeBasedOnPrefixContainer does not, so a NONE HPK query would NPE inside getEPKRangeForPrefixPartitionKey. Cheap defensive check while you're here.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

* matching Direct rather than over-spanning the physical partition.
*/
@Test(groups = {"thinclient"}, timeOut = TIMEOUT)
public void testHierarchicalReadManyByPartitionKeysPrefixPartitionKey() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Test Coverage · This test does not actually exercise the ThinClientStoreModel prefix-header fix

The test name and its Javadoc ("groupPartitionKeysByPhysicalPartition expands the prefix to its effective-partition-key range; the thin client must apply that as a doc-level filter") suggest this exercises the new StartEpk/EndEpk RNTBD path — but that path is never hit for readManyByPartitionKeys:

  • RxDocumentClientImpl.buildSequentialFluxFromScratch (RxDocumentClientImpl.java:4967) sets batchQueryOptions.setFeedRange(new FeedRangeEpkImpl(bd.partitionScope)) — the physical partition range, not batchFilter (the narrow prefix-derived batch range).
  • It does not set PartitionKey on batchQueryOptions. So downstream ParallelDocumentQueryExecutionContextBase.initialize sees cosmosQueryRequestOptions.getPartitionKey() == null, isPartialPartitionKeyQuery never runs, and request.setPrefixPartitionKeyQuery(false) is what ends up on the wire.
  • In ThinClientStoreModel.wrapInHttpRequest, neither the parallel-prefix branch (line 254, needs isPrefixPartitionKeyQuery()) nor the PK-bearing prefix branch (line 262, needs partitionKey != null) fires. The request falls through to the physical-range else at line 313.

Tenant scoping in readMany is instead enforced by the generated SQL predicate WHERE c.<hpkPath> = @... in ReadManyByPartitionKeyQueryHelper.createReadManyByPkQuerySpec. Revert the whole ThinClientStoreModel change and this test still passes — it's a routing + SQL-rewriting regression test, not a fix-guard.

Suggested action: Either rename/scope the test to be honest about what it actually validates, or add a diagnostics-level assertion that the x-ms-read-key-type=EffectivePartitionKeyRange header / StartEpk+EndEpk RNTBD tokens were emitted for the batch request (something that would flip if ThinClientStoreModel regressed). The testHierarchicalReadAllItemsPrefixPartitionKey / testHierarchicalPrefixHalfOpenRange tests do genuinely exercise the fix — this one gives false confidence.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

String[] categories = {"electronics", "books", "clothing", "toys"};
int idx = 0;
for (int t = 0; t < HIER_TENANTS; t++) {
String tenantId = "tenant-" + t + "-" + UUID.randomUUID().toString().substring(0, 8);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Test Coverage · Regression is not deterministically reproducible from this fixture

The three prefix regression tests (testHierarchicalPrefixHalfOpenRange, testHierarchicalReadAllItemsPrefixPartitionKey, testHierarchicalReadManyByPartitionKeysPrefixPartitionKey) all probe only hierarchicalKeys.get(0) — tenant-0 — and the per-doc tenant assertion ("returned a foreign tenant's document") only trips if some foreign tenant is co-located with tenant-0 on the same physical partition.

With this fixture:

  • Tenant IDs are "tenant-" + t + "-" + UUID.randomUUID() — hash placement is effectively random per run.
  • 4 tenants × 6 users = 24 docs on a 12,000 RU/s container (≈ 2 physical partitions).
  • With random assignment of 4 tenants across ~2 physicals, there is a non-trivial probability (roughly (1/2)^3 = 12.5% in the worst case) that tenant-0 is the only tenant on its physical partition. In that case, an over-spanning physical-partition read still returns exactly the 6 tenant-0 docs, and every assertion (count == 6, IDs match Direct, tenant == tenant-0) passes whether or not the fix is present.

That means on some fraction of CI runs the whole regression guard is inert. This weakens the "removing the fix must break CI" contract that these tests are supposed to enforce.

Suggested action (any of):

  1. Iterate all tenants in the assertion loop (for (int t = 0; t < HIER_TENANTS; ...)) — probability tenant-0 and every other tenant is alone drops to near zero.
  2. Bump HIER_TENANTS to ~16 so pigeonhole guarantees co-location per physical partition.
  3. After seeding, use Direct diagnostics (distinctPartitionKeyRangesContacted / activity from queryItems("SELECT c.<tenant> FROM c", crossPartition)) to bucket tenants by physical PKRangeId, pick a tenant proven to share a range with another, and run the assertions against that tenant. Fail-fast the test if no co-located pair exists (setup problem, not a product bug).
  4. Add a nonexistent-tenant prefix negative test — readAllItems with a made-up tenantId prefix should return 0 docs; without the fix it would return whatever co-located tenant's docs are in the resolved physical partition.

Option 4 is arguably the cleanest: it makes co-location irrelevant and directly exposes over-span.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

// The prefix's effective-partition-key sub-range [hash(prefix), hash(prefix) + "FF") is applied
// as the RNTBD EPK headers in the consolidated block after RntbdRequest.from() below.
Range<String> prefixEpkRange = null;
if (request.getOperationType() != OperationType.QueryPlan) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Observation · Cross-SDK · Peer .NET thin-client likely has the same over-span bug

Not actionable for this PR, but worth flagging so the team can decide whether to file a peer issue on Azure/azure-cosmos-dotnet-v3: the .NET thin-client path appears to have the same defect this PR is fixing here.

  • ThinClientStoreClient.PrepareRequestForProxyAsync sets ProxyStartEpk/ProxyEndEpk from request.RequestContext?.ResolvedPartitionKeyRange.MinInclusive/.MaxExclusive — i.e. the resolved physical partition range.
  • ThinClientTransportSerializer.SerializeProxyRequestAsync then unconditionally propagates those values into HttpConstants.HttpHeaders.StartEpk / EndEpk (and WFConstants.BackendHeaders.StartEpkHash/EndEpkHash), with no branch that narrows to a prefix EPK range for partial MULTI_HASH keys. The upstream RequestInvokerHandler.ResolveFeedRangeBasedOnPrefixContainerAsync correctly computes a prefix FeedRangeEpk, but that narrower range is discarded by the time the RNTBD body is serialized.
  • .NET has no CosmosMultiHashTest variant that combines thin-client + prefix HPK (checked ThinClientTransportSerializerTests.cs and CosmosItemThinClientTests.cs).

Python and Rust are architecturally unaffected — neither serializes RNTBD tokens on the client; they send x-ms-start-epk / x-ms-end-epk as plain HTTP headers and the (correctly narrow) prefix range flows through.

If confirmed, this Java PR is a good reference implementation for the .NET-side fix.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12

Copy link
Copy Markdown
Member

Review complete (57:48)

Posted 5 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants