Skip to content

perf(cosmos): strip unused fields from partition key range cache to reduce memory#46297

Draft
tvaron3 wants to merge 2 commits intoAzure:mainfrom
tvaron3:fix/strip-pk-range-fields
Draft

perf(cosmos): strip unused fields from partition key range cache to reduce memory#46297
tvaron3 wants to merge 2 commits intoAzure:mainfrom
tvaron3:fix/strip-pk-range-fields

Conversation

@tvaron3
Copy link
Copy Markdown
Member

@tvaron3 tvaron3 commented Apr 14, 2026

Summary

Three optimizations to reduce CollectionRoutingMap memory footprint when PPCB (Per Partition Circuit Breaker) is enabled. Each CosmosClient maintains its own routing map cache containing all partition key ranges. For accounts with many partitions and many client instances, this dominates memory usage.

Changes

1. Strip unused fields → compact PKRange namedtuple

_routing/aio/routing_map_provider.py + _routing/routing_map_provider.py

The service returns 13 fields per partition key range, but CollectionRoutingMap only uses 4 (id, minInclusive, maxExclusive, parents). After fetching, we now convert to a PKRange namedtuple that supports dict-style [key] access for backward compatibility.

Dropped fields: _rid, _etag, ridPrefix, _self, throughputFraction, status, ownedArchivalPKRangeIds, _ts, lsn

2. Add __slots__ to Range class

_routing/routing_range.py

Range objects store 4 instance attributes (min, max, isMinInclusive, isMaxInclusive). Adding __slots__ eliminates the per-instance __dict__, saving ~100 bytes per Range object. With 100 partitions x 150 clients = 15K Range objects.

3. Skip redundant .upper() on hex strings

_routing/routing_range.py

Range.__init__ calls .upper() unconditionally on min/max strings. The Cosmos service returns uppercase hex (e.g. 10F0F0F0...). We now check first and skip the copy when already uppercase.

Memory Profiling Results

Test setup:

  • Account: ~100 physical partitions, 2 regions (East US 2 + West US 3), multi-write
  • VM: Standard_D16s_v5
  • Tool: tracemalloc (retained memory)
  • Operations per client: 1 read_item + 1 upsert_item
  • PPCB: AZURE_COSMOS_ENABLE_CIRCUIT_BREAKER=True

Current Memory (MB)

Clients 4.15.0 Original Strip Only All 3 Patches PPCB=false
1 14.3 14.3 14.3 14.0
25 23.0 20.5 20.0 17.9
50 31.9 27.4 25.8 21.7
100 44.9 39.9 36.6 29.4
150 63.8 52.9 43.3 36.4

PPCB Overhead Reduction

Clients Original Strip Only All 3 Patches Reduction
25 5.1 MB 2.6 MB 2.1 MB -58%
50 10.3 MB 5.7 MB 4.1 MB -60%
100 15.4 MB 10.5 MB 7.2 MB -53%
150 27.4 MB 16.5 MB 6.9 MB -74%

Reproduction Script

import asyncio, os, tracemalloc
tracemalloc.start()
os.environ["AZURE_COSMOS_ENABLE_CIRCUIT_BREAKER"] = "True"

from azure.cosmos.aio import CosmosClient

N = int(os.environ.get("NUM_CLIENTS", "150"))

async def main():
    clients = []
    for i in range(N):
        c = CosmosClient(os.environ["COSMOS_URI"], os.environ["COSMOS_KEY"],
                         preferred_locations=["East US 2"])
        db = c.get_database_client("mydb")
        cont = db.get_container_client("mycont")
        try:
            await cont.read_item("x", partition_key="x")
        except Exception:
            pass
        clients.append(c)
    curr, peak = tracemalloc.get_traced_memory()
    print(f"{N} clients: {curr/1024/1024:.1f} MB current, {peak/1024/1024:.1f} MB peak")

asyncio.run(main())

Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tvaron3
I am curious, are there other places where we build the collection routing map? Shall we fix those as well?

@@ -39,6 +64,8 @@ class PartitionKeyRange(object):
class Range(object):
"""description of class"""

__slots__ = ('min', 'max', 'isMinInclusive', 'isMaxInclusive')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

Copy link
Copy Markdown
Member Author

@tvaron3 tvaron3 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__slots__ tells Python to store instance attributes in a fixed-size array instead of a per-instance __dict__ dictionary. Only thing we should watch out for is that we will get an error if we try to add a new field to this object at runtime, but we don't do this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please add a comment above to explain this, thanks!

Copy link
Copy Markdown
Member

@jeet1995 jeet1995 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the __slots__ approach in _PartitionHealthInfo? I do not expect the attributes here to get added/removed dynamically?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I will add to PartitionHealthInfo as well and add a comment explaining

@tvaron3 tvaron3 force-pushed the fix/strip-pk-range-fields branch 2 times, most recently from 6b801a2 to 378f07e Compare April 14, 2026 06:19
for parentId in parents:
parentIds.add(parentId)
return (
PKRange(id=r[routing_range.PartitionKeyRange.Id],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be helpful to understand if the PKRange reference can be used as-is in _GlobalPartitionEndpointManagerForCircuitBreaker and _GlobalPartitionEndpointManagerForPerPartitionAutomaticFailoverAsync.

@tvaron3
Copy link
Copy Markdown
Member Author

tvaron3 commented Apr 14, 2026

Superseded by shared cache approach.

1. Share CollectionRoutingMap cache across clients per endpoint.
   Eliminates N-1 redundant copies when N clients target the same account.
2. Add __slots__ to Range class (64 bytes vs ~250 bytes per instance).
3. Skip .upper() when string is already uppercase.

PPCB overhead (150 clients, tracemalloc):
  Original: 27.4 MB -> Patched: ~0 MB (-100%)
  At customer scale (200K partitions x 152 clients): ~2.1 GB -> ~14 MB

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tvaron3 tvaron3 force-pushed the fix/strip-pk-range-fields branch from 128d459 to 8b03fa2 Compare April 14, 2026 18:40
…storage

Convert raw service response dicts to PKRange namedtuples in both
full refresh (_build_routing_map_from_ranges) and incremental update
(process_fetched_ranges) paths. PKRange retains only 4 fields (id,
minInclusive, maxExclusive, parents) and supports dict-style access
for backward compatibility.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants