Skip to content

feat(experiments): Filter experiments list by a rollup metric#453

Open
shanaiabuggy wants to merge 3 commits into
mainfrom
sbuggy/ase-321
Open

feat(experiments): Filter experiments list by a rollup metric#453
shanaiabuggy wants to merge 3 commits into
mainfrom
sbuggy/ase-321

Conversation

@shanaiabuggy

@shanaiabuggy shanaiabuggy commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What

Adds metric filtering to the experiments list — filter by rollup metrics (cost, latency, evaluator scores, run count), not just entity fields.

How

Reuses the platform's standard filter[field][op] bracket syntax, so it combines naturally with existing entity filters:

?filter[cost_usd.mean][$lte]=0.5&filter[run_count][$gte]=1&filter[experiment_group_id]=

  • Supported paths mirror the sort grammar: run_count, cost_usd.<stat>, latency_ms.<stat>, evaluators.<name>.<stat> (stat ∈ mean/median/p90/p95/p99/sum/count). Operators: $gte/$lte/$gt/$lt/$eq.
  • Metrics live in ClickHouse, not Postgres, so list_experiments splits the filter tree: entity predicates go to the entity store; metric predicates are applied in-app after rollup hydration (compute-on-read, same plumbing as metric sort). Declared via self-mapping namespaces on ExperimentFilter so paths pass validation untranslated.
  • Added a NumberFilter range type ($gte/$lte/$gt/$lt/$eq) alongside DatetimeFilter/StringFilter.

Behavior

  • Metric filters must be AND-combined with entity filters (nested ANDs flatten); a metric under OR/NOT → 400 (can't split a boolean tree across two stores).
  • 400 unsupported metric/stat/operator; 413 result set over the in-memory bound; 503 when ClickHouse is unavailable for a metric filter. Missing metric never matches.

Tests

Unit tests for the split/validate/match helpers + endpoint wiring (validation, 400/503), and an integration test combining entity + metric filters end to end against ClickHouse. OpenAPI specs regenerated.

Summary by CodeRabbit

  • New Features

    • Experiment list filtering now supports numeric comparisons on metric rollups, including run_count, cost_usd, latency_ms, and evaluator metrics.
    • Added support for comparison operators: $eq, $gt, $gte, $lt, $lte (e.g., range queries like cost_usd.mean <= 0.50).
  • Bug Fixes

    • Improved error messaging for unsupported sort or filter fields.
    • Metric-based sort/filter now returns 503 when telemetry data is unavailable.
  • Documentation

    • Updated the OpenAPI contract to describe the new metric rollup filter syntax and updated response text.
  • Tests

    • Added integration and unit test coverage for metric filtering and error cases.

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
@shanaiabuggy shanaiabuggy requested review from a team as code owners June 25, 2026 00:17
@github-actions github-actions Bot added the feat label Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6e8d96e4-7659-4ba5-abc5-e8fab6522eaf

📥 Commits

Reviewing files that changed from the base of the PR and between a965d84 and 139fc9e.

⛔ Files ignored due to path filters (10)
  • sdk/python/nemo-platform/.nmpcontext/openapi.yaml is excluded by !sdk/**
  • sdk/python/nemo-platform/.nmpcontext/stainless.yaml is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/experiments/api.md is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/resources/experiments/experiments.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/experiments/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/experiments/experiment_filter_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/experiments/experiment_list_params.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/experiments/number_filter_param.py is excluded by !sdk/**
  • sdk/python/nemo-platform/tests/api_resources/test_experiments.py is excluded by !sdk/**
  • sdk/stainless.yaml is excluded by !sdk/**
📒 Files selected for processing (7)
  • openapi/ga/individual/platform.openapi.yaml
  • openapi/ga/openapi.yaml
  • openapi/openapi.yaml
  • packages/nmp_common/src/nmp/common/entities/values.py
  • services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
  • services/intake/src/nmp/intake/api/v2/experiments/schemas.py
  • services/intake/tests/test_experiment_metric_filter.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/nmp_common/src/nmp/common/entities/values.py
  • services/intake/tests/test_experiment_metric_filter.py
  • services/intake/src/nmp/intake/api/v2/experiments/schemas.py
  • openapi/ga/openapi.yaml
  • services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
  • openapi/ga/individual/platform.openapi.yaml
  • openapi/openapi.yaml

📝 Walkthrough

Walkthrough

Adds numeric rollup filtering to the experiments list API. OpenAPI, shared filter types, endpoint handling, and tests now cover run_count, cost_usd, latency_ms, and evaluators comparisons.

Changes

Metric Rollup Filters

Layer / File(s) Summary
Endpoint docs and error text
openapi/openapi.yaml, openapi/ga/openapi.yaml, openapi/ga/individual/platform.openapi.yaml
filter docs add rollup-metric examples; 400 and 503 text now mention unsupported sort/filter fields and metric-based sort/filter.
Filter contract and schema shapes
openapi/openapi.yaml, openapi/ga/openapi.yaml, openapi/ga/individual/platform.openapi.yaml, packages/nmp_common/src/nmp/common/entities/values.py, services/intake/src/nmp/intake/api/v2/experiments/schemas.py
ExperimentFilter gains rollup-metric fields, and NumberFilter adds $gte, $lte, $gt, $lt, and $eq operators.
Metric filter handling in list_experiments
services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
list_experiments splits entity filters from metric predicates, validates metric paths and numeric operators, hydrates rollups, and applies metric predicates before sorting and pagination.
Metric filter tests
services/intake/tests/test_experiment_metric_filter.py, services/intake/tests/integration/spans/test_experiment_metric_sort.py
Unit tests cover filter extraction and validation, and integration tests cover combined metric filtering in the experiments list response.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ExperimentListEndpoint
  participant EntityStore
  participant ClickHouseTelemetryStore
  Client->>ExperimentListEndpoint: GET /experiments with metric filters
  ExperimentListEndpoint->>ExperimentListEndpoint: split entity filters and metric predicates
  ExperimentListEndpoint->>EntityStore: query entity operation
  EntityStore-->>ExperimentListEndpoint: experiments
  ExperimentListEndpoint->>ClickHouseTelemetryStore: hydrate rollups for metric fields
  ClickHouseTelemetryStore-->>ExperimentListEndpoint: rollup values
  ExperimentListEndpoint->>ExperimentListEndpoint: apply metric predicates, sort, paginate
  ExperimentListEndpoint-->>Client: filtered page
Loading

Possibly related PRs

Suggested reviewers

  • BrianNewsom
  • callingmedic911
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title matches the main change: adding rollup-metric filtering to the experiments list endpoint.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sbuggy/ase-321

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
packages/nmp_common/src/nmp/common/entities/values.py (1)

274-310: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

NumberFilter overlaps FloatFilter.

FloatFilter already provides $gte/$lte; NumberFilter adds $gt/$lt/$eq. Consider folding the extra operators into FloatFilter (or deriving one from the other) to avoid two near-identical numeric filters drifting apart.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nmp_common/src/nmp/common/entities/values.py` around lines 274 -
310, `NumberFilter` duplicates most of `FloatFilter` and risks the two numeric
filter models drifting apart. Refactor the filter types so there is a single
source of truth for numeric comparisons, either by moving `$gt`/`$lt`/`$eq` into
`FloatFilter` or by making `NumberFilter` inherit/compose from `FloatFilter`;
update the `NumberFilter` and `FloatFilter` definitions in `values.py` so their
shared behavior lives in one place and their aliases/config stay consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openapi/ga/individual/platform.openapi.yaml`:
- Around line 3738-3740: The filter examples in the OpenAPI docs use unprefixed
operators that do not match the schema. Update the example text in the
`NumberFilter`/rollup metric descriptions so the keys match the defined query
shape (`$gte`, `$lte`, `$gt`, `$lt`, `$eq`) everywhere this example appears,
including the related `NumberFilter` documentation block. Keep the surrounding
example values the same, but ensure the operator names in the docs are
consistent with the schema.
- Around line 14255-14279: The NumberFilter schema currently allows empty
objects because it lacks a minimum property constraint. Update the NumberFilter
definition in the openapi schema to require at least one predicate by adding
minProperties: 1 alongside the existing properties and additionalProperties:
false, so the schema still accepts $gte, $lte, $gt, $lt, or $eq but rejects {}.

In `@openapi/ga/openapi.yaml`:
- Around line 3734-3740: Update the filter documentation text in the OpenAPI
spec so the numeric range examples use the same $-prefixed operator keys defined
by the schema. In the affected description near the experiments filter section,
change the examples for run_count, cost_usd.mean, latency_ms.p95, and
evaluators.<name>.mean to use $gte/$lte/$gt/$lt/$eq consistently. Apply the same
wording cleanup anywhere the duplicated filter description appears so the
examples match the actual supported operators and do not point clients to
invalid keys.
- Around line 14255-14279: NumberFilter currently allows an empty object, so
update the NumberFilter schema in openapi/ga/openapi.yaml to require at least
one predicate operator. Add minProperties: 1 alongside the existing properties
definition so validation rejects {} while still allowing $gte, $lte, $gt, $lt,
or $eq. Use the NumberFilter schema block to locate the change.

In `@services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`:
- Around line 992-1016: The metric filter validation in the LogicalOperation
handling is rejecting nested AND groups because
`_operation_references_metric(child)` treats a child AND containing metrics as
invalid, even though the parent combinator is already AND. Update the logic
around `LogicalOperation`, `_operation_references_metric`, and
`_validated_metric_predicate` to either flatten nested ANDs before validation or
explicitly recurse through AND children so metric comparisons inside sub-ANDs
are accepted; if nested ANDs remain unsupported, adjust the HTTPException detail
to clearly state that only flat metric comparisons are allowed.

---

Nitpick comments:
In `@packages/nmp_common/src/nmp/common/entities/values.py`:
- Around line 274-310: `NumberFilter` duplicates most of `FloatFilter` and risks
the two numeric filter models drifting apart. Refactor the filter types so there
is a single source of truth for numeric comparisons, either by moving
`$gt`/`$lt`/`$eq` into `FloatFilter` or by making `NumberFilter` inherit/compose
from `FloatFilter`; update the `NumberFilter` and `FloatFilter` definitions in
`values.py` so their shared behavior lives in one place and their aliases/config
stay consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6359daae-f76d-4371-af1a-b831e0bf4a36

📥 Commits

Reviewing files that changed from the base of the PR and between b4473e8 and a965d84.

⛔ Files ignored due to path filters (1)
  • web/packages/sdk/generated/agents/schema/DeploymentLogsResponse.ts is excluded by !**/generated/**
📒 Files selected for processing (8)
  • openapi/ga/individual/platform.openapi.yaml
  • openapi/ga/openapi.yaml
  • openapi/openapi.yaml
  • packages/nmp_common/src/nmp/common/entities/values.py
  • services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
  • services/intake/src/nmp/intake/api/v2/experiments/schemas.py
  • services/intake/tests/integration/spans/test_experiment_metric_sort.py
  • services/intake/tests/test_experiment_metric_filter.py

Comment thread openapi/ga/individual/platform.openapi.yaml Outdated
Comment thread openapi/ga/individual/platform.openapi.yaml
Comment thread openapi/ga/openapi.yaml Outdated
Comment thread openapi/ga/openapi.yaml
Comment thread services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 20917/27485 76.1% 61.2%
Integration Tests 12123/26254 46.2% 19.6%

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant