Skip to content

[feat] Meter sandbox compute and storage usage (no billing)#5039

Draft
junaway wants to merge 11 commits into
feat/add-sandbox-meteringfrom
feat/metering-track-b
Draft

[feat] Meter sandbox compute and storage usage (no billing)#5039
junaway wants to merge 11 commits into
feat/add-sandbox-meteringfrom
feat/metering-track-b

Conversation

@junaway

@junaway junaway commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Context

Track B of the metering rework adds the new sandbox metering as pure measurement. It records how much compute and storage each org uses, and records nothing to Stripe. Billing comes later in Track C. Base is Track A (feat/add-sandbox-metering).

Records is deliberately not here. RECORDS_INGESTED already exists on big-agents, so this branch does not touch it.

Changes

Sandbox usage from E2B and Daytona now flows into meters. A new sandboxes domain (core/sandboxes/ + apis/fastapi/sandboxes/) receives it two ways: an E2B webhook (leader-generated secret, HMAC-verified, self-registered) and a Daytona poll. Both feed one record_usage() sink at org scope.

The raw compute is metered per resource, per second, with an explicit unit token so the key reads unambiguously:

SANDBOX_CPU_CORE_SECONDS   (core-seconds)
SANDBOX_RAM_GIBI_SECONDS   (GiB-seconds)
SANDBOX_SSD_GIBI_SECONDS   (GiB-seconds)
SANDBOX_GPU_CORE_SECONDS   (core-seconds)

The full scheme is SANDBOX_<RESOURCE>_<UNIT>_SECONDS (see docs/designs/sandbox-metering/NAMING.md).

Storage is a gauge, Gauge.STORAGE_BYTES, with per-plan caps. Its reconcile job reads the object store through the existing env.store config (the SeaweedFS ObjectStore the mounts already use), not a new storage config.

Each meter gets a non-blocking Quota(period=MONTHLY) on every plan. REPORTS is unchanged, so none of this is sent to Stripe yet. Migration ee0000000004 appends the sandbox and storage values to the meters_type enum (down_revision = ee0000000003).

Tests / notes

  • ruff format and ruff check are clean. All new modules import, including the full ee.src.main composition root.
  • No billing wiring by design. Track C (credits + gating) stacks on top of this branch and is the layer that adds REPORTS, pricing, and gating.
  • Base this PR on Track A so the diff shows only Track B.

jp-agenta added 6 commits July 2, 2026 15:29
Add measurement-only sandbox compute meters (vCPU/vmem/disk/GPU-seconds)
fed by E2B webhook + Daytona poll providers, under a new `sandboxes`
domain (core/apis). No Stripe reporting — REPORTS is untouched.

- ee/src/core/sandboxes/: SandboxMeteringService (record_usage via
  non-blocking check_entitlements, E2B webhook secret + HMAC verify +
  registration, Daytona poll+lock), DTOs, domain exceptions.
- ee/src/apis/fastapi/sandboxes/: public E2B webhook receiver + admin
  Daytona-poll-trigger route.
- entitlements/types.py: Counter.SANDBOX_{VCPU_CORE,VMEM_GIBI,
  DISK_GIBI,VGPU_CORE}_SECONDS, non-blocking monthly Quota per plan,
  READ_ONLY constraint membership.
- meters/types.py: mirror the 4 sandbox counters into Meters.
- env.py: E2BConfig, DaytonaConfig gains analytics_url/organization_id
  + enabled properties.
- main.py / entrypoints/routers.py: wire SandboxMeteringService/Router,
  best-effort E2B webhook registration at startup.
…U}_SECONDS

Final naming decision supersedes the previous *_CORE/_GIBI variant:
plain 3-letter resource tokens, no unit token. Applies to the Counter
enum, the Meters mirror, DEFAULT_ENTITLEMENTS quotas, CONSTRAINTS, and
the sandboxes service meter-delta mapping. Also adds the deferred
meters_type enum migration (ee0000000004, down_revision=ee0000000003)
appending the 4 sandbox counters + storage_bytes.
Wire the storage-size gauge (Gauge.STORAGE_BYTES) to the existing
shared object store config (env.store / StoreConfig) instead of a
duplicate config surface:

- storage/adapters.py: get_org_storage_bytes() now sums via the
  existing ObjectStore.list_objects_v2 (miniopy-async, same S3-
  compatible client mounts already use) against env.store, replacing
  the ad-hoc boto3/httpx per-provider implementation.
- storage/reconcile.py: gate on env.store.reconcile_enabled /
  env.store.enabled instead of a separate agenta.storage.* namespace.
- storage/types.py: drop the now-dead StorageProvider enum (provider
  selection lives in ObjectStore.is_seaweedfs via env.store.signing_key).
- env.py: add StoreConfig.reconcile_enabled (AGENTA_STORE_RECONCILE_ENABLED).
  No duplicate StorageConfig class.
- subscriptions interfaces/dao: add list_active() so the reconcile job
  can iterate active orgs.
- billing/router.py: admin endpoints POST /admin/billing/storage/reconcile
  and .../storage/reconcile/unlock, mirroring the existing usage/report
  lock pattern.
…, migration

Records what was consolidated from sandbox-metering-phase-1/-4 into
feat/metering-track-b, the sandbox_metering->sandboxes rename, the
final SANDBOX_{CPU,RAM,SSD,GPU}_SECONDS meter naming, how the storage
gauge wires to env.store instead of a duplicate config, and what was
deliberately left out (records, REPORTS/billing, credits).
SANDBOX_{CPU,GPU}_SECONDS -> _CORE_SECONDS; SANDBOX_{RAM,SSD}_SECONDS -> _GIBI_SECONDS.
Scheme is SANDBOX_<RESOURCE>_<UNIT>_SECONDS.
Copilot AI review requested due to automatic review settings July 2, 2026 14:42
@vercel

vercel Bot commented Jul 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jul 2, 2026 8:59pm

Request Review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1972b59a-be0f-4d30-8ab6-ddd4a10683bc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/metering-track-b

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Track B of the metering rework to the API monorepo, introducing measurement-only meters for sandbox compute usage (E2B webhooks + Daytona poll) and a storage-bytes gauge with an EE reconcile job, without any billing/reporting wiring.

Changes:

  • Introduces the EE sandboxes domain and FastAPI routes for E2B webhook ingestion + admin-triggered Daytona polling, feeding a single usage recording sink.
  • Adds the Gauge.STORAGE_BYTES gauge and an EE reconcile path that reads authoritative sizes from the existing env.store / ObjectStore.
  • Extends entitlements/meters enums and adds an Alembic migration appending the new meter enum labels.

Reviewed changes

Copilot reviewed 21 out of 24 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
docs/designs/sandbox-metering/TRACK_B_FINDINGS.md Track B implementation notes and findings for the metering work.
docs/designs/sandbox-metering/NAMING.md Documents the sandbox meter key naming scheme used by the code.
api/oss/src/utils/env.py Adds Daytona analytics fields, introduces E2BConfig, and adds StoreConfig.reconcile_enabled.
api/entrypoints/routers.py Registers the E2B webhook at startup (best-effort) in EE mode.
api/ee/src/main.py Wires the new sandboxes service/router into the EE FastAPI app.
api/ee/src/dbs/postgres/subscriptions/dao.py Adds list_active() to support storage reconcile iteration.
api/ee/src/core/subscriptions/interfaces.py Adds the list_active() interface method.
api/ee/src/core/storage/types.py Introduces storage-domain exceptions.
api/ee/src/core/storage/service.py Adds storage delta recording + per-org reconcile logic for STORAGE_BYTES.
api/ee/src/core/storage/reconcile.py Adds the periodic storage reconcile job (gated by EE + env.store.reconcile_enabled).
api/ee/src/core/storage/paths.py Adds helpers for computing org/project storage prefixes.
api/ee/src/core/storage/adapters.py Implements authoritative per-org byte counting via ObjectStore.list_objects_v2.
api/ee/src/core/storage/init.py Storage package initializer.
api/ee/src/core/sandboxes/service.py Implements usage recording, E2B webhook registration/verification, and Daytona polling.
api/ee/src/core/sandboxes/exceptions.py Introduces sandboxes-domain exceptions for webhook signature/registration failures.
api/ee/src/core/sandboxes/dtos.py Adds DTOs for sandbox usage ingestion and results.
api/ee/src/core/sandboxes/init.py Sandboxes package initializer.
api/ee/src/core/meters/types.py Adds new sandbox counters + storage gauge to Meters.
api/ee/src/core/access/entitlements/types.py Adds new sandbox counters, Gauge.STORAGE_BYTES, plan quotas, and constraints updates.
api/ee/src/apis/fastapi/sandboxes/router.py Adds unauthenticated E2B webhook receiver (HMAC) and admin Daytona poll endpoint.
api/ee/src/apis/fastapi/sandboxes/models.py Adds Pydantic models for sandboxes endpoints.
api/ee/src/apis/fastapi/sandboxes/init.py Sandboxes FastAPI package initializer.
api/ee/src/apis/fastapi/billing/router.py Adds admin endpoints to trigger/unlock storage reconcile with a distributed lock.
api/ee/databases/postgres/migrations/core_ee/versions/ee0000000004_add_sandbox_and_storage_meters.py Appends the new meter enum labels to meters_type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +56
Went through two naming revisions mid-task; the committed state uses the final,
simplest scheme — plain 3-letter resource tokens, no unit token:

| Counter key | value |
|-------------------------|--------------------------|
| `SANDBOX_CPU_SECONDS` | `sandbox_cpu_seconds` |
| `SANDBOX_RAM_SECONDS` | `sandbox_ram_seconds` |
| `SANDBOX_SSD_SECONDS` | `sandbox_ssd_seconds` |
| `SANDBOX_GPU_SECONDS` | `sandbox_gpu_seconds` |

Plus `Gauge.STORAGE_BYTES` (`storage_bytes`) — the storage-size gauge, distinct from
`SANDBOX_SSD_SECONDS` (sandbox disk *compute-time*, not stored bytes). Applied
consistently to: `Counter` enum, `Meters` mirror, `DEFAULT_ENTITLEMENTS` quotas,
`CONSTRAINTS`, the `ee0000000004` migration's enum labels, and the sandboxes service's
meter-delta mapping. A stray `docs/designs/sandbox-metering/NAMING.md` on disk
(untracked, authored elsewhere) documents an earlier, superseded 4-letter-token variant
of this scheme — not updated as part of this task since it wasn't in Track B's file
list; the code is the source of truth.
Comment thread api/ee/src/core/sandboxes/exceptions.py Outdated
Comment on lines +1 to +14
class SandboxMeteringError(Exception):
pass


class SandboxWebhookSignatureError(SandboxMeteringError):
def __init__(self, message: str = "Webhook signature verification failed."):
self.message = message
super().__init__(message)


class SandboxWebhookRegistrationError(SandboxMeteringError):
def __init__(self, message: str = "Failed to register sandbox webhook."):
self.message = message
super().__init__(message)
Comment thread api/ee/src/core/sandboxes/service.py Outdated
Comment on lines +207 to +212
"events": [
"sandbox.created",
"sandbox.paused",
"sandbox.resumed",
"sandbox.killed",
],
Comment on lines +55 to +56
org_id = usage.organization_id
scope = MeterScope(organization_id=org_id)
Comment on lines +277 to +286
usage = SandboxUsageDTO(
organization_id=org_id,
provider="daytona",
sandbox_id="__aggregate__",
vcpu_seconds=vcpu_seconds,
ram_gib_seconds=ram_gib_seconds,
disk_gib_seconds=disk_gib_seconds,
gpu_seconds=gpu_seconds,
)
await self.record_usage(usage)
Comment thread api/ee/src/core/sandboxes/service.py Outdated
Comment on lines +155 to +167
E2B signs: sha256(secret + raw_body) → base64, sent in e2b-signature header.
NOTE: docs/actual header mismatch issue #1103 — log raw header on first failures.
"""
expected = hmac.new(
secret.encode(),
raw_body,
hashlib.sha256,
).hexdigest()
# E2B may send hex or base64; try hex comparison first.
try:
return hmac.compare_digest(expected, signature_header.strip())
except Exception:
return False
@junaway junaway changed the title [feat] New metering: sandbox compute meters + storage gauge (measurement only) [feat] Meter sandbox compute and storage usage (no billing) Jul 2, 2026
@junaway junaway mentioned this pull request Jul 2, 2026
12 tasks
jp-agenta added 4 commits July 2, 2026 18:06
get_or_create_e2b_webhook_secret() minted its own secret behind a Redis
SET-NX leader election with a TTL, so on TTL expiry a new secret could be
generated while E2B kept signing with the old one — permanent signature
verification failure until manual intervention. Also deleted the
best-effort auto-registration (ensure_e2b_webhook_registered) that
depended on that secret.

Mirrors the Stripe pattern instead: E2B_WEBHOOK_SECRET is a plain env var
(E2BConfig.webhook_secret), the operator registers the webhook with E2B
out-of-band, and verify_e2b_signature() reads the secret straight from
env.e2b.webhook_secret.
org_prefix(org_id) built an f"{org_id}/" scan prefix, but mount object
keys are mounts/<project_id>/<mount_id>/<path> (MountsService._storage_key)
— org_id is never a key component, so the ListObjectsV2 scan matched
nothing and the storage gauge was always 0.

get_org_storage_bytes() now enumerates the org's projects via the new
fetch_projects_by_organization() accessor (ProjectDB.organization_id) and
sums list_objects_v2() over each project's mounts/<project_id>/ prefix
(project_prefix(), honoring env.store.namespace like _storage_key does).
Router already captured e2b-delivery-id into SandboxUsageDTO.delivery_id,
but record_usage() never used it, so an E2B redelivery of the same event
double-counted the meter deltas.

record_usage() now claims the delivery_id via Redis SET NX (48h TTL,
comfortably beyond E2B's redelivery window) before touching any meter;
a losing claim short-circuits with SandboxUsageResult(deduped=True) and
skips the writes. Missing delivery_id skips dedup entirely (best-effort,
matches the DTO's Optional contract). Uses the existing acquire_lock()
Redis SET NX primitive rather than hand-rolling one.
daytona_poll() assigned Daytona's totalRAMGBSeconds/totalDiskGBSeconds
(decimal GB, 10^9 bytes) straight into the *_GIBI_SECONDS meters, while
the E2B path already converts to binary GiB — so the two providers fed
inconsistent units into the same meter.

Added _gb_to_gib_seconds(), multiplying by the exact Decimal ratio
10**9 / 2**30 (~0.9313225746) and rounding with the same
max(1, ceil(...)) the E2B path uses (_mb_ms_to_gib_seconds in
router.py), so both providers report GiB-seconds consistently. vCPU-
and GPU-seconds are counts, not bytes, and are left unconverted.
@junaway

junaway commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Post-review fixes applied (from a high-effort code review of this branch):

  • Storage gauge was always 0 — the reconcile scanned {org_id}/, but mount objects are keyed mounts/<project_id>/.... Now enumerates the org's projects and sums [<namespace>/]mounts/<project_id>/ per project (matched byte-for-byte against MountsService._storage_key; added fetch_projects_by_organization).
  • Webhook redelivery double-counteddelivery_id was captured but unused. Added idempotency via the existing acquire_lock (Redis SET NX, 48h) before applying meter deltas; duplicates return deduped=True and skip writes.
  • Daytona GB vs GiB — Daytona reports decimal GB-seconds; these fed a *_GIBI_SECONDS meter with no conversion (~7% off vs the E2B path). Added a GB→GiB factor so both providers feed consistent GiB units.
  • E2B webhook secret — replaced the self-minted, Redis-TTL-cached secret + auto-registration (which silently broke metering when the TTL expired) with a plain E2B_WEBHOOK_SECRET env var, mirroring STRIPE_WEBHOOK_SECRET. Operator sets it and registers with E2B out-of-band.

ruff clean; the sandbox/storage modules import. See big-agents-audit/e2b_signature_probe.py for a script to confirm E2B's actual signature format (hex vs base64) before relying on verify_e2b_signature.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 26 changed files in this pull request and generated 6 comments.

Comment on lines +202 to +211
usage = SandboxUsageDTO(
organization_id=org_id,
provider="daytona",
sandbox_id="__aggregate__",
vcpu_seconds=vcpu_seconds,
ram_gib_seconds=ram_gib_seconds,
disk_gib_seconds=disk_gib_seconds,
gpu_seconds=gpu_seconds,
)
await self.record_usage(usage)
delivery_id,
signature[:32],
)
raise SandboxWebhookSignatureError()
Comment on lines +23 to +25
log = get_module_logger(__name__)

_GIB = 1024.0 * 1024.0 * 1024.0
Comment on lines +37 to +56
## Meter key naming (final)

Went through two naming revisions mid-task; the committed state uses the final,
simplest scheme — plain 3-letter resource tokens, no unit token:

| Counter key | value |
|-------------------------|--------------------------|
| `SANDBOX_CPU_SECONDS` | `sandbox_cpu_seconds` |
| `SANDBOX_RAM_SECONDS` | `sandbox_ram_seconds` |
| `SANDBOX_SSD_SECONDS` | `sandbox_ssd_seconds` |
| `SANDBOX_GPU_SECONDS` | `sandbox_gpu_seconds` |

Plus `Gauge.BYTES` (`bytes`) — the storage-size gauge, distinct from
`SANDBOX_SSD_SECONDS` (sandbox disk *compute-time*, not stored bytes). Applied
consistently to: `Counter` enum, `Meters` mirror, `DEFAULT_ENTITLEMENTS` quotas,
`CONSTRAINTS`, the `ee0000000004` migration's enum labels, and the sandboxes service's
meter-delta mapping. A stray `docs/designs/sandbox-metering/NAMING.md` on disk
(untracked, authored elsewhere) documents an earlier, superseded 4-letter-token variant
of this scheme — not updated as part of this task since it wasn't in Track B's file
list; the code is the source of truth.

## Entitlements (measurement only)

- `Counter.SANDBOX_{CPU,RAM,SSD,GPU}_SECONDS` and `Gauge.BYTES` added.
Comment on lines +1171 to +1178
@intercept_exceptions()
async def reconcile_storage(
self,
):
log.info("[storage] [reconcile] [endpoint] Trigger")

LOCK_TTL = 3600 # 1 hour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants