open-telemetry · lmolkova · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026
@@ -34,6 +34,8 @@ prefer opt-in or additive. Breaking changes need explicit justification in the P
 - Message content, prompts, and tool call arguments must only be set through the util's content
   capture path — never as unconditional span/log attributes.
 - Adding attributes to invocations produced by the util is fine.
+- Streaming responses must be instrumented by subclassing the util's `SyncStreamWrapper` /
+  `AsyncStreamWrapper` (`opentelemetry.util.genai.stream`). Flag hand-rolled stream wrappers.
 - If a capability is missing in `opentelemetry-util-genai`, land it in the util first.
 
 ## 3. Semantic conventions
@@ -66,9 +68,12 @@ prefer opt-in or additive. Breaking changes need explicit justification in the P
 - Test against oldest and latest supported library versions via `tests/requirements.{oldest,latest}.txt`
   and `{oldest,latest}` `tox.ini` factors.
 - `tests/conftest.py` must consume the shared fixtures from `opentelemetry.test_util_genai`
-  (`from opentelemetry.test_util_genai.fixtures import *` and
-  `from opentelemetry.test_util_genai.vcr import fixture_vcr, scrub_response_headers`). Do not
-  re-implement in-memory provider/exporter setup or the VCR pretty-print serializer locally.
+  by registering them as plugins. Always register the fixtures plugin; register the VCR plugin
+  too when the package's tests use VCR cassettes —
+  `pytest_plugins = ["opentelemetry.test_util_genai.fixtures", "opentelemetry.test_util_genai.vcr"]`
+  (drop the `vcr` entry for packages with no cassette-backed tests), importing scrub helpers from
+  `opentelemetry.test_util_genai.vcr` as needed. Do not re-implement in-memory provider/exporter
+  setup or the VCR pretty-print serializer locally.
 - When recording VCR cassettes, scrub account-identifying values in the conftest's
   `vcr_config` (`filter_headers` for requests, `scrub_response_headers_overwrite` for
   responses) before committing. Examples: `authorization`, `openai-organization`,

@@ -50,22 +50,55 @@ own `pyproject.toml` and `tests/`. The util package follows the equivalent layou
 - Packages use the OpenTelemetry beta versioning format `MAJOR.MINORbN` (e.g. `1.0b0`). `version.py` carries a `.dev`
   suffix during development (`1.0b0.dev`); the release workflow drops it.
 
+## Adding a package to the workspace
+
+A new package under `instrumentation/<pkg>/` (where `<pkg>` is the full
+`opentelemetry-instrumentation-genai-<lib>` directory name) wires in as follows.
+Copy the shape from an existing package — paths in `tox.ini` are repo-root-relative.
+
+- **uv workspace**: auto-included via the `instrumentation/*` glob in root
+  `pyproject.toml [tool.uv.workspace] members` — no edit needed.
+- **`tox.ini`**:
+  - `envlist`: add `py3{…}-test-instrumentation-genai-<lib>-{oldest,latest}`, the
+    `py3{…}-…-<lib>-conformance` entry, and `lint-instrumentation-genai-<lib>`.
+  - `[testenv] deps`: add the factor-conditional test-requirements lines
+    (`<lib>-{oldest,latest,conformance}: -r …/tests/requirements.<factor>.txt` plus
+    `{[testenv]test_deps}` / `{[testenv]pytest_deps}`). Requirements install here — **not**
+    in `commands_pre`.
+  - `[testenv] commands`: add the pytest line (it `--ignore`s `tests/test_conformance.py`),
+    the separate `…-conformance` pytest line, and
+    `lint-…: sh -c "cd instrumentation && ruff check <pkg>"`.
+  - `[testenv:typecheck] deps`: add `{toxinidir}/instrumentation/<pkg>[instruments]`.
+- **`[tool.pyright]`** (in root `pyproject.toml`): `include` is opt-in and added to
+  *progressively* as a package gets fully typed. When a package is in `include`, also add its
+  `<pkg>/tests/**/*.py` and `<pkg>/examples/**/*.py` to `exclude` — tests and examples stay
+  untyped; `src/**` is never excluded.
+
 ## Commands
 
 ```sh
 # Install all packages and dev tools
 uv sync --frozen --all-packages
 
-# Lint (runs ruff via pre-commit)
+# All pre-commit hooks (ruff, ruff-format, uv-lock, rstcheck) — the CI lint gate
+uv run tox -e precommit
+# …or just the ruff hook while iterating
 uv run pre-commit run ruff --all-files
 
-# Test a specific package (append -oldest, -latest for version variants)
-uv run tox -e py312-test-instrumentation-genai-openai-oldest
+# Test one package (append -oldest / -latest for the version-matrix variants)
+uv run tox -e py312-test-instrumentation-genai-openai-latest
+
+# Run a package's conformance scenarios (only *-conformance envs collect test_conformance.py)
+uv run tox -e py312-test-instrumentation-genai-openai-conformance
 
-# Type check
+# Type check (pyright)
 uv run tox -e typecheck
 ```
 
+Before opening a PR, run `uv run tox -e precommit`, `uv run tox -e typecheck`, and the changed package's
+test envs (`-oldest` and `-latest`, plus `-conformance` if it ships scenarios) — these mirror
+the CI gates.
+
 ## Guidelines
 
 - Each package has its own `pyproject.toml` with version, dependencies, and entry points.
@@ -92,6 +125,49 @@ This repo uses [towncrier](https://towncrier.readthedocs.io/) to manage changelo
 
 Apply to packages under `instrumentation/`.
 
+### Telemetry via `opentelemetry-util-genai`
+
+- Spans, logs, metrics, and events should go through `opentelemetry-util-genai`. Do not call OTel
+  `Tracer`/`Meter`/`Logger` directly, and import only its public surface — never an
+  `opentelemetry.util.genai._*` module.
+- Content capture, hooks, and configuration are owned by the util. Don't add instrumentation-local
+  env vars or settings.
+
+#### Streaming responses
+
+A streamed response only finishes once the caller has drained the stream, so the invocation must
+stay open until then. Do **not** call `invocation.stop()` when the SDK returns the stream — the
+span would close before any chunks arrive.
+
+Instrument streams by subclassing `SyncStreamWrapper` / `AsyncStreamWrapper` from
+`opentelemetry.util.genai.stream` (the public, supported helpers). The base class proxies the
+underlying SDK stream, drives iteration, and finalizes telemetry exactly once on success, error,
+or `close()`. Subclasses pass the SDK stream to `super().__init__(stream)` and implement three
+hooks:
+
+- `_process_chunk(chunk)` — accumulate per-chunk state (e.g. response model, finish reasons,
+  token usage, streamed content) onto the invocation.
+- `_on_stream_end()` — finalize on success; set the accumulated response attributes and call
+  `invocation.stop()`.
+- `_on_stream_error(error)` — finalize on failure; call `invocation.fail(error)`.
+
+```python
+class MyStreamWrapper(SyncStreamWrapper[Chunk]):
+    def __init__(self, stream, invocation, capture_content):
+        super().__init__(stream)
+        self._self_invocation = invocation
+        ...
+
+    def _process_chunk(self, chunk): ...      # accumulate state
+    def _on_stream_end(self): self._self_invocation.stop()
+    def _on_stream_error(self, error): self._self_invocation.fail(error)
+```
+
+The hooks are called internally by the wrapper lifecycle.
+Instance state must use the wrapt-proxy `_self_`-prefixed attribute convention (e.g.
+`self._self_invocation`) so it isn't forwarded to the wrapped stream. Don't reimplement iteration,
+finalization, or error handling in instrumentations — extend the wrapper instead, and if a hook
+isn't enough, add the capability here rather than working around it.
 ### Exception handling
 
 - When catching exceptions from the underlying library to record telemetry, always re-raise the
@@ -112,15 +188,29 @@ Apply to packages under `instrumentation/`.
 - Tests must verify exact attribute names **and value types**, checked against the semconv spec.
 - Test against oldest and latest supported library versions via `tests/requirements.{oldest,latest}.txt`
   and `{oldest,latest}` `tox.ini` factors.
-- `tests/conftest.py` must consume the shared fixtures from
-  `opentelemetry.test_util_genai` (`from opentelemetry.test_util_genai.fixtures import *` and
-  `from opentelemetry.test_util_genai.vcr import fixture_vcr, scrub_response_headers`) rather
-  than re-implementing provider/exporter/VCR plumbing.
+- `tests/conftest.py` must consume the shared fixtures from `opentelemetry.test_util_genai`
+  by registering them as plugins. Always register the fixtures plugin; register the VCR plugin
+  too when the package's tests use VCR cassettes —
+  `pytest_plugins = ["opentelemetry.test_util_genai.fixtures", "opentelemetry.test_util_genai.vcr"]`
+  (drop the `vcr` entry for packages with no cassette-backed tests) — rather than
+  re-implementing provider/exporter/VCR plumbing. Import scrub helpers
+  (`scrub_response_headers` / `scrub_response_headers_overwrite`) from
+  `opentelemetry.test_util_genai.vcr` where a `vcr_config` needs them.
+- Drive instrumentation in tests through the shared `instrument` context manager from
+  `opentelemetry.test_util_genai.instrumentor` — `instrument(SomeInstrumentor(),
+  tracer_provider=…, logger_provider=…, meter_provider=…, semconv=…, content_capture=…)`. It sets
+  the content-capture (`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`) env var *before* instrumenting and
+  restores them after, so a package's `instrument_*` fixtures don't manage that env themselves
+  (`TelemetryHandler` snapshots content-capture at construction, so the env must be set before it
+  is built).
 - When recording VCR cassettes, scrub account-identifying values in the conftest's
   `vcr_config` (`filter_headers` for requests, `scrub_response_headers_overwrite` for
   responses) before committing. Examples: `authorization`, `openai-organization`,
   `openai-project`, `Set-Cookie`, and any response-body field tied to a real
   account.
+- An AI-synthesized cassette (recorded without provider access) must start with a
+  `# TODO: this is generated by AI, re-record` comment so it gets re-recorded
+  against the real provider later.
 
 ### Conformance tests
 
@@ -132,6 +222,12 @@ via Weaver live-check. Each scenario module defines a subclass of
 `expected_spans`, `expected_metrics`, and implements
 `run(*, tracer_provider, meter_provider, logger_provider, vcr)`.
 
+Ship a scenario for **every** semconv operation the library emits, even an
+operation currently blocked by a util-genai or semconv gap. Skipping the
+scenario hides the gap; writing it records the gap (as a declared violation
+or a skip reason) so it fails loudly once the gap is fixed. **Never** drop a
+scenario file because it would fail today.
+
 Run via `tox -e py312-test-instrumentation-genai-<lib>-conformance`. The
 `*-conformance` tox envs target `tests/test_conformance.py` directly; the
 regular `*-{oldest,latest}` envs `--ignore` it so they don't need the

@@ -92,9 +92,10 @@ package's `tests/cassettes/`.
   Diff each new cassette before committing — leaked API keys, org ids, or
   `Set-Cookie` values block the PR.
 - **AI-generated cassettes**: if you lack provider access, you may
-  synthesize a cassette from the provider's API reference via AI. Make sure
-  to mention it in the PR and open a follow-up issue to re-record it in CI
-  against the real provider.
+  synthesize a cassette from the provider's API reference via AI. Add a
+  `# TODO: this is generated by AI, re-record` comment at the top of the
+  cassette, mention it in the PR, and open a follow-up issue to re-record it
+  in CI against the real provider.
 - **CI**: replay-only; recording in CI is a future improvement.
 
 ### 4. Update the changelog

@@ -50,7 +50,7 @@ The returned object can also be used as a context manager (`with ... as invocati
 The above factories must map 1:1 to distinct semconv operation types (inference, embeddings,
 retrieval, tool execution, agent invocation, workflow invocation). Names must match the operation
 unambiguously — for example, `create_agent` and `invoke_agent` are different operations, so a
-single `agent()` would be ambiguous and is not acceptable. Add a new factory per operation
+single `agent()` would be ambiguous and is not acceptable. Add a new factory per operation type
 instead.
 
 Factory names are Python-style singular verbs (`inference`, `embedding`, `retrieval`, `tool`, `workflow`); the op names
@@ -61,6 +61,42 @@ decisions as parameters, so they are on the span at creation time. Attributes th
 marked required by semconv must be required parameters (no default value). Operation name
 is usually hardcoded in specific invocation and does not need to be passed.
 
+### Streaming responses
+
+A streamed response only finishes once the caller has drained the stream, so the invocation must
+stay open until then. Do **not** call `invocation.stop()` when the SDK returns the stream — the
+span would close before any chunks arrive.
+
+Instrument streams by subclassing `SyncStreamWrapper` / `AsyncStreamWrapper` from
+`opentelemetry.util.genai.stream` (the public, supported helpers). The base class proxies the
+underlying SDK stream, drives iteration, and finalizes telemetry exactly once on success, error,
+or `close()`. Subclasses pass the SDK stream to `super().__init__(stream)` and implement three
+hooks:
+
+- `_process_chunk(chunk)` — accumulate per-chunk state (e.g. response model, finish reasons,
+  token usage, streamed content) onto the invocation.
+- `_on_stream_end()` — finalize on success; set the accumulated response attributes and call
+  `invocation.stop()`.
+- `_on_stream_error(error)` — finalize on failure; call `invocation.fail(error)`.
+
+```python
+class MyStreamWrapper(SyncStreamWrapper[Chunk]):
+    def __init__(self, stream, invocation, capture_content):
+        super().__init__(stream)
+        self._self_invocation = invocation
+        ...
+
+    def _process_chunk(self, chunk): ...      # accumulate state
+    def _on_stream_end(self): self._self_invocation.stop()
+    def _on_stream_error(self, error): self._self_invocation.fail(error)
+```
+
+The hooks are called internally by the wrapper lifecycle.
+Instance state must use the wrapt-proxy `_self_`-prefixed attribute convention (e.g.
+`self._self_invocation`) so it isn't forwarded to the wrapped stream. Don't reimplement iteration,
+finalization, or error handling in instrumentations — extend the wrapper instead, and if a hook
+isn't enough, add the capability here rather than working around it.
+
 ### Anti-patterns
 
 **Never construct invocation types directly** (`InferenceInvocation(...)`, `ToolInvocation(...)`,