feat(config): recursively convert parsed dicts to typed dataclasses in loader#5269
feat(config): recursively convert parsed dicts to typed dataclasses in loader#5269MikeGoldsmith wants to merge 8 commits into
Conversation
Adds `_dict_to_dataclass` in `_conversion.py` which walks each field's type annotation and converts: - nested dicts → typed dataclass instances - lists of dicts → lists of typed dataclasses - string/value → Enum members (e.g. log_level: info) - unknown keys → routed to the @_additional_properties decorator The loader's `_dict_to_model` now produces a fully-typed OpenTelemetryConfiguration tree end-to-end. Factory functions can rely on typed attribute access (config.tracer_provider.processors[0].batch .exporter.otlp_http.endpoint) instead of failing on raw dicts. This closes the gap between load_config_file() and the factory functions — YAML/JSON config → SDK objects now works end-to-end. Closes open-telemetry#5127 Assisted-by: Claude Opus 4.6
- Use TypeVar for _dict_to_dataclass return — callers now get the correct type instead of Any - Use collections.abc.Mapping for input (more permissive than dict) - Add explicit is_dataclass check at entry — raises TypeError with a descriptive message instead of failing later in dataclasses.fields Assisted-by: Claude Opus 4.6
Astroid 3.x (used by pylint 3.x) follows typing.get_type_hints into Python 3.14's annotationlib, which contains t-string literals it can't parse and crashes with AttributeError on 'visit_templatestr'. Wrapping the call in a helper that returns dict[str, Any] stops the inference at the declared return type. Assisted-by: Claude Opus 4.7
Same effect as the prior helper — declaring the local as ``dict[str, Any]`` stops astroid's inference at the annotation rather than tracing into the typing internals. Assisted-by: Claude Opus 4.7
… codespell Replace the bespoke _Level enum (which violated pylint's invalid-name on lowercase members) with the real ExemplarFilter enum from models.py — the generated models use lowercase values verbatim from the JSON schema, so using one of them avoids fighting the linter and exercises the same code path with real data shapes. Add 'astroid' to codespell's ignore-words-list; the prior commit's explanatory comment mentions the library by name and codespell flagged it as a misspelling of 'asteroid'. Assisted-by: Claude Opus 4.7
|
Looks good.. should we have a full e2e test like the one you described in your comment ? That seems useful |
The conversion module has unit tests that exercise _dict_to_dataclass
in isolation, but nothing verified the full pipeline: load a real
YAML file, get back fully-typed nested dataclasses, and feed the
result into a downstream factory function.
Adds two checks built on a representative nested fixture (tracer
provider with a parent-based / trace-id-ratio sampler and a batch
processor with console exporter):
- nested fields (sampler, processors[*].batch) come back as the
expected typed dataclasses, not raw dicts
- the typed result is accepted by ``create_tracer_provider`` and
produces an SDK ``TracerProvider``
This is the integration coverage requested in PR review feedback;
the inline example in the PR description is now an actual regression
test.
Assisted-by: Claude Opus 4.7
| dataclasses. Other values (primitives, enums, ``dict[str, Any]`` aliases) | ||
| pass through unchanged. | ||
| """ | ||
| if value is None: |
There was a problem hiding this comment.
Do we want to raise an error if value is None, but type_hint is not optional?
There was a problem hiding this comment.
The JSON schmea validator must have been run before handing these to the @DataClass so would have caught any non-null types. This func should only ever be accessible after that, so I don't think we need those checks. I can add them if you think it's worthwhile though.
| origin = get_origin(unwrapped) | ||
|
|
||
| # list[X] — recurse on each element | ||
| if origin is list and isinstance(value, list): |
There was a problem hiding this comment.
Should we raise an error here if origin is a list, but the value is not a list, or iterable?
| ): | ||
| return unwrapped(value) | ||
|
|
||
| return value |
There was a problem hiding this comment.
I'm not sure how robust we want this function to be, but at this point the type of value and origin should be a primitive, do we want to validate this and raise an exception if the type of value and origin mismatch, and/or perform a coercion?
| def test_nested_fields_are_typed_dataclasses(self): | ||
| config = self._load() | ||
|
|
||
| self.assertIsInstance(config.tracer_provider, TracerProviderConfig) |
There was a problem hiding this comment.
nit: you could construct the TracerProvider and then just do self.assertEqual on the whole object
Description
Closes the gap between
load_config_file()and the factory functions: YAML/JSON config → SDK objects now works end-to-end through the typed model tree.Previously, the loader's
_dict_to_modeldidOpenTelemetryConfiguration(**data)which only constructed the top-level dataclass — nested fields stayed as raw dicts. This meant factory functions likecreate_tracer_provider(config: TracerProviderConfig)would break trying to accessconfig.sampleras an attribute when it was actually a dict.Approach
Added
_dict_to_dataclassin a new_conversion.pymodule. It walks each field's type annotation viatyping.get_type_hintsand recursively converts:dict→TracerProvider→SpanProcessor→BatchSpanProcessor→ ...)list[SpanProcessor])log_level: info→SeverityNumber.info)@_additional_propertiesdecorator (so user-defined plugin names still flow through)Optional[X]/X | Noneis unwrapped before checking the inner type.ClassVarfields are skipped (theadditional_propertiesannotation on decorated classes is correctly ignored).Verified end-to-end
User-defined plugins continue to work — unknown sampler/propagator/exporter names land in
additional_propertiesand are loaded via entry points.Tests
11 new tests in
test_conversion.pycovering: flat dicts, nested dataclasses, lists, optionals, missing fields, unknown keys (additional_properties), enum coercion, primitive pass-through.Closes #5127