feat: add agent-helper CLI commands for schema introspection and type discovery#329
feat: add agent-helper CLI commands for schema introspection and type discovery#329johnnygreco wants to merge 35 commits intomainfrom
Conversation
- add the new agent-context command, controller, and introspection services for config model and method discovery - register the command in the main CLI and add broad unit test coverage for commands, controllers, and introspection formatting/inspection behavior - enrich config pydantic models with Field descriptions so introspection output provides clearer, user-facing schema documentation - add an agent-context CLI review document under docs/reviews
Remove redundant `name` field re-declaration from ExpressionColumnConfig (already inherited from SingleColumnConfig) and fix validator_type description to use actual enum values instead of uppercase member names.
Add required/default/constraints to FieldDetail, PropertyInfo dataclass, classmethod detection, inspect_class_properties, and __init__ docstring fallback. Enum values now use .value instead of .name.
Add description= to RunConfig fields, class docstrings to constraint and seed source types for richer introspection output.
…at functions Add required/default/constraints to field rendering, schema deduplication via seen_schemas, and new formatters for interface, imports, and namespace tree output.
Add discover_namespace_tree, discover_interface_classes, and discover_importable_names functions. Move config imports to module level.
Rename CLI command from agent-context to introspect, add OutputFormat enum for validated --format options, and add interface, imports, and code-structure subcommands with fuzzy category matching.
Add end-to-end tests for preview, validate, and introspect commands covering non-interactive preview, interactive navigation, error messages, and JSON contract validation.
… groups Split the monolithic `introspect` CLI into two focused command groups: - `types`: explore configuration types (columns, samplers, validators, etc.) - `reference`: reference docs (overview, builder, interface, imports, code-structure)
…ucture Update command references from `introspect` to `types`/`reference`, enhance import display to use `dd.` alias pattern with recommended imports section, and fix singular/plural noun in category headers.
Greptile SummaryThis PR adds two new CLI command groups (
The PR also includes a comprehensive test suite across all layers (discovery, inspection, formatting, controllers, commands, and end-to-end scenarios).
|
| Filename | Overview |
|---|---|
| packages/data-designer/src/data_designer/cli/services/introspection/discovery.py | New discovery service for dynamically finding config types via module inspection. Clean implementation with discriminator-based and fallback name-matching strategies. |
| packages/data-designer/src/data_designer/cli/services/introspection/pydantic_inspector.py | New Pydantic model inspector that formats schemas as YAML-style text. Handles nested models, enums, constraints, and cycle protection well. |
| packages/data-designer/src/data_designer/cli/services/introspection/method_inspector.py | New method inspector for introspecting class signatures and parsing Google-style docstrings. Solid implementation with good edge case handling. |
| packages/data-designer/src/data_designer/cli/services/introspection/formatters.py | New formatters for method info and type list text output. Simple, focused formatting functions. |
| packages/data-designer/src/data_designer/cli/controllers/introspection_controller.py | New controller orchestrating discovery, inspection, and formatting for introspect CLI commands. Uses data-driven spec pattern to avoid repetition. |
| packages/data-designer/src/data_designer/cli/controllers/list_controller.py | New controller for listing valid configuration values. Minor separator width inconsistency in _print_type_table compared to format_type_list_text. |
| packages/data-designer/src/data_designer/cli/commands/agent_helpers/inspect.py | New Typer command definitions for inspect subcommands. Thin command layer delegating to controller. |
| packages/data-designer/src/data_designer/cli/commands/agent_helpers/list.py | New Typer command definitions for list subcommands. Thin command layer delegating to ListController. |
| packages/data-designer/src/data_designer/cli/main.py | Registers new inspect and list agent-helper command groups. Renames help panel labels for consistency. Clean integration. |
| packages/data-designer-config/src/data_designer/config/column_configs.py | Added Field descriptions to all column config fields for self-documenting schema output. Also fixed mutable default for conditional_params. |
| packages/data-designer-config/src/data_designer/config/models.py | Added Field descriptions to model config fields. Descriptions are accurate and helpful for agent consumption. |
| packages/data-designer-config/src/data_designer/config/sampler_params.py | Added Field descriptions to sampler_type discriminator fields. Consistent pattern across all sampler params classes. |
Flowchart
flowchart TD
CLI["CLI Commands<br/>(inspect.py / list.py)"]
IC["IntrospectionController"]
LC["ListController"]
subgraph Services ["Introspection Services"]
DISC["discovery.py<br/>discover_column_configs()<br/>discover_sampler_types()<br/>discover_validator_types()<br/>discover_processor_configs()<br/>discover_constraint_types()"]
PI["pydantic_inspector.py<br/>format_model_text()<br/>format_type()"]
MI["method_inspector.py<br/>inspect_class_methods()"]
FMT["formatters.py<br/>format_method_info_text()<br/>format_type_list_text()"]
end
subgraph ConfigModels ["Config Models (Field descriptions added)"]
CC["column_configs.py"]
MOD["models.py"]
SP["sampler_params.py"]
SC["sampler_constraints.py"]
PROC["processors.py"]
SEED["seed.py / seed_source.py"]
MCP["mcp.py"]
end
subgraph Repos ["Repositories"]
MR["ModelRepository"]
PR["ProviderRepository"]
PER["PersonaRepository"]
end
CLI -->|"inspect column/sampler/..."| IC
CLI -->|"list columns/model-aliases/..."| LC
IC --> DISC
IC --> PI
IC --> MI
IC --> FMT
LC --> DISC
LC --> MR
LC --> PR
LC --> PER
DISC -->|"dir(dd), _LAZY_IMPORTS"| ConfigModels
PI -->|"model_fields, annotations"| ConfigModels
Last reviewed commit: 1f8e3fc
…sed discovery Use _LAZY_IMPORTS as the single source of truth for config exports so discovery functions stay in sync automatically when new types are added. Add _discover_by_modules() helper and make discover_interface_classes() scan interface.__all__ dynamically. Dynamically classify interface classes in the controller using ConfigBase instead of hardcoded lists. Add field descriptions to ModelProvider (newly discovered by dynamic scan).
packages/data-designer/src/data_designer/cli/services/introspection/__init__.py
Show resolved
Hide resolved
packages/data-designer/src/data_designer/cli/controllers/introspection_controller.py
Outdated
Show resolved
Hide resolved
packages/data-designer/src/data_designer/cli/services/introspection/discovery.py
Outdated
Show resolved
Hide resolved
- validate negative depth values in code-structure discovery/CLI paths and return actionable errors - preserve machine-typed field defaults in JSON schema output via default/default_factory handling - surface namespace import warnings and include enum values in seed JSON output, with coverage updates across introspection tests
packages/data-designer/src/data_designer/cli/controllers/introspection_controller.py
Outdated
Show resolved
Hide resolved
packages/data-designer/src/data_designer/cli/services/introspection/pydantic_inspector.py
Outdated
Show resolved
Hide resolved
- Use enum .value in text path of _show_all_schemas for parity with JSON - Simplify _default_to_json list/dict branch (remove dead try/except) - Add test_show_seeds_text_uses_enum_values_not_names for format parity
Log subpackage import exceptions during namespace discovery so skipped modules are traceable during development without changing best-effort traversal behavior.
packages/data-designer/src/data_designer/cli/services/introspection/__init__.py
Show resolved
Hide resolved
| type_str = str(annotation) | ||
|
|
||
| # Remove module prefixes | ||
| type_str = re.sub(r"data_designer\.config\.\w+\.", "", type_str) |
There was a problem hiding this comment.
Module prefix regex only strips one level
The regex r"data_designer\.config\.\w+\." only strips a single submodule segment. For types from deeper nested paths like data_designer.config.utils.code_lang.CodeLang, this produces code_lang.CodeLang instead of CodeLang.
Consider using a greedy pattern that strips the full dotted path:
| type_str = re.sub(r"data_designer\.config\.\w+\.", "", type_str) | |
| type_str = re.sub(r"data_designer\.config\.(?:\w+\.)+", "", type_str) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/services/introspection/pydantic_inspector.py
Line: 146:146
Comment:
**Module prefix regex only strips one level**
The regex `r"data_designer\.config\.\w+\."` only strips a single submodule segment. For types from deeper nested paths like `data_designer.config.utils.code_lang.CodeLang`, this produces `code_lang.CodeLang` instead of `CodeLang`.
Consider using a greedy pattern that strips the full dotted path:
```suggestion
type_str = re.sub(r"data_designer\.config\.(?:\w+\.)+", "", type_str)
```
How can I resolve this? If you propose a fix, please make it concise.| """Discover params classes keyed by their literal discriminator value. | ||
|
|
||
| Args: | ||
| params_class_suffix: Class-name suffix to select params classes. | ||
| discriminator_field: Field name that stores the literal discriminator. | ||
|
|
||
| Returns: | ||
| Dict mapping discriminator values to params classes. |
There was a problem hiding this comment.
Missing enum_name in docstring Args
The enum_name parameter is declared in the signature but not documented in the Args section. This function uses enum_name as a critical fallback lookup mechanism (line 112), so documenting it helps future contributors understand the two-phase discovery strategy.
| """Discover params classes keyed by their literal discriminator value. | |
| Args: | |
| params_class_suffix: Class-name suffix to select params classes. | |
| discriminator_field: Field name that stores the literal discriminator. | |
| Returns: | |
| Dict mapping discriminator values to params classes. | |
| """Discover params classes keyed by their literal discriminator value. | |
| Args: | |
| params_class_suffix: Class-name suffix to select params classes. | |
| discriminator_field: Field name that stores the literal discriminator. | |
| enum_name: Name of the enum class on ``dd`` used as a fallback to | |
| match params classes by normalized name when no literal | |
| discriminator is present. | |
| Returns: | |
| Dict mapping discriminator values to params classes. | |
| """ |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/services/introspection/discovery.py
Line: 82:89
Comment:
**Missing `enum_name` in docstring Args**
The `enum_name` parameter is declared in the signature but not documented in the Args section. This function uses `enum_name` as a critical fallback lookup mechanism (line 112), so documenting it helps future contributors understand the two-phase discovery strategy.
```suggestion
"""Discover params classes keyed by their literal discriminator value.
Args:
params_class_suffix: Class-name suffix to select params classes.
discriminator_field: Field name that stores the literal discriminator.
enum_name: Name of the enum class on ``dd`` used as a fallback to
match params classes by normalized name when no literal
discriminator is present.
Returns:
Dict mapping discriminator values to params classes.
"""
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.Gives agents a quick way to check which Nemotron-Persona locales are installed and usable in PersonSamplerParams.
There was a problem hiding this comment.
Unreachable guard condition
_default_to_json handles all input types (None, Enum, bool, int, float, str, list, dict, fallback repr()) and never returns _UNDEFINED. This means default_json is not _UNDEFINED is always True after line 242, making the guard on line 243 dead code. It's harmless but worth simplifying for clarity.
| if default_json is not _UNDEFINED: | |
| default_display = repr(default_json) | |
| default_json = _default_to_json(field_info.default) | |
| default_display = repr(default_json) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/services/introspection/pydantic_inspector.py
Line: 243:244
Comment:
**Unreachable guard condition**
`_default_to_json` handles all input types (None, Enum, bool, int, float, str, list, dict, fallback `repr()`) and never returns `_UNDEFINED`. This means `default_json is not _UNDEFINED` is always `True` after line 242, making the guard on line 243 dead code. It's harmless but worth simplifying for clarity.
```suggestion
default_json = _default_to_json(field_info.default)
default_display = repr(default_json)
```
How can I resolve this? If you propose a fix, please make it concise.| extract_reasoning_content: bool = False | ||
| column_type: Literal["llm-text"] = "llm-text" | ||
| prompt: str = Field( | ||
| description="Jinja2 template for the LLM prompt; can reference other columns via {{ column_name }}" |
There was a problem hiding this comment.
is it worth warning here or somewhere else about providing fstrings here that could mess up the jinja template? I've found that cursor likes to auto convert this to f""
| validator_type: ValidatorType | ||
| validator_params: ValidatorParamsT | ||
| target_columns: list[str] = Field(description="List of column names to validate") | ||
| validator_type: ValidatorType = Field(description="Validation method: 'code', 'local_callable', or 'remote'") |
There was a problem hiding this comment.
is 'code', 'local_callable', or 'remote' necessary since it's already strongly typed? Same comment for other similar chanages.
| @@ -0,0 +1,380 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
is introspection needed when the agent can crawl to data designer installed source code and discover all of these themselves?
There was a problem hiding this comment.
+1, would like to understand a bit better the motivation. I think the new documentation (field descriptions etc.) is very helpful, but the agent can get it "for free" right, do we need the whole CLI introspection layer?
There was a problem hiding this comment.
I don't think reading source code or fetching docs is "free". This is designed to be much more targeted, concise context for the agent right when they need it. This is like going above and beyond for AX – similar to how you can ask what's the point of adding display_sampler_record when the user can just write a quick display function – it's for UX.
Consolidate the types and reference command groups into a single inspect command group under agent_helpers. Remove JSON output format and simplify the introspection controller and service layer. Use singular subcommand names for inspect (column, sampler, validator, processor) to semantically distinguish from the plural list commands. Rename constraints to sampler-constraints for clarity.
Replace the flat list-assets command with a list command group under agent_helpers. Add subcommands for columns, samplers, validators, processors, model-aliases, and persona-datasets. Each subcommand includes tip text pointing to the corresponding inspect subcommand.
Update docstrings and field descriptions in sampler_constraints.py to make explicit that Constraint, ScalarInequalityConstraint, and ColumnInequalityConstraint are scoped to sampler columns.
Remove PropertyInfo, inspect_class_properties, _collect_classmethod_names, is_classmethod field from MethodInfo, format_method_info_json, and _param_to_json. None of these are called in production. Also removes _extract_nested_basemodel and PropertyInfo from __init__.py re-exports.
- Fix get_brief_description to iterate lines instead of taking first (handles whitespace-first-line docstrings) - Remove dead elif branch in _extract_nested_basemodel union handler - Remove input mutation in _join_desc_lines - Move _DEFAULT_INIT_DOCSTRING constant to top of method_inspector.py - Extract _MIN_CLASS_COL_WIDTH and _NO_DESCRIPTION constants - Use plain string variable instead of single-element list in formatters - Document enum_name param in _discover_params_by_discriminator
- Remove unreachable type_name=None branch from _show_typed_items (Typer always provides a string); move list-mode handling to _show_typed_command instead - Replace inline chr(10) docstring splitting with get_brief_description - Remove template dd.<ClassName> hint from show_sampler_constraints - Remove unused related_inspect_tip parameter
- Fix max() crash on empty discovery results in all list_* methods - Extract _print_type_table helper to replace 4 near-identical methods - Add import hint (# import data_designer.config as dd) to all list output - Remove inconsistent Nemotron-Personas banner from list_persona_datasets - Add early return for empty persona dataset list
- Remove trailing periods from ModelProvider field descriptions to match the convention used across all other config models - Fix ambiguous 'sampler' example in inspect column help text - Update main CLI help to "Data Designer CLI for humans and agents."
- Add empty discovery tests for all ListController list_* methods - Add empty persona dataset test - Add processors specific/all/nonexistent tests for IntrospectionController - Add mixed-case lookup tests (LLM-TEXT, CATEGORY) - Add method_inspector edge cases (empty class, signature error, varargs, keyword-only params, _is_dunder/_is_private parametrized) - Add _extract_literal_discriminator_value direct tests - Add _default_to_json parametrized tests (9 branches) - Add format_type regex branch tests - Add format_model_text empty model test - Add format_method_info_text edge cases (empty list, no description, no parameters) - Add processors CLI command tests - Fix broken persona banner assertion after banner removal
- method_inspector: prevent redundant "*" insertion in _format_signature instead of inserting then stripping it post-hoc - pydantic_inspector: merge format_model_text/_format_model_text wrapper into a single function with defaulted parameters - pydantic_inspector: replace string-level bracket-matching hack for Annotated[X, Discriminator] with proper type-level unwrapping - pydantic_inspector: condense union handling in _extract_nested_basemodel
| typer.echo(_IMPORT_HINT) | ||
| typer.echo("") | ||
| typer.echo(f"{col1:<{max_width}} {col2}") | ||
| typer.echo(f"{'-' * max_width} {'-' * max(len(items[t].__name__) for t in sorted_types)}") |
There was a problem hiding this comment.
Separator width ignores column header
The separator width for the second column is computed from the max class name length but doesn't account for the col2 header length. If all class names are shorter than the header string (e.g., "config_class" is 12 chars), the separator underline will be shorter than the header text. This is inconsistent with list_model_aliases at lines 76-78 which correctly uses max(len(header), max(len(data))) for all columns.
| typer.echo(f"{'-' * max_width} {'-' * max(len(items[t].__name__) for t in sorted_types)}") | |
| typer.echo(f"{'-' * max_width} {'-' * max(len(col2), max(len(items[t].__name__) for t in sorted_types))}") |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/controllers/list_controller.py
Line: 131:131
Comment:
**Separator width ignores column header**
The separator width for the second column is computed from the max class name length but doesn't account for the `col2` header length. If all class names are shorter than the header string (e.g., `"config_class"` is 12 chars), the separator underline will be shorter than the header text. This is inconsistent with `list_model_aliases` at lines 76-78 which correctly uses `max(len(header), max(len(data)))` for all columns.
```suggestion
typer.echo(f"{'-' * max_width} {'-' * max(len(col2), max(len(items[t].__name__) for t in sorted_types))}")
```
How can I resolve this? If you propose a fix, please make it concise.| self._persona_repository = PersonaRepository() | ||
| self._download_service = DownloadService(config_dir, self._persona_repository) | ||
|
|
||
| def list_model_aliases(self) -> None: |
There was a problem hiding this comment.
should there be a list on model providers. same question for mcp configs....
There was a problem hiding this comment.
We can certainly add that. I was thinking to keep it simple, we hold off on mcp, and assume the user has set up their model providers – so the model alias is the only think the agent needs to choose. WDYT?
Summary
Adds two agent-helper CLI command groups —
inspectandlist— that expose Data Designer's configuration API as structured, agent-consumable output. These commands let AI agents programmatically discover configuration types, schemas, builder methods, and valid values without reading source files.data-designer inspect— detailed schemas and signaturesinspect column <type>— schema for a column config typeinspect sampler <type>— schema for a sampler params typeinspect validator <type>— schema for a validator params typeinspect processor <type>— schema for a processor config typeinspect sampler-constraints— constraint schemas for sampler columnsinspect config-builder— DataDesignerConfigBuilder method signatures and docstringsdata-designer list— available types and valueslist columns— column type names and config classeslist samplers— sampler type names and params classeslist validators— validator type names and params classeslist processors— processor type names and config classeslist model-aliases— configured model aliases and backing modelslist persona-datasets— Nemotron-Persona datasets and install statusSupporting infrastructure
cli/services/introspection/) — discovery, Pydantic model inspection, method signature extraction, and dual-format output (text + JSON)IntrospectionControllerandListControllerorchestrate discovery, inspection, formatting, and outputcolumn_configs.py,models.py,sampler_params.py, etc.) for self-documenting schema outputTests
Comprehensive test suite across discovery, inspection, formatting, controllers, commands, and end-to-end usage scenarios.
Attention areas
config/column_configs.py— Field description additions across all column config types; verify descriptions are accuratecli/services/introspection/discovery.py— Uses live module inspection to discover types; changes to config module exports could affect discoverycli/controllers/introspection_controller.py— Shared_show_typed_itemspattern; new type categories should follow this pattern