Skip to content

feat: add agent-helper CLI commands for schema introspection and type discovery#329

Open
johnnygreco wants to merge 35 commits intomainfrom
johnny/feat/agent-context
Open

feat: add agent-helper CLI commands for schema introspection and type discovery#329
johnnygreco wants to merge 35 commits intomainfrom
johnny/feat/agent-context

Conversation

@johnnygreco
Copy link
Contributor

@johnnygreco johnnygreco commented Feb 16, 2026

Summary

Adds two agent-helper CLI command groups — inspect and list — that expose Data Designer's configuration API as structured, agent-consumable output. These commands let AI agents programmatically discover configuration types, schemas, builder methods, and valid values without reading source files.

data-designer inspect — detailed schemas and signatures

  • inspect column <type> — schema for a column config type
  • inspect sampler <type> — schema for a sampler params type
  • inspect validator <type> — schema for a validator params type
  • inspect processor <type> — schema for a processor config type
  • inspect sampler-constraints — constraint schemas for sampler columns
  • inspect config-builder — DataDesignerConfigBuilder method signatures and docstrings

data-designer list — available types and values

  • list columns — column type names and config classes
  • list samplers — sampler type names and params classes
  • list validators — validator type names and params classes
  • list processors — processor type names and config classes
  • list model-aliases — configured model aliases and backing models
  • list persona-datasets — Nemotron-Persona datasets and install status

Supporting infrastructure

  • Introspection service layer (cli/services/introspection/) — discovery, Pydantic model inspection, method signature extraction, and dual-format output (text + JSON)
  • Controller layerIntrospectionController and ListController orchestrate discovery, inspection, formatting, and output
  • Field descriptions added to config models (column_configs.py, models.py, sampler_params.py, etc.) for self-documenting schema output

Tests

Comprehensive test suite across discovery, inspection, formatting, controllers, commands, and end-to-end usage scenarios.

Attention areas

  • config/column_configs.py — Field description additions across all column config types; verify descriptions are accurate
  • cli/services/introspection/discovery.py — Uses live module inspection to discover types; changes to config module exports could affect discovery
  • cli/controllers/introspection_controller.py — Shared _show_typed_items pattern; new type categories should follow this pattern

- add the new agent-context command, controller, and introspection services
  for config model and method discovery
- register the command in the main CLI and add broad unit test coverage for
  commands, controllers, and introspection formatting/inspection behavior
- enrich config pydantic models with Field descriptions so introspection output
  provides clearer, user-facing schema documentation
- add an agent-context CLI review document under docs/reviews
Remove redundant `name` field re-declaration from ExpressionColumnConfig
(already inherited from SingleColumnConfig) and fix validator_type
description to use actual enum values instead of uppercase member names.
Add required/default/constraints to FieldDetail, PropertyInfo dataclass,
classmethod detection, inspect_class_properties, and __init__ docstring
fallback. Enum values now use .value instead of .name.
Add description= to RunConfig fields, class docstrings to constraint
and seed source types for richer introspection output.
…at functions

Add required/default/constraints to field rendering, schema deduplication
via seen_schemas, and new formatters for interface, imports, and namespace
tree output.
Add discover_namespace_tree, discover_interface_classes, and
discover_importable_names functions. Move config imports to module level.
Rename CLI command from agent-context to introspect, add OutputFormat enum
for validated --format options, and add interface, imports, and
code-structure subcommands with fuzzy category matching.
Add end-to-end tests for preview, validate, and introspect commands
covering non-interactive preview, interactive navigation, error messages,
and JSON contract validation.
… groups

Split the monolithic `introspect` CLI into two focused command groups:
- `types`: explore configuration types (columns, samplers, validators, etc.)
- `reference`: reference docs (overview, builder, interface, imports, code-structure)
…ucture

Update command references from `introspect` to `types`/`reference`, enhance
import display to use `dd.` alias pattern with recommended imports section,
and fix singular/plural noun in category headers.
@johnnygreco johnnygreco requested a review from a team as a code owner February 16, 2026 03:55
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 16, 2026

Greptile Summary

This PR adds two new CLI command groups (inspect and list) that expose Data Designer's configuration API as structured, agent-consumable output. The implementation follows a clean layered architecture:

  • Service layer (cli/services/introspection/): discovery functions that dynamically find config types via module inspection, Pydantic model formatters, and method signature extractors
  • Controller layer (IntrospectionController, ListController): orchestrates discovery, formatting, and output with a data-driven spec pattern that avoids repetition
  • Command layer (cli/commands/agent_helpers/): thin Typer wrappers delegating to controllers
  • Config model descriptions: Field(description=...) added across all config models for self-documenting schema output

The PR also includes a comprehensive test suite across all layers (discovery, inspection, formatting, controllers, commands, and end-to-end scenarios).

  • Well-structured architecture with clear separation of concerns across services, controllers, and commands
  • Config model field descriptions are accurate and useful for agent consumption
  • Good use of the _TypedCommandSpec pattern in IntrospectionController to reduce duplication
  • The conditional_params default in SamplerColumnConfig was correctly changed from = {} to Field(default_factory=dict)
  • Minor inconsistency in _print_type_table separator width calculation (does not account for column header width)
  • Previous review feedback (enum representation, dead code, bare exceptions, __all__ ordering) appears to have been addressed in follow-up commits

Confidence Score: 4/5

  • This PR is safe to merge — it adds new read-only CLI commands and field descriptions without altering existing runtime behavior.
  • Score of 4 reflects that this is a well-structured, additive feature with comprehensive tests. The config model changes are limited to adding Field descriptions (no behavioral changes). The only functional code changes are new CLI commands and supporting infrastructure. No existing tests are modified. One minor cosmetic issue found (separator width). Previous review feedback has been addressed.
  • Minor cosmetic issue in packages/data-designer/src/data_designer/cli/controllers/list_controller.py (_print_type_table separator width).

Important Files Changed

Filename Overview
packages/data-designer/src/data_designer/cli/services/introspection/discovery.py New discovery service for dynamically finding config types via module inspection. Clean implementation with discriminator-based and fallback name-matching strategies.
packages/data-designer/src/data_designer/cli/services/introspection/pydantic_inspector.py New Pydantic model inspector that formats schemas as YAML-style text. Handles nested models, enums, constraints, and cycle protection well.
packages/data-designer/src/data_designer/cli/services/introspection/method_inspector.py New method inspector for introspecting class signatures and parsing Google-style docstrings. Solid implementation with good edge case handling.
packages/data-designer/src/data_designer/cli/services/introspection/formatters.py New formatters for method info and type list text output. Simple, focused formatting functions.
packages/data-designer/src/data_designer/cli/controllers/introspection_controller.py New controller orchestrating discovery, inspection, and formatting for introspect CLI commands. Uses data-driven spec pattern to avoid repetition.
packages/data-designer/src/data_designer/cli/controllers/list_controller.py New controller for listing valid configuration values. Minor separator width inconsistency in _print_type_table compared to format_type_list_text.
packages/data-designer/src/data_designer/cli/commands/agent_helpers/inspect.py New Typer command definitions for inspect subcommands. Thin command layer delegating to controller.
packages/data-designer/src/data_designer/cli/commands/agent_helpers/list.py New Typer command definitions for list subcommands. Thin command layer delegating to ListController.
packages/data-designer/src/data_designer/cli/main.py Registers new inspect and list agent-helper command groups. Renames help panel labels for consistency. Clean integration.
packages/data-designer-config/src/data_designer/config/column_configs.py Added Field descriptions to all column config fields for self-documenting schema output. Also fixed mutable default for conditional_params.
packages/data-designer-config/src/data_designer/config/models.py Added Field descriptions to model config fields. Descriptions are accurate and helpful for agent consumption.
packages/data-designer-config/src/data_designer/config/sampler_params.py Added Field descriptions to sampler_type discriminator fields. Consistent pattern across all sampler params classes.

Flowchart

flowchart TD
    CLI["CLI Commands<br/>(inspect.py / list.py)"]
    IC["IntrospectionController"]
    LC["ListController"]
    
    subgraph Services ["Introspection Services"]
        DISC["discovery.py<br/>discover_column_configs()<br/>discover_sampler_types()<br/>discover_validator_types()<br/>discover_processor_configs()<br/>discover_constraint_types()"]
        PI["pydantic_inspector.py<br/>format_model_text()<br/>format_type()"]
        MI["method_inspector.py<br/>inspect_class_methods()"]
        FMT["formatters.py<br/>format_method_info_text()<br/>format_type_list_text()"]
    end
    
    subgraph ConfigModels ["Config Models (Field descriptions added)"]
        CC["column_configs.py"]
        MOD["models.py"]
        SP["sampler_params.py"]
        SC["sampler_constraints.py"]
        PROC["processors.py"]
        SEED["seed.py / seed_source.py"]
        MCP["mcp.py"]
    end
    
    subgraph Repos ["Repositories"]
        MR["ModelRepository"]
        PR["ProviderRepository"]
        PER["PersonaRepository"]
    end
    
    CLI -->|"inspect column/sampler/..."| IC
    CLI -->|"list columns/model-aliases/..."| LC
    IC --> DISC
    IC --> PI
    IC --> MI
    IC --> FMT
    LC --> DISC
    LC --> MR
    LC --> PR
    LC --> PER
    DISC -->|"dir(dd), _LAZY_IMPORTS"| ConfigModels
    PI -->|"model_fields, annotations"| ConfigModels
Loading

Last reviewed commit: 1f8e3fc

…sed discovery

Use _LAZY_IMPORTS as the single source of truth for config exports so
discovery functions stay in sync automatically when new types are added.
Add _discover_by_modules() helper and make discover_interface_classes()
scan interface.__all__ dynamically. Dynamically classify interface
classes in the controller using ConfigBase instead of hardcoded lists.
Add field descriptions to ModelProvider (newly discovered by dynamic scan).
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

- validate negative depth values in code-structure discovery/CLI paths
  and return actionable errors
- preserve machine-typed field defaults in JSON schema output via
  default/default_factory handling
- surface namespace import warnings and include enum values in seed
  JSON output, with coverage updates across introspection tests
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

- Use enum .value in text path of _show_all_schemas for parity with JSON
- Simplify _default_to_json list/dict branch (remove dead try/except)
- Add test_show_seeds_text_uses_enum_values_not_names for format parity
Log subpackage import exceptions during namespace discovery so skipped modules are traceable during development without changing best-effort traversal behavior.
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

type_str = str(annotation)

# Remove module prefixes
type_str = re.sub(r"data_designer\.config\.\w+\.", "", type_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module prefix regex only strips one level
The regex r"data_designer\.config\.\w+\." only strips a single submodule segment. For types from deeper nested paths like data_designer.config.utils.code_lang.CodeLang, this produces code_lang.CodeLang instead of CodeLang.

Consider using a greedy pattern that strips the full dotted path:

Suggested change
type_str = re.sub(r"data_designer\.config\.\w+\.", "", type_str)
type_str = re.sub(r"data_designer\.config\.(?:\w+\.)+", "", type_str)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/services/introspection/pydantic_inspector.py
Line: 146:146

Comment:
**Module prefix regex only strips one level**
The regex `r"data_designer\.config\.\w+\."` only strips a single submodule segment. For types from deeper nested paths like `data_designer.config.utils.code_lang.CodeLang`, this produces `code_lang.CodeLang` instead of `CodeLang`.

Consider using a greedy pattern that strips the full dotted path:

```suggestion
    type_str = re.sub(r"data_designer\.config\.(?:\w+\.)+", "", type_str)
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 82 to 89
"""Discover params classes keyed by their literal discriminator value.

Args:
params_class_suffix: Class-name suffix to select params classes.
discriminator_field: Field name that stores the literal discriminator.

Returns:
Dict mapping discriminator values to params classes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing enum_name in docstring Args
The enum_name parameter is declared in the signature but not documented in the Args section. This function uses enum_name as a critical fallback lookup mechanism (line 112), so documenting it helps future contributors understand the two-phase discovery strategy.

Suggested change
"""Discover params classes keyed by their literal discriminator value.
Args:
params_class_suffix: Class-name suffix to select params classes.
discriminator_field: Field name that stores the literal discriminator.
Returns:
Dict mapping discriminator values to params classes.
"""Discover params classes keyed by their literal discriminator value.
Args:
params_class_suffix: Class-name suffix to select params classes.
discriminator_field: Field name that stores the literal discriminator.
enum_name: Name of the enum class on ``dd`` used as a fallback to
match params classes by normalized name when no literal
discriminator is present.
Returns:
Dict mapping discriminator values to params classes.
"""

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/services/introspection/discovery.py
Line: 82:89

Comment:
**Missing `enum_name` in docstring Args**
The `enum_name` parameter is declared in the signature but not documented in the Args section. This function uses `enum_name` as a critical fallback lookup mechanism (line 112), so documenting it helps future contributors understand the two-phase discovery strategy.

```suggestion
    """Discover params classes keyed by their literal discriminator value.

    Args:
        params_class_suffix: Class-name suffix to select params classes.
        discriminator_field: Field name that stores the literal discriminator.
        enum_name: Name of the enum class on ``dd`` used as a fallback to
            match params classes by normalized name when no literal
            discriminator is present.

    Returns:
        Dict mapping discriminator values to params classes.
    """
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Gives agents a quick way to check which Nemotron-Persona locales
are installed and usable in PersonSamplerParams.
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

36 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 243 to 244
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable guard condition
_default_to_json handles all input types (None, Enum, bool, int, float, str, list, dict, fallback repr()) and never returns _UNDEFINED. This means default_json is not _UNDEFINED is always True after line 242, making the guard on line 243 dead code. It's harmless but worth simplifying for clarity.

Suggested change
if default_json is not _UNDEFINED:
default_display = repr(default_json)
default_json = _default_to_json(field_info.default)
default_display = repr(default_json)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/services/introspection/pydantic_inspector.py
Line: 243:244

Comment:
**Unreachable guard condition**
`_default_to_json` handles all input types (None, Enum, bool, int, float, str, list, dict, fallback `repr()`) and never returns `_UNDEFINED`. This means `default_json is not _UNDEFINED` is always `True` after line 242, making the guard on line 243 dead code. It's harmless but worth simplifying for clarity.

```suggestion
                    default_json = _default_to_json(field_info.default)
                    default_display = repr(default_json)
```

How can I resolve this? If you propose a fix, please make it concise.

extract_reasoning_content: bool = False
column_type: Literal["llm-text"] = "llm-text"
prompt: str = Field(
description="Jinja2 template for the LLM prompt; can reference other columns via {{ column_name }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth warning here or somewhere else about providing fstrings here that could mess up the jinja template? I've found that cursor likes to auto convert this to f""

validator_type: ValidatorType
validator_params: ValidatorParamsT
target_columns: list[str] = Field(description="List of column names to validate")
validator_type: ValidatorType = Field(description="Validation method: 'code', 'local_callable', or 'remote'")
Copy link
Contributor

@nabinchha nabinchha Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 'code', 'local_callable', or 'remote' necessary since it's already strongly typed? Same comment for other similar chanages.

@@ -0,0 +1,380 @@
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is introspection needed when the agent can crawl to data designer installed source code and discover all of these themselves?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, would like to understand a bit better the motivation. I think the new documentation (field descriptions etc.) is very helpful, but the agent can get it "for free" right, do we need the whole CLI introspection layer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think reading source code or fetching docs is "free". This is designed to be much more targeted, concise context for the agent right when they need it. This is like going above and beyond for AX – similar to how you can ask what's the point of adding display_sampler_record when the user can just write a quick display function – it's for UX.

Consolidate the types and reference command groups into a single
inspect command group under agent_helpers. Remove JSON output format
and simplify the introspection controller and service layer. Use singular
subcommand names for inspect (column, sampler, validator, processor) to
semantically distinguish from the plural list commands. Rename
constraints to sampler-constraints for clarity.
Replace the flat list-assets command with a list command group under
agent_helpers. Add subcommands for columns, samplers, validators,
processors, model-aliases, and persona-datasets. Each subcommand
includes tip text pointing to the corresponding inspect subcommand.
@johnnygreco johnnygreco changed the title feat: add agent-facing CLI introspection commands (types & reference) feat: add agent-helper CLI commands for schema introspection and type discovery Feb 18, 2026
Remove PropertyInfo, inspect_class_properties, _collect_classmethod_names,
is_classmethod field from MethodInfo, format_method_info_json, and
_param_to_json. None of these are called in production. Also removes
_extract_nested_basemodel and PropertyInfo from __init__.py re-exports.
- Fix get_brief_description to iterate lines instead of taking first
  (handles whitespace-first-line docstrings)
- Remove dead elif branch in _extract_nested_basemodel union handler
- Remove input mutation in _join_desc_lines
- Move _DEFAULT_INIT_DOCSTRING constant to top of method_inspector.py
- Extract _MIN_CLASS_COL_WIDTH and _NO_DESCRIPTION constants
- Use plain string variable instead of single-element list in formatters
- Document enum_name param in _discover_params_by_discriminator
- Remove unreachable type_name=None branch from _show_typed_items
  (Typer always provides a string); move list-mode handling to
  _show_typed_command instead
- Replace inline chr(10) docstring splitting with get_brief_description
- Remove template dd.<ClassName> hint from show_sampler_constraints
- Remove unused related_inspect_tip parameter
- Fix max() crash on empty discovery results in all list_* methods
- Extract _print_type_table helper to replace 4 near-identical methods
- Add import hint (# import data_designer.config as dd) to all list output
- Remove inconsistent Nemotron-Personas banner from list_persona_datasets
- Add early return for empty persona dataset list
- Remove trailing periods from ModelProvider field descriptions to match
  the convention used across all other config models
- Fix ambiguous 'sampler' example in inspect column help text
- Update main CLI help to "Data Designer CLI for humans and agents."
- Add empty discovery tests for all ListController list_* methods
- Add empty persona dataset test
- Add processors specific/all/nonexistent tests for IntrospectionController
- Add mixed-case lookup tests (LLM-TEXT, CATEGORY)
- Add method_inspector edge cases (empty class, signature error, varargs,
  keyword-only params, _is_dunder/_is_private parametrized)
- Add _extract_literal_discriminator_value direct tests
- Add _default_to_json parametrized tests (9 branches)
- Add format_type regex branch tests
- Add format_model_text empty model test
- Add format_method_info_text edge cases (empty list, no description,
  no parameters)
- Add processors CLI command tests
- Fix broken persona banner assertion after banner removal
- method_inspector: prevent redundant "*" insertion in _format_signature
  instead of inserting then stripping it post-hoc
- pydantic_inspector: merge format_model_text/_format_model_text wrapper
  into a single function with defaulted parameters
- pydantic_inspector: replace string-level bracket-matching hack for
  Annotated[X, Discriminator] with proper type-level unwrapping
- pydantic_inspector: condense union handling in _extract_nested_basemodel
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

37 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

typer.echo(_IMPORT_HINT)
typer.echo("")
typer.echo(f"{col1:<{max_width}} {col2}")
typer.echo(f"{'-' * max_width} {'-' * max(len(items[t].__name__) for t in sorted_types)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separator width ignores column header
The separator width for the second column is computed from the max class name length but doesn't account for the col2 header length. If all class names are shorter than the header string (e.g., "config_class" is 12 chars), the separator underline will be shorter than the header text. This is inconsistent with list_model_aliases at lines 76-78 which correctly uses max(len(header), max(len(data))) for all columns.

Suggested change
typer.echo(f"{'-' * max_width} {'-' * max(len(items[t].__name__) for t in sorted_types)}")
typer.echo(f"{'-' * max_width} {'-' * max(len(col2), max(len(items[t].__name__) for t in sorted_types))}")
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/controllers/list_controller.py
Line: 131:131

Comment:
**Separator width ignores column header**
The separator width for the second column is computed from the max class name length but doesn't account for the `col2` header length. If all class names are shorter than the header string (e.g., `"config_class"` is 12 chars), the separator underline will be shorter than the header text. This is inconsistent with `list_model_aliases` at lines 76-78 which correctly uses `max(len(header), max(len(data)))` for all columns.

```suggestion
        typer.echo(f"{'-' * max_width}  {'-' * max(len(col2), max(len(items[t].__name__) for t in sorted_types))}")
```

How can I resolve this? If you propose a fix, please make it concise.

self._persona_repository = PersonaRepository()
self._download_service = DownloadService(config_dir, self._persona_repository)

def list_model_aliases(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a list on model providers. same question for mcp configs....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly add that. I was thinking to keep it simple, we hold off on mcp, and assume the user has set up their model providers – so the model alias is the only think the agent needs to choose. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments