Use cmdstan config and decouple stanfit objects from RunSet by amas0 · Pull Request #851 · stan-dev/cmdstanpy

amas0 · 2026-04-12T01:39:18Z

Draft PR for initial progress on config parsing for #785 and implementing ideas discussed in #848.

@WardBrian I'd like you to take a look at this initial draft which implements pydantic-based parsing of the config json files and uses that to refactor CmdStanPathfinder to be built entirely from output files (and decouples it from the runset).

The changes I'd like you to look at are in stanfit/pathfinder.py and stanfit/metadata.py -- the others files are mostly just temporary patches to have the library in a functional state.

The CmdStanPathfinder class has been rewritten as a dataclass that defines a from_files(...) method as a constructor to build it from the output files. I also prefer the clarity that this gives to what attributes the class has.

Some areas where I'd appreciate thoughts:

I don't include any reference to the runset in the object itself. I think there are some arguments for having it as an optional attribute, but I opted for a clean break here.
I dropped from the repr any reference to the method_args, building directly from files made it seem that was no longer desirable.
I include stdout_file as an optional attribute. With no reference to the runset, it still seems like something that might be nice to have? But the stdout info is more like process diagnostic than an essential component of the stanfit?
save_csvfiles probably needs to be called something like save_output_files, but I haven't changed that yet.
In a world where we no longer need to parse any config out of the stan csv, this makes the InferenceMetadata object more or less only scan the header line for stan variables, I think. So it may be worth a think as to what that should look like?

If the shape of looks to be in the right direction, I'll proceed with doing this for the other stanfit objects. Following that, we should be able to really clean up what we have going on in stanfit/__init__.py and should be able to drop all the code that we have for parsing out the headers of the stan csvs.

Bumps [actions/cache](https://github.com/actions/cache) from 4 to 5. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](actions/cache@v4...v5) --- updated-dependencies: - dependency-name: actions/cache dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

WardBrian

A few comments/questions, but overall I like the look of the code!

Responding to your comments in the PR:

Runset being left out of the result entirely makes sense. We will have to see how this feels when we tackle something with more moving parts like the MCMC class, but for now I agree.
Repr changes make sense.
Stdout file is good. We may want to also do the same thing with diagnostic files and profiling files, which gets us closer to just including the whole Runset again... maybe we have a small dataclass that is like a cut-down runset and stores the relevant files, can be re-used across all of them?
I agree, it makes sense to replace csv with output across the board, both in the save functions and the from_csv function. This would make sense going forward anyway, c.f. stan-dev/design-docs#58
Yep, totally agree that InferenceMetadata might want to take a hike when all is said and done. The header could be stored in each class, or in the cut-down-Runset idea ("FileInfo"?)

WardBrian · 2026-04-13T15:51:42Z

cmdstanpy/stanfit/pathfinder.py

+    model_name: str
+    csv_file: str
+    config: PathfinderConfig
+    config_file: str | None = None


Is this really optional? Doesn't look like it in from_files

It's optional in the sense that if you pass a config in directly, you don't need to reference the file itself. Outside of simplifying some testing, I'm not sure if it will come up much? The intended entrypoint in practice is always going to be the from_files.

WardBrian · 2026-04-13T15:53:28Z

cmdstanpy/stanfit/pathfinder.py

+        metadata = InferenceMetadata.from_csv(csv_file)
+        return cls(
+            metadata=metadata,
+            csv_file=str(csv_file),


The super-correct way to convert a str|os.PathLike to a str is os.fspath()

WardBrian · 2026-04-13T15:55:08Z

cmdstanpy/stanfit/pathfinder.py

+            metadata=metadata,
+            csv_file=str(csv_file),
+            model_name=stan_config.model_name,
+            config=stan_config.method_config,


It's obviously nice that we don't have any downstream isinstance checks, but it is a bit of a bummer that we end up throwing away the extra information that is present in the StanConfig (e.g. model name, etc). Is there some trick to keep it all around, while maintaining that this is a Pathfinder-specific config?

Good point. We could just store references to both stan_config and config? The top level StanConfig would still contain the inference config within it, but it would just be the same object. Storing both would keep the top level info and still let us have config be validated that the method is correct. Keeps the type checker happy and doesn't drop any info. Thoughts?

That seems fine. I was wondering if pydantic had a more elegant way, but it's not like doing that actually costs any extra memory or anything

We could do something fancy like make StanConfig a generic, which pydantic has good support for:

from typing import Generic, TypeVar AnyMethodConfig = Annotated[ SampleConfig | OptimizeConfig | PathfinderConfig | LaplaceConfig | VariationalConfig | GeneratedQuantitiesConfig, Discriminator("method"), ] MethodT = TypeVar("MethodT", bound=BaseModel, default=AnyMethodConfig) class StanConfig(BaseModel, Generic[MethodT]): model_config = ConfigDict(extra="allow") model_name: str stan_major_version: str stan_minor_version: str stan_patch_version: str method_config: MethodT

Then for example we could have:

@dataclass class CmdStanPathfinder: # other fields ... config: StanConfig[PathfinderConfig]

And parsing like StanConfig[PathfinderConfig].model_validate(jsons) would validate it against the Pathfinder instance of the config. This would result in only one reference to the config but the type checker would know that it has the pathfinder configuration fields within method_config.

That seems like a good representation of what's actually happening under the hood

WardBrian · 2026-04-13T16:01:56Z

cmdstanpy/stanfit/__init__.py

+                    f'found {len(csvfiles)}'
+                )
+            csv_file = csvfiles[0]
+            config_file = os.path.splitext(csv_file)[0] + '_config.json'


I assume this is temporary but it's worth putting a comment saying so, since this is kind of a nasty assumption

Yeah definitely temporary. I was planning on doing away with this function entirely once the full changes are in.

amas0 and others added 9 commits January 6, 2026 13:28

Add initial basic ConfigInfo model

d3fcc52

Add flattening and parsing logic for config JSON

81d08ab

Update flatten_value_dict to clarify variable naming

b3b20eb

Add PathfinderConfig

a5f8094

Add LaplaceConfig

d49c62c

Add VariationalConfig

503a970

Add GeneratedQuantitiesConfig

08c6d0e

Add first-pass CmdStanPathfinder rewrite

8f62f2e

WardBrian reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use cmdstan config and decouple stanfit objects from RunSet#851

Use cmdstan config and decouple stanfit objects from RunSet#851
amas0 wants to merge 9 commits intostan-dev:developfrom
amas0:use-save-cmdstan-config

amas0 commented Apr 12, 2026 •

edited

Loading

Uh oh!

WardBrian left a comment

Uh oh!

WardBrian Apr 13, 2026

Uh oh!

amas0 Apr 13, 2026

Uh oh!

WardBrian Apr 13, 2026

Uh oh!

WardBrian Apr 13, 2026

Uh oh!

amas0 Apr 13, 2026

Uh oh!

WardBrian Apr 13, 2026

Uh oh!

amas0 Apr 13, 2026

Uh oh!

WardBrian Apr 13, 2026

Uh oh!

WardBrian Apr 13, 2026

Uh oh!

amas0 Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

amas0 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WardBrian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amas0 commented Apr 12, 2026 •

edited

Loading