Skip to content

Conversation

@Dekermanjian
Copy link
Contributor

This is a draft proposal for #598

The idea is to handle each component separately using _set_{component} methods and all information are stored using data classes for easy mapping.

I believe this will simplify our tests of these components and will reduce redundancies where we have the same information spread across multiple sub-components like data_names and data_info.

@jessegrabowski let me know what you think I put a little notebook together to showcase the changes.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Dekermanjian Dekermanjian marked this pull request as draft November 2, 2025 17:50
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great first pass, much cleaner than what we have now.

@jessegrabowski
Copy link
Member

We can also keep all of the existing properties like state_names, shock_names, state_dims, etc, but move them to the base class and just extract the requested info from the relevant Info objects.

@jessegrabowski jessegrabowski changed the title proposal for updating propogate_component_properties using data classes Represent statespace metadata with dataclasses Nov 7, 2025
@jessegrabowski
Copy link
Member

Reflecting on it, I am convinced this is the way to go. It's 1000x more ergonomic. I made some changes to your initial code to make the API more "dictionary like", and to reduce code duplication. I moved everything to statespace/core/properties.py, because this is ultimately going to replace what we have in both the core models and the components.

@Dekermanjian
Copy link
Contributor Author

@jessegrabowski, this is looking really cool! What can I do to help push this forward?

@jessegrabowski
Copy link
Member

jessegrabowski commented Nov 7, 2025

Delete the new regression_dataclass.py and simply refactor regression.py to use the new stuff.

We should keep your notebook with the plan to add it as a new example for the docs. Or it can be merged into the custom statespace notebook. So that should also be updated to import from the new properties.py file

@Dekermanjian
Copy link
Contributor Author

Perfect! I'll work on that today!! It is really looking cool!

@Dekermanjian
Copy link
Contributor Author

@jessegrabowski, I agree with all of your comments above. I am going to start making those changes.

…uplicate with warning

2. removed unnecessary imports from __init__ after deleting regression_dataclass
3. updated components and structural classes to only utilize dataclasses and pull other objects from <foo>_info dataclasses
4. updated tests to conform to dataclass api
2. created tests for add and merge methods
3. added utility to convert from snake to pascal and integrated it in error messaging
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete review, I'll continue tomorrow AM

Comment on lines +47 to +48
# if key in index:
# raise ValueError(f"Duplicate {self.key_field} '{key}' detected.") # This needs to be possible for shared states
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't happen here though, it should come up in merge or add right? And we handle it there with the allow_duplicates flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what happens is because our data classes are immutable the __post_init__ runs right after our merge/add because we always return new objects of the same dataclass and it see that there are duplicate keys even though the merge/add method had allowed them via allow_duplicates.

raise AttributeError(f"Items missing attribute '{self.key_field}': {missing_attr}")
object.__setattr__(self, "_index", index)

def _key(self, item: T) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

Comment on lines +201 to +203
@property
def observed_states(self) -> tuple[State, ...]: # Is this needed??
return tuple(s for s in self.items if s.observed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to keep it (as an alias for observed_state_names), then pick one to be the "canonical" name and just return that one from the other ones, rather than re-writing the loop in several places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning more towards just using observed_state_names instead. If you are okay with that I am going to remove observed_states

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for me

def unobserved_state_names(self) -> tuple[State, ...]:
return tuple(s.name for s in self.items if not s.observed)

def merge(self, other: StateInfo, allow_duplicates: bool = False) -> StateInfo:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't the base class version work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here what is happening for shared states _get_combined_shapes counts the unique names between components. It needs to combine the state names without raising but it needs the unique names and not duplicates. It is subtle but if you look closely the StateInfo merge is an override that allows unique merge even when allow_duplicates is False compared to the base class that will raise an error. Obviously, this may not be the best way to handle this and maybe we need to look at _get_combined_shapes and see if it makes more sense to alter how that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adjust the base class to handle the behavior you actually want?

)

self.param_info = ParameterInfo(parameters=[beta_parameter, sigma_parameter])
self.param_names = self.param_info.names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.param_names = self.param_info.names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad on the indentation. Will fix in next commit.

self.param_names = self.param_info.names
else:
self.param_info = ParameterInfo(parameters=[beta_parameter])
self.param_names = self.param_info.names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.param_names = self.param_info.names
self.param_names = self.param_info.names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad on the indentation. Will fix in next commit.

self.param_info = ParameterInfo(parameters=[beta_parameter])
self.param_names = self.param_info.names

def _set_data(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the name _set_data is too close to pm.set_data, which changes the actual model data. _set_data_info here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I will change it to _set_data_info

Comment on lines +191 to +192
# TODO: discuss if copying is still needed since these are now immutable
self._coord_info = coords_info.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it shouldn't be necessary

Comment on lines +259 to +260
if is_dataclass(param_info):
param_info = param_info.add(initial_state_cov_param)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If param_info isn't a dataclass at this point what else can it be? None? Can we just check for that instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct it's either going to be a dataclass or None. I will update the logic.

self.state_names = list(state_names) if state_names is not None else []
self.observed_state_names = (
list(observed_state_names) if observed_state_names is not None else []
self.param_info = ParameterInfo(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the component signature to just take the Info objects directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be nice. Right now, there is this intermediate conversion step that needs to take place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's leave it for the next PR then and make the priority here just getting each model to be represented in the new way

Comment on lines +566 to +576
self.state_names = self.state_info.unobserved_state_names
self.observed_state_names = self.state_info.observed_state_names
self.param_names = self.param_info.names
self.data_names = [d.name for d in self.data_info if not d.is_exogenous]
self.exog_names = self.data_info.exogenous_names
self.shock_names = self.shock_info.names

self.param_info = {}
self.data_info = {}
self.coords = self.coord_info.to_dict()
self.param_dims = [p.dims for p in self.param_info]

self.param_counts = {}
self.needs_exog_data = self.data_info.needs_exogenous_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all be properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will update the properties to derive from the <foo>_info sources of truth.

Comment on lines +683 to +689
def _set_parameters(self) -> None:
raise NotImplementedError

def _set_shocks(self) -> None:
return ShockInfo(shocks=[Shock(name=f"shock_{n}") for n in range(self.k_posdef or 0)])

def _set_states(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't return None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. I will fix this error.


if not isinstance(self_prop, list | dict):
if not is_dataclass(self_prop):
# TODO: This works right now because we are only passing <foo>_info info names into _combine_property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need _combine_property? Can we just call left.merge(right) since everything is an Info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we still need _combine_property. Everything can be handled as you say with left.merge(right). I think that would also be cleaner and we already handle improper merges with non-identical objects in the base class merge method.

@@ -0,0 +1,5 @@
import re
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a whole new file for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I am sorry about that. I just wasn't sure where I could place it. Should I stick it in /statespace/utils/data_tools or /statespace/utils/coord_tools?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah anywhere is fine. It could also just go in the file where it's used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants