Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an easier way to pre-process incoming values for specific fields? #797

Open
amogus07 opened this issue Jan 8, 2025 · 3 comments

Comments

@amogus07
Copy link

amogus07 commented Jan 8, 2025

Question

I want to convert an incoming comma-separated string to a set of enum members.
This is my current solution:

# to avoid extra whitespace
SPLIT_PATTERN: re.Pattern[str] = re.compile(r"\s*,\s*")


class AlbumArtist(TaggedBase, kw_only=True):
    _categories: str = msgspec.field(name="categories")
    _effective_roles: str = msgspec.field(name="effectiveRoles")
    is_support: bool
    name: str
    roles: str
    artist: Optional[Artist] = None
    categories: set[ArtistCategories] = msgspec.field(
        default_factory=set, name="dummy1"
    )
    effective_roles: set[ArtistRoles] = msgspec.field(
        default_factory=set, name="dummy2"
    )

    def __post_init__(self) -> None:
        self.categories = {
            ArtistCategories(c) for c in SPLIT_PATTERN.split(self._categories) if c
        }
        self.effective_roles = {
            ArtistRoles(r) for r in SPLIT_PATTERN.split(self._effective_roles) if r
        }

it's a snippet from https://github.com/prTopi/beets-vocadb/blob/2a2b3cca83449b26717ffff2a7bb085b26381d26/beetsplug/vocadb/requests_handler/models.py

Is there a more efficient way that doesn't involve additional attributes?

@amogus07
Copy link
Author

ok, came up with this in the meantime:

E = TypeVar("E", bound=StrEnum)


class AlbumArtist(TaggedBase, dict=True, kw_only=True):
    _categories: str = msgspec.field(name="categories")
    _effective_roles: str = msgspec.field(name="effectiveRoles")
    is_support: bool
    name: str
    roles: str
    artist: Optional[Artist] = None

    _SPLIT_PATTERN: ClassVar[re.Pattern[str]] = re.compile(r"\s*,\s*")

    @cached_property
    def categories(self) -> set[ArtistCategories]:
        return self._parse_enum_set(self._categories, ArtistCategories)

    @cached_property
    def effective_roles(self) -> set[ArtistRoles]:
        return self._parse_enum_set(self._effective_roles, ArtistRoles)

    @classmethod
    def _parse_enum_set(cls, value: str, enum_class: type[E]) -> set[E]:
        """Helper method to parse comma-separated string into set of enum values"""
        return {
            enum_class(item) for item in cls._SPLIT_PATTERN.split(value) if item
        }

Now, I have another problem: I use httpx in my project for api requests, and somehow need to pass specific keys and values to the params parameter of httpx.get. Currently, each params Struct has its own asdict property that puts all its attributes into a dict suitable for httpx, doing basically the opposite of the above: https://github.com/prTopi/beets-vocadb/blob/3e9033c1354a1d003498c6100d39a59a105018ac/beetsplug/vocadb/requests_handler/__init__.py
But that doesn't seem like a good solution to me. Does anyone know a better way of doing this?

@uwinx
Copy link

uwinx commented Jan 17, 2025

@amogus07, yes there are some ways, but I'm sure you won't escape implementing custom logic in any of them.
You can explore: https://jcristharif.com/msgspec/extending.html

here's a way to achieve your goal different to your solution:

from typing import Any
import msgspec

# data object

class CsvSet:
    def __init__(self, raw_value: str):
        self._values = set(raw_value.split(','))
    def __eq__[T: set](self, other: T) -> bool:
        return self._values == other
    def __str__(self):
        return ','.join(self._values)

class MyStruct(msgspec.Struct):
    param: CsvSet

# custom hooks

def enc_hook(obj: Any) -> Any:
    if isinstance(obj, CsvSet):
        return str(obj)

def dec_hook(tp: type, obj: Any) -> Any:
    if issubclass(tp, CsvSet):
        return CsvSet(obj)

# tests

data = msgspec.convert({"param": "foo,bar"}, MyStruct, dec_hook=dec_hook)
assert data.param == {'foo', 'bar'}

serialized = msgspec.to_builtins(data, enc_hook=enc_hook)
assert serialized['param'] in ('foo,bar', 'bar,foo')
  • New type with your custom logic for handling CSV
  • Custom encoding and decoding hooks

Note that you can't (easily) inherit from set, see: Supported Typed

p.s.: adding generics support should not be too hard. at this moment, you can bring GenericAlias into your field and throw your validation errors from dec_hook.

@amogus07
Copy link
Author

I should've probably mentioned that Python 3.9 needs to be supported, which doesn't support function type parameter syntax. Also, is it possible to get this to work without using Any?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants