Skip to content

Conversation

llucax
Copy link
Contributor

@llucax llucax commented Jun 26, 2025

This pull request splits the resampling module into many submodules as it was getting very large

If also uses tuples internally to store and calculate samples, which improves performance and simplifies the logic, as Samples can have None values, but the values in the internal buffer can never be None, and we don't need to use a Quantity as we know we are always working with the same units.

It also exposes the ResamplingFunction and SourceProperties types, converts ResamplingFunction to a Protocol, and replaces average as the default resampling function with Python's statistics.fmean, among other small improvements and fixes.

Tests should try to be as close as possible as real code.

Signed-off-by: Leandro Lucarella <[email protected]>
@Copilot Copilot AI review requested due to automatic review settings June 26, 2025 09:55
@llucax llucax requested a review from a team as a code owner June 26, 2025 09:55
@llucax llucax requested review from daniel-zullo-frequenz and removed request for a team June 26, 2025 09:55
@github-actions github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) part:data-pipeline Affects the data pipeline part:actor Affects an actor ot the actors utilities (decorator, etc.) part:microgrid Affects the interactions with the microgrid labels Jun 26, 2025
@llucax
Copy link
Contributor Author

llucax commented Jun 26, 2025

This is an extract of some assorted improvement that were in #999, plus a few new ones. It is a preparation for the final version of #999.

@llucax llucax added this to the v1.0.0-rc2100 milestone Jun 26, 2025
@llucax llucax added the type:enhancement New feature or enhancement visitble to users label Jun 26, 2025
@llucax llucax requested review from shsms and Marenz June 26, 2025 09:56
@llucax llucax moved this from To do to Review in progress in Python SDK Roadmap Jun 26, 2025
@llucax llucax self-assigned this Jun 26, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reorganizes the resampling module into smaller submodules, switches internal sample storage from Sample objects to (timestamp, value) tuples for performance, and updates the default resampling function to use statistics.fmean. It also exposes new types, converts ResamplingFunction to a Protocol, and adjusts imports, tests, docs, and release notes to match the refactoring.

  • Split _resampling into _base_types, _config, _exceptions, and _resampler submodules
  • Changed internal buffer to store (datetime, float) tuples and updated tests accordingly
  • Replaced custom average with statistics.fmean, exposed ResamplingFunction and SourceProperties, and updated documentation

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/timeseries/test_resampling.py Updated imports, added astuple helper, adapted tests to tuple samples
tests/timeseries/test_periodic_feature_extractor.py Changed UNIX_EPOCH import to frequenz.core.datetime
tests/timeseries/test_moving_window.py Adjusted imports for UNIX_EPOCH, MovingWindow, ResamplerConfig
tests/microgrid/test_datapipeline.py Updated ResamplerConfig import path
src/frequenz/sdk/timeseries/_ringbuffer/buffer.py Removed base_types UNIX_EPOCH, added core import
src/frequenz/sdk/timeseries/_resampling/_resampler.py Buffer now holds tuples; methods updated
src/frequenz/sdk/timeseries/_resampling/_exceptions.py New module for resampling errors
src/frequenz/sdk/timeseries/_resampling/_config.py Config moved to its own submodule, uses Protocol
src/frequenz/sdk/timeseries/_resampling/_base_types.py Defined Sink, Source, SourceProperties types
src/frequenz/sdk/timeseries/_resampling/init.py Added package init
src/frequenz/sdk/timeseries/_moving_window.py Updated imports to new submodule structure
src/frequenz/sdk/timeseries/_base_types.py Removed deprecated UNIX_EPOCH constant
src/frequenz/sdk/timeseries/init.py Revised exports and doc string
src/frequenz/sdk/microgrid/_resampling.py Adjusted imports for resampler components
src/frequenz/sdk/actor/init.py Updated ResamplerConfig import path
mkdocs.yml Bumped external doc URLs to latest versions
benchmarks/timeseries/resampling.py Benchmarks updated for tuple-based samples
RELEASE_NOTES.md Added notes on value type change and UNIX_EPOCH removal

llucax added 3 commits June 26, 2025 12:01
Future changes require adding more code to this module, which is already
quite big. This commit makes it a package and split the module into a
few sub-modules to make it more manageable.

Signed-off-by: Leandro Lucarella <[email protected]>
`ResamplingFunction` is used by `ResamplerConfig`, which is public, and
`SourceProperties` is used by `ResamplingFunction`, so both should be
public too.

Signed-off-by: Leandro Lucarella <[email protected]>
Using a type alias forces us to use strings to specify types because
Python can't use forward declarations for statements.

Also using a protocol allows for more control, as even when we don't
use it for now, it will also check function argument names if `/` is not
used.

Signed-off-by: Leandro Lucarella <[email protected]>
@llucax llucax force-pushed the improve-resampler branch from ca7af9d to a1aff5c Compare June 26, 2025 10:01
llucax added 2 commits June 26, 2025 12:38
Also use `disable-next` instead of disabling blocks, as it is less
error prone.

Signed-off-by: Leandro Lucarella <[email protected]>
`Sample` can have `None` as value, but the resampler buffer (and
function) never get any `None` value stored/passed. The reason is saving
a sample with no value provides no extra information and would make the
resampling function implementation more complicated, as it would need
to check values for `None`.

Currently, the resampling function will never receive a `None` value, but
it still needs to check for them for type-checking reasons.

This commit uses a `tuple[datetime, Quantity]` to store samples in the
buffer and pass it to the resampling function. This way it is trivially
clear that values can't be `None` in this context.

Signed-off-by: Leandro Lucarella <[email protected]>
@llucax llucax force-pushed the improve-resampler branch from a1aff5c to be4311e Compare June 26, 2025 10:38
from frequenz.sdk.timeseries._resampling._base_types import SourceProperties
from frequenz.sdk.timeseries._resampling._resampler import _ResamplingHelper


def nop( # pylint: disable=unused-argument
samples: Sequence[Sample[Quantity]],
samples: Sequence[tuple[datetime, Quantity]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a NoNoneSample type or so might be a nice abstraction here. I can imagine there are actually many places that would enjoy not having to check for None after they initially received the data. I can certainly use it in FCR as well.

For now its fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in any case here we want to get rid of even the Quantity, we shouldn't really use Sample either, at least until we can use Sample[float].

If we go with the None/not None sample, maybe it can be OptionalSample (None values allowed) and Sample (can't have a None value) to follow Python's old Optional as an alias of | None. Another option, which I'm not sure if it is doable or not, is Sample[Power] vs Sample[Power | None]. I have the feeling we tried to make that work and failed, but maybe I'm just remembering wrongly.

@@ -55,6 +54,12 @@ async def source_chan() -> AsyncIterator[Broadcast[Sample[Quantity]]]:
await chan.close()


def astuple(sample: Sample[Quantity]) -> tuple[datetime, float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

astuple splitted into the wrong commit I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be an unused implementation. you're using it from dataclasses somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, damn, I will have a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, the commits are correct. I first used the "standard" dataclasses.astuple and then changed the implementation to convert to float too. At first I used a different name for the function, but then thought re-using astuple could make the diff smaller and, at the end, we want to convert to a tuple too, but maybe it is just too confusing, so I can rename the custom astuple to something less confusing.

Marenz
Marenz previously approved these changes Jun 26, 2025
Copy link
Contributor

@Marenz Marenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. astuple is added one commit later than the one that uses it, but not very important imo.

@github-project-automation github-project-automation bot moved this from Review in progress to Review approved in Python SDK Roadmap Jun 26, 2025
@@ -55,6 +54,12 @@ async def source_chan() -> AsyncIterator[Broadcast[Sample[Quantity]]]:
await chan.close()


def astuple(sample: Sample[Quantity]) -> tuple[datetime, float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be an unused implementation. you're using it from dataclasses somewhere else.

@@ -134,7 +114,9 @@ class ResamplerConfig:
passed to the resampling function.
"""

resampling_function: ResamplingFunction = average
resampling_function: ResamplingFunction = lambda samples, _, __: statistics.fmean(
[s[1] for s in samples]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the docs this can be an iterable: https://docs.python.org/3/library/statistics.html#statistics.fmean

Suggested change
[s[1] for s in samples]
s[1] for s in samples

@llucax
Copy link
Contributor Author

llucax commented Jun 26, 2025

Updated with fixup commits.

@llucax
Copy link
Contributor Author

llucax commented Jun 26, 2025

Now that @shsms approved, I will fixup and enable auto-merge, if you want to make any further comments you need to hurry @Marenz :D (you can always disable auto-merge to get some extra time).

@llucax llucax enabled auto-merge June 26, 2025 12:24
llucax added 7 commits June 26, 2025 14:25
When resampling we know we are working always with the same units, so
using a `Quantity` doesn't add any advantages to plain `float`s.
Instead, it makes the resampler slower, as it needs to do an extra
lookup each time a buffer value is needed and potentially avoids the
need to create `Quantity`s from the inputs.

The performance can be improved by about 25% (for resamples=1000
samples=1000, before 3.7s, now 2.8s) according to the resampling
benchmark.

Signed-off-by: Leandro Lucarella <[email protected]>
The `average` function is just a mean function and Python already offers
a function to calculate a mean, so we use that instead.

We also wrap `fmean` to match the `ResamplingFunction` signature inline
so it is clear in the documentation what the default resampling function
does.

Signed-off-by: Leandro Lucarella <[email protected]>
This reduces the benchmark time by about 4x.

Signed-off-by: Leandro Lucarella <[email protected]>
Also add the missing docs cross-reference for the core library.

Signed-off-by: Leandro Lucarella <[email protected]>
We know the latest version will still have all the features we need, so
there is no need to target a particular minor.

Signed-off-by: Leandro Lucarella <[email protected]>
Signed-off-by: Leandro Lucarella <[email protected]>
@llucax llucax force-pushed the improve-resampler branch from 5dc64e8 to da45456 Compare June 26, 2025 12:25
@llucax llucax added this pull request to the merge queue Jun 26, 2025
Merged via the queue into frequenz-floss:v1.x.x with commit c1bc917 Jun 26, 2025
5 checks passed
@llucax llucax deleted the improve-resampler branch June 26, 2025 12:34
@github-project-automation github-project-automation bot moved this from Review approved to Done in Python SDK Roadmap Jun 26, 2025
@llucax llucax modified the milestones: v1.0.0-rc2100, v1.0.0-rc2200 Jun 27, 2025
@@ -134,7 +114,9 @@ class ResamplerConfig:
passed to the resampling function.
"""

resampling_function: ResamplingFunction = average
resampling_function: ResamplingFunction = lambda samples, _, __: statistics.fmean(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure there are no NaNs?

Copy link
Contributor Author

@llucax llucax Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no NaNs, no Nones, no infs. That's filtered out before they are fed to the resampling function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, inf is still allowed:

async for sample in self._source:
if sample.value is not None and not sample.value.isnan():
self._helper.add_sample((sample.timestamp, sample.value.base_value))

@llucax llucax modified the milestones: v1.0.0-rc2200, v1.0.0-rc2101 Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:actor Affects an actor ot the actors utilities (decorator, etc.) part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:microgrid Affects the interactions with the microgrid part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) type:enhancement New feature or enhancement visitble to users
Projects
Development

Successfully merging this pull request may close these issues.

4 participants