-
Notifications
You must be signed in to change notification settings - Fork 20
Improve resampler structure and performance #1242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests should try to be as close as possible as real code. Signed-off-by: Leandro Lucarella <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reorganizes the resampling
module into smaller submodules, switches internal sample storage from Sample
objects to (timestamp, value)
tuples for performance, and updates the default resampling function to use statistics.fmean
. It also exposes new types, converts ResamplingFunction
to a Protocol
, and adjusts imports, tests, docs, and release notes to match the refactoring.
- Split
_resampling
into_base_types
,_config
,_exceptions
, and_resampler
submodules - Changed internal buffer to store
(datetime, float)
tuples and updated tests accordingly - Replaced custom
average
withstatistics.fmean
, exposedResamplingFunction
andSourceProperties
, and updated documentation
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tests/timeseries/test_resampling.py | Updated imports, added astuple helper, adapted tests to tuple samples |
tests/timeseries/test_periodic_feature_extractor.py | Changed UNIX_EPOCH import to frequenz.core.datetime |
tests/timeseries/test_moving_window.py | Adjusted imports for UNIX_EPOCH , MovingWindow , ResamplerConfig |
tests/microgrid/test_datapipeline.py | Updated ResamplerConfig import path |
src/frequenz/sdk/timeseries/_ringbuffer/buffer.py | Removed base_types UNIX_EPOCH , added core import |
src/frequenz/sdk/timeseries/_resampling/_resampler.py | Buffer now holds tuples; methods updated |
src/frequenz/sdk/timeseries/_resampling/_exceptions.py | New module for resampling errors |
src/frequenz/sdk/timeseries/_resampling/_config.py | Config moved to its own submodule, uses Protocol |
src/frequenz/sdk/timeseries/_resampling/_base_types.py | Defined Sink , Source , SourceProperties types |
src/frequenz/sdk/timeseries/_resampling/init.py | Added package init |
src/frequenz/sdk/timeseries/_moving_window.py | Updated imports to new submodule structure |
src/frequenz/sdk/timeseries/_base_types.py | Removed deprecated UNIX_EPOCH constant |
src/frequenz/sdk/timeseries/init.py | Revised exports and doc string |
src/frequenz/sdk/microgrid/_resampling.py | Adjusted imports for resampler components |
src/frequenz/sdk/actor/init.py | Updated ResamplerConfig import path |
mkdocs.yml | Bumped external doc URLs to latest versions |
benchmarks/timeseries/resampling.py | Benchmarks updated for tuple-based samples |
RELEASE_NOTES.md | Added notes on value type change and UNIX_EPOCH removal |
Future changes require adding more code to this module, which is already quite big. This commit makes it a package and split the module into a few sub-modules to make it more manageable. Signed-off-by: Leandro Lucarella <[email protected]>
`ResamplingFunction` is used by `ResamplerConfig`, which is public, and `SourceProperties` is used by `ResamplingFunction`, so both should be public too. Signed-off-by: Leandro Lucarella <[email protected]>
Using a type alias forces us to use strings to specify types because Python can't use forward declarations for statements. Also using a protocol allows for more control, as even when we don't use it for now, it will also check function argument names if `/` is not used. Signed-off-by: Leandro Lucarella <[email protected]>
ca7af9d
to
a1aff5c
Compare
Also use `disable-next` instead of disabling blocks, as it is less error prone. Signed-off-by: Leandro Lucarella <[email protected]>
`Sample` can have `None` as value, but the resampler buffer (and function) never get any `None` value stored/passed. The reason is saving a sample with no value provides no extra information and would make the resampling function implementation more complicated, as it would need to check values for `None`. Currently, the resampling function will never receive a `None` value, but it still needs to check for them for type-checking reasons. This commit uses a `tuple[datetime, Quantity]` to store samples in the buffer and pass it to the resampling function. This way it is trivially clear that values can't be `None` in this context. Signed-off-by: Leandro Lucarella <[email protected]>
a1aff5c
to
be4311e
Compare
benchmarks/timeseries/resampling.py
Outdated
from frequenz.sdk.timeseries._resampling._base_types import SourceProperties | ||
from frequenz.sdk.timeseries._resampling._resampler import _ResamplingHelper | ||
|
||
|
||
def nop( # pylint: disable=unused-argument | ||
samples: Sequence[Sample[Quantity]], | ||
samples: Sequence[tuple[datetime, Quantity]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a NoNoneSample
type or so might be a nice abstraction here. I can imagine there are actually many places that would enjoy not having to check for None
after they initially received the data. I can certainly use it in FCR as well.
For now its fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in any case here we want to get rid of even the Quantity
, we shouldn't really use Sample
either, at least until we can use Sample[float]
.
If we go with the None
/not None
sample, maybe it can be OptionalSample
(None
values allowed) and Sample
(can't have a None
value) to follow Python's old Optional
as an alias of | None
. Another option, which I'm not sure if it is doable or not, is Sample[Power]
vs Sample[Power | None]
. I have the feeling we tried to make that work and failed, but maybe I'm just remembering wrongly.
tests/timeseries/test_resampling.py
Outdated
@@ -55,6 +54,12 @@ async def source_chan() -> AsyncIterator[Broadcast[Sample[Quantity]]]: | |||
await chan.close() | |||
|
|||
|
|||
def astuple(sample: Sample[Quantity]) -> tuple[datetime, float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
astuple
splitted into the wrong commit I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears to be an unused implementation. you're using it from dataclasses
somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, damn, I will have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, the commits are correct. I first used the "standard" dataclasses.astuple
and then changed the implementation to convert to float
too. At first I used a different name for the function, but then thought re-using astuple
could make the diff smaller and, at the end, we want to convert to a tuple too, but maybe it is just too confusing, so I can rename the custom astuple
to something less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. astuple
is added one commit later than the one that uses it, but not very important imo.
tests/timeseries/test_resampling.py
Outdated
@@ -55,6 +54,12 @@ async def source_chan() -> AsyncIterator[Broadcast[Sample[Quantity]]]: | |||
await chan.close() | |||
|
|||
|
|||
def astuple(sample: Sample[Quantity]) -> tuple[datetime, float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears to be an unused implementation. you're using it from dataclasses
somewhere else.
@@ -134,7 +114,9 @@ class ResamplerConfig: | |||
passed to the resampling function. | |||
""" | |||
|
|||
resampling_function: ResamplingFunction = average | |||
resampling_function: ResamplingFunction = lambda samples, _, __: statistics.fmean( | |||
[s[1] for s in samples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the docs this can be an iterable: https://docs.python.org/3/library/statistics.html#statistics.fmean
[s[1] for s in samples] | |
s[1] for s in samples |
Updated with fixup commits. |
When resampling we know we are working always with the same units, so using a `Quantity` doesn't add any advantages to plain `float`s. Instead, it makes the resampler slower, as it needs to do an extra lookup each time a buffer value is needed and potentially avoids the need to create `Quantity`s from the inputs. The performance can be improved by about 25% (for resamples=1000 samples=1000, before 3.7s, now 2.8s) according to the resampling benchmark. Signed-off-by: Leandro Lucarella <[email protected]>
The `average` function is just a mean function and Python already offers a function to calculate a mean, so we use that instead. We also wrap `fmean` to match the `ResamplingFunction` signature inline so it is clear in the documentation what the default resampling function does. Signed-off-by: Leandro Lucarella <[email protected]>
This reduces the benchmark time by about 4x. Signed-off-by: Leandro Lucarella <[email protected]>
Also add the missing docs cross-reference for the core library. Signed-off-by: Leandro Lucarella <[email protected]>
We know the latest version will still have all the features we need, so there is no need to target a particular minor. Signed-off-by: Leandro Lucarella <[email protected]>
Signed-off-by: Leandro Lucarella <[email protected]>
Signed-off-by: Leandro Lucarella <[email protected]>
5dc64e8
to
da45456
Compare
@@ -134,7 +114,9 @@ class ResamplerConfig: | |||
passed to the resampling function. | |||
""" | |||
|
|||
resampling_function: ResamplingFunction = average | |||
resampling_function: ResamplingFunction = lambda samples, _, __: statistics.fmean( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure there are no NaNs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no NaN
s, no None
s, no . That's filtered out before they are fed to the resampling function.inf
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, inf
is still allowed:
frequenz-sdk-python/src/frequenz/sdk/timeseries/_resampling/_resampler.py
Lines 512 to 514 in c1bc917
async for sample in self._source: | |
if sample.value is not None and not sample.value.isnan(): | |
self._helper.add_sample((sample.timestamp, sample.value.base_value)) |
This pull request splits the
resampling
module into many submodules as it was getting very largeIf also uses tuples internally to store and calculate samples, which improves performance and simplifies the logic, as
Sample
s can haveNone
values, but the values in the internal buffer can never beNone
, and we don't need to use aQuantity
as we know we are always working with the same units.It also exposes the
ResamplingFunction
andSourceProperties
types, convertsResamplingFunction
to aProtocol
, and replacesaverage
as the default resampling function with Python'sstatistics.fmean
, among other small improvements and fixes.