Sketch implementation details of SDK data pipeline #34

matthias-wende-frequenz · 2022-10-06T13:43:07Z

matthias-wende-frequenz
Oct 6, 2022
Maintainer

In the following we sketch an implementation of the future SDK data pipeline.
The design is originated from a users perspective, who might ask the question:
"I need to measure power on a certain point of my microgrid".

For now we only sketch the details for Meters. All other Component categories
may have analog though slightly different implementations respecting the hardware specifications
of the individual category.

class MicrogridClient:
    """
    ... this is the usual mg client, with the following new methods
    """

    ...

    def get_meter(meter_id: int, sample_frequency: float).power -> Receiver[Power]:
        """
        This method looks up if a stream of with the parameters (compenent_id, sample_frequency), is existing in
        a dictionary and if not creates a new channel and stores it to the clients internal meter dictionary.

        Usually raw data streams have timestamps that are not aligned to any equidistant time-grid.
        Thus the incoming datastream will be resampled to `sample_frequency` in order
        to simplify all following data stream operations. These are addition two datastreams
        and scalar multiplication.
        """
        ...

    # the following methods are supposed to be work analog to the previous one
    def get_meter(...).current() ...
    def get_meter(...).voltage() ...
    def get_meter(...).frequency() ...
}

@dataclass
class LogicalMeterFormulas:
    """
    Combines all formulas needed to calculate metrics from several physical meters combined
    into a virtual meter
    """
    current_formula: String
    voltage_formula: String
    frequency_formula: String

class Meter(ABC):
    @property
    @abstractmethod
    def current(self) -> Receiver[Current]:
        pass

    @property
    @abstractmethod
    def voltage(self) -> Receiver[Voltage]:
        pass

    @property
    @abstractmethod
    def power(self) -> Receiver[Power]:
        pass

class LogicalMeter(Meter): {
    """
    The Logical Meter Class represents a Meter in a microgrid
    that may or may not have a physical existence.

    To calculate an output data stream, a formula has to be provided.
    # TODO how do we define a formula? We want to be able to get the data sources purely from the formula!

    If the meter is physically represented, this meter is used as the single data source
    otherwise it will be calculated according by an user provided formula.

    For most cases it's encouraged to use the per-defined formulas by using the get_grid_formula()
    or get_load_formula() functions. These functions will in most cases auto-determine the working
    formula by utilising the component graph.

    It is recommended to use this meter if the user is interested in the grid consumption or location load,
    without having prior information if the physical existence. This makes it possible to write reusable code
    that can run across different locations, with different hardware setups.

    It may also be used to calculate the total sum of meters that are connected to individual battery Racks
    to measure the total power of the full battery system.

    Usage Example:
    ```
        # example formula string: "(c_id1 + c_id2)"
        log_meter = LogicalMeter(micro_grid_client, get_grid_formulas())
        lm_power_rx = log_meter.get_power();
    ```
    """

    def __init__(self,
        client: MicroGridClient,
        formulas: MeterFormulas
    ):
    """
    Initializes a logical meter.

    If no meter, inverter or ev_charger instance is provided a new one will be created,
    which can be returned by the `get_meters()`, `get_inverters()` or `get_ev_charger()` methods.

    Note: No data stream is set up after calling this method.
    """

    def get_power(self) -> Receiver<Power>:
    """
    This method resamples the input power data streams from all meters
    and combines the data streams according to operations defined in the user provided formula
    """

        * create a new (tx, rx) pair
        * parse the formula string into a formula tree
        * get all (resampled) data streams from the micro grid api
          that are represented as tokens in the power formula string,
          i.e. call `client.get_meter_power(component_id, self.sample_frequency)`
          for all component ids
        * create a task that
            * combines the channels according to the power_formula
              and sends the result in the channels Sender

        return rx
}

tiyash-basu-frequenz · 2022-10-06T14:08:27Z

tiyash-basu-frequenz
Oct 6, 2022
Maintainer

Doesn't look like we are using interfaces like we discussed earlier. Also, having one large class called MicrogridClient looks a bit unnecessary, because that indicates coupling between different component categories like inverters and meters.

I'd suggest creating interfaces (like abstract classes). E.g., there could be a meter interface:

class Meter(ABC):
    @property
    @abstractmethod
    def current(self):
        pass

    @property
    @abstractmethod
    def voltage(self):
        pass

    @property
    @abstractmethod
    def power(self):
        pass

Then we can implement it for different units. E.g., we could have a resampler for a meter with component ID 4, and get power from it using this interface:

class Resampler(Meter):
    def __init__(component_id: int)
        ...

    @property
    @abstractmethod
    def current(self):
        pass

    @property
    @abstractmethod
    def voltage(self):
        pass

    @property
    @abstractmethod
    def power(self):
        pass

We can do something similar with the LogicalMeter definition as well.

This way, the interface stays consistent, lazy, and its implemention varies from one class to another, depending upon the functionality.

0 replies

matthias-wende-frequenz · 2022-10-06T18:30:28Z

matthias-wende-frequenz
Oct 6, 2022
Maintainer Author

That was my first roughly my design approach, but it didn't work out very nicely.

A meters data stream from the api is actually a stream of a product data type, where each entry represents a certain metric.

From experience that most of the time a user is interested in power and thus we don't want to resample all metrics in the stream but only the requested. If a user requests a new metric we don't want to create a new stream from the api but reuse the existing one. Hence we need to store an object somewhere that holds the context of existing data streams from components.

Now coming back to the interface. If we'd do it like you suggested, in order to avoid duplicated streams from the api, we would need another class that holds meters.
Instead of creating meters "on the fly" we'd have to use such a builder class, that holds the meta context of existing streams and can re-use those.
Furthermore that must be a singleton object and also we'd need to copy that design for each category, i.e. inverters, ev_chargers.

The microgrid client is by design a singleton object that is supposed to take care that streams to the api are re-used.
Therefore, instead of creating another singleton object for meters, inverters, ev_chargers, we decided to add this as methods to the micro_grid_client.

Also from a user perspective we found that easier, because a user requests a channel for a certain metric from the microgrid client and gets back a channel that delivers what was asked. That keeps the sdk simpler to understand because users don't have to learn about several but only one object, where one of the main objectives of that microgrid client is to deliver data streams.

Nevertheless I haven't ditched the idea of a Meter interface (although it's not part of the current sketch) but actually just left out of the sketch. I expect that in most cases a user constructs a virtual meter.
I virtual meter is by design a meter that delivers real-time data. What has been left out in this sketch (and was meant to be done as a homework task by the reader ;)) is the part how we plan to handle historical (or even fake) data.

class LocalStorageMeter(Meter):
    """
    A Meter that reads data from local storage (e.g. from parquet or csv files)
    resamples the data and sends it into a channel.
    """
    ...

And indeed we have to repeat the same exercise for the ReportingApiMeter (or whatever we want call that one).

I've added the abstract Meter class.

As a side note. In case there is an urge to use the Meter interface for physical meters we can simply create a simple wrapper class that returns the channels from the microgrid client.

0 replies

tiyash-basu-frequenz · 2022-10-07T09:00:45Z

tiyash-basu-frequenz
Oct 7, 2022
Maintainer

Now coming back to the interface. If we'd do it like you suggested, in order to avoid duplicated streams from the api, we would need another class that holds meters.
Instead of creating meters "on the fly" we'd have to use such a builder class, that holds the meta context of existing streams and can re-use those.
Furthermore that must be a singleton object and also we'd need to copy that design for each category, i.e. inverters, ev_chargers.

As we discussed in our call, using interfaces in independent of the underlying implementation. Regardless of whether we use interfaces or not, we will need singleton objects that memoize data channels. Using interfaces corresponding to component categories on top of this will provide additional structure, and clear expectations from entities, e.g., if something implements the Meter interface, it will always provide the power() method.

0 replies

leandro-lucarella-frequenz · 2022-10-07T09:11:47Z

leandro-lucarella-frequenz
Oct 7, 2022
Maintainer

Doesn't look like we are using interfaces like we discussed earlier. Also, having one large class called MicrogridData looks a bit unnecessary, because that indicates coupling between different component categories like inverters and meters.

I agree, it feels like MicrogridData all over again. I think going back to the idea of one resampling per metric instead of whole components is good, but I would divide different functionality in different layers to avoid the one-class-to-rule-them-all approach we had with MicrogridData.

We don't need to go for the per-component-abstraction, but we can still have different layers providing different functionality, so we can keep the current MicrogridApiClient as it is, and instead of adding more methods there, we can create a ResamplingMicrogridApiClient that just takes care of the resampling, adding the new optional sample_frequency: float argument.

We could even extend this to have a virtual component_id that is directly a formula, so have a LogicalMicrogridApiClient that takes the same methods but with component_id is a Union[int, Formula]. If a Formula is used instead of an int, then you get the the result of the "logical meter".

What I like about this approach that is a clear pipeline, where we add more operations to the same channel's data.

I think this design is missing a key point though. We are talking about calling methods to get the info. This means data calculation can't be global to the whole system while we start distributing stuff (we are talking about classes, not actors). I guess we could write an actor on top of this and then we'll receive subscriptions to metrics via message-passing, but this adds more complexity (like having to know in advance which the users of this actors are and probably connect channels at construction-time, like we have now in MicrogridData. But I don't know, maybe it is OK to recalculate everything in each process if stuff is lazy and we only calculate what we really need.

In any case my feeling is we now have the requirements well figured out, and we are starting to fight more with different implementation ideas that might work better or worse but it is hard to figure without actually trying to implement it. So my feeling is maybe we should try to implement a minimal but working version of the idea more concretely, and maybe even more than one idea. I know it is time consuming, but this is a very key part of the system, so I think it might be worth it.

0 replies

tiyash-basu-frequenz · 2022-10-07T10:10:56Z

tiyash-basu-frequenz
Oct 7, 2022
Maintainer

We are talking about calling methods to get the info. This means data calculation can't be global to the whole system while we start distributing stuff (we are talking about classes, not actors). I guess we could write an actor on top of this and then we'll receive subscriptions to metrics via message-passing, but this adds more complexity (like having to know in advance which the users of this actors are and probably connect channels at construction-time, like we have now in MicrogridData

We are already discussing about methods that return data streams. E.g., the Meter.power() method will return a stream of power values from a given meter. The underlying class (implementing the Meter interface) could very well act as a pub-sub broker that takes care of memoizing calculations.

So my feeling is maybe we should try to implement a minimal but working version of the idea more concretely, and maybe even more than one idea. I know it is time consuming, but this is a very key part of the system, so I think it might be worth it.

Agreed. Design funneling is a good idea in this case.

0 replies

matthias-wende-frequenz · 2022-10-07T10:11:00Z

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

I changed the design such that the microgrid client returns a instance of "type" Meter for a given component_id and sample_frequency since it helps separating concerns between microgrid_client and physical meters. From a user perspective it doesn't change much since now the user has to call mgc.get_meter(id, fz).power instead of mgc.get_meter_power(id, fz).

0 replies

matthias-wende-frequenz · 2022-10-07T10:24:39Z

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

I think this design is missing a key point though. We are talking about calling methods to get the info. This means data calculation can't be global to the whole system while we start distributing stuff (we are talking about classes, not actors). I guess we could write an actor on top of this and then we'll receive subscriptions to metrics via message-passing, but this adds more complexity (like having to know in advance which the users of this actors are and probably connect channels at construction-time, like we have now in MicrogridData. But I don't know, maybe it is OK to recalculate everything in each process if stuff is lazy and we only calculate what we really need.

As tiyash pointed out we are returning channels. One of the motivations for implementing python channels was distribution, because we can implement channels that communicate over say tcp.

I'd also like to highlight that we wanted to avoid having multiple singleton objects in the code and thus we decided that the microgrid client is the single object that holds all states.

This is of course a design trade-off but since we decided on the high-level to make re-sampling very prominent it makes a lot of sense that by default data streams from components that are returned by the most prominent object are already resampled. (There will be a method to get the raw data stream from the api too).

In a distributed world, each process needs to create it's own instance of a microgrid client too and we don't run in any problems. If computations should be re-used channels can be used as pointed out above.

0 replies

matthias-wende-frequenz · 2022-10-07T10:28:18Z

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

Let me also add, that this design doesn't include how things are going to be implemented in the microgrid client. I intent to work that out soon but it surely is intended to be as modular as possible. So we to give an example, we certainly don't want to end up with resampling code that is tightly coupled into the mg client!

0 replies

leandro-lucarella-frequenz · 2022-10-07T13:59:58Z

leandro-lucarella-frequenz
Oct 7, 2022
Maintainer

As tiyash pointed out we are returning channels. One of the motivations for implementing python channels was distribution, because we can implement channels that communicate over say tcp.

Yeah, but the mapping of channels will only be owned by one process. So if you do MicrogridApi.get_meter() from one process you'll get a channel, but if you do it from another process, you'll get a different channel, i.e. you'll do you calculation once per process. This is what I meant and maybe it is OK, but I think the idea of having a "resampling actor" was also to have calculations only once in the whole system. I'm fine with continuing with this approach and if performance ends being a problem, we can build an actor on top later, but it might get nasty.

I'd also like to highlight that we wanted to avoid having multiple singleton objects in the code and thus we decided that the microgrid client is the single object that holds all states.

This is of course a design trade-off but since we decided on the high-level to make re-sampling very prominent it makes a lot of sense that by default data streams from components that are returned by the most prominent object are already resampled. (There will be a method to get the raw data stream from the api too).

I'm not sure if we are using the same nomenclature here because the current class names are a bit confusing at the moment. There is already one only singleton object (MicrogridApi, which is not MicrogridApiClient, but holds a MicrogridApiClient and a ComponentGraph). Not sure how this proposal of having one only singleton object relates to that, specially when you are talking about trade-offs...

Specially since in your proposal code you seem to be using arbitrary names, it is not clear if they are new classes or if they correspond to existing code. It is a bit hard to have a clear picture of the proposal without something more concrete, this is why I think we should either have another meeting to explain it a bit further, or go for a MVP with real code in the real SDK. For me all is still too much in the air...

In a distributed world, each process needs to create it's own instance of a microgrid client too and we don't run in any problems. If computations should be re-used channels can be used as pointed out above.

OK, so you are fine with the resampling the same data multiple times (in different processes then). I'm fine with that but it diverges from the design in Tom's slides.

0 replies

leandro-lucarella-frequenz · 2022-10-07T14:02:00Z

leandro-lucarella-frequenz
Oct 7, 2022
Maintainer

https://lkml.org/lkml/2000/8/25/132 😆

0 replies

jakub-toptal · 2022-10-07T14:40:12Z

jakub-toptal
Oct 7, 2022

So do we want only one resampling_frequency for resampling raw data or multiple ones per component-basis (e.g. Battery with id=1) or per component+attribute basis (e.g. Power from Battery with id=5)? (this refers to Step 2 from this diagram)

This method looks up if a stream of with the parameters (compenent_id, sample_frequency), is existing in a dictionary and if not creates a new channel and stores it to the clients internal meter dictionary.

This makes some sense, but I cannot see a direct benefit right now. At the moment, we just create these channels in a function that lives outside of the MicrogridClient:
https://github.com/frequenz-floss/frequenz-sdk-python/blob/v0.x.x/src/frequenz/sdk/data_ingestion/gen_component_receivers.py#L247

and return a list of Receivers to the caller:
https://github.com/frequenz-floss/frequenz-sdk-python/blob/v0.x.x/src/frequenz/sdk/data_ingestion/gen_component_receivers.py#L199

that was fine, because we always had a single sink for the component data, e.g.:
MicrogridData -> Receivers -> FormulaCalculator,
but if we had more sinks then, yeah instead of creating separate channels, these could be reused:

                           -> FormulaCalculator1
MicrogridData -> Receivers -> FormulaCalculator2
                           -> FormulaCalculator3

The proposed interface for retrieving data from meters:
def get_meter(meter_id: int, sample_frequency: float).power -> Receiver[Power]:

What's the benefit of doing it per single metric like Power instead of the entire data point from a single meter (MeterData)? As far as I know, the protobuf message already contains everything so we could just wrap that into a more user-friendly dataclass and pass it further and then it will be up to the caller to decide if they want to use MeterData.active_power or MeterData.current_per_phase.

In regards to the resampling, I wouldn't add the sample_frequency here, because we would lose access to raw data. I would instead pass the raw meter data downstream, where a Resampler with a desired resampling_frequency could be plugged in. (or a collection of Resamplers if there was a need to resample with different frequencies, for some reason)

So it would basically use what we already have in the MicrogridClient:
https://github.com/frequenz-floss/frequenz-sdk-python/blob/v0.x.x/src/frequenz/sdk/microgrid/client.py#L70-L86
plus resampling would have to be applied to it.

If we stick to the MeterData, BatteryData, ... we will have N methods in the MicrogridClient, where N is the number of component categories (actually, even less, because we aren't interested in data from component of certain categories).

If we split these per attribute, we will end up with like 20 methods, for every category & attribute combination.

Formulas and logical meters.

# example formula string: "(c_id1 + c_id2)"
* get all (resampled) data streams from the micro grid api
that are represented as tokens in the power formula string,
i.e. call client.get_meter_power(component_id, self.sample_frequency)
for all component ids

I thought we were supposed to stop using sympy so I guess we will not use string symbols anymore?

I am still confused about how these 3 are supposed to work:

LogicalMetersFormulas,
Meter(ABC),
LogicalMeter(Meter)

Also, this method in the LogicalMeter looks like you are shifting a lot of logic into that class:

    def get_power(self) -> Receiver<Power>:
    """
    This method resamples the input power data streams from all meters
    and combines the data streams according to operations defined in the user provided formula
    """

        * create a new (tx, rx) pair
        * parse the formula string into a formula tree
        * get all (resampled) data streams from the micro grid api
          that are represented as tokens in the power formula string,
          i.e. call `client.get_meter_power(component_id, self.sample_frequency)`
          for all component ids
        * create a task that
            * combines the channels according to the power_formula
              and sends the result in the channels Sender

        return rx

0 replies

matthias-wende-frequenz · 2022-10-07T16:00:36Z

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

Yeah, but the mapping of channels will only be owned by one process. So if you do MicrogridApi.get_meter() from one process you'll get a channel, but if you do it from another process, you'll get a different channel, i.e. you'll do you calculation once per process. This is what I meant and maybe it is OK, but I think the idea of having a "resampling actor" was also to have calculations only once in the whole system.

One could pipe the output of one meters output channel into a tcp based channel and we are done. Or alternatively the microgrid client returns a receiver that is connected over tcp and then we can pass that one to another process and the problem would be solved too.

0 replies

jakub-toptal · 2022-10-07T16:09:47Z

jakub-toptal
Oct 7, 2022

In the following we sketch an implementation of the future SDK data pipeline.
The design is originated from a users perspective, who might ask the question:
"I need to measure power on a certain point of my microgrid".

Does SDK user really want to measure power of an individual component? Or perhaps did you mean they would want to measure grid_load or client_load, which are based on the power from meters?

Could you please elaborate a bit on what the most typical use case and the pain points for the SDK user currently are?

I thought the most typical SDK user would like to get a stream of some aggregation result, e.g. grid_load so that they can apply some data science / ML logic to it, e.g. for peak shaving purposes.

0 replies

matthias-wende-frequenz · 2022-10-07T19:25:25Z

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

Does SDK user really want to measure power of an individual component? Or perhaps did you mean they would want to measure grid_load or client_load, which are based on the power from meters?

We don't know what a user wants to measure but we certainly want to offer all possibilities.

I thought the most typical SDK user would like to get a stream of some aggregation result, e.g. grid_load so that they can apply some data science / ML logic to it, e.g. for peak shaving purposes.

Yep that is often the case and grid load and since it is not determined if such a meter exists we expect to work with a logical meter in that case.

But note that under the hood in the new design a logical meter would consume the streams from physical meters.

0 replies

matthias-wende-frequenz · 2022-10-07T19:31:39Z

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

So do we want only one resampling_frequency for resampling raw data or multiple ones per component-basis (e.g. Battery with id=1) or per component+attribute basis (e.g. Power from Battery with id=5)? (this refers to Step 2 from this diagram)

In this design the sampling frequency is set per stream, so we slightly differ from the high level diagram, towards more flexibility. The simple reason is that we started to draft the high level overview we didn't fully anticipate how the data streams look in detail.

0 replies

matthias-wende-frequenz · 2022-10-07T21:04:49Z

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

I thought we were supposed to stop using sympy so I guess we will not use string symbols anymore?

I am still confused about how these 3 are supposed to work:
LogicalMetersFormulas,
Meter(ABC),
LogicalMeter(Meter)

Meter is the Interface for all kind of meters, e.g. PhysicalMeter, LogicalMeter, LocalStorageMeter and whatever meter can be thought of.

The LogicalMeter needs formulas for combining data streams from PhysicalMeters and LogicalMeterFormulas is just the struct that holds all formulas necessary for the calculation of the different metrics. Note that each metric may need a different Formula (e.g. voltage and current behave differently and thus have different laws).
The design idea here is, that the LogicalMeter can be composed with a get_grid_formulas() function on construction. That leaves a maximum of flexibility while being simple to use.

Also, this method in the LogicalMeter looks like you are shifting a lot of logic into that class:

Nope. The details are simply not fully worked out. But formula parsing will certainly become a separate module, resampling happens in an instance of PhysicalMeter (a class that isn't defined yet in the above sketch) and that only leaves combining data sources according to the provided formula as the logic that has to implemented.

0 replies

matthias-wende-frequenz · 2022-10-07T21:15:52Z

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

What's the benefit of doing it per single metric like Power instead of the entire data point from a single meter (MeterData)? As far as I know, the protobuf message already contains everything so we could just wrap that into a more user-friendly dataclass and pass it further and then it will be up to the caller to decide if they want to use MeterData.active_power or MeterData.current_per_phase.

The benefit is that we don't need to resample metrics that are unused. In many cases users are not interested in all metrics that come from the client, thus we process them lazy.

In regards to the resampling, I wouldn't add the sample_frequency here, because we would lose access to raw data. I would instead pass the raw meter data downstream, where a Resampler with a desired resampling_frequency could be plugged in. (or a collection of Resamplers if there was a need to resample with different frequencies, for some reason)

Not if we add methods like raw() to the PhysicalMeter instance that the microgrid client returns, which returns the unprocessed stream.

So it would basically use what we already have in the MicrogridClient:
https://github.com/frequenz-floss/frequenz-sdk-python/blob/v0.x.x/src/frequenz/sdk/microgrid/client.py#L70-L86
plus resampling would have to be applied to it.

yes but I'd suggest to move implementation to the PhysicalMeter class. I plan to outline this next week in more detail.

If we split these per attribute, we will end up with like 20 methods, for every category & attribute combination.

Nope, the number of messages wont change since we return classes for each category. But I think we'll gain modularity since I think we can offload some of the implementation into these new classes. Nevertheless the microgrid client is supposed to hold all the state, e.g. the already constructed meter channels.

0 replies

thomas-nicolai-frequenz · 2022-10-09T11:53:20Z

thomas-nicolai-frequenz
Oct 9, 2022
Maintainer

We don't need to go for the per-component-abstraction, but we can still have different layers providing different functionality, so we can keep the current MicrogridApiClient as it is, and instead of adding more methods there, we can create a ResamplingMicrogridApiClient that just takes care of the resampling, adding the new optional sample_frequency: float argument.

I absolutely share what @tiyash-basu-frequenz and @leandro-lucarella-frequenz have been pointing out. The proposal feels like MicrogridData all over again. The reason I proposed a general Resampling actor was to keep scopes clear and easy. Now squeezing everything into the MicrogridClient makes no sense after all because we want the Resampler to work on top of any data source no matter if API or parquet database. Suddenly we are locking things up again. I get that we are trying to be as efficient as possible to only resample data streams that someone is really consuming from but I don't understand where the trouble is. If an actor registers to another actor saying I'm only interested into these component data stream why can't we encapsulate this into a Resampling actor and this actor would only forward (resample) what he's been asked for?

0 replies

thomas-nicolai-frequenz · 2022-10-09T11:58:58Z

thomas-nicolai-frequenz
Oct 9, 2022
Maintainer

This is what I meant and maybe it is OK, but I think the idea of having a "resampling actor" was also to have calculations only once in the whole system.

Yes! That is the idea @leandro-lucarella-frequenz and we should strictly follow that.

In a distributed world, each process needs to create it's own instance of a microgrid client too and we don't run in any problems. If computations should be re-used channels can be used as pointed out above.

mm, what? @matthias-wende-frequenz

0 replies

thomas-nicolai-frequenz · 2022-10-09T12:01:41Z

thomas-nicolai-frequenz
Oct 9, 2022
Maintainer

Does SDK user really want to measure power of an individual component? Or perhaps did you mean they would want to measure grid_load or client_load, which are based on the power from meters? Could you please elaborate a bit on what the most typical use case and the pain points for the SDK user currently are? I thought the most typical SDK user would like to get a stream of some aggregation result, e.g. grid_load so that they can apply some data science / ML logic to it, e.g. for peak shaving purposes.

@jakub-toptal The user should be able to do both. Both cases are needed.

0 replies

thomas-nicolai-frequenz · 2022-10-09T12:14:02Z

thomas-nicolai-frequenz
Oct 9, 2022
Maintainer

Let me be very clear on the requirements:

The user should be able to use the MicrogridApiClient as is, without needed to all anything like _raw or whatever. The intention is clearly that you can develop whatever you want and expect the MicrogridApiClient to work as any normal ApiClient out there.
There should be a dedicated Resampling actor imho that can either consumer for the MicrogridApiClient, the LocalStorageApiClient or the ReportingApiClient. This is a MUST!
The user should have the ability to either receive aggregated & resampled data series, none-aggregated & resampled data series or none-aggregated & none-resampled data streams using the SDK.
We want the Resample to work lazy ;-)

That means:

We do not touch MicrogridApiClient at all.
Every time series can be clearly identified by the component_id + metric_id and thus can act as a single identifier for any data channel and the output into each channel is purely a time series. There is no such thing as meter, inverter or whatever when the data gets pushed into the channel. Its only a time series.

If the Resampler consumes from the MicrogridApiClient, LocalStorageApiClient or the ReportingApiClient what other actors have registered for (component_id, metric_id, start_time) the Resampler would send that metric for that component as a dedicated time series. It would not forward any kind of object or data class! Why doesn't that simply approach work? It feels we are getting stuck with the stubs and object type of thinking as all the developer wants downstream is methods to easily consumer the time series data and thats also what the data scientist needs. That also means we would not store any type of objects in the ring buffers but just pure time series values.

0 replies

sahas-subramanian-frequenz · 2022-10-10T07:58:48Z

sahas-subramanian-frequenz
Oct 10, 2022

I think my original concern was that the resampled output would still be low level/high frequency data and would be much cheaper to calculate just what we need locally, than sending python pickle objects across processes (once we have them) which need a lot more cpu time to encode/decode, and also save CPU until we can go multi-process by resampling only the necessary metrics.

But there might be other (rust/memcpy) ways to optimize the performance once we get to that point, and in exchange we get modularity and the ability to use the same resampler on historical data as well.

So I agree with making a separate actor as well.

0 replies

leandro-lucarella-frequenz · 2022-10-18T12:43:52Z

leandro-lucarella-frequenz
Oct 18, 2022
Maintainer

OK, I'm closing this as won't fix, as we decided to go with another design.

0 replies

Sketch implementation details of SDK data pipeline #34

matthias-wende-frequenz Oct 6, 2022 Maintainer

Replies: 23 comments

tiyash-basu-frequenz Oct 6, 2022 Maintainer

matthias-wende-frequenz Oct 6, 2022 Maintainer Author

tiyash-basu-frequenz Oct 7, 2022 Maintainer

leandro-lucarella-frequenz Oct 7, 2022 Maintainer

tiyash-basu-frequenz Oct 7, 2022 Maintainer

matthias-wende-frequenz Oct 7, 2022 Maintainer Author

matthias-wende-frequenz Oct 7, 2022 Maintainer Author

matthias-wende-frequenz Oct 7, 2022 Maintainer Author

leandro-lucarella-frequenz Oct 7, 2022 Maintainer

leandro-lucarella-frequenz Oct 7, 2022 Maintainer

jakub-toptal Oct 7, 2022

matthias-wende-frequenz Oct 7, 2022 Maintainer Author

jakub-toptal Oct 7, 2022

matthias-wende-frequenz Oct 7, 2022 Maintainer Author

matthias-wende-frequenz Oct 7, 2022 Maintainer Author

matthias-wende-frequenz Oct 7, 2022 Maintainer Author

matthias-wende-frequenz Oct 7, 2022 Maintainer Author

thomas-nicolai-frequenz Oct 9, 2022 Maintainer

thomas-nicolai-frequenz Oct 9, 2022 Maintainer

thomas-nicolai-frequenz Oct 9, 2022 Maintainer

thomas-nicolai-frequenz Oct 9, 2022 Maintainer

sahas-subramanian-frequenz Oct 10, 2022

leandro-lucarella-frequenz Oct 18, 2022 Maintainer

matthias-wende-frequenz
Oct 6, 2022
Maintainer

tiyash-basu-frequenz
Oct 6, 2022
Maintainer

matthias-wende-frequenz
Oct 6, 2022
Maintainer Author

tiyash-basu-frequenz
Oct 7, 2022
Maintainer

leandro-lucarella-frequenz
Oct 7, 2022
Maintainer

tiyash-basu-frequenz
Oct 7, 2022
Maintainer

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

leandro-lucarella-frequenz
Oct 7, 2022
Maintainer

leandro-lucarella-frequenz
Oct 7, 2022
Maintainer

jakub-toptal
Oct 7, 2022

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

jakub-toptal
Oct 7, 2022

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

matthias-wende-frequenz
Oct 7, 2022
Maintainer Author

thomas-nicolai-frequenz
Oct 9, 2022
Maintainer

thomas-nicolai-frequenz
Oct 9, 2022
Maintainer

thomas-nicolai-frequenz
Oct 9, 2022
Maintainer

thomas-nicolai-frequenz
Oct 9, 2022
Maintainer

sahas-subramanian-frequenz
Oct 10, 2022

leandro-lucarella-frequenz
Oct 18, 2022
Maintainer