Sketch implementation details of SDK data pipeline #34
Replies: 23 comments
-
Doesn't look like we are using interfaces like we discussed earlier. Also, having one large class called I'd suggest creating interfaces (like abstract classes). E.g., there could be a meter interface: class Meter(ABC):
@property
@abstractmethod
def current(self):
pass
@property
@abstractmethod
def voltage(self):
pass
@property
@abstractmethod
def power(self):
pass Then we can implement it for different units. E.g., we could have a resampler for a meter with component ID 4, and get power from it using this interface: class Resampler(Meter):
def __init__(component_id: int)
...
@property
@abstractmethod
def current(self):
pass
@property
@abstractmethod
def voltage(self):
pass
@property
@abstractmethod
def power(self):
pass We can do something similar with the This way, the interface stays consistent, lazy, and its implemention varies from one class to another, depending upon the functionality. |
Beta Was this translation helpful? Give feedback.
-
That was my first roughly my design approach, but it didn't work out very nicely. A meters data stream from the api is actually a stream of a product data type, where each entry represents a certain metric. From experience that most of the time a user is interested in power and thus we don't want to resample all metrics in the stream but only the requested. If a user requests a new metric we don't want to create a new stream from the api but reuse the existing one. Hence we need to store an object somewhere that holds the context of existing data streams from components. Now coming back to the interface. If we'd do it like you suggested, in order to avoid duplicated streams from the api, we would need another class that holds meters. The microgrid client is by design a singleton object that is supposed to take care that streams to the api are re-used. Also from a user perspective we found that easier, because a user requests a channel for a certain metric from the microgrid client and gets back a channel that delivers what was asked. That keeps the sdk simpler to understand because users don't have to learn about several but only one object, where one of the main objectives of that microgrid client is to deliver data streams. Nevertheless I haven't ditched the idea of a class LocalStorageMeter(Meter):
"""
A Meter that reads data from local storage (e.g. from parquet or csv files)
resamples the data and sends it into a channel.
"""
... And indeed we have to repeat the same exercise for the I've added the abstract Meter class. As a side note. In case there is an urge to use the Meter interface for physical meters we can simply create a simple wrapper class that returns the channels from the microgrid client. |
Beta Was this translation helpful? Give feedback.
-
As we discussed in our call, using interfaces in independent of the underlying implementation. Regardless of whether we use interfaces or not, we will need singleton objects that memoize data channels. Using interfaces corresponding to component categories on top of this will provide additional structure, and clear expectations from entities, e.g., if something implements the |
Beta Was this translation helpful? Give feedback.
-
I agree, it feels like We don't need to go for the per-component-abstraction, but we can still have different layers providing different functionality, so we can keep the current We could even extend this to have a virtual What I like about this approach that is a clear pipeline, where we add more operations to the same channel's data. I think this design is missing a key point though. We are talking about calling methods to get the info. This means data calculation can't be global to the whole system while we start distributing stuff (we are talking about classes, not actors). I guess we could write an actor on top of this and then we'll receive subscriptions to metrics via message-passing, but this adds more complexity (like having to know in advance which the users of this actors are and probably connect channels at construction-time, like we have now in In any case my feeling is we now have the requirements well figured out, and we are starting to fight more with different implementation ideas that might work better or worse but it is hard to figure without actually trying to implement it. So my feeling is maybe we should try to implement a minimal but working version of the idea more concretely, and maybe even more than one idea. I know it is time consuming, but this is a very key part of the system, so I think it might be worth it. |
Beta Was this translation helpful? Give feedback.
-
We are already discussing about methods that return data streams. E.g., the
Agreed. Design funneling is a good idea in this case. |
Beta Was this translation helpful? Give feedback.
-
I changed the design such that the microgrid client returns a instance of "type" |
Beta Was this translation helpful? Give feedback.
-
As tiyash pointed out we are returning channels. One of the motivations for implementing python channels was distribution, because we can implement channels that communicate over say tcp. I'd also like to highlight that we wanted to avoid having multiple singleton objects in the code and thus we decided that the microgrid client is the single object that holds all states. This is of course a design trade-off but since we decided on the high-level to make re-sampling very prominent it makes a lot of sense that by default data streams from components that are returned by the most prominent object are already resampled. (There will be a method to get the In a distributed world, each process needs to create it's own instance of a microgrid client too and we don't run in any problems. If computations should be re-used channels can be used as pointed out above. |
Beta Was this translation helpful? Give feedback.
-
Let me also add, that this design doesn't include how things are going to be implemented in the microgrid client. I intent to work that out soon but it surely is intended to be as modular as possible. So we to give an example, we certainly don't want to end up with resampling code that is tightly coupled into the mg client! |
Beta Was this translation helpful? Give feedback.
-
Yeah, but the mapping of channels will only be owned by one process. So if you do
I'm not sure if we are using the same nomenclature here because the current class names are a bit confusing at the moment. There is already one only singleton object ( Specially since in your proposal code you seem to be using arbitrary names, it is not clear if they are new classes or if they correspond to existing code. It is a bit hard to have a clear picture of the proposal without something more concrete, this is why I think we should either have another meeting to explain it a bit further, or go for a MVP with real code in the real SDK. For me all is still too much in the air...
OK, so you are fine with the resampling the same data multiple times (in different processes then). I'm fine with that but it diverges from the design in Tom's slides. |
Beta Was this translation helpful? Give feedback.
-
This makes some sense, but I cannot see a direct benefit right now. At the moment, we just create these channels in a function that lives outside of the and return a list of that was fine, because we always had a single sink for the component data, e.g.:
What's the benefit of doing it per single metric like In regards to the resampling, I wouldn't add the So it would basically use what we already have in the If we stick to the If we split these per attribute, we will end up with like 20 methods, for every category & attribute combination.
I thought we were supposed to stop using I am still confused about how these 3 are supposed to work:
Also, this method in the
|
Beta Was this translation helpful? Give feedback.
-
One could pipe the output of one meters output channel into a tcp based channel and we are done. Or alternatively the microgrid client returns a receiver that is connected over tcp and then we can pass that one to another process and the problem would be solved too. |
Beta Was this translation helpful? Give feedback.
-
Does SDK user really want to measure power of an individual component? Or perhaps did you mean they would want to measure Could you please elaborate a bit on what the most typical use case and the pain points for the SDK user currently are? I thought the most typical SDK user would like to get a stream of some aggregation result, e.g. |
Beta Was this translation helpful? Give feedback.
-
We don't know what a user wants to measure but we certainly want to offer all possibilities.
Yep that is often the case and grid load and since it is not determined if such a meter exists we expect to work with a logical meter in that case. But note that under the hood in the new design a logical meter would consume the streams from physical meters. |
Beta Was this translation helpful? Give feedback.
-
In this design the sampling frequency is set per stream, so we slightly differ from the high level diagram, towards more flexibility. The simple reason is that we started to draft the high level overview we didn't fully anticipate how the data streams look in detail. |
Beta Was this translation helpful? Give feedback.
-
Meter is the Interface for all kind of meters, e.g. The
Nope. The details are simply not fully worked out. But formula parsing will certainly become a separate module, resampling happens in an instance of |
Beta Was this translation helpful? Give feedback.
-
The benefit is that we don't need to resample metrics that are unused. In many cases users are not interested in all metrics that come from the client, thus we process them lazy.
Not if we add methods like
yes but I'd suggest to move implementation to the
Nope, the number of messages wont change since we return classes for each category. But I think we'll gain modularity since I think we can offload some of the implementation into these new classes. Nevertheless the microgrid client is supposed to hold all the state, e.g. the already constructed meter channels. |
Beta Was this translation helpful? Give feedback.
-
I absolutely share what @tiyash-basu-frequenz and @leandro-lucarella-frequenz have been pointing out. The proposal feels like |
Beta Was this translation helpful? Give feedback.
-
Yes! That is the idea @leandro-lucarella-frequenz and we should strictly follow that.
mm, what? @matthias-wende-frequenz |
Beta Was this translation helpful? Give feedback.
-
@jakub-toptal The user should be able to do both. Both cases are needed. |
Beta Was this translation helpful? Give feedback.
-
Let me be very clear on the requirements:
That means:
If the |
Beta Was this translation helpful? Give feedback.
-
I think my original concern was that the resampled output would still be low level/high frequency data and would be much cheaper to calculate just what we need locally, than sending python pickle objects across processes (once we have them) which need a lot more cpu time to encode/decode, and also save CPU until we can go multi-process by resampling only the necessary metrics. But there might be other (rust/memcpy) ways to optimize the performance once we get to that point, and in exchange we get modularity and the ability to use the same resampler on historical data as well. So I agree with making a separate actor as well. |
Beta Was this translation helpful? Give feedback.
-
OK, I'm closing this as won't fix, as we decided to go with another design. |
Beta Was this translation helpful? Give feedback.
-
In the following we sketch an implementation of the future SDK data pipeline.
The design is originated from a users perspective, who might ask the question:
"I need to measure power on a certain point of my microgrid".
For now we only sketch the details for Meters. All other Component categories
may have analog though slightly different implementations respecting the hardware specifications
of the individual category.
Beta Was this translation helpful? Give feedback.
All reactions