Description
Sender algorithms that compose other child operations are generally implemented by having the parent operation-state contain as data-members the operation-states of the child operations.
The child operation-states are constructed by connecting the sender to a receiver defined by the parent operation where the receiver contains a pointer to the parent operation-state. The child operation-states are generally required to store this receiver so that they can deliver the result of the child operation when it completes.
This means that for every child operation in an expression tree, we end up storing a pointer to the parent operation-state. However, if we think about it, for cases where the child operation-state is a direct member of the parent operation-state, the pointer stored in the receiver is always going to be pointing to some constant offset relative to the child operation-state's address.
We can potentially reduce the size of child operation-states by not storing the receiver and instead constructing the receiver on-demand from the address of the child operation-state whenever the child operation-state needs it.
Doing this would have a number of benefits:
- it reduces the size of an overall operation-state tree by at least one pointer for each operation in the tree (it might save more due to padding for some operation-states)
- when querying the environment of the receiver it can turn a walk of a linked-list of operation-states to get to the operation-state providing some query result into a constant offset from the current operation-state's address.
Doing this would also reduce the overhead inherent in composing higher-level algorithms out of lower-level algorithms as it reduces the overhead of each operation-state introduced in the abstract. This can potentially reduce the motivation for building higher-level algorithms from scratch in order to avoid the overhead.
The suggested change would be to introduce some new concepts and helper utilities that operation-states can use to opt-in to participating in this optimisation.
A new concept:
template<typename Receiver, typename OpState>
concept reconstuctible_receiver_from =
requires(OpState* op) {
{ Receiver::reconstruct_from(op) } noexcept -> std::same_as<Receiver>;
};
A CRTP helper that operation-states can inherit from in order to elide storage of the receiver.
It takes two template parameters:
Derived
is the type of the operation-state that inherits frominlinable_operation_state
.Receiver
is the type of the receiver for this operation.
If theReceiver
is reconstructible from the address of theDerived
operation-state then it doesn't store the receiver and just produces a receiver on-demand. Otherwise, it stores the receiver as a data-member and returns a reference to it.
template<typename Derived, typename Receiver>
class inlinable_operation_state {
protected:
explicit inlinable_operation_state(Receiver rcvr) noexcept
: receiver_(std::move(rcvr))
{}
Receiver& get_receiver() noexcept { return receiver_; }
private:
[[no_unique_address]] Receiver receiver_;
};
template<typename Derived, typename Receiver>
requires reconstructible_receiver_from<Receiver, Derived>
class inlinable_operation_state<Derived, Receiver> {
protected:
explicit inlinable_operation_state(Receiver&&) noexcept {}
Receiver get_receiver() noexcept {
return Receiver::reconstruct_from(static_cast<Derived*>(this));
}
};
The other piece needed is some helpers for parent-operation-states that makes constructing receivers that implement the reconstructible_receiver_from
concept easier.
There are some tricky parts of the language that limit the ability to go from the address of a data-member to the address of the enclosing object. This is generally not something you can do unless the object is a standard-layout type and even then, only from the first data-member.
So you can't generally obtain the pointer of the enclosing class from a data-member sub-object, but you can obtain the pointer of the enclosing class from a base-class sub-object.
So we need to build some helpers that make it easier to hold child-operation states as base-class sub-objects (or as first-data-members of standard-layout-type base class subobjects).
There are a few different use-cases to cover here:
- cases where there is a single child operation-state (e.g. for the
then
algorithm) - cases where there are multiple child operation-states that exist concurrently (e.g. for
when_all()
and most structured-concurrency algorithms) - cases where there are child operation-states that have non-overlapping lifetimes. e.g. in a
variant_sender
operation where only one of the op-states is active, or inlet_value()
where the op-state of the source op is destroyed and then the second operation is constructed in the same location. - cases where the lifetime of the child operation-states is the same as that of the parent operation-state
- cases where the lifetime of the child operation-states is different and so construction/destruction of the child operation states needs to be deferred/performed manually.
- cases where a parent operation has two child sub-objects that have the same sender-type - these need to have distinct base-class types to avoid ambiguous cast from base to derived.
A prototype implementation came up with the following types:
manual_child_operation<ParentOp, Tag, Sender>
auto_child_operation<ParentOp, Tag, Sender>
variant_child_operation<ParentOp, Tag, Senders...>
The manual_child_operation
class stores a child operation-state constructed from Sender
This operation-state has manually controlled lifetime, allowing deferred initialization by calling the construct()
and destruct()
member-functions at appropriate times.
It takes an additional Tag
type that can be used to disambiguate between multiple child-operations with the same Sender
type.
template<typename ParentOp, typename Tag, typename Sender>
class manual_child_operation {
struct child_receiver {
// satisfies reconstructible_receiver_from<op_state_t>
};
protected:
manual_child_operation() noexcept; // noop/trivial
~manual_child_operation() noexcept; // noop/trivial
void start() noexcept;
void construct(Sender&& sender) noexcept(is_nothrow_connectable_v<Sender, child_receiver>);
void destruct() noexcept;
private:
using op_state_t = connect_result_t<Sender, child_receiver>;
alignas(op_state_t) std::byte storage[sizeof(op_state_t)];
};
The auto_child_operation
class is similar to manual_child_operation
but takes the Sender
in the constructor and automatically destroys the operation-state in its destructor rather than having manually called construct
and destruct
operations.
template<typename ParentOp, typename Tag, Sender>
class auto_child_operation {
struct child_receiver {
// satisfies reconstructible_receiver_from<op_state_t>
};
public:
auto_child_operation(Sender&& sender) noexcept(is_nothrow_connectable_v<Sender, child_receiver>);
~auto_child_operation();
void start() noexcept;
private:
using op_state_t = connect_result_t<Sender, child_receiver>;
alignas(op_state_t) std::byte storage[sizeof(op_state_t)];
};
The variant_child_operation
provides storage for one of several possible operation-state types, constructed from one of several possible Sender
types.
The lifetimes of the child operation-states are manually controlled. It is up to the derived class to construct and destruct the right operation-state type at the appropriate time.
This is roughly to a union
of child-operation-states.
template<typename ParentOp, template<std::size_t> class Tag, typename... Senders>
class variant_child_operation {
template<std::size_t Idx>
struct child_receiver {
// satisfies reconstructible_receiver_from<op_state_t<Idx>>;
};
protected:
variant_child_operation() noexcept; // noop/trivial
~variant_child_operation() noexcept; // noop/trivial
template<std::size_t Idx>
void construct(Senders...[Idx]&& sender)
noexcept(is_nothrow_connectable_v<Senders...[Idx], child_receiver<Idx>>);
template<std::size_t Idx>
void destruct() noexcept;
template<std::size_t Idx>
void start() noexcept;
private:
alignas(max_op_state_align) std::byte storage[max_op_state_size];
};
The child_receiver
types defined here forward their operations through to methods on the ParentOp
class.
rcvr.set_value(vs...)
forwards toparent->set_value(Tag{}, vs...)
rcvr.set_error(e)
forwards toparent->set_error(Tag{}, e)
rcvr.set_stopped()
forwards toparent->set_stopped(Tag{})
rcvr.get_env()
forwards toparent->get_env(Tag{})
Where Tag
is the Tag
template parameter for auto_child_operation
and manual_child_operation
and Tag
is Tag<Idx>
for variant_child_operation
- where Idx
is the index of the corresponding type in the Senders...
pack.
Leaf operation-state implementations can just inherit from the inlinable_operation_state
base class to avoid having to store the receiver in cases where it is stored inline in a parent operation.
For example:
template<typename Receiver>
class leaf_op_state : public inlinable_operation_state<leaf_op_state<Receiver>, Receiver> {
using inlinable_base = inlinable_operation_state<leaf_op_state, Receiver>;
public:
using inlinable_base::inlinable_base;
void start() noexcept {
auto rcvr = this->get_receiver();
auto env = rcvr.get_env();
// ...
}
};
Composed operation-states with child operations can also inherit from both inlinable_operation_state
to avoid storing the receiver when it is an inline operation-state, but can also inherit from one of the child_operation
classes to store child-operation-states that can themselves avoid storing the receiver.
For example:
struct source_tag {};
template<typename Source, typename Func, typename Receiver>
struct then_op_state :
public inlinable_operation_state<then_op_state<Source, Func, Receiver>, Receiver>,
public auto_child_operation<then_op_state<Source, Func, Receiver>, source_tag, Source> {
using inlinable_base = inlinable_operation_state<then_op_state, Receiver>;
using child_base = auto_child_operation<then_op_state, source_tag, Source>;
public:
then_op_state(Source&& source, Func func, Receiver rcvr)
: inlinable_base(std::move(rcvr))
, child_base(std::forward<Source>(source))
, func(std::move(func))
{}
void start() noexcept {
child_base::start();
}
template<typename... Vs
void set_value(source_tag, Vs&&... vs) noexcept {
try {
this->get_receiver().set_value(
std::invoke(func, std::forward<Vs>(vs)...));
} catch (...) {
if constexpr (!std::is_nothrow_invocable_v<Func&, Vs...>) {
this->get_receiver().set_error(std::current_exception());
}
}
}
template<typename E>
void set_error(source_tag, E&& e) noexcept {
this->get_receiver().set_error(std::forward<E>(e));
}
void set_stopped(source_tag) noexcept {
this->get_receiver().set_stopped();
}
decltype(auto) get_env(source_tag) noexcept {
return this->get_receiver().get_env();
}
};
See the following proof-of-concept for more details:
https://godbolt.org/z/fj3cExTbf
Support for this needs to be baked in to the initial standard library implementations of algorithms if we are to be able to take advantage of this.
Trying to add support for this in later would either just not be effective (because few operations opt-in to the optimisations) or would be a potential ABI break for the implementation.