Skip to content

Add support for more efficiently storing receivers in child operation-states stored inline in a parent operation-state #224

Open
@lewissbaker

Description

@lewissbaker

Sender algorithms that compose other child operations are generally implemented by having the parent operation-state contain as data-members the operation-states of the child operations.

The child operation-states are constructed by connecting the sender to a receiver defined by the parent operation where the receiver contains a pointer to the parent operation-state. The child operation-states are generally required to store this receiver so that they can deliver the result of the child operation when it completes.

This means that for every child operation in an expression tree, we end up storing a pointer to the parent operation-state. However, if we think about it, for cases where the child operation-state is a direct member of the parent operation-state, the pointer stored in the receiver is always going to be pointing to some constant offset relative to the child operation-state's address.

We can potentially reduce the size of child operation-states by not storing the receiver and instead constructing the receiver on-demand from the address of the child operation-state whenever the child operation-state needs it.

Doing this would have a number of benefits:

  • it reduces the size of an overall operation-state tree by at least one pointer for each operation in the tree (it might save more due to padding for some operation-states)
  • when querying the environment of the receiver it can turn a walk of a linked-list of operation-states to get to the operation-state providing some query result into a constant offset from the current operation-state's address.

Doing this would also reduce the overhead inherent in composing higher-level algorithms out of lower-level algorithms as it reduces the overhead of each operation-state introduced in the abstract. This can potentially reduce the motivation for building higher-level algorithms from scratch in order to avoid the overhead.

The suggested change would be to introduce some new concepts and helper utilities that operation-states can use to opt-in to participating in this optimisation.

A new concept:

template<typename Receiver, typename OpState>
concept reconstuctible_receiver_from =
  requires(OpState* op) {
    { Receiver::reconstruct_from(op) } noexcept -> std::same_as<Receiver>;
  };

A CRTP helper that operation-states can inherit from in order to elide storage of the receiver.
It takes two template parameters:

  • Derived is the type of the operation-state that inherits from inlinable_operation_state.
  • Receiver is the type of the receiver for this operation.
    If the Receiver is reconstructible from the address of the Derived operation-state then it doesn't store the receiver and just produces a receiver on-demand. Otherwise, it stores the receiver as a data-member and returns a reference to it.
template<typename Derived, typename Receiver>
class inlinable_operation_state {
protected:
  explicit inlinable_operation_state(Receiver rcvr) noexcept
  : receiver_(std::move(rcvr))
  {}  

  Receiver& get_receiver() noexcept { return receiver_; }

private:
  [[no_unique_address]] Receiver receiver_;
};

template<typename Derived, typename Receiver>
requires reconstructible_receiver_from<Receiver, Derived>
class inlinable_operation_state<Derived, Receiver> {
protected:
  explicit inlinable_operation_state(Receiver&&) noexcept {}
  
  Receiver get_receiver() noexcept {
    return Receiver::reconstruct_from(static_cast<Derived*>(this));
  }
};

The other piece needed is some helpers for parent-operation-states that makes constructing receivers that implement the reconstructible_receiver_from concept easier.

There are some tricky parts of the language that limit the ability to go from the address of a data-member to the address of the enclosing object. This is generally not something you can do unless the object is a standard-layout type and even then, only from the first data-member.
So you can't generally obtain the pointer of the enclosing class from a data-member sub-object, but you can obtain the pointer of the enclosing class from a base-class sub-object.

So we need to build some helpers that make it easier to hold child-operation states as base-class sub-objects (or as first-data-members of standard-layout-type base class subobjects).

There are a few different use-cases to cover here:

  • cases where there is a single child operation-state (e.g. for the then algorithm)
  • cases where there are multiple child operation-states that exist concurrently (e.g. for when_all() and most structured-concurrency algorithms)
  • cases where there are child operation-states that have non-overlapping lifetimes. e.g. in a variant_sender operation where only one of the op-states is active, or in let_value() where the op-state of the source op is destroyed and then the second operation is constructed in the same location.
  • cases where the lifetime of the child operation-states is the same as that of the parent operation-state
  • cases where the lifetime of the child operation-states is different and so construction/destruction of the child operation states needs to be deferred/performed manually.
  • cases where a parent operation has two child sub-objects that have the same sender-type - these need to have distinct base-class types to avoid ambiguous cast from base to derived.

A prototype implementation came up with the following types:

  • manual_child_operation<ParentOp, Tag, Sender>
  • auto_child_operation<ParentOp, Tag, Sender>
  • variant_child_operation<ParentOp, Tag, Senders...>

The manual_child_operation class stores a child operation-state constructed from Sender
This operation-state has manually controlled lifetime, allowing deferred initialization by calling the construct() and destruct() member-functions at appropriate times.
It takes an additional Tag type that can be used to disambiguate between multiple child-operations with the same Sender type.

template<typename ParentOp, typename Tag, typename Sender>
class manual_child_operation {
  struct child_receiver {
     // satisfies reconstructible_receiver_from<op_state_t>
  };
protected:
  manual_child_operation() noexcept; // noop/trivial
  ~manual_child_operation() noexcept; // noop/trivial
  
  void start() noexcept;
  
  void construct(Sender&& sender) noexcept(is_nothrow_connectable_v<Sender, child_receiver>);
  
  void destruct() noexcept;
  
private:
  using op_state_t = connect_result_t<Sender, child_receiver>;
  alignas(op_state_t) std::byte storage[sizeof(op_state_t)];
};

The auto_child_operation class is similar to manual_child_operation but takes the Sender in the constructor and automatically destroys the operation-state in its destructor rather than having manually called construct and destruct operations.

template<typename ParentOp, typename Tag, Sender>
class auto_child_operation {
  struct child_receiver {
    // satisfies reconstructible_receiver_from<op_state_t>
  };
public:
  auto_child_operation(Sender&& sender) noexcept(is_nothrow_connectable_v<Sender, child_receiver>);
  ~auto_child_operation();
  
  void start() noexcept;
  
private:
  using op_state_t = connect_result_t<Sender, child_receiver>;
  alignas(op_state_t) std::byte storage[sizeof(op_state_t)];
};

The variant_child_operation provides storage for one of several possible operation-state types, constructed from one of several possible Sender types.
The lifetimes of the child operation-states are manually controlled. It is up to the derived class to construct and destruct the right operation-state type at the appropriate time.
This is roughly to a union of child-operation-states.

template<typename ParentOp, template<std::size_t> class Tag, typename... Senders>
class variant_child_operation {
  template<std::size_t Idx>
  struct child_receiver {
    // satisfies reconstructible_receiver_from<op_state_t<Idx>>;
  };
protected:
  variant_child_operation() noexcept; // noop/trivial
  ~variant_child_operation() noexcept; // noop/trivial
  
  template<std::size_t Idx>
  void construct(Senders...[Idx]&& sender)
    noexcept(is_nothrow_connectable_v<Senders...[Idx], child_receiver<Idx>>);
    
  template<std::size_t Idx>
  void destruct() noexcept;
  
  template<std::size_t Idx>
  void start() noexcept;
  
private:
  alignas(max_op_state_align) std::byte storage[max_op_state_size];
};

The child_receiver types defined here forward their operations through to methods on the ParentOp class.

  • rcvr.set_value(vs...) forwards to parent->set_value(Tag{}, vs...)
  • rcvr.set_error(e) forwards to parent->set_error(Tag{}, e)
  • rcvr.set_stopped() forwards to parent->set_stopped(Tag{})
  • rcvr.get_env() forwards to parent->get_env(Tag{})

Where Tag is the Tag template parameter for auto_child_operation and manual_child_operation and Tag is Tag<Idx> for variant_child_operation - where Idx is the index of the corresponding type in the Senders... pack.

Leaf operation-state implementations can just inherit from the inlinable_operation_state base class to avoid having to store the receiver in cases where it is stored inline in a parent operation.

For example:

template<typename Receiver>
class leaf_op_state : public inlinable_operation_state<leaf_op_state<Receiver>, Receiver> {
  using inlinable_base = inlinable_operation_state<leaf_op_state, Receiver>;
public:
  using inlinable_base::inlinable_base;
  
  void start() noexcept {
    auto rcvr = this->get_receiver();
    auto env = rcvr.get_env();
    // ...
  }
};

Composed operation-states with child operations can also inherit from both inlinable_operation_state to avoid storing the receiver when it is an inline operation-state, but can also inherit from one of the child_operation classes to store child-operation-states that can themselves avoid storing the receiver.

For example:

struct source_tag {};

template<typename Source, typename Func, typename Receiver>
struct then_op_state :
    public inlinable_operation_state<then_op_state<Source, Func, Receiver>, Receiver>,
    public auto_child_operation<then_op_state<Source, Func, Receiver>, source_tag, Source>  {
  using inlinable_base = inlinable_operation_state<then_op_state, Receiver>;
  using child_base = auto_child_operation<then_op_state, source_tag, Source>;
  
public:
  then_op_state(Source&& source, Func func, Receiver rcvr)
  : inlinable_base(std::move(rcvr))
  , child_base(std::forward<Source>(source))
  , func(std::move(func))
  {}  
  
  void start() noexcept {
    child_base::start();
  }
  
  template<typename... Vs
  void set_value(source_tag, Vs&&... vs) noexcept {
    try {
      this->get_receiver().set_value(
        std::invoke(func, std::forward<Vs>(vs)...));
    } catch (...) {
      if constexpr (!std::is_nothrow_invocable_v<Func&, Vs...>) {
        this->get_receiver().set_error(std::current_exception());
      }
    }
  }
  
  template<typename E>
  void set_error(source_tag, E&& e) noexcept {
    this->get_receiver().set_error(std::forward<E>(e));
  }
  
  void set_stopped(source_tag) noexcept {
    this->get_receiver().set_stopped();
  }
  
  decltype(auto) get_env(source_tag) noexcept {
    return this->get_receiver().get_env();
  }
};

See the following proof-of-concept for more details:
https://godbolt.org/z/fj3cExTbf

Support for this needs to be baked in to the initial standard library implementations of algorithms if we are to be able to take advantage of this.
Trying to add support for this in later would either just not be effective (because few operations opt-in to the optimisations) or would be a potential ABI break for the implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions