Description
🚀 The feature, motivation and pitch
Currently, ExecuTorch Tensors (henceforth referred to as ETensor
) store the pointer to a CPU array containing the Tensor's data. Technically, since ETensor
only stores a raw pointer, the pointer could be leading to any resource, but there is an understanding that the data pointer will be pointing to a CPU array.
The consequence of this is that backends such as Vulkan (which performs compute on the GPU) will need to copy the contents of input/output ETensor
s to/from some kind of specialized memory/representation before and after inference. This adds a copy overhead when using certain delegates.
The copy overhead is unavoidable if the overall inference pipeline produces input data on the CPU, and the outputs must be consumed on the CPU. However, in some cases it is possible to produce/consume data on the same memory type/device used by a delegate, for example using Vulkan delegate for inference on a Vulkan-based rendering platform. In this case the "restriction" that ETensor
should only store a CPU buffer adds even more overhead, since inputs/outputs will have to be copied to the CPU to be wrapped with an ETensor
only to be copied again to the original memory type/device for inference.
To alleviate the copy overhead in these use-cases, it would be great to provide a mechanism to specify what kind of data structure is being referenced by the raw pointer stored by an ETensor
and thus allow ETensor
to wrap arbitrary opaque data structures that can be interpreted by delegates.
One possible solution comes from @JacobSzwejbka suggested adding a Device
tag to ETensor
, which will signal to consumers of the ETensor
how the raw pointer should be interpreted.
Alternatives
No response
Additional context
No response
RFC (Optional)
@JacobSzwejbka wrote an internal RFC for adding a Device tag to ETensor
. Unfortunately this document is not available externally at the moment.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status