Can the core array API ops of cubed be implemented in jax, s.t. everything easily compiles to accelerators? Could this solve the common pain point of running out of GPU memory? How would other constraints (GPU bandwidth limits) be handled? What is the ideal distributed runtime environment to make the most of this? Could spot GPU instances be used (serverless accelerators)?
Can the core array API ops of cubed be implemented in jax, s.t. everything easily compiles to accelerators? Could this solve the common pain point of running out of GPU memory? How would other constraints (GPU bandwidth limits) be handled? What is the ideal distributed runtime environment to make the most of this? Could spot GPU instances be used (serverless accelerators)?