[ops] Add mx.empty() — uninitialized array allocation

## Summary

MLX is missing a primitive for allocating an uninitialized array, equivalent to `numpy.empty` / `torch.empty` / `jnp.empty`. This is useful when a buffer will be fully overwritten by a subsequent kernel — the implicit zero-fill of `mx.zeros` is wasted work in that case.

Adding `mx.empty(shape, dtype=..., stream=...)` would close that gap.

## Motivation

The concrete use case we hit: when a TileLang Metal kernel produces an output tensor, the host-side allocation only needs the right shape/dtype/storage — the kernel will fully overwrite the contents. With only `mx.zeros` available today, we pay for a `memset` to zero before the kernel runs and then immediately overwrites every byte. For larger output tensors (e.g. attention outputs in a transformer block) the wasted zero-fill measurably hurts throughput.

The same pattern shows up any time an MLX array is used as a write-only output buffer of an external kernel (a custom Metal op, a DLPack-imported tensor about to be filled in place, etc.).

PyTorch / NumPy / JAX all expose this primitive (`torch.empty`, `numpy.empty`, `jnp.empty`) for the same reason.

## Proposed API

```python
mx.empty(shape, dtype=mx.float32, stream=None)
```

Semantics:

- Allocates an array of the given shape and dtype on the active device.
- Does **not** initialize the contents — the caller is expected to write into it before reading.
- Reuses MLX's existing allocator and dtype rules, including the existing GPU `float64` restriction.
- Rejects negative dimensions with the standard MLX shape-validation error.
- Optional `stream=` argument to match the rest of the MLX ops surface.

This is intentionally a thin wrapper around the existing allocation path — no new buffer-management complexity, just skipping the fill.

## Prototype

We have a working implementation in our downstream fork:

- https://github.com/DatasunriseOU/mlx/commit/4acd37a59eb209584456170485534aa14a55a78b — `Add uninitialized array allocation`
- Diff: 60 LOC across 4 files: `mlx/ops.cpp`, `mlx/ops.h`, `python/src/ops.cpp`, `python/tests/test_ops.py`.

The prototype exposes the API exactly as proposed above. Tests cover default dtype, explicit dtype, negative-shape rejection, and the GPU `float64` rejection path.

## What we're offering

If maintainers are interested, we can rebase the prototype on current `ml-explore/mlx@main` and open a PR. The patch is small and independent of the DLPack work in #3531 — no shared surface, no ordering requirement.

If the team would prefer a slightly different shape (e.g. `dtype` as the first positional argument, or a different `stream=` default), happy to adjust before opening the PR.

## Notes

- One open design question: in debug builds, should `mx.empty` fill with NaN / sentinel values to surface uninitialized-read bugs in user code? PyTorch doesn't do this; NumPy doesn't do this. Our prototype follows the same convention (raw allocation, no debug-fill). Flagging it here in case MLX has a different preference.
- This issue is intentionally narrow per the maintainer guidance on #3548 — DLPack consumer work is being handled in #3531, and this is a small orthogonal piece that came out of the same PoC.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ops] Add mx.empty() — uninitialized array allocation #3549

Summary

Motivation

Proposed API

Prototype

What we're offering

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ops] Add mx.empty() — uninitialized array allocation #3549

Description

Summary

Motivation

Proposed API

Prototype

What we're offering

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions