`Device.set_current()` is slow

Follow-up of #658.

With `cuda-python` 12.6.0 (this is an important case because this is a Cython-based cudart re-implementation):
```python
In [4]: %timeit cudart.cudaSetDevice(0)
355 ns ± 1.69 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
```
For comparison, this is CuPy
```python
In [6]: %timeit cp.cuda.runtime.setDevice(0)
154 ns ± 1.67 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
```
With `cuda-bindings` 12.9.0 (the re-implementation is replaced by the statically linked cudart)
```python
In [4]: %timeit runtime.cudaSetDevice(0)
167 ns ± 0.367 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
```
and this is `cuda-core` on the main branch + `cuda-bindings` 12.9.0
```python
In [5]: %timeit dev.set_current()
1.84 μs ± 16 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
```
I think we should find a way to at least revive and reuse the old re-implementation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Device.set_current()` is slow #739

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Device.set_current() is slow #739

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Device.set_current()` is slow #739