Skip to content

Conversation

tom-pollak
Copy link

@tom-pollak tom-pollak commented Oct 9, 2025

Motivation

Refactor to provide a device-agnostic backend API, with optional device-specific functionality.

Changes

Backend Detection (iris/backend/init.py)

  • Auto-detects available GPU runtime at import time.
  • Provides unified API across devices, refactor iris.hip -> iris.backend
  • Can be forced via IRIS_BACKEND environment variable

API Usage

# Portable code - works on both CUDA and AMD
import iris.backend as backend
backend.set_device(0)
ptr = backend.malloc(size)

# Platform-specific features
from iris.backend import hip
ptr = hip.malloc_fine_grained(size)  # AMD: cache-coherent memory

from iris.backend import cuda
ptr = cuda.malloc_managed(size)  # NVIDIA: unified memory

Migration

Device Interface

  • Note no use of malloc_fine_grained, get_num_xcc should probably also be hip specific but its used in the examples. For now CUDA sets this to 1.
def set_device(gpu_id: int) -> None
def get_device_id() -> int
def count_devices() -> int
def get_cu_count(device_id: int | None = None) -> int
def get_wall_clock_rate(device_id: int) -> int
def get_arch_string(device_id: int | None = None) -> str
def get_num_xcc(device_id: int | None = None) -> int
def get_runtime_version() -> tuple[int, int]  # (major, minor)

def get_ipc_handle(ptr: int | ctypes.c_void_p, rank: int) -> Any
def open_ipc_handle(ipc_handle_data: np.ndarray, rank: int) -> int

def malloc(size: int) -> ctypes.c_void_p
def free(ptr: int | ctypes.c_void_p) -> None

Any comments and feedback would be welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant