-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] ENH: CuPy multi-device support #293
base: main
Are you sure you want to change the base?
Conversation
@tylerjereddy as you have a dual-GPU box, could you help testing this? |
CI failure is unrelated |
cc @lucyleeow who has been recently working on multiple device support for scikit-learn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds multi-device support across several array libraries (CuPy, Dask, NumPy, and PyTorch) by updating device type annotations from str to Device and introducing device validation through helper functions. Key changes include:
- Updating documentation and type annotations for device parameters and return types.
- Introducing and using _helpers._validate_device for device validation in array creation functions.
- Modifying CuPy’s array creation functions and context management for device support.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
array_api_compat/torch/_info.py | Update device type in default_device, added notes on current device, with a minor spelling issue in the note. |
array_api_compat/numpy/_info.py | Update device type annotations for default_device and devices. |
array_api_compat/numpy/_aliases.py | Replace manual device checking with _helpers._validate_device. |
array_api_compat/dask/array/_info.py | Update device type annotations for default_device and devices. |
array_api_compat/dask/array/_aliases.py | Use _helpers._validate_device in various array functions. |
array_api_compat/cupy/_info.py | Update device type annotations and add contextual documentation for default_device. |
array_api_compat/cupy/_aliases.py | Adjust device handling in asarray, changing the default for the copy parameter. |
array_api_compat/common/_helpers.py | Remove unused _check_device and add _device_ctx for context management with devices. |
array_api_compat/common/_aliases.py | Replace _check_device calls with _device_ctx in array creation functions. |
Co-authored-by: Copilot <[email protected]>
It's not clear to me what happens in cupy non-creation functions. e.g. with cupy.cuda.Device(1):
x = cp.asarray(1)
with cupy.cuda.Device(0):
y = x + 1
assert y.device == x.device Does the device propagate from the input, like the Array API dictates, or does the interpreter-level default / context prevail? In the latter case this PR is insufficient. |
In some multi GPU setups the following happens
|
Damn. That's a showstopper. We can patch functions, but we can't patch array methods. |
I'm not sure what should happen here. We are trying to combine arrays on two different devices. From the docs it doesn't sound like you can use the context manager to move an existing array to a different device. In the example above This doesn't seem that unreasonable and I don't think the standard says anything about how arrys on different devices should/shouldn't be combined? |
@@ -54,8 +54,8 @@ def empty_like( | |||
device: Optional[Device] = None, | |||
**kwargs, | |||
) -> Array: | |||
_check_device(xp, device) | |||
return xp.empty_like(x, dtype=dtype, **kwargs) | |||
with _device_ctx(xp, device, like=x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education: why can the user pass a device=
argument to a *_like
function? Naively I'd have expected that the _like
implies that the device matches that of x
. But then you can also pass a dtype=
which overrides the dtype of x
, so by symmetry allowing a device=
makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matching device of the input is what the user wants 99% of the time.
Using empty_like
etc. on a different device can make some sense when preparing an output vessel that is filled from different sources. TBH, though, the main difference between empty
and empty_like
, besides convenience, is that the latter can easily duplicate a lazy (unknown) shape. Which frequently prevents masked updates, but that's a separate problem.
No, we're trying to propagate the input to the output, which the standard does say is what should happen:
In with cupy.cuda.Device(1):
x = cp.asarray(1)
with cupy.cuda.Device(0):
y = x + 1
assert y.device == x.device I expect |
array_api_compat/numpy/_info.py
Outdated
@@ -326,7 +326,7 @@ def devices(self): | |||
|
|||
Returns | |||
------- | |||
devices : list of str | |||
devices : list[Device] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In numpy it is a string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, in numpy it is a Literal["cpu"]
.
This is a fairly philosophical thing I doubt warrants much discussion.
My personal, very debatable opinion is that I'd rather stick to the abstract Device object rather than str
.
The returned variable type in the two snippets below should be the same to static type checkers:
import array_api_compat.numpy as xp
d = xp.__array_namespace_info__().devices()
vs.
import numpy as np
from array_api_compat import array_namespace
xp = array_namespace(np.asarray(0)) # returns an abstracts array_api_types.Namespace
d = xp.__array_namespace_info__().devices()
(it doesn't today because array-api-types
doesn't exist yet, but it will eventually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy returns a string, not a Device
, so why change this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the user shouldn't think of it as a string, but as an opaque device object.
(what is up with the automatic copilot review 👀) |
Your expectation and mine don't agree, which is what makes me think that it isn't clear what should happen. I read "Raise an exception if an operation involves arrays on different devices" combined with "If a library has multiple ways of controlling device placement, the most explicit method should have the highest priority." as the user asking for something that isn't possible because they asked for |
They're not my expectations, they're the standard's recommendations:
You're quoting point 5, but you skipped over point 2:
Also, the context manager is just an example for the sake of making a reproducer. A more realistic example is: with cupy.cuda.Device(1):
x = ...
# ... 1000 lines and 3 modules later...
y = scipy.logsumexp(x) # x is on device 1, but the current device is 0 Here there is no "most explicit method". What should be more obvious to the user, that they're currently on the default device 0, or that the array is on device 1? |
My expectation is that |
So you agree that there should be propagation from input to output. |
The problem is that the CuPy docs say:
I admit that in this case with cupy.cuda.Device(1):
x = cp.asarray(1)
with cupy.cuda.Device(0):
y = x + 1
assert y.device == x.device I think the intuitive behaviour would be for The bigger problem is whether such behaviour which is intuitive for me (and seems to align with the standard) would break the model-by-design of CuPy's multi-device support. For example, if you are supposed to treat Python scalars not as scalars but as already arrays on the default device, before their input to any functions is considered. Again, that seems unintuitive to me, but if that is the model CuPy is committed to, I think it may be worth updating the standard. |
It may be worth opening a CuPy issue to ask the devs about this specific example, to find out how much it is by (crucial) design or how much it is accidental. |
That feels really weird, considering that there is already explicit ad-hoc code for NEP50-style type promotion and it would feel logical to assign their device in the same way: >>> cupy.asarray(1.0, dtype=cupy.float32) + cupy.asarray(1.0)
array(2.) # float64
>>> cupy.asarray(1.0, dtype=cupy.float32) + 1.0
array(2., dtype=float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerjereddy as you have a dual-GPU box, could you help testing this?
I'm cross-testing this branch against scipy/scipy#22756, with the state of that SciPy branch at hash 3025a6a05
before some of the more recent i.e., xfail
adjustments over there, to see if this feature branch of array-api-compat
can rescue the test failure when pulled in to scipy/_lib/array_api_compat
(using hash 9fae9434
of this branch inside of the other SciPy branch).
At the moment, SCIPY_DEVICE=cuda python dev.py test -t scipy/special/tests/test_logsumexp.py::TestLogSumExp::test_device -b cupy
continues to fail in the same way as originally described over there. I'll try to provide
slightly more information below on the dual-GPU machine, and maybe you can ask me for what you want me to try next.
With this branch pulled over there, I see that the first setting of the non-default device works just fine:
--- a/scipy/special/tests/test_logsumexp.py
+++ b/scipy/special/tests/test_logsumexp.py
@@ -296,6 +296,7 @@ class TestLogSumExp:
def test_device(self, x, xp, nondefault_device):
"""Test input device propagation to output."""
x = xp.asarray(x, device=nondefault_device)
+ print("x device:", xp_device(x))
That prints x device: <CUDA Device 1>
with this branch pulled in, but this also seems to be true when I checkout the submodule version SciPy main
currently uses.
I dissected further in SciPy on the dual-GPU machine with the debug prints below the fold.
--- a/scipy/_lib/_array_api.py
+++ b/scipy/_lib/_array_api.py
@@ -515,9 +515,11 @@ def xp_ravel(x: Array, /, *, xp: ModuleType | None = None) -> Array:
# utility to broadcast arrays and promote to common dtype
def xp_broadcast_promote(*args, ensure_writeable=False, force_floating=False, xp=None):
+ print("xp_broadcast_promote xp_device(args[0]) checkpoint 1:", xp_device(args[0]))
xp = array_namespace(*args) if xp is None else xp
args = [(_asarray(arg, subok=True) if arg is not None else arg) for arg in args]
+ print("xp_broadcast_promote xp_device(args[0]) checkpoint 2:", xp_device(args[0]))
args_not_none = [arg for arg in args if arg is not None]
# determine minimum dtype
diff --git a/scipy/_lib/array_api_compat b/scipy/_lib/array_api_compat
index 1b0de51538..9fae94343f 160000
--- a/scipy/_lib/array_api_compat
+++ b/scipy/_lib/array_api_compat
@@ -1 +1 @@
-Subproject commit 1b0de51538deb7c21d0c268f36764a8589e40012
+Subproject commit 9fae94343fcb3e88ff88e7c0494267fb0a0dfb13
diff --git a/scipy/special/_logsumexp.py b/scipy/special/_logsumexp.py
index c1664bf935..2e2d041d05 100644
--- a/scipy/special/_logsumexp.py
+++ b/scipy/special/_logsumexp.py
@@ -105,9 +105,12 @@ def logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False):
"""
xp = array_namespace(a, b)
+ print("xp_device(a) checkpoint -1:", xp_device(a))
a, b = xp_broadcast_promote(a, b, ensure_writeable=True, force_floating=True, xp=xp)
+ print("xp_device(a) checkpoint 0:", xp_device(a))
a = xpx.atleast_nd(a, ndim=1, xp=xp)
b = xpx.atleast_nd(b, ndim=1, xp=xp) if b is not None else b
+ print("xp_device(a) checkpoint 1:", xp_device(a))
axis = tuple(range(a.ndim)) if axis is None else axis
A sample failure then drops the non-default device very early on in logsumexp()
, and in particular apparently because of an _asarray(arg, subok=True)
call inside of xp_broadcast_promote()
:
scipy/special/tests/test_logsumexp.py:300: in test_device
assert xp_device(logsumexp(x)) == nondefault_device
E assert <CUDA Device 0> == <CUDA Device 1>
E + where <CUDA Device 0> = xp_device(array(-inf))
E + where array(-inf) = logsumexp(array([], dtype=float64))
nondefault_device = <CUDA Device 1>
self = <scipy.special.tests.test_logsumexp.TestLogSumExp object at 0x7482e19845d0>
x = array([], dtype=float64)
xp = <module 'scipy._lib.array_api_compat.cupy' from '/home/treddy/github_projects/scipy/build-install/lib/python3.11/site-packages/scipy/_lib/array_api_compat/cupy/__init__.py'>
------------------------------------------------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------------------------------------------------
x device: <CUDA Device 1>
xp_device(a) checkpoint -1: <CUDA Device 1>
xp_broadcast_promote xp_device(args[0]) checkpoint 1: <CUDA Device 1>
xp_broadcast_promote xp_device(args[0]) checkpoint 2: <CUDA Device 0>
xp_device(a) checkpoint 0: <CUDA Device 0>
xp_device(a) checkpoint 1: <CUDA Device 0>
@tylerjereddy Thank you, there was indeed a bug in |
Yes, except when you add some explicit request to not do that. For example by using
Because in the I agree it is a bit murky what "this" is. Is "this" the addition of |
@kmaehashi @leofang @asi1024 hello! We have a question about how to interpret CuPy's device context manager. The following seems clear: with cp.cuda.Device(1):
x = cp.asarray(1) # x should be on 'current device' Device(1)
y = cp.asarray(1, device=cp.cuda.Device(0)) # y should be on Device(0), as explicitly requested But what about the following example: with cp.cuda.Device(1):
y = cp.asarray(1, device=cp.cuda.Device(0)) # y should be on Device(0)
z = y + 1 How 'strong' is the context manager supposed to be here? It seems like there are at least a couple options:
#293 (comment) suggests that in practice, the current CuPy implementation leans towards option (1), failing when peer access cannot be established. If so, is this a deliberate/crucial design aspect, or more so accidental? If (1) is by design, how extreme does it go? For example should with cp.cuda.Device(1):
y = cp.asarray(1, device=cp.cuda.Device(0)) # y should be on Device(0)
y = y also raise an exception if I suppose there is also the more extreme option 3 where with cp.cuda.Device(1):
y = cp.asarray(1, device=cp.cuda.Device(0)) should itself throw an exception, with the context manager overriding the |
This much is clear.
So are you saying that the context should trump the device of the input array to a function (
? |
Just to continue providing the multi-device scenario feedback, we get farther in the control flow in the SciPy example now until encountering the dreaded That is perhaps slightly more surprising since I don't see a context manager directly in the source, but I think confusion is a common theme already anyway, and the context manager may just be abstracted away in the shims here (or is the global default device still overriding even without the context?) so I don't "see it." Dissection at scipy/scipy#22756 (comment), but probably more something to discuss over here for now. |
CuPy does not support the |
array-api-compat could wrap every single function so that cupy respects the input argument's device, but cannot do the same for array methods, e.g. |
@leofang do you have an opinion on the desired behaviour here? with cp.cuda.Device(1):
y = cp.asarray(1, device=cp.cuda.Device(0)) # y should be on Device(0)
z = y + 1 # device of z? |
Add support for multiple devices in CuPy
UNTESTED: I don't know how to test this without a dual-GPU box, which I don't have access to.
Suggestions welcome. At any rate, any tests for this should be in
array-api-tests
,replicating the same pattern as in BUG/TST:
special.logsumexp
on non-default device scipy/scipy#22756.Tracker: Test device support array-api-tests#302
Validate
device
keyword inarray_api_compat.numpy.astype
Validate
device
keyword in all Dask functions