Skip to content

Commit 4908f94

Browse files
authored
Merge pull request #1 from data-apis/array-api-release-blog
Array API release blog
2 parents e4a0ae8 + 8094071 commit 4908f94

File tree

2 files changed

+199
-1
lines changed

2 files changed

+199
-1
lines changed
+198
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
+++
2+
date = "2020-11-10T08:00:00+00:00"
3+
author = "Ralf Gommers"
4+
title = "First release of the Array API Standard"
5+
tags = ["APIs", "standard", "consortium", "arrays", "community"]
6+
categories = ["Consortium", "Standardization"]
7+
description = "This first release of the standards document and accompanying test suite marks the start of the community review period."
8+
draft = false
9+
weight = 30
10+
+++
11+
12+
Array and tensor libraries - from NumPy, TensorFlow and PyTorch to Dask, JAX,
13+
MXNet and beyond - could benefit greatly from a uniform API for creating and
14+
working with multi-dimensional arrays (a.k.a tensors), as we discussed in
15+
[our previous blog post]({{< relref "announcing_the_consortium.md" >}}).
16+
Today we're pleased to announce a first version of our array API standard
17+
([document](https://data-apis.github.io/array-api/latest),
18+
[repo](https://github.com/data-apis/array-api/)) for review by the
19+
wider community. Getting to this point took slightly longer than we had
20+
initially announced because, well, it's 2020 and hence nothing quite goes
21+
according to plan.
22+
23+
The current status of the standard is that it is a coherent story (or at
24+
least, we hope it is) that gives readers enough context about goals and scope
25+
to understand and review the design decisions already taken and APIs it
26+
contains. However, _it is not yet complete and we can still change direction
27+
and make significant changes based on community feedback_. This is important
28+
--- no one likes a "take it or leave it" approach, and more eyes can make the
29+
final result better. There's still a few TODOs in places, and a couple of key
30+
sections to be finished. The most important of those are the API for device
31+
support, and the Python API for the
32+
[data interchange protocol](https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html)
33+
(proposed to be based on [DLPack](https://github.com/dmlc/dlpack)).
34+
35+
It is worth repeating the main goal of this standard: make it easier to
36+
switch from one array library to another one, or to support multiple array
37+
libraries as compute backends in downstream packages. We'd also like to
38+
emphasize that if some functionality is _not_ present in the API standard,
39+
that does _not_ mean it's unimportant, or that we're asking existing array
40+
libraries to deprecate it. Instead it simply means that that functionality at
41+
present isn't supported - likely due to it not being present in all or most
42+
current array libraries, or not being used widely enough to have been
43+
included so far. The [use cases section](https://data-apis.github.io/array-api/latest/use_cases.html)
44+
of the standard may provide more insight into important goals.
45+
46+
47+
## Some key design topics
48+
49+
Two topics stood out so far in terms of complexity and choices that were hard
50+
to make in such a way that they'd work well for all existing libraries:
51+
mutability & copy/view behaviour, and dtype casting rules.
52+
53+
##### The standard will contain common mutable operations such as slice assignment, but will generally avoid in-place mutation in APIs like the `out` keyword
54+
55+
NumPy, PyTorch, CuPy and MXNet provide strided arrays, and rely heavily on
56+
mutating values in existing arrays and on the concept of a "view" for
57+
performance. TensorFlow, JAX and Dask on the other hand have no or limited
58+
support, given that they rely on an execution graph and/or JIT compiler which
59+
provides constraints on how much mutability can be supported. The design
60+
decisions described [here](https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html)
61+
will allow the most heavily used types of mutability - inplace operators,
62+
item assignment and slice assignment - to be retained, while avoiding the use
63+
of the `out=` keyword which is problematic to support for some libraries and
64+
arguably a suboptimal API to begin with.
65+
66+
For libraries like SciPy and scikit-learn, the supported features are essential.
67+
Code like this, from scikit-learn's `ForestClassifier`:
68+
69+
```python
70+
for k in range(self.n_outputs_):
71+
predictions[k][unsampled_indices, :] += p_estimator[k]
72+
```
73+
74+
or this, from SciPy's `optimize.linprog`:
75+
76+
```python
77+
r = b - A@x
78+
A[r < 0] = -A[r < 0]
79+
b[r < 0] = -b[r < 0]
80+
r[r < 0] *= -1
81+
```
82+
83+
is quite common and we see it as fundamental to how users work with array libraries.
84+
`out=` is less essential though, and leaving it out is important for JAX,
85+
TensorFlow, Dask, and future libraries designed on immutable data structures.
86+
87+
88+
##### Casting rules for mixed type families will not be specified and are implementation specific
89+
90+
Casting rules are relatively straightforward when all involved dtypes are of
91+
the same kind (e.g. all integer), but when mixing for example integers and
92+
floats it quickly becomes clear that array libraries don't agree with each
93+
other. One may get exceptions, or dtypes with different precision. Therefore
94+
we had to make the choice to leave the rules for "mixed kind dtype casting"
95+
undefined - when users want to write portable code, they should avoid this
96+
situation or use explicit casts to obtain the same results from different
97+
array libraries. An example as simple as this one:
98+
99+
```python
100+
x = np.arange(5) # will be integer
101+
y = np.ones(5, dtype=float16)
102+
(x * y).dtype
103+
```
104+
105+
will show the issue. NumPy will produce `float64` here, PyTorch will produce
106+
`float16`, and TensorFlow will raise `InvalidArgumentError` because it does not
107+
support mixing integer and float dtypes.
108+
109+
See [this section of the standard](https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html)
110+
for more details on casting rules.
111+
112+
113+
## A portable test suite
114+
115+
With the array API standard document we are also working on a
116+
[test suite](https://github.com/data-apis/array-api-tests). This test suite
117+
will be implemented with Pytest and Hypothesis, and won't rely on any
118+
particular array implementation, and is meant to test compliance with the API
119+
standard.
120+
121+
It is still very much a work-in-progress, but the aim is to complete it by
122+
the time the community review of the API standard wraps up. However, the
123+
community is encouraged to check out the current work on the test suite on
124+
[GitHub](https://github.com/data-apis/array-api-tests) and try it out and
125+
comment on it. The
126+
[README](https://github.com/data-apis/array-api-tests/blob/master/README.md)
127+
in the test suite repo contains more information on how to run it and
128+
contribute to it.
129+
130+
The test suite will be runnable with any existing library. This can be done
131+
by specifying the array implementation namespace to be tested via an
132+
environment variable:
133+
134+
```bash
135+
$ ARRAY_API_TESTS_MODULE=jax.numpy pytest
136+
```
137+
138+
The test suite will also support vendoring so that array libraries can easily
139+
include it in their own test suites.
140+
141+
The result of running the test suite will be an overview of the level of
142+
compliance with the standard. We expect it will take time for libraries to
143+
get to 100%; anything less shouldn't just mean "fail", 98% would be a major
144+
step towards portable code compared to today.
145+
146+
147+
## People & projects
148+
149+
So who was involved in getting the API standard to this point, and which
150+
libraries do we hope will adopt this standard? The answer to the latter is
151+
"all existing and new array and tensor libraries with a Python API". As for
152+
who was involved, we were lucky to get contributions from creators and senior
153+
maintainers of almost every of interest - here's a brief description:
154+
155+
- NumPy: Stephan Hoyer and Ralf Gommers are both long-time NumPy maintainers.
156+
In addition we got to consult regularly with Travis Oliphant, creator of
157+
NumPy, on the history behind some decisions made early on in NumPy's life.
158+
- TensorFlow: Alexandre Passos was a technical lead on the TensorFlow team,
159+
and has been heavily involved until a few weeks ago. Paige Bailey is the
160+
product manager for TensorFlow APIs at Google Research. Edward Loper and
161+
Ashish Agarwal, TensorFlow maintainers, replaced Alexandre recently as
162+
Consortium members.
163+
- PyTorch: Adam Paszke is one of the co-creators of PyTorch. Ralf Gommers
164+
leads a team of engineers contributing to PyTorch.
165+
- MXNet: Sheng Zha is a long-time MXNet maintainer. Markus Weimer is an
166+
Apache PMC member and mentor for the MXNet incubation process into the
167+
Apache Foundation.
168+
- JAX: Stephan Hoyer and Adam Paszke are two maintainers of JAX.
169+
- XArray: Stephan Hoyer is one of the co-creators, and still a maintainer, of Xarray.
170+
- Dask: Tom Augspurger is a senior Dask maintainer.
171+
- CuPy: we have no active participant from CuPy. However we have talked to
172+
the CuPy team at Preferred Networks, who are supportive of the goals and
173+
committed to following NumPy's lead on APIs.
174+
- ONNX: Sheng Zha is an ONNX Steering Committee member.
175+
176+
Many other people have made contributions so far, including the Consortium
177+
members listed at https://github.com/data-apis/governance.
178+
179+
180+
## Next steps to a first complete standard
181+
182+
We are now looking for feedback from the wider community, and in particular
183+
maintainers of array libraries. For each of those libraries, a Consortium
184+
member involved in the library will be soliciting feedback from their own
185+
project. We'd like to get to the point where it's clear for each library that
186+
there are no blockers to adoption and that the overall shape of the API
187+
standard is considered valuable enough to support.
188+
189+
In addition, given that this API standard is completely new and drafting
190+
something like it hasn't been attempted before in this community, we'd love
191+
to get meta feedback - is anything missing or in need of shaping in the
192+
standard document, the goal and scope, ways to participate, or any other such
193+
topic?
194+
195+
To provide feedback on the array API standard, please open issues or pull
196+
requests on https://github.com/data-apis/array-api. For larger discussions
197+
and meta-feedback, please open GitHub Discussion topics at
198+
https://github.com/data-apis/consortium-feedback/discussions.

layouts/index.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ <h1>{{.Title}}</h1>
99
</div>
1010
<div id="action-buttons">
1111
<a class="button primary big" href="https://github.com/data-apis/consortium-feedback">Feedback</a> <a class="button outline big" href="https://github.com/data-apis" >View on Github</a>
12-
<p>We're just getting started - read <a href="/blog/announcing_the_consortium/">our announcement blog post</a> tell us what you think!</p>
12+
<p>Read <a href="/blog/announcing_the_consortium/">our announcement blog post</a> and <a href="/blog/array_api_standard_release/">draft array API standard</a> and tell us what you think!</p>
1313
</div>
1414

1515
<div id="kube-features">

0 commit comments

Comments
 (0)