|
| 1 | ++++ |
| 2 | +date = "2024-11-11T08:00:00+00:00" |
| 3 | +author = "Athan Reines" |
| 4 | +title = "CZI EOSS 6 Award to Advance Array Interoperability within the PyData Ecosystem" |
| 5 | +tags = ["APIs", "standard", "consortium", "arrays", "community", "funding", "czi", "eoss6"] |
| 6 | +categories = ["Consortium", "Standardization"] |
| 7 | +description = "The Chan Zuckerberg Initiative (CZI) awarded an EOSS Cycle 6 grant to the Data APIs Consortium to advance array interoperability within the PyData ecosystem." |
| 8 | +draft = false |
| 9 | +weight = 30 |
| 10 | ++++ |
| 11 | + |
| 12 | +We are thrilled to announce that the Chan Zuckerberg Initiative (CZI) recently |
| 13 | +awarded the Consortium for Python Data API Standards an Essential Open Source |
| 14 | +Software for Science(EOSS) Cycle 6 grant to support ongoing work within the |
| 15 | +Consortium and to accelerate the adoption of the Array API Standard across the |
| 16 | +PyData ecosystem. With this award, we'll drive forward our vision of |
| 17 | +standardizing a universal API for array operations, enhancing library |
| 18 | +interoperability, and increasing accessibility to high-performance |
| 19 | +computational resources across scientific domains. |
| 20 | + |
| 21 | +## The Importance of the EOSS Program |
| 22 | + |
| 23 | +The EOSS program by CZI was launched to support open source software that is |
| 24 | +foundational for scientific research, especially within biology and medicine. |
| 25 | +As software tools underpin modern scientific investigation, ensuring these |
| 26 | +tools receive adequate funding is crucial for sustainable growth and long-term |
| 27 | +impact. Through EOSS, CZI has committed to funding development, usability |
| 28 | +improvements, community engagement, and maintenance efforts for critical open |
| 29 | +source tools. This support enables open source software to be more accessible, |
| 30 | +reliable, and adaptable to researchers' evolving needs. |
| 31 | + |
| 32 | +With the EOSS Cycle 6 award, Quansight, in cooperation with collaborators within |
| 33 | +the Consortium and the broader ecosystem, will focus on advancing |
| 34 | +interoperability, improving ease of Array API adoption, and reducing array |
| 35 | +library fragmentation within the PyData ecosystem. |
| 36 | + |
| 37 | +## Addressing Fragmentation in the PyData Ecosystem |
| 38 | + |
| 39 | +As Python's popularity has grown, so has the number of frameworks and libraries |
| 40 | +for numerical computing, data science, and machine learning. Researchers and |
| 41 | +data science practitioners now have access to a vast suite of tools and |
| 42 | +libraries for computation, but this diversity comes with the challenge of |
| 43 | +fragmented APIs for fundamental data structures such as multidimensional |
| 44 | +arrays. While array libraries largely follow similar paradigms, their API |
| 45 | +differences present a real challenge for users who need to switch between or |
| 46 | +integrate multiple libraries in their workflows. |
| 47 | + |
| 48 | +The Consortium for Python Data API Standards, founded in 2020, addresses this |
| 49 | +issue directly. By standardizing a universal array API, the Consortium seeks to |
| 50 | +simplify the process for users moving between libraries and foster an ecosystem |
| 51 | +where array operations are seamless across libraries such as NumPy, CuPy, |
| 52 | +PyTorch, and JAX. To date, the Array API Standard has seen adoption by major |
| 53 | +libraries, laying the groundwork for an interoperable PyData ecosystem that |
| 54 | +emphasizes compatibility and ease of use. |
| 55 | + |
| 56 | +If you're curious to learn more about the Consortium, its origins, and the |
| 57 | +benefits of standardization, be sure to read our 2023 SciPy Proceedings paper |
| 58 | +["Python Array API Standard: Toward Array Interoperability in the Scientific |
| 59 | +Python Ecosystem"](https://proceedings.scipy.org/articles/gerudo-f2bc6f59-001). |
| 60 | + |
| 61 | +## Scope of Work for the EOSS 6 Award |
| 62 | + |
| 63 | +The EOSS 6 award will help the Consortium focus on key initiatives to expand |
| 64 | +adoption and improve compatibility across the ecosystem. The proposed work |
| 65 | +includes: |
| 66 | + |
| 67 | +### Array API Adoption in Downstream Libraries |
| 68 | + |
| 69 | +One of our primary goals is to further adoption of the Array API Standard in |
| 70 | +downstream libraries, such as SciPy, scikit-learn, and scikit-image. |
| 71 | +Historically, many downstream libraries have been dependent on NumPy, thus |
| 72 | +limiting their execution model to CPU-bound computation and thus their ability |
| 73 | +to leverage the performance advantages of GPU- or TPU-based computation. By |
| 74 | +adopting the Standard, downstream libraries will be able to support array |
| 75 | +libraries such as CuPy and PyTorch, empowering researchers to take advantage of |
| 76 | +the hardware acceleration options suitable to their needs. |
| 77 | + |
| 78 | +### Infrastructure for Adoption and Compliance Tracking |
| 79 | + |
| 80 | +We're also committed to building infrastructure to monitor compliance and |
| 81 | +adoption of the Array API across the ecosystem. While we have already developed |
| 82 | +a [test suite](https://github.com/data-apis/array-api-tests) to measure |
| 83 | +compliance for array libraries, this tool has been largely developer-facing, |
| 84 | +leaving end users with limited visibility into compatibility across different |
| 85 | +libraries. To address this gap, we will create public mechanisms, such as |
| 86 | +compatibility tables, for tracking which libraries are adopting the Standard |
| 87 | +and helping users make informed decisions about which libraries to use. |
| 88 | + |
| 89 | +Additionally, we plan to develop mechanisms for automating compliance tracking |
| 90 | +within array library continuous integration (CI) workflows, allowing real-time |
| 91 | +monitoring of adoption and compatibility regressions. This infrastructure will |
| 92 | +hopefully instill greater confidence among end users in array library |
| 93 | +compatibility and help array library developers maintain interoperability. |
| 94 | + |
| 95 | +### Comprehensive Documentation and Migration Guides |
| 96 | + |
| 97 | +As adoption grows, we recognize the need for high-quality documentation and |
| 98 | +migration guides to help users and developers transition seamlessly to using |
| 99 | +the Array API Standard. Through our collaborations with library maintainers, |
| 100 | +we've gathered insights into best practices for building array library-agnostic |
| 101 | +applications. With EOSS 6 funding, we'll transform these insights into |
| 102 | +tutorials, case studies, and migration guides to facilitate adoption among |
| 103 | +downstream libraries. By offering clear and accessible resources, we aim to |
| 104 | +reduce the learning curve for new users and provide developers with the tools |
| 105 | +they need to confidently build array library-agnostic applications. |
| 106 | + |
| 107 | +## Value to the Scientific Community and End Users |
| 108 | + |
| 109 | +The work funded by this award will provide significant benefits to users within |
| 110 | +the scientific research community. Our hope is that this work will yield three |
| 111 | +primary outcomes: |
| 112 | + |
| 113 | +1. **Interoperability Across Libraries**: Fragmentation within the ecosystem has |
| 114 | +often led to duplication of effort, limited access to hardware acceleration, |
| 115 | +and the need for repeated re-implementation of foundational array structures. |
| 116 | +By fostering interoperability across libraries, we aim to simplify the process |
| 117 | +of moving between technical stacks and unlock new performance gains for array |
| 118 | +library consumers. |
| 119 | + |
| 120 | +2. **Standardization and Reduced Switching Costs**: Users will benefit from |
| 121 | +shorter learning curves and lower costs associated with switching libraries. |
| 122 | +With standardized APIs and robust compliance infrastructure, users will have |
| 123 | +greater confidence that their workflows will be portable across array |
| 124 | +libraries, regardless of the underlying computational backend. |
| 125 | + |
| 126 | +3. **Enhanced Performance for Array-Consuming Libraries**: Array API adoption |
| 127 | +has [already shown](https://proceedings.scipy.org/articles/gerudo-f2bc6f59-001) |
| 128 | +promising performance improvements across several libraries in the ecosystem. |
| 129 | +For example, performance gains of up to 50x in SciPy and 10-40x in scikit-learn |
| 130 | +were observed upon integrating support for alternative array libraries such as |
| 131 | +CuPy and PyTorch. We hope to observe similar acceleration in other downstream |
| 132 | +libraries, which could dramatically reduce analysis time for computationally |
| 133 | +intensive research tasks, ultimately improving efficiency and access for users |
| 134 | +working with high-dimensional data. |
| 135 | + |
| 136 | +## Looking Forward |
| 137 | + |
| 138 | +As we embark on this phase of our work, we're excited to continue pushing |
| 139 | +forward the Array API Standard as a unifying foundation for the PyData |
| 140 | +ecosystem. Support from CZI's EOSS program is instrumental in making this |
| 141 | +vision a reality, and we're committed to expanding the impact of the Array API |
| 142 | +Standard through real-world applications and community engagement. |
| 143 | + |
| 144 | +With this award, we're not only addressing technical fragmentation but also |
| 145 | +advancing a more inclusive, accessible, and robust future for scientific |
| 146 | +computing. We look forward to collaborating with the community to make array |
| 147 | +interoperability a reality across the ecosystem and to empower researchers with |
| 148 | +tools that help them achieve scientific breakthroughs more efficiently and |
| 149 | +effectively. |
| 150 | + |
| 151 | +Stay tuned for updates as we implement these initiatives and continue to |
| 152 | +strengthen the foundations of the PyData ecosystem! |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## Funding Acknowledgment |
| 157 | + |
| 158 | +This project has been made possible in part by grant number EOSS6-0000000621 |
| 159 | +from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley |
| 160 | +Community Foundation. Athan Reines is the grant's principal investigator and |
| 161 | +Quansight Labs is the entity receiving and executing on the grant. |
0 commit comments