|
1 | 1 | # PDEP-10: PyArrow as a required dependency for default string inference implementation
|
2 | 2 |
|
3 |
| -- Created: 17 April 2023 (updated May 8, 2024) |
4 |
| -- Status: Rejected |
| 3 | +- Created: 17 April 2023 |
| 4 | +- Status: Accepted |
5 | 5 | - Discussion: [#52711](https://github.com/pandas-dev/pandas/pull/52711)
|
6 | 6 | [#52509](https://github.com/pandas-dev/pandas/issues/52509)
|
7 | 7 | - Author: [Matthew Roeschke](https://github.com/mroeschke)
|
8 | 8 | [Patrick Hoefler](https://github.com/phofl)
|
9 |
| -- Revision: 2 |
| 9 | +- Revision: 1 |
10 | 10 |
|
11 | 11 | # Note
|
12 | 12 |
|
13 |
| -This PDEP was originally accepted on May 8, 2023. However, after reviewing feedback posted |
14 |
| -on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of |
15 |
| -the core team, have not decided with moving forward with this PDEP for pandas 3.0. |
16 |
| - |
17 |
| -The primary reasons for rejecting this PDEP are twofold: |
18 |
| - |
19 |
| -1) Requiring pyarrow as a dependency causes installation problems. |
20 |
| - - Pyarrow does not fit or has a hard time fitting in space-constrained environments |
21 |
| -such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compiled wheel |
22 |
| -(which is larger than pandas' own wheel sizes) |
23 |
| - - Installation of pyarrow is not possible on some platforms. We provide support for some |
24 |
| -less widely used platforms such as Alpine Linux (and there is third party support for pandas in |
25 |
| -pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for. |
26 |
| - |
27 |
| - While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing |
28 |
| -of the PDEP, we underestimated the impact this would have on users, and also downstream developers. |
29 |
| - |
30 |
| -2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency. |
31 |
| - |
32 |
| - For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics |
33 |
| - as our current default object string data type, but that allows users to experience faster performance and memory savings |
34 |
| - compared to the object strings. |
35 |
| - |
36 |
| -While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP |
37 |
| -does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe |
38 |
| -that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the |
39 |
| -ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow |
40 |
| -and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for them, allowing us |
41 |
| -to potentially revisit this decision in the future. |
42 |
| - |
43 |
| -However, at this point in time, it is clear that we are not ready to require pyarrow |
44 |
| -as a dependency in pandas. |
45 |
| - |
| 13 | +This PDEP is superseded by PDEP-15. |
46 | 14 |
|
47 | 15 | ## Abstract
|
48 | 16 |
|
@@ -246,7 +214,6 @@ before releasing a new pandas version.
|
246 | 214 |
|
247 | 215 | - 17 April 2023: Initial version
|
248 | 216 | - 8 May 2023: Changed proposal to make pyarrow required in pandas 3.0 instead of 2.1
|
249 |
| -- 8 May 2024: Changed status to rejected |
250 | 217 |
|
251 | 218 | [^1] <https://pandas.pydata.org/docs/development/roadmap.html#apache-arrow-interoperability>
|
252 | 219 | [^2] <https://arrow.apache.org/powered_by/>
|
0 commit comments