[SPARK-42826][3.4][FOLLOWUP][PS][DOCS] Update migration notes for pan…

…das API on Spark ### What changes were proposed in this pull request? This is follow-up for apache#40459 to fix the incorrect information and to elaborate more detailed changes. - We're not fully support the pandas 2.0.0, so the information "Pandas API on Spark follows for the pandas 2.0" is not correct. - We should list all the APIs that no longer support `inplace` parameter. ### Why are the changes needed? Correctness for migration notes. ### Does this PR introduce _any_ user-facing change? No, only updating migration notes. ### How was this patch tested? The existing CI should pass Closes apache#41207 from itholic/migration_guide_followup. Authored-by: itholic <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
lyft · Oct 10, 2023 · a700cf7 · a700cf7
1 parent 9f559de
commit a700cf7
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/python/docs/source/migration_guide/pyspark_upgrade.rst b/python/docs/source/migration_guide/pyspark_upgrade.rst
@@ -33,8 +33,8 @@ Upgrading from PySpark 3.3 to 3.4
 * In Spark 3.4, the ``Series.concat`` sort parameter will be respected to follow pandas 1.4 behaviors.
 * In Spark 3.4, the ``DataFrame.__setitem__`` will make a copy and replace pre-existing arrays, which will NOT be over-written to follow pandas 1.4 behaviors.
 * In Spark 3.4, the ``SparkSession.sql`` and the Pandas on Spark API ``sql`` have got new parameter ``args`` which provides binding of named parameters to their SQL literals.
-* In Spark 3.4, Pandas API on Spark follows for the pandas 2.0, and some APIs were deprecated or removed in Spark 3.4 according to the changes made in pandas 2.0. Please refer to the [release notes of pandas](https://pandas.pydata.org/docs/dev/whatsnew/) for more details.
 * In Spark 3.4, the custom monkey-patch of ``collections.namedtuple`` was removed, and ``cloudpickle`` was used by default. To restore the previous behavior for any relevant pickling issue of ``collections.namedtuple``, set ``PYSPARK_ENABLE_NAMEDTUPLE_PATCH`` environment variable to ``1``.
+* In Spark 3.4, the ``inplace`` parameter is no longer supported for Pandas API on Spark API ``add_categories``, ``remove_categories``, ``remove_unused_categories``, ``rename_categories``, ``reorder_categories``, ``set_categories`` to follow pandas 2.0.0 behaviors.
 
 
 Upgrading from PySpark 3.2 to 3.3