Skip to content

DOC: Clarify broadcasting behavior when using lists in DataFrame arithmetic (GH18857) #61820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/user_guide/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,10 @@ either match on the *index* or *columns* via the **axis** keyword:
df.sub(column, axis="index")
df.sub(column, axis=0)

Be careful when using raw Python lists in binary operations with DataFrames.
Unlike NumPy arrays or Series, lists are not broadcast across rows or columns.
Instead, pandas attempts to match the entire list against a single axis, which may lead to confusing results such as Series of arrays.
To ensure proper broadcasting behavior, use a NumPy array or Series with explicit index or shape.
Furthermore you can align a level of a MultiIndexed DataFrame with a Series.

.. ipython:: python
Expand Down
13 changes: 13 additions & 0 deletions doc/source/user_guide/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -650,6 +650,19 @@ row-wise. For example:

df - df.iloc[0]

When using a Python list in arithmetic operations with a DataFrame, the behavior is not element-wise broadcasting.
Instead, the list is treated as a single object and the operation is performed column-wise, resulting in unexpected output (e.g. arrays inside each cell).

.. ipython:: python

df = pd.DataFrame(np.arange(6).reshape(2, 3), columns=["A", "B", "C"])

df + [1, 2, 3] # Returns a Series of arrays, not a DataFrame

df + np.array([1, 2, 3]) # Correct broadcasting

df + pd.Series([1, 2, 3], index=["A", "B", "C"]) # Also correct

For explicit control over the matching and broadcasting behavior, see the
section on :ref:`flexible binary operations <basics.binop>`.

Expand Down
Loading