Skip to content

Conversation

@vyasr
Copy link
Contributor

@vyasr vyasr commented Nov 8, 2025

Description

#19072 made a lot of the necessary fixes, but polars was not actually added to the pre-commit mypy environment so we haven't been checking since then. As a result, some new issues have crept in, and #20272 removed various ignores that are required for polars type safety but mypy didn't know that without polars available.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

vyasr and others added 10 commits November 8, 2025 18:54
Fixed 8 mypy errors where accessing .fields and .inner attributes on polars
DataType subclasses after runtime checks. The issue was that these attributes
return DataTypeClass | DataType, but consuming code expects DataType.

Fixed by casting the result of attribute access to pl.DataType:
- datatype.py: Cast dtype.inner and field.dtype in _dtype_to_header (2 locations)
- datatype.py: Cast field.dtype and .inner in children property (2 locations)
- boolean.py: Cast .inner when unwrapping List column (1 location)
- string.py: Cast .fields when iterating Struct fields (1 location)
- struct.py: Cast .fields when accessing Struct metadata (2 locations)
- to_ast.py: Cast .inner when extracting List inner type (1 location)

Note: isinstance() checks already narrow the parent type to pl.Struct/pl.List,
so we only need to cast the attribute result, not the parent object.

Total mypy errors: 39 → 33

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Cast time_unit parameters to Literal["ns", "us", "ms"] in pl.Datetime and
pl.Duration constructors. These casts are safe because _dtype_to_header() only
writes these three values and _from_polars() proves via exhaustive if/elif
chains that only these values exist.

Reduces mypy errors from 33 to 31.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add assertion and cast for dtype.precision in _dtype_to_header to work around
incorrect polars type stubs where precision is typed as int | None. At runtime,
Decimal precision is never None.

This workaround is gated by POLARS_VERSION_LT_136 check and can be removed when
upgrading to polars >= 1.36, which will include the upstream fix.

Upstream fix: pola-rs/polars#25227

Reduces mypy errors from 31 to 30.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Changed the polars_dtype parameter type from pl.DataType to PolarsDataType
(DataTypeClass | DataType) to accept both type classes (e.g., pl.Int64) and
instances (e.g., pl.Int64()), which polars allows interchangeably.

Added runtime conversion logic to instantiate type classes when needed, with
a cast to help mypy understand the result is always a DataType instance.

Also updated the test helper _as_instance() signature to accept PolarsDataType.

Fixes Category C errors (test DataType instantiation bugs).

Reduces mypy errors from 30 to 25.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Added casts for pytest-parameterized literal string arguments that need to
match specific literal types:

1. mapping_strategy parameter cast to Literal["group_to_rows", "join", "explode"]
2. closed parameter cast to Literal["left", "right", "both", "none"]

These casts are safe because pytest.mark.parametrize ensures only the specified
literal values are passed to the test functions.

Fixes Category D errors (test parameterized literals).

Reduces mypy errors from 25 to 23.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Cast test parameters to expected Literal types where pytest.mark.parametrize
guarantees only specified values are passed:

- test_rolling.py:391 - Cast fill_null strategy to Literal["forward", "backward"]
- test_config.py:538 - Cast test object() to bool for validation testing
- test_shuffle.py:80 - Cast "cpu" engine to Literal (undocumented polars value)

These are safe casts since:
1. pytest parametrize ensures only specified literal values are used
2. Test code intentionally passes invalid types to verify validators catch them
3. Undocumented polars engine values work at runtime but aren't in stubs

Progress: 42 → 21 errors remaining (50% reduction overall)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add type: ignore[method-assign, assignment] to 3 lines in plugin.py
that monkey-patch polars.LazyFrame.collect for GPU test infrastructure.

This is a known mypy limitation (python/mypy#2427) where mypy cannot
express reassigning an overloaded method with a partialmethod descriptor.
The runtime behavior is correct and this is standard pytest plugin pattern.

Fixed 6 errors (20 → 14 remaining)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add type: ignore comments to 5 lines in asserts.py where dict unpacking
with OptimizationArgs causes mypy errors:

- Lines 115, 116: lazydf.collect(**kwargs) unpacking [misc, call-overload]
- Line 139: assert_frame_equal(**tol_kwargs) multiple dict unpacking [arg-type]
- Lines 297, 310: lazydf.collect(**kwargs) in assert_collect_raises [misc, call-overload]

This is a known mypy limitation with dict unpacking when keys are Literal
types. Runtime behavior is correct - the dicts contain valid optimization
flags that polars.LazyFrame.collect() accepts as keyword arguments.

Fixed 9 errors (14 → 5 remaining)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fixed Category H (Cross-Library Types) errors:
- ir.py:601: Added version-gated type ignore for map[str] -> pl.Series
  (polars stubs incorrectly don't accept iterators, fixed upstream in
  pola-rs/polars#25228)
- sort.py:101-104: Added type ignores for plc.Column -> pl.Series
  (cross-library Arrow C Data Interface protocol boundaries)

All 42 original mypy errors in cudf_polars are now resolved.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@vyasr vyasr self-assigned this Nov 8, 2025
@vyasr vyasr requested review from a team as code owners November 8, 2025 18:57
@vyasr vyasr added the improvement Improvement / enhancement to an existing function label Nov 8, 2025
@vyasr vyasr requested a review from bdice November 8, 2025 18:57
@vyasr vyasr added the non-breaking Non-breaking change label Nov 8, 2025
@vyasr vyasr requested a review from galipremsagar November 8, 2025 18:57
@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Nov 8, 2025
@vyasr
Copy link
Contributor Author

vyasr commented Nov 8, 2025

FYI @mroeschke @TomAugspurger

@GPUtester GPUtester moved this to In Progress in cuDF Python Nov 8, 2025
# these type ignores are needed because the type checker doesn't
# see that these equality checks passing imply a specific type for each child field.
# Type checker doesn't narrow polars_type through plc_type.id() checks
if self.plc_type.id() == plc.TypeId.STRUCT:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider implementing some TypeGuard functions for our DataType classes so that we can get better narrowing if others prefer that approach to casting like this.

@vyasr
Copy link
Contributor Author

vyasr commented Nov 11, 2025

/merge

@rapids-bot rapids-bot bot merged commit 1060d8b into rapidsai:main Nov 12, 2025
138 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Nov 12, 2025
@vyasr vyasr deleted the fix/typing_polars branch November 12, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants