-
Notifications
You must be signed in to change notification settings - Fork 991
Add Polars to mypy environment and fix errors #20563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixed 8 mypy errors where accessing .fields and .inner attributes on polars DataType subclasses after runtime checks. The issue was that these attributes return DataTypeClass | DataType, but consuming code expects DataType. Fixed by casting the result of attribute access to pl.DataType: - datatype.py: Cast dtype.inner and field.dtype in _dtype_to_header (2 locations) - datatype.py: Cast field.dtype and .inner in children property (2 locations) - boolean.py: Cast .inner when unwrapping List column (1 location) - string.py: Cast .fields when iterating Struct fields (1 location) - struct.py: Cast .fields when accessing Struct metadata (2 locations) - to_ast.py: Cast .inner when extracting List inner type (1 location) Note: isinstance() checks already narrow the parent type to pl.Struct/pl.List, so we only need to cast the attribute result, not the parent object. Total mypy errors: 39 → 33 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Cast time_unit parameters to Literal["ns", "us", "ms"] in pl.Datetime and pl.Duration constructors. These casts are safe because _dtype_to_header() only writes these three values and _from_polars() proves via exhaustive if/elif chains that only these values exist. Reduces mypy errors from 33 to 31. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add assertion and cast for dtype.precision in _dtype_to_header to work around incorrect polars type stubs where precision is typed as int | None. At runtime, Decimal precision is never None. This workaround is gated by POLARS_VERSION_LT_136 check and can be removed when upgrading to polars >= 1.36, which will include the upstream fix. Upstream fix: pola-rs/polars#25227 Reduces mypy errors from 31 to 30. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Changed the polars_dtype parameter type from pl.DataType to PolarsDataType (DataTypeClass | DataType) to accept both type classes (e.g., pl.Int64) and instances (e.g., pl.Int64()), which polars allows interchangeably. Added runtime conversion logic to instantiate type classes when needed, with a cast to help mypy understand the result is always a DataType instance. Also updated the test helper _as_instance() signature to accept PolarsDataType. Fixes Category C errors (test DataType instantiation bugs). Reduces mypy errors from 30 to 25. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added casts for pytest-parameterized literal string arguments that need to match specific literal types: 1. mapping_strategy parameter cast to Literal["group_to_rows", "join", "explode"] 2. closed parameter cast to Literal["left", "right", "both", "none"] These casts are safe because pytest.mark.parametrize ensures only the specified literal values are passed to the test functions. Fixes Category D errors (test parameterized literals). Reduces mypy errors from 25 to 23. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Cast test parameters to expected Literal types where pytest.mark.parametrize guarantees only specified values are passed: - test_rolling.py:391 - Cast fill_null strategy to Literal["forward", "backward"] - test_config.py:538 - Cast test object() to bool for validation testing - test_shuffle.py:80 - Cast "cpu" engine to Literal (undocumented polars value) These are safe casts since: 1. pytest parametrize ensures only specified literal values are used 2. Test code intentionally passes invalid types to verify validators catch them 3. Undocumented polars engine values work at runtime but aren't in stubs Progress: 42 → 21 errors remaining (50% reduction overall) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add type: ignore[method-assign, assignment] to 3 lines in plugin.py that monkey-patch polars.LazyFrame.collect for GPU test infrastructure. This is a known mypy limitation (python/mypy#2427) where mypy cannot express reassigning an overloaded method with a partialmethod descriptor. The runtime behavior is correct and this is standard pytest plugin pattern. Fixed 6 errors (20 → 14 remaining) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add type: ignore comments to 5 lines in asserts.py where dict unpacking with OptimizationArgs causes mypy errors: - Lines 115, 116: lazydf.collect(**kwargs) unpacking [misc, call-overload] - Line 139: assert_frame_equal(**tol_kwargs) multiple dict unpacking [arg-type] - Lines 297, 310: lazydf.collect(**kwargs) in assert_collect_raises [misc, call-overload] This is a known mypy limitation with dict unpacking when keys are Literal types. Runtime behavior is correct - the dicts contain valid optimization flags that polars.LazyFrame.collect() accepts as keyword arguments. Fixed 9 errors (14 → 5 remaining) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Fixed Category H (Cross-Library Types) errors: - ir.py:601: Added version-gated type ignore for map[str] -> pl.Series (polars stubs incorrectly don't accept iterators, fixed upstream in pola-rs/polars#25228) - sort.py:101-104: Added type ignores for plc.Column -> pl.Series (cross-library Arrow C Data Interface protocol boundaries) All 42 original mypy errors in cudf_polars are now resolved. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Contributor
Author
galipremsagar
approved these changes
Nov 8, 2025
bdice
approved these changes
Nov 8, 2025
vyasr
commented
Nov 8, 2025
| # these type ignores are needed because the type checker doesn't | ||
| # see that these equality checks passing imply a specific type for each child field. | ||
| # Type checker doesn't narrow polars_type through plc_type.id() checks | ||
| if self.plc_type.id() == plc.TypeId.STRUCT: |
Contributor
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider implementing some TypeGuard functions for our DataType classes so that we can get better narrowing if others prefer that approach to casting like this.
Matt711
requested changes
Nov 8, 2025
TomAugspurger
approved these changes
Nov 10, 2025
Matt711
approved these changes
Nov 10, 2025
Contributor
Author
|
/merge |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cudf-polars
Issues specific to cudf-polars
improvement
Improvement / enhancement to an existing function
non-breaking
Non-breaking change
Python
Affects Python cuDF API.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
#19072 made a lot of the necessary fixes, but polars was not actually added to the pre-commit mypy environment so we haven't been checking since then. As a result, some new issues have crept in, and #20272 removed various ignores that are required for polars type safety but mypy didn't know that without polars available.
Checklist