Skip to content

Conversation

visz11
Copy link
Collaborator

@visz11 visz11 commented Oct 3, 2025

(Explain how this PR changes mypy.)

Summary by CodeRabbit

  • Bug Fixes

    • Improved accuracy and consistency of type inference in Python 3.10 match statements.
    • Correct handling when matching on function calls, attributes, operations, dict items, and literal values.
    • Fewer false “unreachable” branches and clearer type narrowing in case arms.
    • Resolved a previously failing case involving capture patterns from async functions returning unions.
  • Tests

    • Added comprehensive match-pattern tests covering multiple scenarios to validate narrowing and branch reachability.

Copy link

Summary of Changes

Hello @visz11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in mypy's handling of match statements, specifically when the subject of the match is not a simple variable but a more complex expression. The core change involves generating unique, temporary names for these complex subjects, which allows the type checker to perform accurate type narrowing and exhaustivity checks. The inferred type information is then correctly mapped back to the original complex expression, ensuring that mypy provides precise type feedback for advanced pattern matching constructs.

Highlights

  • Improved Type Inference for Match Statements: The pull request enhances mypy's ability to correctly infer types and check exhaustivity for match statements where the subject is a complex expression, such as a function call, attribute access, or an operation.
  • Synthetic Named Expressions: A new mechanism has been introduced to create synthetic named expressions for complex match subjects. This allows mypy to analyze these expressions more effectively for type narrowing and exhaustivity, then propagate the inferred types back to the original expressions.
  • Refactored Match Statement Visitor: The visit_match_stmt method in mypy/checker.py has been refactored to delegate the creation of these synthetic named expressions to a new helper method, _make_named_statement_for_match, improving code organization and clarity.
  • New Test Cases: Several new test cases have been added to test-data/unit/check-python310.test to validate the improved type inference for various complex match subjects, including function calls, attribute access, operations, and dictionary item access.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

coderabbitai bot commented Oct 3, 2025

Walkthrough

Introduces unique dummy-name generation and a helper to consistently name match subjects in TypeChecker, refactors visit_match_stmt to use it, and expands Python 3.10 pattern-matching tests with additional scenarios and previously xfailed case enabled.

Changes

Cohort / File(s) Summary
TypeChecker match subject naming refactor
mypy/checker.py
Added private counter TypeChecker._unique_id, new method new_unique_dummy_name(namespace: str) -> str, and helper _make_named_statement_for_match(s: MatchStmt) -> Expression. Refactored visit_match_stmt to use the helper for consistent subject naming and inference.
Python 3.10 pattern matching tests
test-data/unit/check-python310.test
Renamed an xfail test to active and added multiple new test blocks covering function call subjects, attribute subjects, operation results, dict item mapping patterns, and literal value matching, with expected type reveals and unreachable branches.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant TC as TypeChecker
  participant MS as MatchStmt
  participant Expr as Subject Expression
  participant Var as Dummy Var/NameExpr

  TC->>MS: visit_match_stmt(s)
  TC->>TC: _make_named_statement_for_match(s)
  alt Subject already named or dummy provided
    TC-->>Expr: Return existing named subject
  else Need a named subject
    TC->>TC: new_unique_dummy_name("match_subject")
    TC->>Var: Create Var + NameExpr
    TC->>MS: Set s.subject_dummy
    TC-->>Expr: Return dummy NameExpr
  end
  TC->>TC: Perform case inference using named subject
  TC-->>MS: Complete type checking for cases
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibble on types with a twinkle bright,
Naming match subjects by moonlit night.
A fresh dummy tag—hop, hop—no fuss!
Tests bloom like clover around the bus.
With every case narrowed, I thump with delight—
Pattern parfait served just right. 🐰✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title appears to mirror the branch name and includes non-descriptive elements like “bugfix/st,” making it unclear what the pull request actually changes; it does not succinctly summarize the primary change of adding unique dummy name generation and refactoring match statement handling in the type checker. Please rename the PR to a concise, descriptive sentence that highlights the main change, for example: “Refactor match statement handling to use a helper for generating unique dummy names in TypeChecker.”
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clone-bugfix/st-synthetic-named-expr-in-match

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@visz11
Copy link
Collaborator Author

visz11 commented Oct 3, 2025

/refacto-visz

Copy link

refacto-visz bot commented Oct 3, 2025

Refacto is reviewing this PR. Please wait for the review comments to be posted.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors and extends the logic for handling complex subjects in match statements. By creating a synthetic named expression for subjects that are not simple literals or variables, it enables more accurate type narrowing and exhaustiveness checking. The changes are well-implemented and include a comprehensive set of new tests that cover various new scenarios. My only suggestion is a minor improvement for maintainability.

Comment on lines +5509 to +5520
expressions_to_preserve = (
# Already named - we should infer type of it as given
NameExpr,
AssignmentExpr,
# Primitive literals - their type is known, no need to name them
IntExpr,
StrExpr,
BytesExpr,
FloatExpr,
ComplexExpr,
EllipsisExpr,
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and a minor performance improvement, consider defining expressions_to_preserve as a constant at the module or class level, rather than recreating it on each call to _make_named_statement_for_match.

For example, you could define it at the module level:

_MATCH_SUBJECT_EXPRS_TO_PRESERVE: Final = (
    # Already named - we should infer type of it as given
    NameExpr,
    AssignmentExpr,
    # Primitive literals - their type is known, no need to name them
    IntExpr,
    StrExpr,
    BytesExpr,
    FloatExpr,
    ComplexExpr,
    EllipsisExpr,
)

class TypeChecker(...):
    ...
    def _make_named_statement_for_match(self, s: MatchStmt) -> Expression:
        subject = s.subject
        if isinstance(subject, _MATCH_SUBJECT_EXPRS_TO_PRESERVE):
            ...

Copy link

refacto-visz bot commented Oct 3, 2025

Code Review: Source Location Preservation in Match Statements

👍 Well Done
Unique ID Generation

Thread-safe unique identifier generation prevents naming collisions effectively.

Expression Type Coverage

Comprehensive literal expression handling improves match statement reliability.

📁 Selected files for review (2)
  • mypy/checker.py
  • test-data/unit/check-python310.test
🎯 Custom Instructions
✅ Applied Instructions
Organization Guidelines

Scope: All files

Repository Guidelines
  • Preserve source‑location info when synthesising a dummy subject.
  • When reviewing Python code for this project Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  • When reviewing Python code for this project‑As a style convention, consider the code style advocated in CEP-8

Scope: 1 specific files

Matched files

Paths [mypy/checker.py] matched 1 files:

  • mypy/checker.py
❌ Unapplied Instructions
refacto-api

Reason: Repository 'refacto-api' does not match current PR repository

pr-reviewer

Reason: Repository 'pr-reviewer' does not match current PR repository

bazel

Reason: Repository 'bazel' does not match current PR repository

devd-client

Reason: Repository 'devd-client' does not match current PR repository

📝 Additional Comments
mypy/checker.py (5)
Pattern Map Validation

Type map propagation assumes named_subject key exists without validation. Missing key access could cause KeyError during pattern matching analysis. Adding defensive key existence checks prevents runtime failures in type inference.

Standards:

  • ISO-IEC-25010-Reliability-Fault-Tolerance
  • DbC-Preconditions
Pattern Matching Optimization

Tuple creation and isinstance check executed for every match statement creates unnecessary allocation overhead. Pre-computing expressions_to_preserve as class constant eliminates repeated tuple construction. Performance impact scales with match statement frequency in codebase.

Standards:

  • ISO-IEC-25010-Performance-Efficiency-Resource-Utilization
  • Optimization-Pattern-Constant-Hoisting
Method Extraction Opportunity

Type map propagation logic could be extracted into a separate method for better testability and reusability. The pattern of copying type information from dummy subject to original subject may be needed elsewhere in the codebase.

Standards:

  • SOLID-SRP
  • Clean-Code-Functions
  • Refactoring-Extract-Method
Integer Overflow Risk

Unbounded integer increment could theoretically overflow after 2^63-1 iterations. While practically unlikely in type checking context, defensive bounds checking or modular arithmetic prevents potential naming failures in long-running processes.

Standards:

  • ISO-IEC-25010-Reliability-Maturity
  • DbC-Invariants
Unique ID Overflow

The unique ID counter lacks overflow protection and could theoretically wrap around after maximum integer value. While unlikely in practice, this could lead to non-unique dummy names in long-running processes. Consider implementing bounds checking or using UUID for guaranteed uniqueness.

Standards:

  • CWE-190
  • NIST-SSDF-PW.1

Comment on lines +5529 to +5534
name = self.new_unique_dummy_name("match")
v = Var(name)
named_subject = NameExpr(name)
named_subject.node = v
s.subject_dummy = named_subject
return named_subject
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source Location Preservation

The synthetic NameExpr creation does not preserve source location information from the original subject expression. This violates the repository guideline requiring preservation of source-location info when synthesising dummy subjects. Missing location data can impact debugging and error reporting accuracy.

            name = self.new_unique_dummy_name("match")
            v = Var(name)
            named_subject = NameExpr(name)
            named_subject.node = v
            # Preserve source location from original subject
            named_subject.line = s.subject.line
            named_subject.column = s.subject.column
            s.subject_dummy = named_subject
            return named_subject
Commitable Suggestion
Suggested change
name = self.new_unique_dummy_name("match")
v = Var(name)
named_subject = NameExpr(name)
named_subject.node = v
s.subject_dummy = named_subject
return named_subject
name = self.new_unique_dummy_name("match")
v = Var(name)
named_subject = NameExpr(name)
named_subject.node = v
# Preserve source location from original subject
named_subject.line = s.subject.line
named_subject.column = s.subject.column
s.subject_dummy = named_subject
return named_subject
Standards
  • Repo-Guideline-Preserve source‑location info when synthesising a dummy subject.
  • CWE-209

Comment on lines +5509 to +5520
expressions_to_preserve = (
# Already named - we should infer type of it as given
NameExpr,
AssignmentExpr,
# Primitive literals - their type is known, no need to name them
IntExpr,
StrExpr,
BytesExpr,
FloatExpr,
ComplexExpr,
EllipsisExpr,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhanced Pattern Coverage

Expression preservation tuple could benefit from additional literal types like ListExpr, DictExpr, SetExpr for comprehensive literal coverage. Current implementation handles basic literals but may create unnecessary dummy names for compound literals with known types.

Standards
  • SOLID-OCP
  • Clean-Code-Functions
  • Design-Pattern-Strategy

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
mypy/checker.py (2)

431-447: Reset the unique-id counter in reset()

For determinism across runs and to avoid unbounded growth when reusing TypeChecker in FG incremental mode, reset the counter.

Apply this diff within reset():

         self.expr_checker.reset()
+
+        # Reset unique dummy-name counter for determinism across runs
+        self._unique_id = 0

5506-5535: Use identifier‐safe dummy names and preserve source location

  • In _make_named_statement_for_match, copy the subject’s line/column:
@@ def _make_named_statement_for_match(self, s: MatchStmt) -> Expression:
-            named_subject.node = v
+            named_subject.node = v
+            named_subject.line = subject.line
+            named_subject.column = subject.column
  • In new_unique_dummy_name, replace hyphens with underscores:
@@ def new_unique_dummy_name(self, namespace: str) -> str:
-        name = f"dummy-{namespace}-{self._unique_id}"
+        name = f"__mypy_{namespace}_subject_{self._unique_id}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 380cb8d and ded98cb.

📒 Files selected for processing (2)
  • mypy/checker.py (7 hunks)
  • test-data/unit/check-python310.test (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
mypy/checker.py (1)
mypy/nodes.py (23)
  • ComplexExpr (1783-1797)
  • MatchStmt (1640-1667)
  • Expression (201-207)
  • NameExpr (1875-1895)
  • AssignmentExpr (2074-2087)
  • IntExpr (1696-1710)
  • StrExpr (1719-1733)
  • BytesExpr (1740-1763)
  • FloatExpr (1766-1780)
  • EllipsisExpr (1800-1806)
  • name (234-235)
  • name (353-354)
  • name (532-533)
  • name (571-577)
  • name (809-810)
  • name (907-908)
  • name (1068-1069)
  • name (2287-2288)
  • name (2572-2573)
  • name (3205-3207)
  • name (3752-3753)
  • name (3864-3865)
  • Var (973-1118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: mypyc runtime tests with py39-macos
  • GitHub Check: Test suite with py39-windows-64
  • GitHub Check: Test suite with py311-ubuntu, mypyc-compiled
  • GitHub Check: Type check our own code (py39-windows-64)
  • GitHub Check: Formatting and code style with Black + ruff
  • GitHub Check: Type check our own code (py39-ubuntu)
  • GitHub Check: Test suite with py312-ubuntu, mypyc-compiled
  • GitHub Check: Test suite with py310-ubuntu
  • GitHub Check: Test suite with py313-ubuntu, mypyc-compiled
  • GitHub Check: Test suite with py39-ubuntu, mypyc-compiled
  • GitHub Check: Run mypy_primer (2)
  • GitHub Check: Run mypy_primer (1)
  • GitHub Check: Run mypy_primer (4)
  • GitHub Check: Run mypy_primer (3)
  • GitHub Check: Run mypy_primer (0)
  • GitHub Check: Test mypyc suite with 32-bit Python
🔇 Additional comments (8)
test-data/unit/check-python310.test (6)

1305-1305: LGTM! Test case enabled.

Removing the -xfail suffix indicates this test now passes, which aligns with the PR's fix for synthetic named expressions in match statements.


2589-2601: LGTM! Comprehensive function call matching test.

This test case correctly verifies type narrowing when matching on function return values and appropriately flags unreachable code.


2602-2615: LGTM! Attribute matching test is well-structured.

This test case appropriately verifies type narrowing when matching on object attributes and correctly expects unreachable code detection.


2616-2644: LGTM! Operations matching test covers key scenarios.

This test case comprehensively verifies type narrowing for match subjects involving unary, binary, and comparison operations, with appropriate unreachable code detection.


2645-2659: LGTM! Dict item matching test is correct.

This test case appropriately verifies type narrowing when matching on dictionary item access and includes proper unreachable code detection.


2661-2671: LGTM! Edge case test for literal matching.

This test case appropriately verifies the pathological scenario of matching a literal value, including the expected type representation and unreachable code detection.

mypy/checker.py (2)

70-70: Import of ComplexExpr is correct

Needed by expressions_to_preserve in _make_named_statement_for_match.


5421-5459: Match subject naming and type-map propagation look solid

Refactoring to a helper and propagating types from the synthetic name back to s.subject avoids losing narrowing for complex subjects. Guard handling is preserved. LGTM.

Comment on lines +7915 to +7920
def new_unique_dummy_name(self, namespace: str) -> str:
"""Generate a name that is guaranteed to be unique for this TypeChecker instance."""
name = f"dummy-{namespace}-{self._unique_id}"
self._unique_id += 1
return name

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Identifier safety for dummy names

Hyphens in "dummy-match-" are non-identifier. Use an identifier-safe pattern to avoid surprises.

Apply:

-        name = f"dummy-{namespace}-{self._unique_id}"
+        name = f"__mypy_{namespace}_subject_{self._unique_id}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def new_unique_dummy_name(self, namespace: str) -> str:
"""Generate a name that is guaranteed to be unique for this TypeChecker instance."""
name = f"dummy-{namespace}-{self._unique_id}"
self._unique_id += 1
return name
def new_unique_dummy_name(self, namespace: str) -> str:
"""Generate a name that is guaranteed to be unique for this TypeChecker instance."""
name = f"__mypy_{namespace}_subject_{self._unique_id}"
self._unique_id += 1
return name
🤖 Prompt for AI Agents
In mypy/checker.py around lines 7915-7920, the generated dummy name uses hyphens
which are not valid Python identifiers; change the format to produce
identifier-safe names (e.g. use underscores or alphanumeric-only chars),
sanitize the namespace by replacing non-identifier characters with underscores,
ensure the name does not start with a digit (prefix with a letter or underscore
if needed), and keep the uniqueness mechanism (self._unique_id) intact so the
function returns something like "dummy_<sanitized_namespace>_<unique_id>".

Copy link

github-actions bot commented Oct 3, 2025

Diff from mypy_primer, showing the effect of this PR on open source code:

discord.py (https://github.com/Rapptz/discord.py)
- discord/ext/commands/hybrid.py:836: error: Overlap between argument names and ** TypedDict items: "name", "with_app_command"  [misc]
+ discord/ext/commands/hybrid.py:836: error: Overlap between argument names and ** TypedDict items: "with_app_command", "name"  [misc]
- discord/ext/commands/hybrid.py:860: error: Overlap between argument names and ** TypedDict items: "name", "with_app_command"  [misc]
+ discord/ext/commands/hybrid.py:860: error: Overlap between argument names and ** TypedDict items: "with_app_command", "name"  [misc]
- discord/ext/commands/hybrid.py:885: error: Overlap between argument names and ** TypedDict items: "name", "with_app_command"  [misc]
+ discord/ext/commands/hybrid.py:885: error: Overlap between argument names and ** TypedDict items: "with_app_command", "name"  [misc]
- discord/ext/commands/hybrid.py:937: error: Overlap between argument names and ** TypedDict items: "name", "with_app_command"  [misc]
+ discord/ext/commands/hybrid.py:937: error: Overlap between argument names and ** TypedDict items: "with_app_command", "name"  [misc]
- discord/ext/commands/bot.py:290: error: Overlap between argument names and ** TypedDict items: "name", "with_app_command"  [misc]
+ discord/ext/commands/bot.py:290: error: Overlap between argument names and ** TypedDict items: "with_app_command", "name"  [misc]
- discord/ext/commands/bot.py:314: error: Overlap between argument names and ** TypedDict items: "name", "with_app_command"  [misc]
+ discord/ext/commands/bot.py:314: error: Overlap between argument names and ** TypedDict items: "with_app_command", "name"  [misc]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants