Skip to content

Optimize dependency iteration #13489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

killerdevildog
Copy link

…cies

PROBLEM IDENTIFIED:
Before this optimization, dependencies were being parsed twice during candidate resolution:

  1. During _check_metadata_consistency() for validation (line 233 in original code)
  2. During iter_dependencies() for actual dependency resolution (line 258)

This caused significant performance issues because:

  • dist.iter_provided_extras() was called multiple times
  • dist.iter_dependencies() was called multiple times
  • Parsing requirements from package metadata is computationally expensive
  • The TODO comment at line 230 specifically noted this performance problem

SOLUTION IMPLEMENTED:
Added caching mechanism with two new instance variables:

  • _cached_dependencies: stores list[Requirement] after parsing once
  • _cached_extras: stores list[NormalizedName] after parsing once

HOW THE CACHING WORKS:

  1. Cache variables are initialized as None in init()
  2. During _prepare() -> _check_metadata_consistency(), dependencies are parsed and cached during validation
  3. During iter_dependencies(), the cached results are reused via _get_cached_dependencies()
  4. Cache is populated lazily - only when first accessed
  5. Subsequent calls to iter_dependencies() use cached data (no re-parsing)
  6. Each candidate instance has its own cache (thread-safe)

ADDITIONAL OPTIMIZATIONS:

  • Also optimized ExtrasCandidate.iter_dependencies() to cache iter_provided_extras() results
  • Ensures consistency between validation and dependency resolution phases

TESTING PERFORMED:

  1. Created comprehensive test script (test_performance_optimization.py)
  2. Used mock objects to verify iter_provided_extras() and iter_dependencies() are called at most once
  3. Verified pip install --dry-run works correctly with caching
  4. Test results showed 0 additional calls to parsing methods during multiple iter_dependencies() invocations
  5. Functional testing confirmed dependency resolution still works correctly

PERFORMANCE IMPACT:

  • Eliminates duplicate parsing during metadata consistency checks
  • Reduces CPU time for packages with complex dependency trees
  • Especially beneficial for packages with many dependencies
  • Memory overhead is minimal (only stores parsed results, not raw metadata)

Resolves TODO comment about performance in candidates.py line 230

…cies

PROBLEM IDENTIFIED:
Before this optimization, dependencies were being parsed twice during
candidate resolution:
1. During _check_metadata_consistency() for validation (line 233 in original code)
2. During iter_dependencies() for actual dependency resolution (line 258)

This caused significant performance issues because:
- dist.iter_provided_extras() was called multiple times
- dist.iter_dependencies() was called multiple times
- Parsing requirements from package metadata is computationally expensive
- The TODO comment at line 230 specifically noted this performance problem

SOLUTION IMPLEMENTED:
Added caching mechanism with two new instance variables:
- _cached_dependencies: stores list[Requirement] after parsing once
- _cached_extras: stores list[NormalizedName] after parsing once

HOW THE CACHING WORKS:
1. Cache variables are initialized as None in __init__()
2. During _prepare() -> _check_metadata_consistency(), dependencies are parsed
   and cached during validation
3. During iter_dependencies(), the cached results are reused via
   _get_cached_dependencies()
4. Cache is populated lazily - only when first accessed
5. Subsequent calls to iter_dependencies() use cached data (no re-parsing)
6. Each candidate instance has its own cache (thread-safe)

ADDITIONAL OPTIMIZATIONS:
- Also optimized ExtrasCandidate.iter_dependencies() to cache
  iter_provided_extras() results
- Ensures consistency between validation and dependency resolution phases

TESTING PERFORMED:
1. Created comprehensive test script (test_performance_optimization.py)
2. Used mock objects to verify iter_provided_extras() and iter_dependencies()
   are called at most once
3. Verified pip install --dry-run works correctly with caching
4. Test results showed 0 additional calls to parsing methods during multiple
   iter_dependencies() invocations
5. Functional testing confirmed dependency resolution still works correctly

PERFORMANCE IMPACT:
- Eliminates duplicate parsing during metadata consistency checks
- Reduces CPU time for packages with complex dependency trees
- Especially beneficial for packages with many dependencies
- Memory overhead is minimal (only stores parsed results, not raw metadata)

Resolves TODO comment about performance in candidates.py line 230
Added news fragment documenting the performance improvement that caches
parsed dependencies and extras to eliminate redundant parsing operations
during candidate evaluation in the dependency resolution process.
@notatallshaw
Copy link
Member

Thanks for your PR to pip. Please be aware all maintainers are volunteers so it may take a moment for someone to review.

Let me know if you need any help fixing the the linting and pre-commit errors, you should be able to locally run them following this guide: https://pip.pypa.io/en/stable/development/getting-started/#running-linters

Also, are there real world scenarios that inspired you to fix this? Or is this more of an academic exercise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants