Skip to content

new: decrease number of vectors where possible #937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

joein
Copy link
Member

@joein joein commented Mar 25, 2025

this pr contains all the places where we can decrease the number of vectors in congruence tests without sacrificing payload filters

most of the tests do not bring any significant changes

the most time consuming tests (locally):

test_query.py 56.33 -> 52.52 (test_sparse_query, test_flat_query_sparse_interface)
test_retrieve.py 9.45 -> 1.43
test_sparse_recommend.py 18.6 -> 8.46
test_sparse_search.py 39.45 -> 37.04

the others reducing time spent even less, added just for coherence and accumulated time savings

also time_migrate.py 36.28 -> 14.79 (since it is used in both regular and backward compat tests, saves up to 30 seconds)

Copy link

netlify bot commented Mar 25, 2025

Deploy Preview for poetic-froyo-8baba7 ready!

Name Link
🔨 Latest commit 3ee6865
🔍 Latest deploy log https://app.netlify.com/sites/poetic-froyo-8baba7/deploys/67e3b3f7aec16e0008617fdd
😎 Deploy Preview https://deploy-preview-937--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@joein joein marked this pull request as draft March 25, 2025 21:13
Copy link

coderabbitai bot commented Mar 25, 2025

📝 Walkthrough

Walkthrough

The pull request updates multiple test files by explicitly passing a fixed numeric value to fixture generation functions. In many tests, calls to functions such as generate_fixtures, generate_sparse_fixtures, and generate_multivector_fixtures now include an explicit integer (typically 100 or 10) to determine the number of fixtures or vectors generated. The changes replace previous behavior that depended on defaults or constants, standardizing the test data creation. Additionally, one test file replaces a module-level constant with a local variable set to 100. A new test function is introduced to validate recommendation filters in sparse data tests. The integration test script has been adjusted to add the -x option to the pytest commands, ensuring the tests stop after the first failure.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/congruence_tests/test_sparse_recommend.py (1)

260-280: New test case for filter recommendations

This new test function adds important coverage for recommendation filtering functionality, which is a valuable addition to the test suite.

Consider explicitly specifying the number of fixtures in line 261 similar to the other tests, for consistency:

-    fixture_points = generate_sparse_fixtures()
+    fixture_points = generate_sparse_fixtures(100)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 983fce8 and 6107691.

📒 Files selected for processing (13)
  • tests/congruence_tests/test_collections.py (2 hunks)
  • tests/congruence_tests/test_discovery.py (1 hunks)
  • tests/congruence_tests/test_group_search.py (2 hunks)
  • tests/congruence_tests/test_optional_vectors.py (3 hunks)
  • tests/congruence_tests/test_query.py (17 hunks)
  • tests/congruence_tests/test_query_batch.py (3 hunks)
  • tests/congruence_tests/test_recommendation.py (1 hunks)
  • tests/congruence_tests/test_retrieve.py (2 hunks)
  • tests/congruence_tests/test_search.py (2 hunks)
  • tests/congruence_tests/test_sparse_idf_search.py (2 hunks)
  • tests/congruence_tests/test_sparse_recommend.py (3 hunks)
  • tests/congruence_tests/test_sparse_search.py (1 hunks)
  • tests/integration-tests.sh (1 hunks)
🧰 Additional context used
🧬 Code Definitions (7)
tests/congruence_tests/test_recommendation.py (2)
tests/congruence_tests/test_search_distance_matrix.py (1)
  • fixture_points (22-23)
tests/congruence_tests/test_common.py (1)
  • generate_fixtures (117-132)
tests/congruence_tests/test_sparse_recommend.py (2)
tests/congruence_tests/test_common.py (1)
  • generate_sparse_fixtures (135-153)
tests/congruence_tests/test_recommendation.py (1)
  • TestSimpleRecommendation (23-203)
tests/congruence_tests/test_sparse_search.py (2)
tests/congruence_tests/test_sparse_discovery.py (1)
  • fixture_points (27-28)
tests/congruence_tests/test_common.py (1)
  • generate_sparse_fixtures (135-153)
tests/congruence_tests/test_discovery.py (2)
tests/congruence_tests/test_search_distance_matrix.py (1)
  • fixture_points (22-23)
tests/congruence_tests/test_common.py (1)
  • generate_fixtures (117-132)
tests/congruence_tests/test_group_search.py (2)
tests/congruence_tests/test_search_distance_matrix.py (1)
  • fixture_points (22-23)
tests/congruence_tests/test_common.py (1)
  • generate_fixtures (117-132)
tests/congruence_tests/test_query_batch.py (1)
tests/congruence_tests/test_common.py (3)
  • generate_sparse_fixtures (135-153)
  • generate_multivector_fixtures (156-172)
  • generate_fixtures (117-132)
tests/congruence_tests/test_query.py (1)
tests/congruence_tests/test_common.py (3)
  • generate_fixtures (117-132)
  • generate_sparse_fixtures (135-153)
  • generate_multivector_fixtures (156-172)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Redirect rules - poetic-froyo-8baba7
  • GitHub Check: Header rules - poetic-froyo-8baba7
  • GitHub Check: Pages changed - poetic-froyo-8baba7
  • GitHub Check: Python 3.13.x on ubuntu-latest test
  • GitHub Check: Python 3.12.x on ubuntu-latest test
  • GitHub Check: Python 3.11.x on ubuntu-latest test
  • GitHub Check: Python 3.10.x on ubuntu-latest test
  • GitHub Check: Python 3.9.x on ubuntu-latest test
  • GitHub Check: Python 3.13 test
🔇 Additional comments (26)
tests/integration-tests.sh (2)

41-43: Consistent Test Halt Behavior with -x Option

The addition of the -x flag to the pytest command (line 42) ensures that the test suite stops after the first failure. This can speed up the feedback loop during development. Please confirm that this behavior is intended for all environments, as halting on the first failure might mask additional errors during full test runs.


43-45: Uniform Application of the -x Flag Across Conditional Branches

Both branches explicitly pass -x in the pytest commands (lines 42 and 44), standardizing the test run behavior regardless of whether backwards compatibility tests are skipped. Ensure that this change aligns with the overall testing strategy, especially if any downstream CI/CD processes expect the full suite to complete even when failures occur.

tests/congruence_tests/test_sparse_idf_search.py (2)

39-44: Standardizing test vector count to improve efficiency.

Adding an explicit parameter of 100 vectors standardizes the test data size, which aligns with the PR objective of decreasing the number of vectors where possible. This will make the test more efficient while maintaining adequate coverage.


93-98: Consistent vector count applied to persistence test.

Good consistency by applying the same vector count (100) to the persistence test. This ensures uniform test behavior between the simple search and persistence tests.

tests/congruence_tests/test_recommendation.py (1)

277-278: Standardized test vector count for NaN handling test.

Explicitly specifying 100 vectors for this test ensures consistency with other tests and reduces resource usage, aligning with the PR objective of decreasing vector count where possible.

tests/congruence_tests/test_retrieve.py (2)

16-18: Reduced vector count to improve test efficiency.

Changing from a higher number to 100 vectors will make the retrieval test more efficient while still providing adequate test coverage. This change is consistent with other similar modifications across the test suite.


76-78: Applied consistent vector count reduction to sparse retrieve test.

Good consistency by applying the same vector count (100) to the sparse retrieval test. This creates uniformity across different test scenarios while improving test execution speed.

tests/congruence_tests/test_discovery.py (1)

375-376: Standardized vector count for NaN error case testing.

Explicitly specifying 100 vectors for this NaN handling test ensures consistency with other similar tests across the test suite. This maintains proper test coverage while improving efficiency.

From reviewing all changes, this appears to be part of a consistent strategy to standardize test fixture generation across the codebase.

tests/congruence_tests/test_sparse_search.py (1)

291-291: Explicitly setting the number of fixtures to 100 aligns with the PR objective.

The addition of the explicit count parameter to generate_sparse_fixtures function makes the test more consistent and helps reduce the number of vectors used in testing, which was the goal of this PR.

tests/congruence_tests/test_optional_vectors.py (4)

20-21: Great improvement by making the vector count explicit.

Replacing the imported constant with a local variable and explicitly passing it to the generate_fixtures function makes the test more self-contained and readable. This change supports the PR's goal of decreasing vector counts where possible.


29-29: Consistent use of the local variable for vector count.

The list comprehension now uses the local num_vectors variable which ensures consistency throughout the test.


93-94: Consistent approach applied to sparse vectors test.

Using the same local variable pattern for the sparse vectors test maintains consistency across both test functions.


112-112: Consistent use of variable in the second list comprehension.

This change ensures all vector-related ranges in the test use the same count variable.

tests/congruence_tests/test_search.py (2)

323-324: Explicitly using a smaller number of fixtures (10) for this invalid vector type test.

This is a good optimization since testing invalid vector types doesn't require a large number of fixtures. This supports the PR objective of reducing vector counts where appropriate.


340-341: Using a smaller vector count (10) for NaN value test.

Appropriate reduction in vector count for this test case, which only needs to verify behavior with NaN values and doesn't require numerous vectors.

tests/congruence_tests/test_group_search.py (2)

219-220: Explicitly setting fixture count to 100 while maintaining vector size.

This change makes the test more explicit about resource usage while preserving the original vector size parameter. Good implementation of the PR objective.


247-247: Consistent use of explicit fixture count (100).

This second instance of generate_fixtures is also updated to use the same explicit count, ensuring consistency throughout the test file.

tests/congruence_tests/test_sparse_recommend.py (3)

207-207: Standardizing fixture generation count

Explicitly passing 100 as the parameter to generate_sparse_fixtures() improves test consistency by specifying a fixed number of test vectors rather than relying on the default.


209-209: Reducing secondary collection size to optimize test

Good change reducing the secondary collection size from 100 to 50 points, which aligns with the PR objective to decrease the number of vectors used in tests where possible.


296-296: Optimizing test fixture count

Good change reducing the number of fixtures from default to just 10 for the negative test case, which is sufficient for testing error handling with NaN values.

tests/congruence_tests/test_collections.py (2)

21-21: Standardizing fixture generation count

Explicitly setting the number of fixtures to 100 provides consistent test behavior and memory usage across test runs.


109-109: Consistent explicit fixture sizing

Good change adding an explicit count parameter to generate_fixtures() while maintaining the vectors_sizes parameter, ensuring consistent behavior across all test cases.

tests/congruence_tests/test_query_batch.py (3)

138-138: Consistent fixture count specification

Explicitly specifying 100 fixtures rather than relying on the default value aligns with the PR's objective to standardize vector counts across tests.


152-152: Standardized multivector fixture count

Good standardization of the multivector fixture count to match the pattern established in other tests.


167-167: Consistent dense fixture count specification

The explicit count parameter for dense fixtures ensures consistent test environment across the test suite.

tests/congruence_tests/test_query.py (1)

766-766: Systematic standardization of fixture counts across query tests

These changes systematically specify exact vector counts (either 100 or 10 depending on the test requirements) rather than relying on default values. This approach:

  1. Makes the tests more deterministic and explicit
  2. Reduces overall vector count where appropriate
  3. Standardizes the testing approach across the codebase
  4. Aligns with the PR objective to decrease vectors where possible

The consistent application of this pattern across all test functions demonstrates a thoughtful approach to optimizing the test suite.

Also applies to: 830-830, 844-844, 878-878, 890-890, 927-927, 954-954, 966-966, 1027-1027, 1044-1044, 1055-1055, 1068-1068, 1273-1273, 1295-1295, 1338-1338, 1354-1354, 1366-1366

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants