Skip to content

feat: add empty_streams filtering to standard test suites #641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

devin-ai-integration[bot]
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Jul 3, 2025

Add empty_streams filtering to standard test suites

Summary

This PR implements filtering functionality for the Python CDK standard test suites to exclude streams declared as empty_streams in acceptance-test-config.yml files. This prevents premium/unsupported streams from causing test failures in sandbox environments where connectors may not have access to all stream types.

Key Changes:

  • Added EmptyStreamConfig model and empty_streams field to ConnectorTestScenario
  • Extended get_scenarios() to include "basic_read" category for proper test parametrization
  • Implemented catalog filtering in test_basic_read() and test_docker_image_build_and_read() methods
  • Streams listed in empty_streams configuration are now excluded from configured catalogs before test execution

Review & Testing Checklist for Human

  • Test with real connector: Verify functionality works with source-amazon-ads or another connector that has empty_streams configuration in acceptance-test-config.yml
  • Backward compatibility: Ensure connectors without empty_streams config still work normally (no regressions)
  • Config structure validation: Verify EmptyStreamConfig model matches actual structure used in acceptance-test-config.yml files (name + bypass_reason fields)
  • Category extension impact: Check if adding "basic_read" to categories has any unintended side effects on test discovery or execution
  • Edge case handling: Test with malformed or missing empty_streams configs to ensure graceful handling

Recommended Test Plan:

  1. Run standard tests on source-amazon-ads with its existing empty_streams configuration
  2. Run tests on a connector without empty_streams config to verify no regressions
  3. Verify filtered streams are actually excluded from read operations during tests

Diagram

%%{ init : { "theme" : "default" }}%%
graph TB
    subgraph "Test Framework"
        A["docker_base.py<br/>get_scenarios()"]:::major-edit
        B["docker_base.py<br/>test_docker_image_build_and_read()"]:::major-edit
        C["source_base.py<br/>test_basic_read()"]:::major-edit
        D["scenario.py<br/>ConnectorTestScenario"]:::major-edit
    end
    
    subgraph "Configuration"
        E["acceptance-test-config.yml<br/>empty_streams"]:::context
    end
    
    subgraph "Test Execution"
        F["ConfiguredAirbyteCatalog<br/>filtered streams"]:::context
        G["Connector Read Operations"]:::context
    end
    
    E --> D
    D --> A
    A --> C
    A --> B
    C --> F
    B --> F
    F --> G
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit  
        L3["Context/No Edit"]:::context
    end
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • This implementation follows the same filtering pattern used elsewhere in the test framework (e.g., read_from_streams filtering)
  • The EmptyStreamConfig model structure matches the format seen in source-amazon-ads acceptance-test-config.yml
  • Changes are minimal and focused on the specific requirement to prevent sandbox test failures
  • Session requested by: @aaronsteers
  • Devin session: https://app.devin.ai/sessions/435cfd8cf7624b6d93e8a1b3a3ed2794

- Add EmptyStreamConfig model and empty_streams field to ConnectorTestScenario
- Extend get_scenarios() to include basic_read category
- Implement catalog filtering in test_basic_read and test_docker_image_build_and_read methods
- Exclude streams declared as empty_streams from test execution to prevent failures in sandbox environments

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor Author

Original prompt from AJ Steers:

@Devin - locate the new standard test suites within the Python CDK. We need to add a filter to exclude streams from tests if they are declared as having `empty_streams` in the acceptable-test-config.yml file for the connector. 

Basically this involves filtering the discovered catalog to exclude certain streams before passing it back in a configured_catalog to the source for read operations. (Reason is that our sandbox accounts often do not have support for other "premium" streams, and they fail hard in this scenario.)

Example in source-amazon-ads.

Copy link
Contributor Author

devin-ai-integration bot commented Jul 3, 2025

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added the enhancement New feature or request label Jul 3, 2025
Copy link

github-actions bot commented Jul 3, 2025

PyTest Results (Fast)

3 685 tests  ±0   3 674 ✅ ±0   6m 20s ⏱️ +2s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 86fd9a6. ± Comparison against base commit f6a38dd.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Jul 3, 2025

PyTest Results (Full)

3 688 tests   3 677 ✅  17m 54s ⏱️
    1 suites     11 💤
    1 files        0 ❌

Results for commit 86fd9a6.

♻️ This comment has been updated with latest results.

@aaronsteers aaronsteers closed this Jul 3, 2025
@aaronsteers aaronsteers reopened this Jul 3, 2025
Copy link

github-actions bot commented Jul 3, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1751411517-filter-empty-streams-from-tests#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1751411517-filter-empty-streams-from-tests

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@aaronsteers aaronsteers self-requested a review July 3, 2025 19:13
@aaronsteers
Copy link
Contributor

aaronsteers commented Jul 3, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant