Skip to content

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Nov 10, 2025

What

Given that multiple threads for the same stream starts at the same time, we do load the properties from endpoint cache multiple times

Summary by CodeRabbit

  • Refactor
    • Improved thread-safety mechanisms for endpoint property caching to enhance reliability during concurrent operations.

@github-actions github-actions bot added bug Something isn't working security labels Nov 10, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix_properties_from_endpoint_cache#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix_properties_from_endpoint_cache

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@maxi297
Copy link
Contributor Author

maxi297 commented Nov 10, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

🟦 Job completed successfully (no changes).

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 10, 2025

📝 Walkthrough

Walkthrough

Adds thread-safety to the PropertiesFromEndpoint class by introducing an RLock to protect cached property initialization. Implements double-checked locking pattern to ensure thread-safe lazy initialization of the _cached_properties cache.

Changes

Cohort / File(s) Change Summary
Thread-safety for endpoint property caching
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py
Introduces RLock initialization in __post_init__ and wraps cache population logic with lock-based double-checked locking pattern for thread-safe lazy initialization

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor T1 as Thread 1
    actor T2 as Thread 2
    participant C as PropertiesFromEndpoint
    
    par Concurrent Access
        T1->>C: Get cached properties (check 1)
        T2->>C: Get cached properties (check 1)
    and
        Note over T1,T2: Both see cache uninitialized
    end
    
    par Lock Acquisition
        T1->>C: Acquire lock
        T2->>C: Wait for lock
    and
        T1->>C: Check cache again (check 2)
        T1->>C: Populate cache
        T1->>C: Release lock
    end
    
    T2->>C: Acquire lock
    T2->>C: Check cache (already populated)
    T2->>C: Return cached value
    T2->>C: Release lock
    
    Note over C: Double-checked locking<br/>prevents race conditions
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Verify that double-checked locking pattern is correctly implemented (first check before lock, second check inside lock block)
  • Ensure RLock initialization in __post_init__ won't cause issues with object instantiation or serialization patterns used in the codebase, wdyt?

Possibly related PRs

Suggested reviewers

  • brianjlai

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding thread-safety to the properties-from-endpoint cache using locks.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/fix_properties_from_endpoint_cache

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)

28-28: Good lock initialization, though Lock might suffice?

The RLock (reentrant lock) correctly enables thread safety. Since there's no recursive acquisition pattern in get_properties_from_endpoint, would a simple threading.Lock() work just as well with slightly less overhead, or do you foresee a need for reentrancy? wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8ab340 and b0001b8.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/declarative/retrievers/retriever.py (1)
  • Retriever (14-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)

2-10: LGTM! Clean import additions.

The threading import and import reorganization support the thread-safety changes well.


34-44: Code is thread-safe as implemented.

Good news: your double-checked locking pattern already handles the concern. Here's why:

  • _get_property is thread-safe: it only reads from self and performs read-only operations on the parameter.
  • read_records is protected: While Airbyte CDK's read_records isn't guaranteed to be thread-safe, your lock ensures only one thread calls it during initialization. Subsequent calls simply return the cached result without re-acquiring the lock—no synchronization needed there.
  • Immutable initialization parameter: records_schema={} and stream_slice=None are safe read-only inputs.

The pattern correctly follows the "one instance per thread or explicit synchronization" guidance that the CDK recommends. Nice implementation!

@github-actions
Copy link

PyTest Results (Fast)

3 817 tests  ±0   3 806 ✅ +1   7m 24s ⏱️ +52s
    1 suites ±0      11 💤  - 1 
    1 files   ±0       0 ❌ ±0 

Results for commit b0001b8. ± Comparison against base commit e8ab340.

@github-actions
Copy link

PyTest Results (Full)

3 820 tests   3 808 ✅  10m 44s ⏱️
    1 suites     12 💤
    1 files        0 ❌

Results for commit b0001b8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants