Skip to content

feat: skip config validation during discovery for declarative sources that don't use DynamicSchemaLoader #464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

devin-ai-integration[bot]
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Apr 8, 2025

Preface from AJ

This came up in the S.H.I.T. list and it's something @bnchrch and I have discussed a bit. It came up again today in Slack, raised as an inquiry from Michel. I decided to have Devin to a spike, and it looks like this could be close to shippable.

What changes:

  1. Declaratives sources should be able to run --discover without a config, unless they use DynamicSchemaLoader, in which case, they will require config.
  2. This only affects the discover verb. Other verbs are unchanged.

Prior art:

Devin-Created Summary: Allow Airbyte sources to run discovery unprivileged with DynamicSchemaLoader

This PR allows Airbyte sources to run discovery unprivileged if the source API doesn't need auth in order to provide the catalog info.

Implementation

  1. Added automatic detection of DynamicSchemaLoader to skip config validation during discovery. When a source uses the dynamic schema feature, we assume that the schema endpoint might not require authentication and automatically skip config validation during discovery.

  2. Modified the AirbyteEntrypoint to make the --config parameter optional for the discover command. This allows running discovery without providing a config file when the source doesn't require config validation.

This enables sources with dynamic schemas to provide catalog information without authentication when the schema endpoint doesn't require it.

Testing

Added unit tests to verify that:

  • Sources using DynamicSchemaLoader skip config validation during discovery
  • Sources not using DynamicSchemaLoader still require config validation
  • The AirbyteEntrypoint correctly handles discovery without config when appropriate

Link to Devin run: https://app.devin.ai/sessions/e6aa19df336347919b6cabcff7143a1c
Requested by: Aaron ("AJ") Steers ([email protected])

Copy link
Contributor Author

Original prompt from Aaron:

@Devin - We want to allow Airbyte sources to run discovery unprivileged if the source API doesn't need auth in order to provide the catalog info. Please try to update the declarative manifest implementation in the CDK to succeed in running `discover` even if no auth ("config") is provided by the user. Make sure you test up front that running discovery does indeed fail. Then try to update the connector to skip the check for config. We can use a flag in the source class (or another suitable location) to tell the implementation explicitly to require config or not during discovery.

Copy link
Contributor Author

devin-ai-integration bot commented Apr 8, 2025

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add "(aside)" to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added the enhancement New feature or request label Apr 8, 2025
@aaronsteers aaronsteers changed the title feat: skip config validation during discovery for sources with DynamicSchemaLoader feat: skip config validation during discovery for sources with DynamicSchemaLoader (do not merge) Apr 8, 2025
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
@aaronsteers aaronsteers marked this pull request as draft April 8, 2025 22:35
@devin-ai-integration devin-ai-integration bot changed the title feat: skip config validation during discovery for sources with DynamicSchemaLoader (do not merge) feat: skip config validation during discovery for sources with DynamicSchemaLoader Apr 8, 2025
@aaronsteers aaronsteers changed the title feat: skip config validation during discovery for sources with DynamicSchemaLoader feat: skip config validation during discovery for declarative sources that don't use DynamicSchemaLoader Apr 8, 2025
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a few rounds of review and I think this is looking well. I'll open up now to other reviewers and mark ready for review.

@@ -141,19 +141,35 @@ def run(self, parsed_args: argparse.Namespace) -> Iterable[str]:
)
if cmd == "spec":
message = AirbyteMessage(type=Type.SPEC, spec=source_spec)
yield from [
yield from (
Copy link
Contributor

@aaronsteers aaronsteers Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside - Note: I corrected Devin's implementation to use generator comprehension instead of list comprehension, and Devin applies in these adjacent locations as well. I think this is a positive change, calling out though to explain why other code paths are touched.

More about generator comprehensions here: https://stackoverflow.com/a/47826

@aaronsteers aaronsteers marked this pull request as ready for review April 9, 2025 00:07
@aaronsteers aaronsteers requested a review from bnchrch April 9, 2025 00:11
@aaronsteers
Copy link
Contributor

aaronsteers commented Apr 9, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@aaronsteers aaronsteers requested a review from lazebnyi April 9, 2025 01:44
Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am moving this back to draft status and will dive deeper later when time permits.

Directionally, I think this is doable and makes sense, and most of the code here in the PR is correct, I believe. But I want to add some tests and make sure we're getting the failure behaviors we expect.

Will move future work to this PR with me as author:

Feel free to drop comments/suggestions here on this PR and I'll make sure to consider them in future work.

check_config_against_spec: bool = True
"""Configure whether `check_config_against_spec_or_exit()` needs to be called."""

check_config_during_discover: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value and the default mentionned in the comment don't align.

@@ -225,7 +240,7 @@ def discover(
self, source_spec: ConnectorSpecification, config: TConfig
) -> Iterable[AirbyteMessage]:
self.set_up_secret_filter(config, source_spec.connectionSpecification)
if self.source.check_config_against_spec:
if not self.source.check_config_during_discover:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changing?

Copy link
Contributor Author

Closing due to inactivity for more than 7 days.

Copy link
Contributor Author

Devin is archived and cannot be woken up. Please unarchive Devin if you want to continue using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants