-
Notifications
You must be signed in to change notification settings - Fork 16
feat(PoC): adjust file-based and file uploader component to latest protocol changes. #457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(PoC): adjust file-based and file uploader component to latest protocol changes. #457
Conversation
Co-authored-by: octavia-squidington-iii <[email protected]>
/autofix
|
/autofix
|
I think there are no other things to worry about, as previously, we had a schema, but we ignored such schema (which actually is/was messy) as the only point of interest was to move the file. From now on, we will move data along the file. We will need to refresh the schema for these. Old (messy,, but we didn't care): New (need to update pre-dev as I'm adding source-uri but same idea): |
from pydantic.v1 import BaseModel | ||
|
||
|
||
class FileRecordData(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If allowing additional fields is difficult, I'd suggest adding an "additional_properties" field that sources can populate however they want
bytes: int | ||
source_uri: str | ||
id: Optional[str] = None | ||
updated_at: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also have a created_at
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense, I will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is now there
Anyway, I will do some testing on how it works on a workspace with the previous version of the connectors and later receive the upgrade and update the schema as part of the E2E test we plan for late this week. |
Co-authored-by: octavia-squidington-iii <[email protected]> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this change assuming the CAT changes were out of scope
@@ -0,0 +1,46 @@ | |||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | |||
'''FAST Airbyte Standard Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have CATs tests related changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a terrible person and I'm taking credit for @aaronsteers work :( Sorry.
JK, it is because it is a stacked PR where both branches run behind main. Let me fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine! not a blocker for going forward with this
What
This PR updates the file-based and file uploader components in the Airbyte Python CDK to align with the file transfer record protocol changes introduced in the platform. It introduces schema refinements, file path handling improvements, and new test cases.
Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/12364
How
Review guide
File based changes:
airbyte_cdk/models/airbyte_protocol.py
: remove hacked protocolairbyte_cdk/models/file_transfer_record_message.py
: remove hacked protocolairbyte_cdk/sources/concurrent_source/concurrent_read_processor.py
: remove hacked protocolairbyte_cdk/sources/file_based/file_based_stream_reader.py
: change method verb and return type to AirbyteRecordMessageFileReference, also make _get_file_transfer_paths support method return a dict with path fields.airbyte_cdk/sources/file_based/file_record_data.py
: helper model for record (metadata) of files.airbyte_cdk/sources/file_based/file_types/file_transfer.py
: update to return record and file reference data.airbyte_cdk/sources/file_based/schema_helpers.py
: schema of records (metadata) for file-based connectors.airbyte_cdk/sources/file_based/stream/concurrent/adapters.py
: pass file_referenceairbyte_cdk/sources/file_based/stream/default_file_based_stream.py
: introduce changes to default file based stream to handle new file reference and records data besides fixed schema.airbyte_cdk/sources/file_based/stream/permissions_file_based_stream.py
: update call tostream_data_to_airbyte_message
airbyte_cdk/sources/types.py
: remove oldis_file_transfer_message
flagairbyte_cdk/sources/utils/record_helper.py
: remove handling ofis_file_transfer_message
flagairbyte_cdk/test/mock_http/response_builder.py
: add helper method to get binary data from file for testingFile-api changes:
airbyte_cdk/sources/declarative/retrievers/file_uploader.py
: update latest protocol fields names.User Impact
Developers using the file-based CDK and file uploader in declarative functionality will benefit from file_reference protocol support.
Can this PR be safely reverted and rolled back?