Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integration][Bitbucket] Added support for file kind and file entity processing #1517

Merged
merged 27 commits into from
Apr 8, 2025

Conversation

oiadebayo
Copy link
Member

@oiadebayo oiadebayo commented Mar 26, 2025

User description

Description

What - Added support for ingesting file kinds and also added support for ingesting file as properties

Why - To allow git files be mapped into ocean as entities and enable of adding readme and other markdown properties

How - Created a new kind file

  • Added support for entities to be mapped from files matching a path pattern in a defined list of repositories
  • Added FileEntityProcessor for processing properties with file//: prefix

Type of change

Please leave one option from the following and delete the rest:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • New Integration (non-breaking change which adds a new integration)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Non-breaking change (fix of existing functionality that will not change current behavior)
  • Documentation (added/updated documentation)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • Integration able to create all default resources from scratch
  • Resync finishes successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Scheduled resync able to abort existing resync and start a new one
  • Tested with at least 2 integrations from scratch
  • Tested with Kafka and Polling event listeners
  • Tested deletion of entities that don't pass the selector

Integration testing checklist

  • Integration able to create all default resources from scratch
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Resync finishes successfully
  • If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • Docs PR link here

Preflight checklist

  • Handled rate limiting
  • Handled pagination
  • Implemented the code in async
  • Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

API Documentation

Provide links to the API documentation used for this integration.


PR Type

Enhancement, Tests, Documentation


Description

  • Added support for file kind and file entity processing.

    • Introduced FileEntityProcessor for handling file-based properties.
    • Enabled JSON and YAML file parsing for repository files.
  • Enhanced Bitbucket integration with file pattern matching.

    • Added BitbucketFilePattern and BitbucketFileSelector for file retrieval.
    • Implemented recursive directory scanning and pattern matching.
  • Introduced new tests for file entity and file kind functionalities.

    • Validated JSON/YAML parsing, error handling, and pattern matching.
  • Updated documentation and versioning for new features.


Changes walkthrough 📝

Relevant files
Enhancement
6 files
client.py
Enhanced API methods for file retrieval and directory contents
+22/-6   
file_entity_handler.py
Added FileEntityProcessor for file-based property handling
+63/-0   
file_kind.py
Implemented file pattern matching and repository processing
+191/-0 
utils.py
Added new object kind for file                                                     
+1/-1     
integration.py
Integrated file handling into Bitbucket configuration       
+37/-2   
main.py
Added resync logic for file entities                                         
+29/-1   
Tests
2 files
test_file_entity_handler.py
Added tests for FileEntityProcessor functionality               
+119/-0 
test_file_kind.py
Added tests for file pattern matching and repository processing
+306/-0 
Documentation
1 files
CHANGELOG.md
Updated changelog with file kind feature details                 
+8/-0     
Configuration changes
1 files
pyproject.toml
Bumped version to 0.1.4 for new features                                 
+2/-2     

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Copy link
    Contributor

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Error Handling

    The file parsing function doesn't handle potential parsing errors for JSON and YAML files. If a file has an invalid format, it could cause runtime exceptions.

    def parse_file(file: Dict[str, Any]) -> List[Dict[str, Any]]:
        """
        Parse a file based on its extension.
        """
        file_path = file.get("metadata", {}).get("path", "")
        file_content = file.get("content", "")
        if file_path.endswith(JSON_FILE_SUFFIX):
            loaded_file = json.loads(file_content)
            file["content"] = loaded_file
        elif file_path.endswith(YAML_FILE_SUFFIX):
            loaded_file = yaml.safe_load(file_content)
            file["content"] = loaded_file
        return [file]
    Missing Implementation

    The get_file_content method is called but not defined in the BitbucketClient class. The PR adds get_repository_files but uses a different method name in the FileEntityProcessor.

    file_content = await client.get_file_content(repo_slug, ref, file_path)
    Potential Performance Issue

    The file matching logic in _match_files_with_pattern might be inefficient for complex patterns with multiple wildcards, especially with large repositories.

    def _match_files_with_pattern(
        files: List[Dict[str, Any]], pattern: str
    ) -> List[Dict[str, Any]]:
        """
        Match files against a glob pattern.
        """
        if not pattern:
            return files
    
        paths = [file.get("path", "") for file in files]
        matched_paths: Set[str] = set()
    
        if pattern.startswith("**/"):
            root_pattern = pattern[3:]  # Match files in root directory
            matched_paths.update(
                path
                for path in paths
                if fnmatch.fnmatch(path, root_pattern.replace("**", "*"))
            )
    
        matched_paths.update(
            path for path in paths if fnmatch.fnmatch(path, pattern.replace("**", "*"))
        )
    
        return [file for file in files if file.get("path", "") in matched_paths]

    Copy link
    Contributor

    qodo-merge-pro bot commented Mar 26, 2025

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix incorrect method call
    Suggestion Impact:The commit implemented exactly what was suggested - replacing the non-existent get_file_content method with the correct get_repository_files method

    code diff:

    -            file_content = await client.get_file_content(repo_slug, ref, file_path)
    +            file_content = await client.get_repository_files(repo_slug, ref, file_path)

    The get_file_content method is called but it doesn't exist in the
    BitbucketClient class. The PR adds get_repository_files method to the client,
    which should be used instead.

    integrations/bitbucket-cloud/bitbucket_cloud/helpers/file_entity_handler.py [14-27]

     async def _get_file_content(
         self, client: BitbucketClient, repo_slug: str, ref: str, file_path: str
     ) -> Optional[Any]:
         """Helper method to fetch and process file content."""
         try:
    -        file_content = await client.get_file_content(repo_slug, ref, file_path)
    +        file_content = await client.get_repository_files(repo_slug, ref, file_path)
             return (
                 json.loads(file_content)
                 if file_path.endswith(JSON_SUFFIX)
                 else file_content
             )
         except Exception as e:
             logger.error(f"Failed to get file content for {file_path}: {e}")
             return None

    [Suggestion has been applied]

    Suggestion importance[1-10]: 10

    __

    Why: The code calls a non-existent method get_file_content on the BitbucketClient, which would cause runtime errors. The suggestion correctly identifies that the newly added get_repository_files method should be used instead, fixing a critical bug.

    High
    • Update

    Copy link

    This pull request is automatically being deployed by Amplify Hosting (learn more).

    Access this pull request here: https://pr-1517.d1ftd8v2gowp8w.amplifyapp.com

    @github-actions github-actions bot added size/L and removed size/XL labels Mar 28, 2025
    @mk-armah mk-armah changed the title [Integrastion][Bitbucket] Added support for file kind and file entity processing [Integration][Bitbucket] Added support for file kind and file entity processing Apr 3, 2025
    Copy link
    Member

    @mk-armah mk-armah left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    left some comments

    Copy link
    Member

    @mk-armah mk-armah left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM

    Copy link
    Contributor

    @Tankilevitch Tankilevitch left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    add changelog and bump version

    Copy link
    Member

    @mk-armah mk-armah left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Bump the integration version

    Copy link
    Member

    @mk-armah mk-armah left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM

    @mk-armah mk-armah merged commit 2f80c21 into main Apr 8, 2025
    21 checks passed
    @mk-armah mk-armah deleted the PORT-13527-add-support-for-file-kind-latest branch April 8, 2025 23:39
    dev-habib-nuhu pushed a commit that referenced this pull request Apr 9, 2025
    …processing (#1517)
    
    ### **User description**
    # Description
    
    What - Added support for ingesting file kinds and also added support for
    ingesting file as properties
    
    Why - To allow git files be mapped into ocean as entities and enable of
    adding readme and other markdown properties
    
    How - Created a new kind `file`
    - Added support for entities to be mapped from files matching a path
    pattern in a defined list of repositories
    - Added `FileEntityProcessor` for processing properties with `file//:`
    prefix
    
    ## Type of change
    
    Please leave one option from the following and delete the rest:
    
    - [ ] Bug fix (non-breaking change which fixes an issue)
    - [x] New feature (non-breaking change which adds functionality)
    - [ ] New Integration (non-breaking change which adds a new integration)
    - [ ] Breaking change (fix or feature that would cause existing
    functionality to not work as expected)
    - [ ] Non-breaking change (fix of existing functionality that will not
    change current behavior)
    - [ ] Documentation (added/updated documentation)
    
    <h4> All tests should be run against the port production
    environment(using a testing org). </h4>
    
    ### Core testing checklist
    
    - [ ] Integration able to create all default resources from scratch
    - [ ] Resync finishes successfully
    - [ ] Resync able to create entities
    - [ ] Resync able to update entities
    - [ ] Resync able to detect and delete entities
    - [ ] Scheduled resync able to abort existing resync and start a new one
    - [ ] Tested with at least 2 integrations from scratch
    - [ ] Tested with Kafka and Polling event listeners
    - [ ] Tested deletion of entities that don't pass the selector
    
    
    ### Integration testing checklist
    
    - [ ] Integration able to create all default resources from scratch
    - [ ] Resync able to create entities
    - [ ] Resync able to update entities
    - [ ] Resync able to detect and delete entities
    - [ ] Resync finishes successfully
    - [ ] If new resource kind is added or updated in the integration, add
    example raw data, mapping and expected result to the `examples` folder
    in the integration directory.
    - [ ] If resource kind is updated, run the integration with the example
    data and check if the expected result is achieved
    - [ ] If new resource kind is added or updated, validate that
    live-events for that resource are working as expected
    - [ ] Docs PR link [here](#)
    
    ### Preflight checklist
    
    - [ ] Handled rate limiting
    - [ ] Handled pagination
    - [ ] Implemented the code in async
    - [ ] Support Multi account
    
    ## Screenshots
    
    Include screenshots from your environment showing how the resources of
    the integration will look.
    
    ## API Documentation
    
    Provide links to the API documentation used for this integration.
    
    
    ___
    
    ### **PR Type**
    Enhancement, Tests, Documentation
    
    
    ___
    
    ### **Description**
    - Added support for file kind and file entity processing.
      - Introduced `FileEntityProcessor` for handling file-based properties.
      - Enabled JSON and YAML file parsing for repository files.
    
    - Enhanced Bitbucket integration with file pattern matching.
    - Added `BitbucketFilePattern` and `BitbucketFileSelector` for file
    retrieval.
      - Implemented recursive directory scanning and pattern matching.
    
    - Introduced new tests for file entity and file kind functionalities.
      - Validated JSON/YAML parsing, error handling, and pattern matching.
    
    - Updated documentation and versioning for new features.
    
    
    ___
    
    
    
    ### **Changes walkthrough** 📝
    <table><thead><tr><th></th><th align="left">Relevant
    files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>6
    files</summary><table>
    <tr>
    <td><strong>client.py</strong><dd><code>Enhanced API methods for file
    retrieval and directory contents</code></dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-e23b96ba70db3d84e0b42bfd04c319f604b622c2c44d62865312d31a70052f38">+22/-6</a>&nbsp;
    &nbsp; </td>
    
    </tr>
    
    <tr>
    <td><strong>file_entity_handler.py</strong><dd><code>Added
    FileEntityProcessor for file-based property handling</code></dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-1d50313a40686fbd9171d6679b3685e09b476fe74ea4cbdf8f490f3a577c0e0a">+63/-0</a>&nbsp;
    &nbsp; </td>
    
    </tr>
    
    <tr>
    <td><strong>file_kind.py</strong><dd><code>Implemented file pattern
    matching and repository processing</code></dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-bb86097e6e12852777c152efb9f0683b7aba9944ddbb1bf9d46773ebc7104e4b">+191/-0</a>&nbsp;
    </td>
    
    </tr>
    
    <tr>
    <td><strong>utils.py</strong><dd><code>Added new object kind for
    file</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
    &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
    &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
    </dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-2ac78ad70938aa8e766a2b0bcaa3ef14870f8bf3df1115bc8d96dcf30c29f09a">+1/-1</a>&nbsp;
    &nbsp; &nbsp; </td>
    
    </tr>
    
    <tr>
    <td><strong>integration.py</strong><dd><code>Integrated file handling
    into Bitbucket configuration</code>&nbsp; &nbsp; &nbsp; &nbsp;
    </dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-9d4a76c0d3508f2eedda0850c608377ed78adbad73aa42b6c72d3585d6d7e313">+37/-2</a>&nbsp;
    &nbsp; </td>
    
    </tr>
    
    <tr>
    <td><strong>main.py</strong><dd><code>Added resync logic for file
    entities</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
    &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
    &nbsp; &nbsp; &nbsp; </dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-8acadf1eec56896dbfb07fd369be9fc43376d23a9768b825affd2258eea4913e">+29/-1</a>&nbsp;
    &nbsp; </td>
    
    </tr>
    
    </table></details></td></tr><tr><td><strong>Tests</strong></td><td><details><summary>2
    files</summary><table>
    <tr>
    <td><strong>test_file_entity_handler.py</strong><dd><code>Added tests
    for FileEntityProcessor functionality</code>&nbsp; &nbsp; &nbsp; &nbsp;
    &nbsp; &nbsp; &nbsp; &nbsp; </dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-f186d6e02cf77f85e31ea6fd2205dcce0e07f1e4c8629656f568f75446234a0f">+119/-0</a>&nbsp;
    </td>
    
    </tr>
    
    <tr>
    <td><strong>test_file_kind.py</strong><dd><code>Added tests for file
    pattern matching and repository processing</code></dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-458b21ad2af6d71cbe0a70b393423c1f1e21cf48c6010ff6fcfee750f2d6c60f">+306/-0</a>&nbsp;
    </td>
    
    </tr>
    
    </table></details></td></tr><tr><td><strong>Documentation</strong></td><td><details><summary>1
    files</summary><table>
    <tr>
    <td><strong>CHANGELOG.md</strong><dd><code>Updated changelog with file
    kind feature details</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
    &nbsp; &nbsp; &nbsp; </dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-29be973a2e6d4caf92a6f871135685ef66260b25d342a3f525cbc0c2f9be9da1">+8/-0</a>&nbsp;
    &nbsp; &nbsp; </td>
    
    </tr>
    </table></details></td></tr><tr><td><strong>Configuration
    changes</strong></td><td><details><summary>1 files</summary><table>
    <tr>
    <td><strong>pyproject.toml</strong><dd><code>Bumped version to 0.1.4 for
    new features</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
    &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
    </dd></td>
    <td><a
    href="https://github.com/port-labs/ocean/pull/1517/files#diff-50807681f9e892caf3856d0a8bb1eb0ec5b4a01dc8042fd503b03b76fca84280">+2/-2</a>&nbsp;
    &nbsp; &nbsp; </td>
    
    </tr>
    </table></details></td></tr></tr></tbody></table>
    
    ___
    
    > <details> <summary> Need help?</summary><li>Type <code>/help how to
    ...</code> in the comments thread for any questions about Qodo Merge
    usage.</li><li>Check out the <a
    href="https://qodo-merge-docs.qodo.ai/usage-guide/">documentation</a>
    for more information.</li></details>
    
    ---------
    
    Co-authored-by: Michael Kofi Armah <[email protected]>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants