Skip to content

Commit 2f80c21

Browse files
oiadebayomk-armah
andauthored
[Integration][Bitbucket] Added support for file kind and file entity processing (#1517)
### **User description** # Description What - Added support for ingesting file kinds and also added support for ingesting file as properties Why - To allow git files be mapped into ocean as entities and enable of adding readme and other markdown properties How - Created a new kind `file` - Added support for entities to be mapped from files matching a path pattern in a defined list of repositories - Added `FileEntityProcessor` for processing properties with `file//:` prefix ## Type of change Please leave one option from the following and delete the rest: - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] New Integration (non-breaking change which adds a new integration) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Non-breaking change (fix of existing functionality that will not change current behavior) - [ ] Documentation (added/updated documentation) <h4> All tests should be run against the port production environment(using a testing org). </h4> ### Core testing checklist - [ ] Integration able to create all default resources from scratch - [ ] Resync finishes successfully - [ ] Resync able to create entities - [ ] Resync able to update entities - [ ] Resync able to detect and delete entities - [ ] Scheduled resync able to abort existing resync and start a new one - [ ] Tested with at least 2 integrations from scratch - [ ] Tested with Kafka and Polling event listeners - [ ] Tested deletion of entities that don't pass the selector ### Integration testing checklist - [ ] Integration able to create all default resources from scratch - [ ] Resync able to create entities - [ ] Resync able to update entities - [ ] Resync able to detect and delete entities - [ ] Resync finishes successfully - [ ] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the `examples` folder in the integration directory. - [ ] If resource kind is updated, run the integration with the example data and check if the expected result is achieved - [ ] If new resource kind is added or updated, validate that live-events for that resource are working as expected - [ ] Docs PR link [here](#) ### Preflight checklist - [ ] Handled rate limiting - [ ] Handled pagination - [ ] Implemented the code in async - [ ] Support Multi account ## Screenshots Include screenshots from your environment showing how the resources of the integration will look. ## API Documentation Provide links to the API documentation used for this integration. ___ ### **PR Type** Enhancement, Tests, Documentation ___ ### **Description** - Added support for file kind and file entity processing. - Introduced `FileEntityProcessor` for handling file-based properties. - Enabled JSON and YAML file parsing for repository files. - Enhanced Bitbucket integration with file pattern matching. - Added `BitbucketFilePattern` and `BitbucketFileSelector` for file retrieval. - Implemented recursive directory scanning and pattern matching. - Introduced new tests for file entity and file kind functionalities. - Validated JSON/YAML parsing, error handling, and pattern matching. - Updated documentation and versioning for new features. ___ ### **Changes walkthrough** 📝 <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>6 files</summary><table> <tr> <td><strong>client.py</strong><dd><code>Enhanced API methods for file retrieval and directory contents</code></dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-e23b96ba70db3d84e0b42bfd04c319f604b622c2c44d62865312d31a70052f38">+22/-6</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>file_entity_handler.py</strong><dd><code>Added FileEntityProcessor for file-based property handling</code></dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-1d50313a40686fbd9171d6679b3685e09b476fe74ea4cbdf8f490f3a577c0e0a">+63/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>file_kind.py</strong><dd><code>Implemented file pattern matching and repository processing</code></dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-bb86097e6e12852777c152efb9f0683b7aba9944ddbb1bf9d46773ebc7104e4b">+191/-0</a>&nbsp; </td> </tr> <tr> <td><strong>utils.py</strong><dd><code>Added new object kind for file</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-2ac78ad70938aa8e766a2b0bcaa3ef14870f8bf3df1115bc8d96dcf30c29f09a">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>integration.py</strong><dd><code>Integrated file handling into Bitbucket configuration</code>&nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-9d4a76c0d3508f2eedda0850c608377ed78adbad73aa42b6c72d3585d6d7e313">+37/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>main.py</strong><dd><code>Added resync logic for file entities</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-8acadf1eec56896dbfb07fd369be9fc43376d23a9768b825affd2258eea4913e">+29/-1</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Tests</strong></td><td><details><summary>2 files</summary><table> <tr> <td><strong>test_file_entity_handler.py</strong><dd><code>Added tests for FileEntityProcessor functionality</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-f186d6e02cf77f85e31ea6fd2205dcce0e07f1e4c8629656f568f75446234a0f">+119/-0</a>&nbsp; </td> </tr> <tr> <td><strong>test_file_kind.py</strong><dd><code>Added tests for file pattern matching and repository processing</code></dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-458b21ad2af6d71cbe0a70b393423c1f1e21cf48c6010ff6fcfee750f2d6c60f">+306/-0</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Documentation</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>CHANGELOG.md</strong><dd><code>Updated changelog with file kind feature details</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-29be973a2e6d4caf92a6f871135685ef66260b25d342a3f525cbc0c2f9be9da1">+8/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration changes</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>pyproject.toml</strong><dd><code>Bumped version to 0.1.4 for new features</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1517/files#diff-50807681f9e892caf3856d0a8bb1eb0ec5b4a01dc8042fd503b03b76fca84280">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr></tr></tbody></table> ___ > <details> <summary> Need help?</summary><li>Type <code>/help how to ...</code> in the comments thread for any questions about Qodo Merge usage.</li><li>Check out the <a href="https://qodo-merge-docs.qodo.ai/usage-guide/">documentation</a> for more information.</li></details> --------- Co-authored-by: Michael Kofi Armah <[email protected]>
1 parent 74a8c05 commit 2f80c21

File tree

13 files changed

+731
-44
lines changed

13 files changed

+731
-44
lines changed

Diff for: integrations/bitbucket-cloud/.port/resources/port-app-config.yml

+1
Original file line numberDiff line numberDiff line change
@@ -25,5 +25,6 @@ resources:
2525
properties:
2626
url: ".links.html.href"
2727
defaultBranch: .mainbranch.name
28+
readme: file://README.md
2829
relations:
2930
project: '.project.uuid | gsub("[{-}]"; "")'

Diff for: integrations/bitbucket-cloud/CHANGELOG.md

+8
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
<!-- towncrier release notes start -->
99

10+
## 0.1.5 (2025-04-08)
11+
12+
13+
### Features
14+
15+
- Added support for ingesting file kind and parsing JSON and YAML files
16+
17+
1018
## 0.1.4 (2025-04-07)
1119

1220

Diff for: integrations/bitbucket-cloud/bitbucket_cloud/client.py

+46-11
Original file line numberDiff line numberDiff line change
@@ -69,23 +69,22 @@ async def _send_api_request(
6969
params: Optional[dict[str, Any]] = None,
7070
json_data: Optional[dict[str, Any]] = None,
7171
method: str = "GET",
72+
return_full_response: bool = False,
7273
) -> Any:
7374
"""Send request to Bitbucket API with error handling."""
7475
response = await self.client.request(
7576
method=method, url=url, params=params, json=json_data
7677
)
7778
try:
7879
response.raise_for_status()
79-
return response.json()
80+
return response if return_full_response else response.json()
8081
except HTTPStatusError as e:
81-
error_data = e.response.json()
82-
error_message = error_data.get("error", {}).get("message", str(e))
8382
if e.response.status_code == 404:
84-
logger.error(
85-
f"Requested resource not found: {url}; message: {error_message}"
83+
logger.warning(
84+
f"Requested resource not found: {url}; message: {str(e)}"
8685
)
8786
return {}
88-
logger.error(f"Bitbucket API error: {error_message}")
87+
logger.error(f"Bitbucket API error: {str(e)}")
8988
raise e
9089
except HTTPError as e:
9190
logger.error(f"Failed to send {method} request to url {url}: {str(e)}")
@@ -166,13 +165,19 @@ async def get_repositories(
166165
yield repos
167166

168167
async def get_directory_contents(
169-
self, repo_slug: str, branch: str, path: str, max_depth: int = 2
168+
self,
169+
repo_slug: str,
170+
branch: str,
171+
path: str,
172+
max_depth: int,
173+
params: Optional[dict[str, Any]] = None,
170174
) -> AsyncGenerator[list[dict[str, Any]], None]:
171175
"""Get contents of a directory."""
172-
params = {
173-
"max_depth": max_depth,
174-
"pagelen": PAGE_SIZE,
175-
}
176+
if params is None:
177+
params = {
178+
"max_depth": max_depth,
179+
"pagelen": PAGE_SIZE,
180+
}
176181
async for contents in self._fetch_paginated_api_with_rate_limiter(
177182
f"{self.base_url}/repositories/{self.workspace}/{repo_slug}/src/{branch}/{path}",
178183
params=params,
@@ -212,3 +217,33 @@ async def get_repository(self, repo_slug: str) -> dict[str, Any]:
212217
return await self._send_api_request(
213218
f"{self.base_url}/repositories/{self.workspace}/{repo_slug}"
214219
)
220+
221+
async def get_repository_files(self, repo: str, branch: str, path: str) -> Any:
222+
"""Get the content of a file."""
223+
response = await self._send_api_request(
224+
f"{self.base_url}/repositories/{self.workspace}/{repo}/src/{branch}/{path}",
225+
method="GET",
226+
return_full_response=True,
227+
)
228+
logger.info(f"Retrieved file content for {repo}/{branch}/{path}")
229+
return response.text
230+
231+
async def search_files(
232+
self,
233+
search_query: str,
234+
) -> AsyncGenerator[list[dict[str, Any]], None]:
235+
"""Search for files using Bitbucket's search API."""
236+
params = {
237+
"pagelen": 300,
238+
"search_query": search_query,
239+
"fields": "+values.file.commit.repository.mainbranch.name",
240+
}
241+
242+
async for results in self._send_paginated_api_request(
243+
f"{self.base_url}/workspaces/{self.workspace}/search/code",
244+
params=params,
245+
):
246+
logger.info(
247+
f"Fetched batch of {len(results)} matching files from workspace {self.workspace}"
248+
)
249+
yield results
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
from typing import Any, Optional
2+
from loguru import logger
3+
from port_ocean.core.handlers import JQEntityProcessor
4+
from initialize_client import init_client
5+
6+
7+
FILE_PROPERTY_PREFIX = "file://"
8+
9+
10+
class FileEntityProcessor(JQEntityProcessor):
11+
prefix = FILE_PROPERTY_PREFIX
12+
13+
async def _get_file_content(
14+
self, repo_slug: str, ref: str, file_path: str
15+
) -> Optional[Any]:
16+
"""Helper method to fetch and process file content."""
17+
try:
18+
bitbucket_client = init_client()
19+
return await bitbucket_client.get_repository_files(
20+
repo_slug, ref, file_path
21+
)
22+
except Exception as e:
23+
logger.error(
24+
f"Failed to get file content for {file_path} in repository {repo_slug} in branch {ref}: {e}"
25+
)
26+
return None
27+
28+
async def _search(self, data: dict[str, Any], pattern: str) -> Any:
29+
"""
30+
Search for a file in the repository and return its content.
31+
32+
Args:
33+
data (dict[str, Any]): The data containing the repository information
34+
pattern (str): The pattern to search for (e.g. "file://path/to/file.yaml")
35+
36+
For monorepo, the data should contain a "repo" key and a "folder" key with the repository information.
37+
For non-monorepo, the data should contain the repository information directly.
38+
39+
Returns:
40+
Any: The raw or parsed content of the file
41+
"""
42+
43+
repo_data = data.get("repo", data)
44+
repo_slug = repo_data.get("name", "")
45+
default_branch = repo_data.get("mainbranch", {}).get("name", "main")
46+
47+
if current_directory_path := data.get("folder", {}).get("path", ""):
48+
file_path = f"{current_directory_path}/{pattern.replace(self.prefix, '')}"
49+
ref = data.get("folder", {}).get("commit", {}).get("hash", default_branch)
50+
else:
51+
file_path = pattern.replace(self.prefix, "")
52+
if not default_branch:
53+
logger.info(
54+
f"No default branch found for repository {repo_slug} and file path {file_path}"
55+
)
56+
return None
57+
ref = default_branch
58+
59+
if not repo_slug:
60+
logger.info(
61+
f"No repository slug found for branch {ref} and file path {file_path}"
62+
)
63+
return None
64+
65+
logger.info(
66+
f"Searching for file {file_path} in Repository {repo_slug}, ref {ref}"
67+
)
68+
return await self._get_file_content(repo_slug, ref, file_path)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
import fnmatch
2+
import json
3+
from typing import Dict, List, Any, AsyncGenerator
4+
from loguru import logger
5+
import yaml
6+
from integration import BitbucketFilePattern
7+
from port_ocean.utils.async_iterators import stream_async_iterators_tasks
8+
from initialize_client import init_client
9+
10+
11+
JSON_FILE_SUFFIX = ".json"
12+
YAML_FILE_SUFFIX = (".yaml", ".yml")
13+
14+
15+
def build_search_terms(
16+
filename: str, repos: List[str] | None, path: str, extension: str
17+
) -> str:
18+
"""
19+
This function builds search terms for Bitbucket's search API.
20+
The entire workspace is searched for the filename if repos is not provided.
21+
If repos are provided, only the repos specified are searched.
22+
The path and extension are required to tailor the search so results
23+
are relevant to the file kind.
24+
25+
Args:
26+
filename (str): The filename to search for.
27+
repos (List[str] | None): The repositories to search in.
28+
path (str): The path to search in.
29+
extension (str): The extension to search for.
30+
31+
Returns:
32+
str: The search terms for Bitbucket's search API.
33+
"""
34+
search_terms = [f'"{filename}"']
35+
if repos:
36+
repo_filters = " ".join(f"repo:{repo}" for repo in repos)
37+
search_terms.append(f"{repo_filters}")
38+
39+
search_terms.append(f"path:{path}")
40+
41+
if extension:
42+
search_terms.append(f"ext:{extension}")
43+
44+
return " ".join(search_terms)
45+
46+
47+
async def process_file_patterns(
48+
file_pattern: BitbucketFilePattern,
49+
) -> AsyncGenerator[List[Dict[str, Any]], None]:
50+
"""Process file patterns and retrieve matching files using Bitbucket's search API."""
51+
logger.info(
52+
f"Searching for files in {len(file_pattern.repos) if file_pattern.repos else 'all'} repositories with pattern: {file_pattern.path}"
53+
)
54+
55+
if not file_pattern.repos:
56+
logger.warning("No repositories provided, searching entire workspace")
57+
if not file_pattern.path:
58+
logger.info("Path is required, skipping file search")
59+
return
60+
if not file_pattern.filenames:
61+
logger.info("No filenames provided, skipping file search")
62+
return
63+
64+
for filename in file_pattern.filenames:
65+
search_query = build_search_terms(
66+
filename=filename,
67+
repos=file_pattern.repos,
68+
path=file_pattern.path,
69+
extension=filename.split(".")[-1],
70+
)
71+
logger.debug(f"Constructed search query: {search_query}")
72+
bitbucket_client = init_client()
73+
async for search_results in bitbucket_client.search_files(search_query):
74+
tasks = []
75+
for result in search_results:
76+
if len(result["path_matches"]) >= 1:
77+
file_info = result["file"]
78+
file_path = file_info["path"]
79+
80+
if not validate_file_match(file_path, filename, file_pattern.path):
81+
logger.debug(
82+
f"Skipping file {file_path} as it doesn't match expected patterns"
83+
)
84+
continue
85+
86+
tasks.append(retrieve_file_content(file_info))
87+
88+
async for file_results in stream_async_iterators_tasks(*tasks):
89+
if not file_pattern.skip_parsing:
90+
file_results = parse_file(file_results)
91+
yield [file_results]
92+
93+
94+
async def retrieve_file_content(
95+
file_info: Dict[str, Any],
96+
) -> AsyncGenerator[Dict[str, Any], None]:
97+
"""
98+
Retrieve the content of a single file from Bitbucket.
99+
100+
Args:
101+
file_info (Dict[str, Any]): Information about the file to retrieve
102+
103+
Yields:
104+
Dict[str, Any]: Dictionary containing the file content and metadata
105+
"""
106+
file_path = file_info.get("path", "")
107+
repo_info = file_info["commit"]["repository"]
108+
repo_slug = repo_info["name"]
109+
branch = repo_info["mainbranch"]["name"]
110+
111+
logger.info(f"Retrieving contents for file: {file_path}")
112+
bitbucket_client = init_client()
113+
file_content = await bitbucket_client.get_repository_files(
114+
repo_slug, branch, file_path
115+
)
116+
117+
yield {
118+
"content": file_content,
119+
"repo": repo_info,
120+
"branch": branch,
121+
"metadata": file_info,
122+
}
123+
124+
125+
def parse_file(file: Dict[str, Any]) -> Dict[str, Any]:
126+
"""Parse a file based on its extension."""
127+
try:
128+
file_path = file.get("metadata", {}).get("path", "")
129+
file_content = file.get("content", "")
130+
if file_path.endswith(JSON_FILE_SUFFIX):
131+
loaded_file = json.loads(file_content)
132+
file["content"] = loaded_file
133+
elif file_path.endswith(YAML_FILE_SUFFIX):
134+
loaded_file = yaml.safe_load(file_content)
135+
file["content"] = loaded_file
136+
return file
137+
except Exception as e:
138+
logger.error(f"Error parsing file: {e}")
139+
return file
140+
141+
142+
def validate_file_match(file_path: str, filename: str, expected_path: str) -> bool:
143+
"""Validate if the file path and filename match the expected patterns."""
144+
if not file_path.endswith(filename):
145+
return False
146+
147+
if (not expected_path or expected_path == "/") and file_path == filename:
148+
return True
149+
150+
dir_path = file_path[: -len(filename)]
151+
dir_path = dir_path.rstrip("/")
152+
expected_path = expected_path.rstrip("/")
153+
return fnmatch.fnmatch(dir_path, expected_path)

Diff for: integrations/bitbucket-cloud/bitbucket_cloud/helpers/utils.py

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ class ObjectKind(StrEnum):
77
FOLDER = "folder"
88
REPOSITORY = "repository"
99
PULL_REQUEST = "pull-request"
10+
FILE = "file"
1011

1112

1213
@dataclass

0 commit comments

Comments
 (0)