Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integration][Github-Cloud]Added port ocean for github integration #1510

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

kanmitcode
Copy link

@kanmitcode kanmitcode commented Mar 23, 2025

User description

Description

What -

Why -

How -

Type of change

Please leave one option from the following and delete the rest:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • New Integration (non-breaking change which adds a new integration)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Non-breaking change (fix of existing functionality that will not change current behavior)
  • Documentation (added/updated documentation)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • Integration able to create all default resources from scratch
  • Resync finishes successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Scheduled resync able to abort existing resync and start a new one
  • Tested with at least 2 integrations from scratch
  • Tested with Kafka and Polling event listeners
  • Tested deletion of entities that don't pass the selector

Integration testing checklist

  • Integration able to create all default resources from scratch
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Resync finishes successfully
  • If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • Docs PR link here

Preflight checklist

  • Handled rate limiting
  • Handled pagination
  • Implemented the code in async
  • Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

API Documentation

Provide links to the API documentation used for this integration.


PR Type

Enhancement, Documentation, Tests


Description

  • Added a new GitHub integration for Port Ocean.

  • Implemented async handlers for GitHub API interactions.

  • Defined blueprints and configurations for GitHub resources.

  • Included documentation, testing, and development setup files.


Changes walkthrough 📝

Relevant files
Enhancement
4 files
client.py
Implemented GitHub API client with async handlers               
+85/-0   
debug.py
Added debug entry point for GitHub integration                     
+4/-0     
main.py
Defined resync handlers for GitHub resources                         
+96/-0   
blueprints.json
Defined blueprints for GitHub resources                                   
+248/-0 
Tests
1 files
test_sample.py
Added placeholder test for GitHub integration                       
+2/-0     
Configuration changes
7 files
launch.json
Added debug configuration for GitHub integration                 
+14/-1   
.env.example
Provided example environment variables for GitHub integration
+8/-0     
port-app-config.yml
Configured mappings for GitHub resources                                 
+86/-0   
Makefile
Added Makefile for GitHub integration infrastructure         
+1/-0     
poetry.toml
Configured Poetry for GitHub integration                                 
+3/-0     
pyproject.toml
Defined dependencies and tools for GitHub integration       
+113/-0 
sonar-project.properties
Added SonarQube configuration for GitHub integration         
+2/-0     
Documentation
4 files
spec.yaml
Added specification for GitHub integration                             
+24/-0   
CHANGELOG.md
Added changelog for GitHub integration                                     
+8/-0     
CONTRIBUTING.md
Added contributing guidelines for GitHub integration         
+7/-0     
README.md
Added README for GitHub integration                                           
+7/-0     
Dependencies
1 files
pyproject.toml
Added dotenv dependency for environment variable management
+1/-0     
Additional files
1 files
__init__.py [link]   

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 Security concerns

    Sensitive information exposure:
    The GitHub access token is stored in the integration configuration and used directly in the Authorization header. While this is a common pattern, there's no validation to ensure the token is properly secured or rotated. Additionally, the token is logged in the headers when debugging, which could expose it in log files.

    ⚡ Recommended focus areas for review

    Error Handling

    The client methods don't handle pagination for GitHub API responses, which could lead to incomplete data retrieval for repositories, issues, pull requests, etc. when there are more items than fit in a single response.

    async def get_repositories(self):
        await self.check_rate_limit()
        url = f'{self.base_url}/user/repos'
        response = await self.client.get(url, headers=self.headers)
        if response.status_code == 200:
            return response.json()
        else:
            response.raise_for_status()
    
    async def get_issues(self, username: str, repo: str):
        await self.check_rate_limit()
        url = f'{self.base_url}/repos/{username}/{repo}/issues'
        response = await self.client.get(url, headers=self.headers)
        if response.status_code == 200:
            return response.json()
        else:
            response.raise_for_status()
    
    async def get_pull_requests(self, username: str, repo: str):
        await self.check_rate_limit()
        url = f'{self.base_url}/repos/{username}/{repo}/pulls'
        response = await self.client.get(url, headers=self.headers)
        if response.status_code == 200:
            return response.json()
        else:
            response.raise_for_status()
    
    async def get_organizations(self):
        await self.check_rate_limit()
        url = f'{self.base_url}/user/orgs'
        response = await self.client.get(url, headers=self.headers)
        if response.status_code == 200:
            return response.json()
        else:
            response.raise_for_status()
    
    async def get_teams(self, org: str):
        await self.check_rate_limit()
        url = f'{self.base_url}/orgs/{org}/teams'
        response = await self.client.get(url, headers=self.headers)
        if response.status_code == 200:
            return response.json()
        else:
            response.raise_for_status()
    
    async def get_workflows(self, username: str, repo: str):
        await self.check_rate_limit()
        url = f'{self.base_url}/repos/{username}/{repo}/actions/workflows'
        response = await self.client.get(url, headers=self.headers)
        if response.status_code == 200:
            return response.json()
        else:
            response.raise_for_status()
    Resource Exhaustion

    The resync functions fetch all repositories and then make additional API calls for each repository, which could lead to rate limiting issues or timeouts with large GitHub accounts.

    @ocean.on_resync('issue')
    async def resync_issues(kind: str) -> list[dict[Any, Any]]:
        try:
            handler = GithubHandler()
            repos = await handler.get_repositories()
            all_issues = []
            for repo in repos:
                username = repo["owner"]["login"]
                repo_name = repo['name']
                issues = await handler.get_issues(username, repo_name)
                all_issues.extend(issues)
            logger.info('Issues: %s', all_issues)
            return all_issues
        except Exception as e:
            logger.error('Failed to resync issues: %s', e)
            return []
    
    @ocean.on_resync('pull_request')
    async def resync_pull_requests(kind: str) -> list[dict[Any, Any]]:
        try:
            handler = GithubHandler()
            repos = await handler.get_repositories()
            all_pull_requests = []
            for repo in repos:
                username = repo["owner"]["login"]
                repo_name = repo['name']
                pull_requests = await handler.get_pull_requests(username, repo_name)
                all_pull_requests.extend(pull_requests)
            logger.info('Pull Requests: %s', all_pull_requests)
            return all_pull_requests
        except Exception as e:
            logger.error('Failed to resync pull requests: %s', e)
            return []
    
    @ocean.on_resync('team')
    async def resync_teams(kind: str) -> list[dict[Any, Any]]:
        try:
            handler = GithubHandler()
            organizations = await handler.get_organizations()
            all_teams = []
            for org in organizations:
                teams = await handler.get_teams(org["login"])
                all_teams.extend(teams)
            logger.info('Teams: %s', all_teams)
            return all_teams
        except Exception as e:
            logger.error('Failed to resync teams: %s', e)
            return []
    
    @ocean.on_resync('workflow')
    async def resync_workflows(kind: str) -> list[dict[Any, Any]]:
        try:
            handler = GithubHandler()
            username = ocean.integration_config["github_username"]
            repos = await handler.get_repositories()
            all_workflows = []
            for repo in repos:
                username = repo["owner"]["login"]
                repo_name = repo['name']
                workflows = await handler.get_workflows(username, repo_name)
                all_workflows.extend(workflows)
            logger.info('Workflows: %s', all_workflows)
            return all_workflows
        except Exception as e:
            logger.error('Failed to resync workflows: %s', e)
            return []
    Redundant Rate Limit Checks

    Each API method calls check_rate_limit before making a request, which adds unnecessary API calls since GitHub includes rate limit information in every response header.

    async def check_rate_limit(self):
        url = f'{self.base_url}/rate_limit'
        response = await self.client.get(url, headers=self.headers)
        if response.status_code == 200:
            rate_limit = response.json()
            logger.info('Rate Limit: %s', rate_limit)
            return rate_limit
        else:
            response.raise_for_status()

    Copy link
    Contributor

    qodo-merge-pro bot commented Mar 23, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix API response handling
    Suggestion Impact:The suggestion was implemented in a different way. Instead of directly returning response.json()["workflows"], the commit completely refactored the client implementation to use pagination and yield individual workflow items. The new implementation in lines 236-244 iterates through the workflows and yields each workflow individually, which effectively addresses the original issue of accessing the nested workflows data.

    code diff:

    +    async def get_workflows(self, username: str, repo: str) -> AsyncGenerator[dict[str, Any], None]:
    +        """Fetch all workflows for a specific repository."""
    +        url = f"{self.base_url}/repos/{username}/{repo}/actions/workflows"
    +        try:
    +            async for workflows in self._fetch_paginated_api(url):
    +                for workflow in workflows:
    +                    yield workflow
    +        except Exception as e:
    +            logger.error(f"Error fetching workflows for {username}/{repo}: {e}")

    The GitHub API returns workflows in a nested structure under a 'workflows' key,
    but the current implementation returns the entire response. This will cause
    issues when trying to extend the workflows list in the resync_workflows
    function.

    integrations/github-cloud/client.py [78-85]

     async def get_workflows(self, username: str, repo: str):
         await self.check_rate_limit()
         url = f'{self.base_url}/repos/{username}/{repo}/actions/workflows'
         response = await self.client.get(url, headers=self.headers)
         if response.status_code == 200:
    -        return response.json()
    +        return response.json()["workflows"]
         else:
             response.raise_for_status()

    [Suggestion has been applied]

    Suggestion importance[1-10]: 9

    __

    Why: This suggestion addresses a critical issue with the GitHub API response handling. The current implementation doesn't account for the nested structure of the workflows data, which would cause runtime errors when trying to extend the workflows list in the resync_workflows function.

    High
    General
    Remove unused variable
    Suggestion Impact:The commit removed the unused 'username' variable that was initialized from ocean.integration_config['github_username'] as suggested. The entire function was refactored to use async generators, but the specific issue pointed out in the suggestion was addressed.

    code diff:

    -        username = ocean.integration_config["github_username"]
    -        repos = await handler.get_repositories()

    Remove the unused username variable that's being initialized from the
    integration config but then immediately overwritten in the loop. This could
    cause confusion and is unnecessary.

    integrations/github-cloud/main.py [80-92]

     @ocean.on_resync('workflow')
     async def resync_workflows(kind: str) -> list[dict[Any, Any]]:
         try:
             handler = GithubHandler()
    -        username = ocean.integration_config["github_username"]
             repos = await handler.get_repositories()
             all_workflows = []
             for repo in repos:
                 username = repo["owner"]["login"]
                 repo_name = repo['name']
                 workflows = await handler.get_workflows(username, repo_name)
                 all_workflows.extend(workflows)
             logger.info('Workflows: %s', all_workflows)
             return all_workflows
         except Exception as e:
             logger.error('Failed to resync workflows: %s', e)
             return []

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 7

    __

    Why: The suggestion correctly identifies an unused variable that could cause confusion. The variable 'username' is initialized from the integration config but then immediately overwritten in the loop, making the initial assignment redundant and potentially misleading.

    Medium
    • Update

    Copy link

    This pull request is automatically being deployed by Amplify Hosting (learn more).

    Access this pull request here: https://pr-1510.d1ftd8v2gowp8w.amplifyapp.com

    @kanmitcode kanmitcode changed the title Added port ocean for github integration [Integration][Github]Added port ocean for github integration Mar 29, 2025
    @kanmitcode kanmitcode changed the title [Integration][Github]Added port ocean for github integration [Integration][Github Cloud]Added port ocean for github integration Mar 29, 2025
    @kanmitcode kanmitcode changed the title [Integration][Github Cloud]Added port ocean for github integration [Integration][Github-Cloud]Added port ocean for github integration Mar 29, 2025
    @github-actions github-actions bot added size/XL and removed size/XXL labels Apr 2, 2025
    …cessors to use consistent structure and naming, add comprehensive test coverage, remove unused files (base_pipeline.py, events.py, constants.py), update CHANGELOG.md, improve error handling and validation
    @github-actions github-actions bot added size/XXL and removed size/XL labels Apr 3, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant