Skip to content

[Integration][Github] Added Github Ocean Integration #1507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

emekanwaoma
Copy link

@emekanwaoma emekanwaoma commented Mar 23, 2025

User description

Description

What - A new GitHub integration for Port's Ocean framework that syncs GitHub resources to Port.

Why - To allow Port users to import and track their GitHub resources (repositories, pull requests, issues, teams, and workflows) in their developer portal.

How - Using GitHub's REST API v3 with async processing, rate limiting, and webhook support.

Type of change

Please leave one option from the following and delete the rest:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • New Integration (non-breaking change which adds a new integration)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Non-breaking change (fix of existing functionality that will not change current behavior)
  • Documentation (added/updated documentation)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • Integration able to create all default resources from scratch
  • Resync finishes successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Scheduled resync able to abort existing resync and start a new one
  • Tested with at least 2 integrations from scratch
  • Tested with Kafka and Polling event listeners
  • Tested deletion of entities that don't pass the selector

Integration testing checklist

  • Integration able to create all default resources from scratch
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Resync finishes successfully
  • If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • Docs PR link here

Preflight checklist

  • Handled rate limiting
  • Handled pagination
  • Implemented the code in async
  • Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

Screenshot 2025-03-27 at 23 45 00 Screenshot 2025-03-27 at 23 45 08 Screenshot 2025-03-27 at 23 45 14 Screenshot 2025-03-27 at 23 45 32

API Documentation

Provide links to the API documentation used for this integration.

Additional Implementation Details:

  1. Rate Limiting:
    • Uses GitHub's rate limit headers (X-RateLimit-)
    • Semaphore for concurrent request limiting
    • Automatic backoff when limits are reached
    • Logging of rate limit status
  2. Pagination:
    • Implements GitHub's page-based pagination
    • Configurable page size (default 100)
    • Eficient async processing of pages
    • Proper handling of empty results
  3. Webhook Support:
    • Organization-level webhook creation
    • Event-specific processors
    • Secure webhook validation
    • Real-time entity updates
  4. Resource Processing:
    • Efficient batch processing
    • Proper error handling
    • Detailed logging
    • Resource relationship mapping

PR Type

Enhancement, Documentation, Tests


Description

  • Added a new GitHub integration to sync repositories, pull requests, issues, teams, and workflows.

  • Implemented a GitHub client with rate-limited API requests.

  • Defined resource blueprints and mappings for GitHub entities in Port.

  • Included example environment configuration and documentation for setup and contribution.


Changes walkthrough 📝

Relevant files
Enhancement
6 files
client.py
Implement GitHub client with API rate-limiting                     
+84/-0   
debug.py
Add debug entry point for GitHub integration                         
+4/-0     
integration.py
Define GitHub integration logic and resource handling       
+97/-0   
main.py
Add main entry point for GitHub integration                           
+86/-0   
blueprints.json
Define resource blueprints for GitHub entities                     
+228/-0 
port-app-config.yml
Add Port app configuration for GitHub integration               
+90/-0   
Tests
1 files
test_sample.py
Add placeholder test for GitHub integration                           
+2/-0     
Configuration changes
4 files
launch.json
Add VSCode debug configuration for GitHub integration       
+14/-1   
poetry.toml
Configure Poetry virtual environment for GitHub integration
+3/-0     
pyproject.toml
Define project dependencies and tools for GitHub integration
+113/-0 
sonar-project.properties
Add SonarQube configuration for GitHub integration             
+2/-0     
Documentation
5 files
.env.example
Provide example environment configuration for GitHub integration
+11/-0   
spec.yaml
Specify GitHub integration features and configurations     
+26/-0   
CHANGELOG.md
Add changelog for GitHub integration                                         
+8/-0     
CONTRIBUTING.md
Add contributing guidelines for GitHub integration             
+7/-0     
README.md
Add README for GitHub integration                                               
+7/-0     
Miscellaneous
1 files
Makefile
Add Makefile for GitHub integration infrastructure             
+1/-0     
Additional files
3 files
.DS_Store [link]   
.DS_Store [link]   
__init__.py [link]   

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 Security concerns

    Sensitive information exposure:
    The GitHub token is being printed to logs in integration.py line 51: print(f"Initializing GitHub integration for organization: {self.github_org} {self.github_token}"). This exposes sensitive credentials that could be used to access the GitHub account. The token should never be logged or printed.

    ⚡ Recommended focus areas for review

    Error Handling

    The error handling in _make_request method catches all exceptions generically. This could mask specific API errors that should be handled differently (like authentication issues vs rate limiting).

    except Exception as e:
        logger.error(f"GitHub API request failed: {str(e)}")
        raise
    Sensitive Data Exposure

    The integration is printing the GitHub token during initialization, which could expose sensitive credentials in logs.

    print(f"Initializing GitHub integration for organization: {self.github_org} {self.github_token}")
    self.client = GitHubClient(token=self.github_token, org=self.github_org)
    Duplicate Code

    The resource fetching logic is duplicated between integration.py and main.py files, which could lead to maintenance issues if one is updated but not the other.

    @ocean.on_resync()
    async def on_resync(kind: str) -> List[Dict[Any, Any]]:
        """Handle resync events for different kinds of resources"""
        if not github_client:
            raise RuntimeError("GitHub client not initialized")
    
        if kind == "repository":
            return await github_client.get_repositories()
    
        elif kind == "pull-request":
            all_prs = []
            repos = await github_client.get_repositories()
            for repo in repos:
                prs = await github_client.get_pull_requests(repo["name"])
                all_prs.extend(prs)
            return all_prs
    
        elif kind == "issue":
            all_issues = []
            repos = await github_client.get_repositories()
            for repo in repos:
                issues = await github_client.get_issues(repo["name"])
                all_issues.extend(issues)
            return all_issues
    
        elif kind == "team":
            return await github_client.get_teams()
    
        elif kind == "workflow":
            all_workflows = []
            repos = await github_client.get_repositories()
            for repo in repos:
                workflows = await github_client.get_workflows(repo["name"])
                # Enrich workflow data with repository information
                for workflow in workflows:
                    workflow["repository"] = repo
                    if "latest_run" not in workflow:
                        workflow["latest_run"] = {"status": "unknown"}
                all_workflows.extend(workflows)
            return all_workflows
    
        return []

    Copy link
    Contributor

    qodo-merge-pro bot commented Mar 23, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Security
    Remove sensitive data exposure

    Avoid printing sensitive information like tokens in log messages. The GitHub
    token is being exposed in the log, which is a security risk. Remove the token
    from the log message.

    integrations/github/integration.py [51]

    -print(f"Initializing GitHub integration for organization: {self.github_org} {self.github_token}")
    +print(f"Initializing GitHub integration for organization: {self.github_org}")
    • Apply this suggestion
    Suggestion importance[1-10]: 10

    __

    Why: Exposing sensitive information like authentication tokens in logs is a critical security vulnerability. This could lead to unauthorized access if logs are exposed or shared.

    High
    Possible issue
    Fix duplicate function definition
    Suggestion Impact:The commit completely refactored the file, including removing the duplicate on_start() function at the bottom of the file. The commit keeps only one on_start() function at the top of the file.

    code diff:

     @ocean.on_start()
     async def on_start() -> None:
    -    """Initialize the GitHub client when the integration starts"""
    -    global github_client
    -    if not github_token or not github_org:
    -        raise ValueError("GITHUB_TOKEN and GITHUB_ORGANIZATION environment variables are required")
    -    
    -    print(f"Starting GitHub integration for organization: {github_org}")
    -    github_client = GitHubClient(token=github_token, org=github_org)
    +    logger.info("Starting Port Ocean GitHub integration")
     
    -@ocean.on_resync()
    -async def on_resync(kind: str) -> List[Dict[Any, Any]]:
    -    """Handle resync events for different kinds of resources"""
    -    if not github_client:
    -        raise RuntimeError("GitHub client not initialized")
    +def init_client() -> GitHubClient:
    +    return GitHubClient(
    +        token=ocean.integration_config.get_secret("github_token"),
    +        org=ocean.integration_config.get("organization")
    +    )
     
    -    if kind == "repository":
    -        return await github_client.get_repositories()
    -    
    -    elif kind == "pull-request":
    -        all_prs = []
    -        repos = await github_client.get_repositories()
    -        for repo in repos:
    -            prs = await github_client.get_pull_requests(repo["name"])
    -            all_prs.extend(prs)
    -        return all_prs
    -    
    -    elif kind == "issue":
    -        all_issues = []
    -        repos = await github_client.get_repositories()
    -        for repo in repos:
    -            issues = await github_client.get_issues(repo["name"])
    -            all_issues.extend(issues)
    -        return all_issues
    -    
    -    elif kind == "team":
    -        return await github_client.get_teams()
    -    
    -    elif kind == "workflow":
    -        all_workflows = []
    -        repos = await github_client.get_repositories()
    -        for repo in repos:
    -            workflows = await github_client.get_workflows(repo["name"])
    -            # Enrich workflow data with repository information
    -            for workflow in workflows:
    -                workflow["repository"] = repo
    -                if "latest_run" not in workflow:
    -                    workflow["latest_run"] = {"status": "unknown"}
    -            all_workflows.extend(workflows)
    -        return all_workflows
    +@ocean.on_resync(ObjectKind.REPOSITORY)
    +async def resync_repositories(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
    +    """Resync all repositories in the organization."""
    +    client = init_client()
    +    async for repositories in client.get_repositories():
    +        yield repositories
     
    -    return []
    +@ocean.on_resync(ObjectKind.PULL_REQUEST)
    +async def resync_pull_requests(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
    +    """Resync all pull requests from all repositories."""
    +    client = init_client()
    +    async for repositories in client.get_repositories():
    +        tasks = [
    +            client.get_pull_requests(repo["name"])
    +            for repo in repositories
    +        ]
    +        async for batch in stream_async_iterators_tasks(*tasks):
    +            yield batch
     
    +@ocean.on_resync(ObjectKind.ISSUE)
    +async def resync_issues(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
    +    """Resync all issues from all repositories."""
    +    client = init_client()
    +    async for repositories in client.get_repositories():
    +        tasks = [
    +            client.get_issues(repo["name"])
    +            for repo in repositories
    +        ]
    +        async for batch in stream_async_iterators_tasks(*tasks):
    +            yield batch
     
    -# The same sync logic can be registered for one of the kinds that are available in the mapping in port.
    -# @ocean.on_resync('project')
    -# async def resync_project(kind: str) -> list[dict[Any, Any]]:
    -#     # 1. Get all projects from the source system
    -#     # 2. Return a list of dictionaries with the raw data of the state
    -#     return [{"some_project_key": "someProjectValue", ...}]
    -#
    -# @ocean.on_resync('issues')
    -# async def resync_issues(kind: str) -> list[dict[Any, Any]]:
    -#     # 1. Get all issues from the source system
    -#     # 2. Return a list of dictionaries with the raw data of the state
    -#     return [{"some_issue_key": "someIssueValue", ...}]
    +@ocean.on_resync(ObjectKind.TEAM)
    +async def resync_teams(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
    +    """Resync all teams in the organization."""
    +    client = init_client()
    +    async for teams in client.get_teams():
    +        yield teams
     
    +@ocean.on_resync(ObjectKind.WORKFLOW)
    +async def resync_workflows(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
    +    """Resync all workflows from all repositories."""
    +    client = init_client()
    +    async for repositories in client.get_repositories():
    +        tasks = []
    +        for repo in repositories:
    +            async for workflows in client.get_workflows(repo["name"]):
    +                # Enrich workflow data with repository information
    +                for workflow in workflows:
    +                    workflow["repository"] = repo
    +                    runs = await client.get_workflow_runs(repo["name"], workflow["id"], per_page=1)
    +                    workflow["latest_run"] = runs[0] if runs else {"status": "unknown"}
    +                tasks.append(workflows)
    +        
    +        async for batch in stream_async_iterators_tasks(*tasks):
    +            yield batch
     
    -# Optional
    -# Listen to the start event of the integration. Called once when the integration starts.
    -@ocean.on_start()
    -async def on_start() -> None:
    -    # Something to do when the integration starts
    -    # For example create a client to query 3rd party services - GitHub, Jira, etc...
    -    print("Starting github integration")

    There are two on_start() functions defined in the file. The second one at the
    bottom of the file will override the first one, causing the GitHub client
    initialization to be skipped. Remove the duplicate function or merge their
    functionality.

    integrations/github/main.py [12-20]

     @ocean.on_start()
     async def on_start() -> None:
         """Initialize the GitHub client when the integration starts"""
         global github_client
         if not github_token or not github_org:
             raise ValueError("GITHUB_TOKEN and GITHUB_ORGANIZATION environment variables are required")
         
         print(f"Starting GitHub integration for organization: {github_org}")
         github_client = GitHubClient(token=github_token, org=github_org)
    +    print("Starting github integration")

    [Suggestion has been applied]

    Suggestion importance[1-10]: 9

    __

    Why: The duplicate on_start() function at the end of the file would override the first implementation, causing the GitHub client initialization to be skipped, which would break the integration's functionality.

    High
    • Update

    Copy link

    This pull request is automatically being deployed by Amplify Hosting (learn more).

    Access this pull request here: https://pr-1507.d1ftd8v2gowp8w.amplifyapp.com

    @emekanwaoma emekanwaoma changed the title feat: add github integration [Integration][Github] Added New Github Integration Mar 26, 2025
    @emekanwaoma emekanwaoma changed the title [Integration][Github] Added New Github Integration [Integration][Github] Added Github Ocean Integration Mar 26, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants