Skip to content

feat: Add GitHub integration with agent_prompts and github_components #1637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 53 commits into from
May 28, 2025

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Apr 10, 2025

Related Issues

Proposed Changes:

  • Move github_components from experimental to a new integration
  • Move agent_prompts from experimental to a new integration
  • Add tools that wrap the new components

The idea is to enable users to run the example notebook (or a version with updated imports) after having installed this new integration.

How did you test it?

New unit tests and I ran all usage examples successfully with a test repo.

I have tested it with the updated notebook. I'll update the cookbook PR once the integration is released: deepset-ai/haystack-cookbook#183

Notes for the reviewer

  • I suggest we rename github_token parameter to api_key for consistency with many other integrations.
  • While we could find a way to set up integration tests, I would rather leave them out of this PR.
  • GithubRepositoryViewer has a branch parameter in the run method, which could also be named ref to make more clear it can also be a tag or commit hash. I prefer keeping the parameter name branch.
  • Some components have github_token: Optional[Secret] = None, because they can work without any token while others use Secret.from_env_var("GITHUB_TOKEN"). I suggest we use Secret.from_env_var("GITHUB_TOKEN", strict=False) where we currently have None as the default.
  • The internal implementation of the components differs in how they use _get_headers or _get_request_headers or define headers inline. We could refactor that.

Checklist

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Apr 10, 2025
@julian-risch julian-risch marked this pull request as ready for review April 25, 2025 10:28
@julian-risch
Copy link
Member Author

@sjrl I added a test called test_pipeline_serialization, added _get_request_headers to all components and here is the example notebook with updated code up until the GitHub token is required. I commented out "message": {"source": "documents", "handler": message_handler}, because it didn't work for me and need to ask @mathislucka for advice.

https://colab.research.google.com/drive/1ktlwQ-CDLGDs2uZXvzgG8XspfjPidYqZ?usp=sharing

@sjrl If GitHubFileEditorTool looks good to you, I will add tools for all other components and probably update the directory structure a bit.

@sjrl
Copy link
Contributor

sjrl commented May 6, 2025

@sjrl I added a test called test_pipeline_serialization, added _get_request_headers to all components and here is the example notebook with updated code up until the GitHub token is required. I commented out "message": {"source": "documents", "handler": message_handler}, because it didn't work for me and need to ask @mathislucka for advice.

@julian-risch This is related to the change we made to tools to have a new variable called outputs_to_string. So the google colab code should be updated to

    ...
    outputs_to_state={
        #"message": {"source": "documents", "handler": message_handler}, TODO
        "documents": {"source": "documents"},
    },
    outputs_to_string={"source": "documents", "handler": message_handler}
    ...

@julian-risch
Copy link
Member Author

julian-risch commented May 27, 2025

@sjrl Finally ready for another review! We're using "data" instead of "init_parameters" in serialization now and all newly implemented tools expose outputs_to_string, inputs_from_state, and outputs_to_state as init parameters.
I tested this by running the updated notebook. The Agent forked the repo and committed to a branch.

What do you think about the parameter name github_token? In almost every other place, we use api_key, so for consistency with the many other integrations, we could rename github_token to api_key or leave it as is.

@julian-risch julian-risch requested a review from sjrl May 27, 2025 11:46
Comment on lines 18 to 19
:param name: Optional name for the tool.
:param description: Optional description.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do realize this is a bit confusing in our tools, but it seems that if we define a __init__ then these docstrings are put under the __init__ def. If there is no __init__ defined like in Tool then we put it in the class description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course! Do you think we should a usage example here then (in addition to moving the param docstrings to the init)? I realized that's missing too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if it's not too much to ask, a usage example would be great!

Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only some minor comments left

@julian-risch julian-risch merged commit 3095079 into main May 28, 2025
11 checks passed
@julian-risch julian-risch deleted the move-github-components branch May 28, 2025 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:github topic:CI type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants