-
Notifications
You must be signed in to change notification settings - Fork 236
docs: cicd example guide #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
PeriniM
wants to merge
14
commits into
main
Choose a base branch
from
marco/cicd-example
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+327
−1
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
cb5ddb2
docs: cicd example guide
PeriniM 60ace92
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM 2ef300b
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM ff81677
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM bb2d417
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM ace2699
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM e410301
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM ebd89c4
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM a80f717
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM ef2f262
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM e87a41f
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM a18dcee
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM 5f250d8
Update src/langsmith/cicd-pipeline-example.mdx
PeriniM ac26424
docs: new img, merged deployment sections
PeriniM File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,325 @@ | ||
--- | ||
title: Implement a CI/CD pipeline using LangSmith Deployments and Evaluation | ||
sidebarTitle: Implement a CI/CD pipeline | ||
--- | ||
|
||
This guide demonstrates how to implement a comprehensive CI/CD pipeline for AI agent applications deployed in LangSmith Deployments. In this example, you'll use the [LangGraph](/oss/langgraph/overview) open source framework for orchestrating and building the agent, [LangSmith](/langsmith/home) for observability and evaluations. This pipeline is based on the [cicd-pipeline-example repository](https://github.com/langchain-ai/cicd-pipeline-example). | ||
|
||
## Overview | ||
|
||
The CI/CD pipeline provides: | ||
|
||
- <Icon icon="check-circle" /> **Automated testing**: Unit, integration, and end-to-end tests. | ||
- <Icon icon="chart-line" /> **Offline evaluations**: Performance assessment using [AgentEvals](https://github.com/langchain-ai/agentevals), [OpenEvals](https://github.com/langchain-ai/openevals) and [LangSmith](https://docs.langchain.com/langsmith/home). | ||
- <Icon icon="rocket" /> **Preview and production deployments**: Automated staging and quality-gated production releases using the Control Plane API. | ||
- <Icon icon="eye" /> **Monitoring**: Continuous evaluation and alerting. | ||
|
||
## Pipeline architecture | ||
|
||
The CI/CD pipeline consists of several key components that work together to ensure code quality and reliable deployments: | ||
|
||
```mermaid | ||
graph TD | ||
A1[Code or Graph Change] --> B1[Trigger CI Pipeline] | ||
A2[Prompt Commit in PromptHub] --> B1 | ||
A3[Online Evaluation Alert] --> B1 | ||
A4[PR Opened] --> B1 | ||
|
||
subgraph "Testing" | ||
B1 --> C1[Run Unit Tests] | ||
B1 --> C2[Run Integration Tests] | ||
B1 --> C3[Run End to End Tests] | ||
B1 --> C4[Run Offline Evaluations] | ||
|
||
C4 --> D1[Evaluate with OpenEvals or AgentEvals] | ||
C4 --> D2[Assertions: Hard and Soft] | ||
|
||
C1 --> E1[Run LangGraph Dev Server Test] | ||
C2 --> E1 | ||
C3 --> E1 | ||
D1 --> E1 | ||
D2 --> E1 | ||
end | ||
|
||
E1 --> F1[Push to Staging Deployment - Deploy to LangSmith as Development Type] | ||
|
||
F1 --> G1[Run Online Evaluations on Live Data] | ||
G1 --> H1[Attach Scores to Traces] | ||
|
||
H1 --> I1[If Quality Below Threshold] | ||
I1 --> J1[Send to Annotation Queue] | ||
I1 --> J2[Trigger Alert via Webhook] | ||
I1 --> J3[Push Trace to Golden Dataset] | ||
|
||
F1 --> K1[Promote to Production if All Pass - Deploy to LangSmith Production] | ||
|
||
J2 --> L1[Slack or PagerDuty Notification] | ||
|
||
subgraph Manual Review | ||
J1 --> M1[Human Labeling] | ||
M1 --> J3 | ||
end | ||
``` | ||
|
||
### Trigger sources | ||
|
||
There are multiple ways you can trigger this pipeline, either during development or if your application is already live. The pipeline can be triggered by: | ||
|
||
- <Icon icon="code-branch" /> **Code changes**: Pushes to main/development branches where you can modify the LangGraph architecture, try different models, update agent logic, or make any code improvements. | ||
- <Icon icon="edit" /> **PromptHub updates**: Changes to prompt templates stored in LangSmith PromptHub—whenever there's a new prompt commit, the system triggers a webhook to run the pipeline. | ||
- <Icon icon="exclamation-triangle" /> **Online evaluation alerts**: Performance degradation notifications from live deployments | ||
- <Icon icon="webhook" /> **LangSmith traces webhooks**: Automated triggers based on trace analysis and performance metrics. | ||
- <Icon icon="play" /> **Manual trigger**: Manual initiation of the pipeline for testing or emergency deployments. | ||
|
||
### Testing layers | ||
|
||
Compared to traditional software, testing AI agent applications also requires assessing response quality, so it is important to test each part of the workflow. The pipeline implements multiple testing layers: | ||
|
||
1. <Icon icon="puzzle-piece" /> **Unit tests**: Individual node and utility function testing. | ||
2. <Icon icon="link" /> **Integration tests**: Component interaction testing. | ||
3. <Icon icon="route" /> **End-to-end tests**: Full graph execution testing. | ||
4. <Icon icon="brain" /> **Offline evaluations**: Performance assessment with real-world scenarios including end-to-end evaluations, single-step evaluations, agent trajectory analysis, and multi-turn simulations. | ||
5. <Icon icon="server" /> **LangGraph dev server tests**: Use the [langgraph-cli](/langsmith/cli) tool for spinning up (inside the GitHub Action) a local server to run the LangGraph agent. This polls the `/ok` server API endpoint until it is available and for 30 seconds, after that it throws an error. | ||
|
||
## GitHub Actions Workflow | ||
|
||
The CI/CD pipeline uses GitHub Actions with the [Control Plane API](/langsmith/api-ref-control-plane) and [LangSmith API](/langsmith/api-reference) to automate deployment. A helper script manages API interactions and deployments: https://github.com/langchain-ai/cicd-pipeline-example/blob/main/.github/scripts/langgraph_api.py. | ||
|
||
The workflow includes: | ||
|
||
- **New agent deployment**: When a new PR is opened and tests pass, a new preview deployment is created in LangSmith Deployments using the [Control Plane API](/langsmith/api-ref-control-plane). This allows you to test the agent in a staging environment before promoting to production. | ||
|
||
- **Agent deployment revision**: A revision happens when an existing deployment with the same ID is found, or when the PR is merged into main. In the case of merging to main, the preview deployment is deleted and a production deployment is created. This ensures that any updates to the agent are properly deployed and integrated into the production infrastructure. | ||
|
||
 | ||
|
||
- **Testing and evaluation workflow**: In addition to the more traditional testing phases (unit tests, integration tests, end-to-end tests, etc.), the pipeline includes [offline evaluations](/langsmith/evaluation-concepts#offline-evaluation) and [LangGraph dev server testing](/langsmith/local-server) because you want to test the quality of your agent. These evaluations provide comprehensive assessment of the agent's performance using real-world scenarios and data. | ||
|
||
 | ||
|
||
<AccordionGroup> | ||
<Accordion title="Final Response Evaluation" icon="check-circle"> | ||
Evaluates the final output of your agent against expected results. This is the most common type of evaluation that checks if the agent's final response meets quality standards and answers the user's question correctly. | ||
</Accordion> | ||
|
||
<Accordion title="Single Step Evaluation" icon="step-forward"> | ||
Tests individual steps or nodes within your LangGraph workflow. This allows you to validate specific components of your agent's logic in isolation, ensuring each step functions correctly before testing the full pipeline. | ||
</Accordion> | ||
|
||
<Accordion title="Agent Trajectory Evaluation" icon="route"> | ||
Analyzes the complete path your agent takes through the graph, including all intermediate steps and decision points. This helps identify bottlenecks, unnecessary steps, or suboptimal routing in your agent's workflow. It also evaluates whether your agent invoked the right tools in the right order or at the right time. | ||
</Accordion> | ||
|
||
<Accordion title="Multi-Turn Evaluation" icon="comments"> | ||
Tests conversational flows where the agent maintains context across multiple interactions. This is crucial for agents that handle follow-up questions, clarifications, or extended dialogues with users. | ||
</Accordion> | ||
</AccordionGroup> | ||
|
||
See the [LangGraph testing documentation](/oss/python/langgraph/test) for specific testing approaches and the [evaluation approaches guide](/langsmith/evaluation-approaches) for a comprehensive overview of offline evaluations. | ||
|
||
### Prerequisites | ||
|
||
Before setting up the CI/CD pipeline, ensure you have: | ||
|
||
- <Icon icon="robot" /> An AI agent application (in this case built using [LangGraph](/oss/langgraph/overview)) | ||
- <Icon icon="user" /> A [LangSmith account](https://smith.langchain.com/) | ||
- <Icon icon="key" /> A [LangSmith API key](/langsmith/create-account-api-key) needed to deploy agents and retrieve experiment results | ||
- <Icon icon="cog" /> Project-specific environment variables configured in your repository secrets (e.g., LLM model API keys, vector store credentials, database connections) | ||
|
||
<Note> | ||
While this example uses GitHub, the CI/CD pipeline works with other Git hosting platforms including GitLab, Bitbucket, and others. | ||
</Note> | ||
|
||
## Deployment options | ||
|
||
LangSmith supports multiple deployment methods, depending on how your [LangSmith instance is hosted](/langsmith/hosting): | ||
|
||
- <Icon icon="cloud" /> **Cloud LangSmith**: Direct GitHub integration or Docker image deployment. | ||
- <Icon icon="server" /> **Self-Hosted/Hybrid**: Container registry-based deployments. | ||
|
||
The deployment flow starts by modifying your agent implementation. At minimum, you must have a [`langgraph.json`](/langsmith/application-structure) and dependency file in your project (`requirements.txt` or `pyproject.toml`). Use the `langgraph dev` CLI tool to check for errors—fix any errors; otherwise, the deployment will succeed when deployed to LangSmith Deployments. | ||
|
||
```mermaid | ||
graph TD | ||
A[Agent Implementation] --> B[langgraph.json + dependencies] | ||
B --> C[Test Locally with langgraph dev] | ||
C --> D{Errors?} | ||
D -->|Yes| E[Fix Issues] | ||
E --> C | ||
D -->|No| F[Choose LangSmith Instance] | ||
|
||
F --> G[Cloud LangSmith] | ||
F --> H[Self-Hosted/Hybrid LangSmith] | ||
|
||
subgraph "Cloud LangSmith" | ||
G --> I[Method 1: Connect GitHub Repo in UI] | ||
G --> J[Method 2: Docker Image] | ||
I --> K[Deploy via LangSmith UI] | ||
J --> L[Build Docker Image langgraph build] | ||
L --> M[Push to Container Registry] | ||
M --> N[Deploy via Control Plane API] | ||
end | ||
|
||
subgraph "Self-Hosted/Hybrid LangSmith" | ||
H --> S[Build Docker Image langgraph build] | ||
S --> T[Push to Container Registry] | ||
T --> U{Deploy via?} | ||
U -->|UI| V[Specify Image URI in UI] | ||
U -->|API| W[Use Control Plane API] | ||
V --> X[Deploy via LangSmith UI] | ||
W --> Y[Deploy via Control Plane API] | ||
end | ||
|
||
K --> AA[Agent Ready for Use] | ||
N --> AA | ||
X --> AA | ||
Y --> AA | ||
|
||
AA --> BB{Connect via?} | ||
BB -->|LangGraph SDK| CC[Use LangGraph SDK] | ||
BB -->|RemoteGraph| DD[Use RemoteGraph] | ||
BB -->|REST API| EE[Use REST API] | ||
BB -->|LangGraph Studio UI| FF[Use LangGraph Studio UI] | ||
``` | ||
|
||
### Prerequisites for manual deployment | ||
|
||
Before deploying your agent, ensure you have: | ||
|
||
1. <Icon icon="project-diagram" /> **LangGraph graph**: Your agent implementation (e.g., `./agents/simple_text2sql.py:agent`). | ||
2. <Icon icon="box" /> **Dependencies**: Either `requirements.txt` or `pyproject.toml` with all required packages. | ||
3. <Icon icon="cog" /> **Configuration**: `langgraph.json` file specifying: | ||
- Path to your agent graph | ||
- Dependencies location | ||
- Environment variables | ||
- Python version | ||
|
||
Example `langgraph.json`: | ||
```json | ||
{ | ||
"graphs": { | ||
"simple_text2sql": "./agents/simple_text2sql.py:agent" | ||
}, | ||
"env": ".env", | ||
"python_version": "3.11", | ||
"dependencies": ["."], | ||
"image_distro": "wolfi" | ||
} | ||
``` | ||
|
||
### Local development and testing | ||
|
||
 | ||
|
||
First, test your agent locally using [Studio](/langsmith/studio): | ||
|
||
```bash | ||
# Start local development server with LangGraph Studio | ||
langgraph dev | ||
``` | ||
|
||
This will: | ||
- Spin up a local server with Studio. | ||
- Allow you to visualize and interact with your graph. | ||
- Validate that your agent works correctly before deployment. | ||
|
||
<Note> | ||
If your agent runs locally without any errors, it means that deployment to LangSmith will likely succeed. This local testing helps catch configuration issues, dependency problems, and agent logic errors before attempting deployment. | ||
</Note> | ||
|
||
See the [LangGraph CLI documentation](/langsmith/cli#dev) for more details. | ||
|
||
### Method 1: LangSmith Deployment UI | ||
|
||
Deploy your agent using the LangSmith deployment interface: | ||
|
||
1. Go to your [LangSmith dashboard](https://smith.langchain.com). | ||
2. Navigate to the **Deployments** section. | ||
3. Click the **+ New Deployment** button in the top right. | ||
4. Select your GitHub repository containing your LangGraph agent from the dropdown menu. | ||
|
||
**Supported deployments:** | ||
- <Icon icon="cloud" /> **Cloud LangSmith**: Direct GitHub integration with dropdown menu | ||
- <Icon icon="server" /> **Self-Hosted/Hybrid LangSmith**: Specify your image URI in the Image Path field (e.g., `docker.io/username/my-agent:latest`) | ||
|
||
<Info> | ||
**Benefits:** | ||
- Simple UI-based deployment | ||
- Direct integration with your GitHub repository (cloud) | ||
- No manual Docker image management required (cloud) | ||
</Info> | ||
|
||
### Method 2: Control Plane API | ||
|
||
Build a Docker image and deploy using the Control Plane API: | ||
|
||
```bash | ||
# Build Docker image | ||
langgraph build -t my-agent:latest | ||
|
||
# Push to your container registry | ||
docker push my-agent:latest | ||
``` | ||
|
||
You can push to any container registry (Docker Hub, AWS ECR, Azure ACR, Google GCR, etc.) that your deployment environment has access to. | ||
|
||
**Supported deployments:** | ||
- <Icon icon="cloud" /> **Cloud LangSmith**: Use the Control Plane API to create deployments from your container registry | ||
- <Icon icon="server" /> **Self-Hosted/Hybrid LangSmith**: Use the Control Plane API to create deployments from your container registry | ||
|
||
See the [LangGraph CLI build documentation](/langsmith/cli#build) for more details. | ||
|
||
### Connect to Your Deployed Agent | ||
|
||
- <Icon icon="code" /> **[LangGraph SDK](https://langchain-ai.github.io/langgraph/cloud/reference/sdk/python_sdk_ref/#langgraph-sdk-python)**: Use the LangGraph SDK for programmatic integration. | ||
- <Icon icon="project-diagram" /> **[RemoteGraph](/langsmith/use-remote-graph)**: Connect using RemoteGraph for remote graph connections (to use your graph in other graphs). | ||
- <Icon icon="globe" /> **[REST API](/langsmith/server-api-ref)**: Use HTTP-based interactions with your deployed agent. | ||
- <Icon icon="desktop" /> **[Studio](/langsmith/studio)**: Access the visual interface for testing and debugging. | ||
|
||
### Environment Configuration | ||
|
||
#### Database & Cache Configuration | ||
|
||
By default, LangSmith Deployments create PostgreSQL and Redis instances for you. To use external services, set the following environment variables in your new deployment or revision: | ||
|
||
```bash | ||
# Set environment variables for external services | ||
export POSTGRES_URI_CUSTOM="postgresql://user:pass@host:5432/db" | ||
export REDIS_URI_CUSTOM="redis://host:6379/0" | ||
``` | ||
|
||
See the [environment variables documentation](/langsmith/env-var#postgres-uri-custom) for more details. | ||
|
||
## Troubleshooting | ||
|
||
### Wrong API Endpoints | ||
|
||
If you're experiencing connection issues, verify you're using the correct endpoint format for your LangSmith instance. There are two different APIs with different endpoints: | ||
|
||
#### LangSmith API (Traces, Ingestion, etc.) | ||
|
||
For LangSmith API operations (traces, evaluations, datasets): | ||
|
||
| Region | Endpoint | | ||
|--------|----------| | ||
| US | `https://api.smith.langchain.com` | | ||
| EU | `https://eu.api.smith.langchain.com` | | ||
|
||
For self-hosted LangSmith instances, use `http(s)://<langsmith-url>/api` where `<langsmith-url>` is your self-hosted instance URL. | ||
|
||
<Note> | ||
If you're setting the endpoint in the `LANGSMITH_ENDPOINT` environment variable, you need to add `/v1` at the end (e.g., `https://api.smith.langchain.com/v1`). | ||
</Note> | ||
|
||
#### LangSmith Deployments API (Deployments) | ||
|
||
For LangSmith Deployments operations (deployments, revisions): | ||
|
||
| Region | Endpoint | | ||
|--------|----------| | ||
| US | `https://api.host.langchain.com` | | ||
| EU | `https://eu.api.host.langchain.com` | | ||
|
||
For self-hosted LangSmith instances, use `http(s)://<langsmith-url>/api-host` where `<langsmith-url>` is your self-hosted instance URL. | ||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.