Skip to content

Commit 2b0dac4

Browse files
feat: Ask Sourcebot (#392)
Co-authored-by: msukkari <[email protected]>
1 parent eb20027 commit 2b0dac4

File tree

143 files changed

+16284
-818
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

143 files changed

+16284
-818
lines changed

.github/workflows/_gcp-deploy.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ jobs:
6060
NEXT_PUBLIC_SENTRY_ENVIRONMENT=${{ vars.NEXT_PUBLIC_SENTRY_ENVIRONMENT }}
6161
NEXT_PUBLIC_SENTRY_WEBAPP_DSN=${{ vars.NEXT_PUBLIC_SENTRY_WEBAPP_DSN }}
6262
NEXT_PUBLIC_SENTRY_BACKEND_DSN=${{ vars.NEXT_PUBLIC_SENTRY_BACKEND_DSN }}
63+
NEXT_PUBLIC_LANGFUSE_PUBLIC_KEY=${{ vars.NEXT_PUBLIC_LANGFUSE_PUBLIC_KEY }}
64+
NEXT_PUBLIC_LANGFUSE_BASE_URL=${{ vars.NEXT_PUBLIC_LANGFUSE_BASE_URL }}
6365
SENTRY_SMUAT=${{ secrets.SENTRY_SMUAT }}
6466
SENTRY_ORG=${{ vars.SENTRY_ORG }}
6567
SENTRY_WEBAPP_PROJECT=${{ vars.SENTRY_WEBAPP_PROJECT }}

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added
11+
- Introducing Ask Sourcebot - ask natural langauge about your codebase. Get back comprehensive Markdown responses with inline citations back to the code. Bring your own LLM api key. [#392](https://github.com/sourcebot-dev/sourcebot/pull/392)
12+
1013
## [4.5.3] - 2025-07-20
1114

1215
### Changed

README.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -44,16 +44,20 @@
4444

4545
# About
4646

47-
Sourcebot lets you index all your repos and branches across multiple code hosts (GitHub, GitLab, Bitbucket, Gitea, or Gerrit) and search through them using a blazingly fast interface.
47+
Sourcebot is a self-hosted tool that helps you understand your codebase.
4848

49-
https://github.com/user-attachments/assets/ced355f3-967e-4f37-ae6e-74ab8c06b9ec
49+
- **Ask Sourcebot:** Ask questions about your codebase and have Sourcebot provide detailed answers grounded with inline citations.
50+
- **Code search:** Search and navigate across all your repos and branches, no matter where they’re hosted.
51+
52+
https://github.com/user-attachments/assets/286ad97a-a543-4eef-a2f1-4fa31bea1b32
5053

5154

5255
## Features
5356
- 💻 **One-command deployment**: Get started instantly using Docker on your own machine.
54-
- 🔍 **Multi-repo search**: Index and search through multiple public and private repositories and branches on GitHub, GitLab, Bitbucket, Gitea, or Gerrit.
55-
-**Lightning fast performance**: Built on top of the powerful [Zoekt](https://github.com/sourcegraph/zoekt) search engine.
56-
- 🎨 **Modern web app**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation
57+
- 🤖 **Bring your own model**: Connect Sourcebot to any of the reasoning models you're already using.
58+
- 🔍 **Multi-repo support**: Index and search through multiple public and private repositories and branches on GitHub, GitLab, Bitbucket, Gitea, or Gerrit.
59+
-**Lightning fast performance**: Built on top of the powerful [Zoekt](https://github.com/sourcegraph/zoekt) search engine.
60+
- 🎨 **Modern web app**: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation.
5761
- 📂 **Full file visualization**: Instantly view the entire file when selecting any search result.
5862

5963
You can try out our public hosted demo [here](https://demo.sourcebot.dev)!

docs/docs.json

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,21 @@
2828
"group": "Features",
2929
"pages": [
3030
{
31-
"group": "Search",
31+
"group": "Code Search",
3232
"pages": [
33+
"docs/features/search/overview",
3334
"docs/features/search/syntax-reference",
3435
"docs/features/search/multi-branch-indexing",
3536
"docs/features/search/search-contexts"
3637
]
3738
},
39+
{
40+
"group": "Ask Sourcebot",
41+
"pages": [
42+
"docs/features/ask/overview",
43+
"docs/features/ask/add-model-providers"
44+
]
45+
},
3846
"docs/features/code-navigation",
3947
"docs/features/analytics",
4048
"docs/features/mcp-server",
@@ -51,6 +59,7 @@
5159
{
5260
"group": "Configuration",
5361
"pages": [
62+
"docs/configuration/config-file",
5463
{
5564
"group": "Indexing your code",
5665
"pages": [
@@ -66,8 +75,7 @@
6675
"docs/connections/request-new"
6776
]
6877
},
69-
"docs/license-key",
70-
"docs/configuration/environment-variables",
78+
"docs/configuration/language-model-providers",
7179
{
7280
"group": "Authentication",
7381
"pages": [
@@ -78,6 +86,8 @@
7886
"docs/configuration/auth/faq"
7987
]
8088
},
89+
"docs/configuration/environment-variables",
90+
"docs/license-key",
8191
"docs/configuration/transactional-emails",
8292
"docs/configuration/structured-logging",
8393
"docs/configuration/audit-logs"
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: Config File
3+
sidebarTitle: Config file
4+
---
5+
6+
When self-hosting Sourcebot, you **must** provide it a config file. This is done by defining a config file in a volume that's mounted to Sourcebot, and providing the path to this
7+
file in the `CONFIG_PATH` environment variable. For example:
8+
9+
```bash icon="terminal" Passing in a CONFIG_PATH to Sourcebot
10+
docker run \
11+
-v $(pwd)/config.json:/data/config.json \
12+
-e CONFIG_PATH=/data/config.json \
13+
... \ # other options
14+
ghcr.io/sourcebot-dev/sourcebot:latest
15+
```
16+
17+
The config file tells Sourcebot which repos to index, what language models to use, and various other settings as defined in the [schema](#config-file-schema).
18+
19+
# Config File Schema
20+
21+
The config file you provide Sourcebot must follow the [schema](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/index.json). This schema consists of the following properties:
22+
23+
- [Connections](/docs/connections/overview) (`connections`): Defines a set of connections that tell Sourcebot which repos to index and from where
24+
- [Language Models](/docs/configuration/language-model-providers) (`models`): Defines a set of language model providers for use with [Ask Sourcebot](/docs/features/ask)
25+
- [Settings](#settings) (`settings`): Additional settings to tweak your Sourcebot deployment
26+
- [Search Contexts](/docs/features/search/search-contexts) (`contexts`): Groupings of repos that you can search against
27+
28+
# Config File Syncing
29+
30+
Sourcebot syncs the config file on startup, and automatically whenever a change is detected.
31+
32+
# Settings
33+
34+
The following are settings that can be provided in your config file to modify Sourcebot's behavior
35+
36+
| Setting | Type | Default | Minimum | Description / Notes |
37+
|-------------------------------------------|---------|------------|---------|----------------------------------------------------------------------------------------|
38+
| `maxFileSize` | number | 2 MB | 1 | Maximum size (bytes) of a file to index. Files exceeding this are skipped. |
39+
| `maxTrigramCount` | number | 20 000 | 1 | Maximum trigrams per document. Larger files are skipped. |
40+
| `reindexIntervalMs` | number | 1 hour | 1 | Interval at which all repositories are re‑indexed. |
41+
| `resyncConnectionIntervalMs` | number | 24 hours | 1 | Interval for checking connections that need re‑syncing. |
42+
| `resyncConnectionPollingIntervalMs` | number | 1 second | 1 | DB polling rate for connections that need re‑syncing. |
43+
| `reindexRepoPollingIntervalMs` | number | 1 second | 1 | DB polling rate for repos that should be re‑indexed. |
44+
| `maxConnectionSyncJobConcurrency` | number | 8 | 1 | Concurrent connection‑sync jobs. |
45+
| `maxRepoIndexingJobConcurrency` | number | 8 | 1 | Concurrent repo‑indexing jobs. |
46+
| `maxRepoGarbageCollectionJobConcurrency` | number | 8 | 1 | Concurrent repo‑garbage‑collection jobs. |
47+
| `repoGarbageCollectionGracePeriodMs` | number | 10 seconds | 1 | Grace period to avoid deleting shards while loading. |
48+
| `repoIndexTimeoutMs` | number | 2 hours | 1 | Timeout for a single repo‑indexing run. |
49+
| `enablePublicAccess` **(deprecated)** | boolean | false || Use the `FORCE_ENABLE_ANONYMOUS_ACCESS` environment variable instead. |
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
---
2+
title: Language Model Providers
3+
sidebarTitle: Language model providers
4+
---
5+
6+
To use [Ask Sourcebot](/docs/features/ask) you must define at least one Language Model Provider. These providers are defined within the [config file](/docs/configuration/config-file) you
7+
provide Sourcebot.
8+
9+
10+
```json wrap icon="code" Example config with language model provider
11+
{
12+
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
13+
"models": [
14+
// 1. Google Vertex config for Gemini 2.5 Pro
15+
{
16+
"provider": "google-vertex",
17+
"model": "gemini-2.5-pro",
18+
"displayName": "Gemini 2.5 Pro",
19+
"project": "sourcebot",
20+
"credentials": {
21+
"env": "GOOGLE_APPLICATION_CREDENTIALS"
22+
}
23+
},
24+
// 2. OpenAI config for o3
25+
{
26+
"provider": "openai",
27+
"model": "o3",
28+
"displayName": "o3",
29+
"token": {
30+
"env": "OPENAI_API_KEY"
31+
}
32+
}
33+
]
34+
}
35+
```
36+
37+
# Supported Providers
38+
39+
Sourcebot uses the [Vercel AI SDK](https://ai-sdk.dev/docs/introduction), so it can integrate with any provider the SDK supports. If you don't see your provider below please submit
40+
a [feature request](https://github.com/sourcebot-dev/sourcebot/discussions/categories/feature-requests).
41+
42+
For a detailed description of all the providers, please refer to the [schema](https://github.com/sourcebot-dev/sourcebot/blob/main/schemas/v3/languageModel.json).
43+
44+
<Note>Any parameter defined using `env` will read the value from the corresponding environment variable you provide Sourcebot</Note>
45+
46+
### Amazon Bedrock
47+
48+
[Vercel AI SDK Amazon Bedrock Docs](https://ai-sdk.dev/providers/ai-sdk-providers/amazon-bedrock)
49+
50+
```json wrap icon="code" Example config with Amazon Bedrock provider
51+
{
52+
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
53+
"models": [
54+
{
55+
"provider": "amazon-bedrock",
56+
"model": "YOUR_MODEL_HERE",
57+
"displayName": "OPTIONAL_DISPLAY_NAME",
58+
"accessKeyId": {
59+
"env": "AWS_ACCESS_KEY_ID"
60+
},
61+
"accessKeySecret": {
62+
"env": "AWS_SECRET_ACCESS_KEY"
63+
},
64+
"region": "YOUR_REGION_HERE", // defaults to the AWS_REGION env var if not set
65+
"baseUrl": "OPTIONAL_BASE_URL"
66+
}
67+
]
68+
}
69+
```
70+
71+
### Anthropic
72+
73+
[Vercel AI SDK Anthropic Docs](https://ai-sdk.dev/providers/ai-sdk-providers/anthropic)
74+
75+
```json wrap icon="code" Example config with Anthropic provider
76+
{
77+
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
78+
"models": [
79+
{
80+
"provider": "anthropic",
81+
"model": "YOUR_MODEL_HERE",
82+
"displayName": "OPTIONAL_DISPLAY_NAME",
83+
"token": {
84+
"env": "ANTHROPIC_API_KEY"
85+
},
86+
"baseUrl": "OPTIONAL_BASE_URL"
87+
}
88+
]
89+
}
90+
```
91+
92+
### Google Generative AI
93+
94+
[Vercel AI SDK Google Generative AI Docs](https://ai-sdk.dev/providers/ai-sdk-providers/google-generative-ai)
95+
96+
```json wrap icon="code" Example config with Google Generative AI provider
97+
{
98+
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
99+
"models": [
100+
{
101+
"provider": "google-generative-ai",
102+
"model": "YOUR_MODEL_HERE",
103+
"displayName": "OPTIONAL_DISPLAY_NAME",
104+
"token": {
105+
"env": "GOOGLE_GENERATIVE_AI_API_KEY"
106+
},
107+
"baseUrl": "OPTIONAL_BASE_URL"
108+
}
109+
]
110+
}
111+
```
112+
113+
### Google Vertex
114+
115+
<Note>If you're using an Anthropic model on Google Vertex, you must define a [Google Vertex Anthropic](#google-vertex-anthropic) provider instead</Note>
116+
<Note>The `credentials` paramater here expects a **path** to a [credentials](https://console.cloud.google.com/apis/credentials) file. This file **must be in a volume mounted by Sourcebot** for it to be readable.</Note>
117+
118+
[Vercel AI SDK Google Vertex AI Docs](https://ai-sdk.dev/providers/ai-sdk-providers/google-vertex)
119+
120+
```json wrap icon="code" Example config with Google Vertex provider
121+
{
122+
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
123+
"models": [
124+
{
125+
"provider": "google-vertex",
126+
"model": "YOUR_MODEL_HERE", // e.g., "gemini-2.0-flash-exp", "gemini-1.5-pro", "gemini-1.5-flash"
127+
"displayName": "OPTIONAL_DISPLAY_NAME",
128+
"project": "YOUR_PROJECT_ID", // defaults to the GOOGLE_VERTEX_PROJECT env var if not set
129+
"region": "YOUR_REGION_HERE", // defaults to the GOOGLE_VERTEX_REGION env var if not set, e.g., "us-central1", "us-east1", "europe-west1"
130+
"credentials": {
131+
"env": "GOOGLE_APPLICATION_CREDENTIALS"
132+
},
133+
"baseUrl": "OPTIONAL_BASE_URL"
134+
}
135+
]
136+
}
137+
```
138+
139+
### Google Vertex Anthropic
140+
141+
<Note>The `credentials` paramater here expects a **path** to a [credentials](https://console.cloud.google.com/apis/credentials) file. This file **must be in a volume mounted by Sourcebot** for it to be readable.</Note>
142+
143+
144+
[Vercel AI SDK Google Vertex Anthropic Docs](https://ai-sdk.dev/providers/ai-sdk-providers/google-vertex#google-vertex-anthropic-provider-usage)
145+
146+
```json wrap icon="code" Example config with Google Vertex Anthropic provider
147+
{
148+
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
149+
"models": [
150+
{
151+
"provider": "google-vertex-anthropic",
152+
"model": "YOUR_MODEL_HERE", // e.g., "claude-sonnet-4"
153+
"displayName": "OPTIONAL_DISPLAY_NAME",
154+
"project": "YOUR_PROJECT_ID", // defaults to the GOOGLE_VERTEX_PROJECT env var if not set
155+
"region": "YOUR_REGION_HERE", // defaults to the GOOGLE_VERTEX_REGION env var if not set, e.g., "us-central1", "us-east1", "europe-west1"
156+
"credentials": {
157+
"env": "GOOGLE_APPLICATION_CREDENTIALS"
158+
},
159+
"baseUrl": "OPTIONAL_BASE_URL"
160+
}
161+
]
162+
}
163+
```
164+
165+
### OpenAI
166+
167+
[Vercel AI SDK OpenAI Docs](https://ai-sdk.dev/providers/ai-sdk-providers/openai)
168+
169+
```json wrap icon="code" Example config with OpenAI provider
170+
{
171+
"$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/v3/index.json",
172+
"models": [
173+
{
174+
"provider": "openai",
175+
"model": "YOUR_MODEL_HERE", // e.g., "gpt-4.1", "o4-mini", "o3", "o3-deep-research"
176+
"displayName": "OPTIONAL_DISPLAY_NAME",
177+
"token": {
178+
"env": "OPENAI_API_KEY"
179+
},
180+
"baseUrl": "OPTIONAL_BASE_URL"
181+
}
182+
]
183+
}
184+
```

docs/docs/connections/bitbucket-cloud.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ import BitbucketSchema from '/snippets/schemas/v3/bitbucket.schema.mdx'
1212
Looking for docs on Bitbucket Data Center? See [this doc](/docs/connections/bitbucket-data-center).
1313
</Note>
1414

15+
If you're not familiar with Sourcebot [connections](/docs/connections/overview), please read that overview first.
16+
1517
## Examples
1618

1719
<AccordionGroup>

docs/docs/connections/bitbucket-data-center.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ import BitbucketSchema from '/snippets/schemas/v3/bitbucket.schema.mdx'
1212
Looking for docs on Bitbucket Cloud? See [this doc](/docs/connections/bitbucket-cloud).
1313
</Note>
1414

15+
If you're not familiar with Sourcebot [connections](/docs/connections/overview), please read that overview first.
16+
1517
## Examples
1618

1719
<AccordionGroup>

docs/docs/connections/generic-git-host.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ import GenericGitHost from '/snippets/schemas/v3/genericGitHost.schema.mdx'
77

88
Sourcebot can sync code from any Git host (by clone url). This is helpful when you want to search code that not in a [supported code host](/docs/connections/overview#supported-code-hosts).
99

10+
If you're not familiar with Sourcebot [connections](/docs/connections/overview), please read that overview first.
11+
1012
## Getting Started
1113

1214
To connect to a Git host, create a new [connection](/docs/connections/overview) with type `git` and specify the clone url in the `url` property. For example:

docs/docs/connections/gerrit.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ import GerritSchema from '/snippets/schemas/v3/gerrit.schema.mdx'
1010

1111
Sourcebot can sync code from self-hosted gerrit instances.
1212

13+
If you're not familiar with Sourcebot [connections](/docs/connections/overview), please read that overview first.
14+
1315
## Connecting to a Gerrit instance
1416

1517
To connect to a gerrit instance, provide the `url` property to your config:

0 commit comments

Comments
 (0)