Skip to content

Merge Qdrant master into update.redisearch #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 107 commits into
base: update.redisearch
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
e28cc02
Updated Weaviate Docker image url (auto PR by bot) (#109)
weaviate-git-bot Apr 9, 2024
f4436e4
pgvector improvements (#98)
ankane Apr 11, 2024
beaddb3
[pre-commit.ci] pre-commit suggestions (#47)
pre-commit-ci[bot] Apr 11, 2024
2ffe5e2
refactoring: Standardize format of search params in engine configs (#…
tellet-q Apr 15, 2024
5f2121e
refactor: Nested search params in ES config (#120)
KShivendu Apr 15, 2024
b7ec57e
refactor: Fix and simplify benchmark processing notebook (#125)
KShivendu Apr 16, 2024
5343849
feat: Add sparse vectors benchmark support for Qdrant (#114)
KShivendu Apr 17, 2024
d3fd49f
fix: Remove mc (#127)
KShivendu Apr 17, 2024
04bbb7c
fix: Manual benchmarks (#128)
KShivendu Apr 17, 2024
a8bcf78
feat: Add mmap support for reading sparse vectors to avoid OOM error …
KShivendu Apr 17, 2024
73bb7d1
fix: Avoid reading all mmaped sparse vectors into memory (#130)
KShivendu Apr 17, 2024
a294cce
fix: Use smaller sparse dataset for faster iteration (#132)
KShivendu Apr 18, 2024
1508944
feat: Force docker image removal (#131)
KShivendu Apr 18, 2024
e6049a4
feat: Use both RssAnon and VmRSS in CI benchmarks (#133)
KShivendu Apr 18, 2024
455b590
Automate running benchmarks for all engines (#134)
tellet-q Apr 18, 2024
a3a9b6b
feat: Add H&M filter dataset to CI benchmarks (#140)
KShivendu May 19, 2024
9598214
feat: Add BQ to CI benchmarks (#148)
KShivendu May 31, 2024
e026701
feat: Add DBpedia OpenAI embedding dataset with 100k vectors (#150)
KShivendu Jun 3, 2024
df382ae
Fix github triggers (#154)
tellet-q Jun 5, 2024
884fda3
Weaviate version 1.25.1 (#143)
filipecosta90 Jun 5, 2024
b66b4c5
feat: Add 15m timeout for CI benchmarks (#157)
KShivendu Jun 11, 2024
8a1c664
fix: Rename nodes in cluster mode (#161)
KShivendu Jun 11, 2024
7afc142
up qdrant version
generall Jun 12, 2024
a7d5d68
more configs
generall Jun 12, 2024
bdedf0c
fix: Typo in func name (#164)
KShivendu Jun 17, 2024
2f4a143
add convertor
generall Jun 21, 2024
2a8a8ed
timeout at 30 mins
generall Jun 21, 2024
fd6bea4
set ports instead of network_mode (#145)
ekorman Jul 8, 2024
dd9a4c0
Use efficient filtering for opensearch (#167)
igniting Jul 29, 2024
9db1d83
Fix opensearch query parser (#172)
igniting Aug 5, 2024
7140d3c
[pre-commit.ci] pre-commit suggestions (#169)
pre-commit-ci[bot] Aug 5, 2024
d82c2dc
Updated milvus from 2.3.1 to 2.4.1 (#144)
filipecosta90 Aug 5, 2024
2397444
feat: Add debug logs (#166)
KShivendu Aug 5, 2024
9ed570c
fix after upgrading milvus (#175)
KShivendu Aug 5, 2024
d0f2b18
fix: Unbound variable error (#178)
KShivendu Aug 6, 2024
940c206
Only copy search and upload in first step (#179)
KShivendu Aug 6, 2024
5cfdb0b
Fail CI if any benches fail (#180)
KShivendu Aug 7, 2024
d14e6e8
fix CI (#181)
KShivendu Aug 7, 2024
a4ffca4
Allow CI to force clear previously running resources if required (#182)
KShivendu Aug 7, 2024
4ffc3ce
Bump qdrant versions to 1.11.0 (#184)
tellet-q Aug 13, 2024
a11ebc0
Add continuous benchmark for tenants (#183)
tellet-q Aug 15, 2024
82ec5a8
Split Ci benchmarks into 2 jobs (#186)
tellet-q Aug 27, 2024
868caed
Increase CONTAINER_MEM_LIMIT to 160mb (#191)
tellet-q Sep 4, 2024
d3113bd
fix: pgvector and_subfilter (#193)
SebanDan Sep 17, 2024
66ef760
docs: update process-benchmarks.ipynb (#185)
eltociear Sep 17, 2024
bad876f
Disable workflow for investigation (#197)
tellet-q Sep 18, 2024
c334516
Enable workflow again
tellet-q Sep 18, 2024
108873e
Merge pull request #198 from qdrant/ci/enable-workflow
timvisee Sep 18, 2024
794cfa0
Disable cron (#201)
tellet-q Sep 19, 2024
6e0f4b3
Enable cron (#203)
tellet-q Sep 20, 2024
6bab477
Improve volumes and logging (#202)
tellet-q Sep 24, 2024
5cea6f1
Add benchmark on collection load time (#204)
tellet-q Sep 27, 2024
793f3d0
Report dataset's info in slack (#211)
tellet-q Oct 9, 2024
ea53db4
Update text (#212)
tellet-q Oct 9, 2024
a0d672c
Add benchmark on parallel upload and search (#215)
tellet-q Dec 2, 2024
a26483b
Allow parallel optimizations in Qdrant after uploading (#208)
timvisee Jan 2, 2025
8c37878
Only run runLoadTimeBenchmark once per day
tellet-q Jan 3, 2025
bb482e1
Add comment
tellet-q Jan 3, 2025
f4cb791
Split into 2 and introduce concurrency groups
tellet-q Jan 3, 2025
6bc2439
Update name
tellet-q Jan 3, 2025
5ec825e
Merge pull request #218 from qdrant/ci/reduce-snapshots-bench-runs
generall Jan 3, 2025
781e2f6
Add volume to persist datasets
tellet-q Jan 9, 2025
f562af5
Add volume to persist datasets
tellet-q Jan 9, 2025
740f178
Debug
tellet-q Jan 9, 2025
b889ca3
Update datasets.json during benches
tellet-q Jan 9, 2025
6ac0f49
Revert debug
tellet-q Jan 9, 2025
53c23d0
Add a workflow to remove datasets volume
tellet-q Jan 9, 2025
ffe5c4b
Revert debug
tellet-q Jan 9, 2025
2b45d53
Rename file
tellet-q Jan 9, 2025
2f27285
Merge pull request #219 from qdrant/feat/keep-datasets
generall Jan 10, 2025
c805233
Add ServerAliveInterval and ServerAliveCountMax to rsync and more (#220)
tellet-q Jan 13, 2025
eba0f51
[pre-commit.ci] pre-commit suggestions (#210)
pre-commit-ci[bot] Jan 14, 2025
c8afe7d
fix: weaviate-client version constraint (#199) (#200)
LukasWestholt Jan 15, 2025
532c948
ci/fix-poetry-install (#221)
tellet-q Jan 15, 2025
d12429f
Fix 403 (#222)
tellet-q Jan 16, 2025
71ba85a
Fix job's name in notification (#224)
tellet-q Jan 17, 2025
d6ba3ab
Update qdrant-single-node-bq-rps.json
generall Jan 17, 2025
099079e
Update qdrant-single-node-bq-rps.json
generall Jan 17, 2025
d406db7
Update qdrant-single-node-bq-rps.json
generall Jan 17, 2025
493fb5b
Add Compare Versions Workflow (#225)
tellet-q Feb 25, 2025
8564a25
Complete compare versions' workflow (#226)
tellet-q Mar 13, 2025
475a39f
Cancel manual workflow early (#227)
tellet-q Mar 13, 2025
6a73ea1
Use query_points (#230)
tellet-q Apr 1, 2025
a7e51fa
Fix sparse vector name in query (#232)
tellet-q Apr 1, 2025
9849594
Run ci with payload (#231)
tellet-q Apr 1, 2025
4c2e17b
Use find instead of ls to search files (#234)
tellet-q Apr 7, 2025
a4eff0e
Run benchmarks with feature flags for dev (#237)
tellet-q Apr 10, 2025
0b5393e
Fix fetching upload results (#239)
tellet-q Apr 10, 2025
61a1bf5
Ensure search in current dir only (#240)
tellet-q Apr 14, 2025
000073b
Introduce workflow input `feature_flags_all` for manual benchmarks (#…
tellet-q Apr 14, 2025
74c756e
Fix descriptions
tellet-q Apr 14, 2025
1fe9d86
Apply suggestions from code review
timvisee Apr 14, 2025
6f10563
Merge pull request #243 from qdrant/fix-descriptions
timvisee Apr 14, 2025
799aa4f
Use boolean input type in workflow input for toggling all feature fla…
timvisee Apr 14, 2025
b0b5e52
Enable indexing while uploading
xzfc Apr 9, 2025
cc4db55
Merge pull request #238 from qdrant/enable-index-while-uploading
xzfc Apr 17, 2025
74b6d89
Reduce ci code duplication (#245)
tellet-q Apr 29, 2025
ce8fe71
Add CPU stats into monitoring (#244)
tellet-q Apr 29, 2025
1ae9ca7
Improve README docs and examples (#217)
KShivendu Apr 30, 2025
9bb99ab
Improve GH outputs (#246)
tellet-q Apr 30, 2025
fec61b5
add changes for running cohere wiki benchmark
generall May 4, 2025
fe35b85
rescore with prefetch option
generall May 4, 2025
4848c22
upd deps
May 4, 2025
534220a
upd params
generall May 5, 2025
f7833d0
upd params
generall May 5, 2025
323a928
upd test config
May 5, 2025
d5729aa
Merge update.redisearch branch into sync-qdrant-master
fcostaoliveira May 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/actions/run-engine-benchmark/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Run Engine Benchmark
description: "Run benchmark with specified params"
inputs:
engine:
description: "engine (i.e qdrant-default)"
required: true
dataset:
description: "dataset (i.e random-100)"
required: true
compose_file:
description: "path to docker compose"
required: true

runs:
using: "composite"
steps:
- name: Install poetry
shell: bash
run: pip install poetry
- uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: "poetry"
- name: Install deps
shell: bash
run: poetry install
- uses: hoverkraft-tech/[email protected]
with:
compose-file: "${{ inputs.compose_file }}"
- name: Execution
shell: bash
run: |
engine="${{ inputs.engine }}"
if [[ "$engine" == *"elasticsearch"* || "$engine" == *"opensearch"* ]]; then
./tools/wait_for_green_status.sh
fi
source $(poetry env info -p)/bin/activate
poetry run python3 run.py --engines "${{ inputs.engine }}" --datasets "${{ inputs.dataset }}"
72 changes: 72 additions & 0 deletions .github/workflows/actions/send-slack-msg/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Send Notification
description: "Send a notification to Slack"
inputs:
bench_name:
description: "name of the failed job (i.e runBenchmark)"
required: true
job_status:
description: "status of the job (i.e failed)"
required: true
failed_outputs:
description: "details of the failed job"
required: false
default: "{}"
qdrant_version:
description: "version of Qdrant used in the benchmark"
required: false
default: "unknown"
engine_name:
description: "name of the engine used in the benchmark"
required: false
default: "unknown"
dataset:
description: "name of the dataset used in the benchmark"
required: false
default: "unknown"

runs:
using: "composite"
steps:
- uses: slackapi/[email protected]
with:
payload: |
{
"text": "CI benchmarks (${{ inputs.bench_name }}) run status: ${{ inputs.status }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "CI benchmarks (${{ inputs.bench_name }}) failed because of *${{ inputs.failed_outputs }}*."
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Qdrant version: *${{ inputs.qdrant_version }}*."
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Engine: *${{ inputs.engine_name }}*."
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Dataset: *${{ inputs.dataset }}*."
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "View the results <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|here>"
}
}
]
}
52 changes: 52 additions & 0 deletions .github/workflows/clean-datasets.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Clean Datasets

on:
repository_dispatch:
workflow_dispatch:
schedule:
# Run every month on the 1st day at 3 am
- cron: "0 3 1 * *"

concurrency:
group: continuous-benchmark

# This removes the ci-datasets volume from client machine.
# The next run of Continuous Benchmark will create the volume again and download all the datasets.
jobs:
removeDatasetsVolume:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: webfactory/[email protected]
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
- name: Benches
id: benches
run: |
export HCLOUD_TOKEN=${{ secrets.HCLOUD_TOKEN }}

set +e

timeout 10m bash -x tools/run_client_remove_volume.sh

set -e
- name: Send Notification
if: failure()
uses: slackapi/[email protected]
with:
payload: |
{
"text": "Failed to remove the datasets volume (removeDatasetsVolume), run status: ${{ job.status }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "View the results <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|here>"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.CI_ALERTS_CHANNEL_WEBHOOK_URL }}
SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK
86 changes: 86 additions & 0 deletions .github/workflows/continuous-benchmark-2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
name: Continuous Benchmark 2

on:
repository_dispatch:
workflow_dispatch:
schedule:
# Run every day at midnight
- cron: "0 0 * * *"

# Restrict to only running this workflow one at a time.
# Any new runs will be queued until the previous run is complete.
# Any existing pending runs will be cancelled and replaced with current run.
concurrency:
group: continuous-benchmark

jobs:
# Schedule this benchmark to run once a day for the sake of saving on S3 costs.
runLoadTimeBenchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: webfactory/[email protected]
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
- name: Benches
id: benches
run: |
export HCLOUD_TOKEN=${{ secrets.HCLOUD_TOKEN }}
export POSTGRES_PASSWORD=${{ secrets.POSTGRES_PASSWORD }}
export POSTGRES_HOST=${{ secrets.POSTGRES_HOST }}
export SERVER_NAME="benchmark-server-3"
bash -x tools/setup_ci.sh

set +e

# Benchmark collection load time
export BENCHMARK_STRATEGY="collection-reload"

declare -A DATASET_TO_ENGINE
declare -A DATASET_TO_URL
DATASET_TO_ENGINE["all-payloads-default"]="qdrant-continuous-benchmark-snapshot"
DATASET_TO_ENGINE["all-payloads-on-disk"]="qdrant-continuous-benchmark-snapshot"
DATASET_TO_ENGINE["all-payloads-default-sparse"]="qdrant-continuous-benchmark-snapshot"
DATASET_TO_ENGINE["all-payloads-on-disk-sparse"]="qdrant-continuous-benchmark-snapshot"

export STORAGE_URL="https://storage.googleapis.com/qdrant-benchmark-snapshots/all-payloads"
DATASET_TO_URL["all-payloads-default"]="${STORAGE_URL}/benchmark-all-payloads-500k-768-default.snapshot"
DATASET_TO_URL["all-payloads-on-disk"]="${STORAGE_URL}/benchmark-all-payloads-500k-768-on-disk.snapshot"
DATASET_TO_URL["all-payloads-default-sparse"]="${STORAGE_URL}/benchmark-all-payloads-500k-sparse-default.snapshot"
DATASET_TO_URL["all-payloads-on-disk-sparse"]="${STORAGE_URL}/benchmark-all-payloads-500k-sparse-on-disk.snapshot"

set +e

for dataset in "${!DATASET_TO_ENGINE[@]}"; do
export ENGINE_NAME=${DATASET_TO_ENGINE[$dataset]}
export DATASETS=$dataset
export SNAPSHOT_URL=${DATASET_TO_URL[$dataset]}

# Benchmark the dev branch:
export QDRANT_VERSION=ghcr/dev
export QDRANT__FEATURE_FLAGS__ALL=true
timeout 30m bash -x tools/run_ci.sh

# Benchmark the master branch:
export QDRANT_VERSION=docker/master
export QDRANT__FEATURE_FLAGS__ALL=false
timeout 30m bash -x tools/run_ci.sh
done

set -e
- name: Fail job if any of the benches failed
if: steps.benches.outputs.failed == 'error' || steps.benches.outputs.failed == 'timeout'
run: exit 1
- name: Send slack message
uses: ./.github/workflows/actions/send-slack-msg
if: failure() || cancelled()
with:
bench_name: "runLoadTimeBenchmark"
job_status: ${{ job.status }}
failed_outputs: ${{ steps.benches.outputs.failed }}
qdrant_version: ${{ steps.benches.outputs.qdrant_version }}
engine_name: ${{ steps.benches.outputs.engine_name }}
dataset: ${{ steps.benches.outputs.dataset }}
env:
SLACK_WEBHOOK_URL: ${{ secrets.CI_ALERTS_CHANNEL_WEBHOOK_URL }}
SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK
Loading
Loading