-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add script to delete an entire organization including all traces and …
…workspaces (#126)
- Loading branch information
1 parent
f165826
commit c08a9e1
Showing
3 changed files
with
378 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Deleting Organizations | ||
|
||
The LangSmith UI does not currently support the deletion of an individual organization from a self-hosted instance of LangSmith. This, however, can be accomplished by directly removing all traces from all materialized views in ClickHouse (except the runs_history views) and the runs and feedbacks tables and then removing the Organization from the Postgres tenants table. | ||
|
||
This command using the Organization ID as an argument. | ||
|
||
### Prerequisites | ||
|
||
Ensure you have the following tools/items ready. | ||
|
||
1. kubectl | ||
|
||
- https://kubernetes.io/docs/tasks/tools/ | ||
|
||
2. PostgreSQL client | ||
|
||
- https://www.postgresql.org/download/ | ||
|
||
3. PostgreSQL database connection: | ||
|
||
- Host | ||
- Port | ||
- Username | ||
- If using the bundled version, this is `postgres` | ||
- Password | ||
- If using the bundled version, this is `postgres` | ||
- Database name | ||
- If using the bundled version, this is `postgres` | ||
|
||
4. Clickhouse database credentials | ||
|
||
- Host | ||
- Port | ||
- Username | ||
- If using the bundled version, this is `default` | ||
- Password | ||
- If using the bundled version, this is `password` | ||
- Database name | ||
- If using the bundled version, this is `default` | ||
|
||
5. Connectivity to the PostgreSQL database from the machine you will be running the migration script on. | ||
|
||
- If you are using the bundled version, you may need to port forward the postgresql service to your local machine. | ||
- Run `kubectl port-forward svc/langsmith-postgres 5432:5432` to port forward the postgresql service to your local machine. | ||
|
||
6. Connectivity to the Clickhouse database from the machine you will be running the migration script on. | ||
- If you are using the bundled version, you may need to port forward the clickhouse service to your local machine. | ||
- Run `kubectl port-forward svc/langsmith-clickhouse 8123:8123` to port forward the clickhouse service to your local machine. | ||
- If you are using Clickhouse Cloud you will want to specify the --ssl flag and use port `8443` | ||
|
||
### Running the deletion script for a single organization | ||
|
||
Run the following command to run the organization removal script: | ||
|
||
```bash | ||
sh delete_organization.sh <postgres_url> <clickhouse_url> --organization_id <organization_id> | ||
``` | ||
|
||
For example, if you are using the bundled version with port-forwarding, the command would look like: | ||
|
||
```bash | ||
sh delete_organization.sh "postgres://postgres:postgres@localhost:5432/postgres" "clickhouse://default:password@localhost:8123/default" --organization_id 4ec70ec7-0808-416a-b836-7100aeec934b | ||
``` | ||
|
||
If you visit the Langsmith UI, you should now see organization is no longer present. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,311 @@ | ||
#!/bin/sh | ||
|
||
## Function Definitions | ||
|
||
# Function to generate a fake UUID | ||
generate_uuid() { | ||
if command -v uuidgen >/dev/null 2>&1; then | ||
uuidgen | ||
else | ||
echo "uuidgen command not found. Exiting..." | ||
exit 1 | ||
fi | ||
} | ||
|
||
# Function to execute a select statement against PostgreSQL | ||
execute_pg_select(){ | ||
local query_string="$1" | ||
local result=$(psql $postgres_url -t -c "$query_string" -A 2>&1) | ||
if [ $? -ne 0 ]; then | ||
echo "Error executing select statement: $result" | ||
return 1 | ||
fi | ||
echo "$result" | ||
} | ||
|
||
# Function for deleting from a ClickHouse table | ||
delete_from_ch(){ | ||
local table="$1" | ||
local run_column="$2" | ||
local workspace_id="$3" | ||
local row_count="" | ||
|
||
local select_command_template="curl \ | ||
--fail \ | ||
-sS \ | ||
--user '$ch_user:$ch_passwd' \ | ||
--data-binary \"select count(1) from $ch_database.$table where tenant_id = '$workspace_id' \" \ | ||
$ch_protocol://$ch_host:$ch_port/?wait_end_of_query=1" | ||
|
||
local delete_command_template="curl \ | ||
-vvv \ | ||
--user '$ch_user:$ch_passwd' \ | ||
--data-binary \"DELETE from $ch_database.$table where tenant_id = '$workspace_id' \" \ | ||
$ch_protocol://$ch_host:$ch_port?wait_end_of_query=0" | ||
|
||
echo "Testing select of runs for Workspace ID $workspace_id from $table..." | ||
|
||
## Get the count of rows to delete | ||
## This both tests the select statement AND tells us whether we need to execute the DELETE FROM command | ||
local row_count=$(sh -c "$select_command_template") | ||
|
||
## If Row Count is empty that means Clickhouse errored out | ||
if [ -z "$row_count" ]; then | ||
echo "Error returned from ClickHouse on select statement. Exiting..." >&2 | ||
exit 1 | ||
## If Row Count is not 0 then we should be good to issue the delete | ||
elif [ "$row_count" -gt 0 ]; then | ||
echo "Success! Found $row_count rows in $table..." | ||
echo "Deleting $row_count rows from Workspace ID $workspace_id in $table..." | ||
SECONDS=0 | ||
sh -c "$delete_command_template" | ||
echo "DELETE FROM query completed in $SECONDS seconds." | ||
|
||
## Otherwise skip... | ||
else | ||
echo "No rows to delete in $table from Workspace ID $workspace_id!" | ||
fi | ||
} | ||
|
||
# Function for deleting from PostgreSQL | ||
delete_from_pg(){ | ||
local query_string="$1" | ||
psql $postgres_url -c "$query_string" | ||
} | ||
|
||
# Function to delete workspaces | ||
delete_workspace(){ | ||
local workspace_id="$1" | ||
|
||
echo "Deleting workspace ID $workspace_id..." | ||
|
||
## Find runs with this trace ID in the main runs table. | ||
## If query returns no results, exit unless the `--force` parameter is passed in | ||
table="runs" | ||
run_column="id" | ||
|
||
command_template="curl \ | ||
-s \ | ||
--fail \ | ||
--user '$ch_user:$ch_passwd' \ | ||
--data-binary \"SELECT distinct id from $ch_database.$table where (is_root, tenant_id, session_id, $run_column) IN (select is_root, tenant_id, session_id, id as $run_column from $ch_database.runs where tenant_id = '$workspace_id' and is_root)\" \ | ||
$ch_protocol://$ch_host:$ch_port" | ||
|
||
check_traces=$(sh -c "$command_template") | ||
|
||
if [ -n "$check_traces" ]; then | ||
echo "Found Workspace ID $workspace_id, continuing..." | ||
else | ||
echo "Could not find any traces for Workspace ID $workspace_id." | ||
if [ "$force" != "--force" ]; then | ||
echo "Use --force if you still want to attempt to delete anyway. Exiting..." | ||
exit 1 | ||
else | ||
echo "Respecting the --force flag and continuing..." | ||
|
||
echo "Issuing SQL commands even though the Workspace ID was not found in current runs table..." | ||
|
||
fi | ||
fi | ||
|
||
## Delete from ClickHouse tables | ||
if [ "$sync" = "--sync" ]; then | ||
delete_from_ch runs_token_counts id "$workspace_id" | ||
delete_from_ch runs_tags run_id "$workspace_id" | ||
delete_from_ch runs_run_type id "$workspace_id" | ||
delete_from_ch runs_run_id_v2 id "$workspace_id" | ||
delete_from_ch runs_reference_example_id id "$workspace_id" | ||
delete_from_ch runs_trace_id id "$workspace_id" | ||
delete_from_ch runs_metadata_kv run_id "$workspace_id" | ||
delete_from_ch feedbacks_rmt_id run_id "$workspace_id" | ||
delete_from_ch feedbacks_rmt run_id "$workspace_id" | ||
delete_from_ch feedbacks run_id "$workspace_id" | ||
delete_from_ch runs id "$workspace_id" | ||
else | ||
delete_from_ch runs_token_counts id "$workspace_id" & | ||
delete_from_ch runs_tags run_id "$workspace_id" & | ||
delete_from_ch runs_run_type id "$workspace_id" & | ||
delete_from_ch runs_run_id_v2 id "$workspace_id" & | ||
delete_from_ch runs_reference_example_id id "$workspace_id" & | ||
delete_from_ch runs_trace_id id "$workspace_id" & | ||
delete_from_ch runs_metadata_kv run_id "$workspace_id" & | ||
delete_from_ch feedbacks_rmt_id run_id "$workspace_id" & | ||
delete_from_ch feedbacks_rmt run_id "$workspace_id" & | ||
delete_from_ch feedbacks run_id "$workspace_id" & | ||
delete_from_ch runs id "$workspace_id" & | ||
fi | ||
|
||
## Delete from PostgreSQL tables | ||
pg_delete_tenant="DELETE FROM tenants WHERE id = '$workspace_id'" | ||
if [ "$sync" = "--sync" ]; then | ||
delete_from_pg "$pg_delete_tenant" | ||
else | ||
delete_from_pg "$pg_delete_tenant" & | ||
fi | ||
|
||
## Wait for all background processes to complete unless --sync flag is present | ||
if [ "$sync" != "--sync" ]; then | ||
wait | ||
fi | ||
|
||
echo "Deleted workspace ID $workspace_id." | ||
} | ||
|
||
# Function to delete the organization | ||
delete_organization(){ | ||
local organization_id="$1" | ||
pg_delete_organization="DELETE FROM organizations WHERE id = '$organization_id'" | ||
if [ "$sync" = "--sync" ]; then | ||
delete_from_pg "$pg_delete_organization" | ||
else | ||
delete_from_pg "$pg_delete_organization" & | ||
fi | ||
|
||
## Wait for all background processes to complete unless --sync flag is present | ||
if [ "$sync" != "--sync" ]; then | ||
wait | ||
fi | ||
|
||
echo "Deleted organization ID $organization_id." | ||
} | ||
|
||
## Argument Parsing | ||
clickhouse_url="" | ||
postgres_url="" | ||
organization_id="" | ||
force="" | ||
ssl="" | ||
debug="" | ||
sync="" | ||
|
||
while [ $# -gt 0 ]; do | ||
case "$1" in | ||
--force) | ||
force="--force" | ||
shift | ||
;; | ||
--ssl) | ||
ssl="--ssl" | ||
shift | ||
;; | ||
--debug) | ||
debug="--debug" | ||
shift | ||
;; | ||
--sync) | ||
sync="--sync" | ||
shift | ||
;; | ||
--organization_id) | ||
if [ -n "$2" ]; then | ||
organization_id="$2" | ||
shift 2 | ||
else | ||
echo "Error: --organization_id requires a non-empty argument." | ||
exit 1 | ||
fi | ||
;; | ||
*) | ||
if [ -z "$clickhouse_url" ]; then | ||
clickhouse_url="$1" | ||
elif [ -z "$postgres_url" ]; then | ||
postgres_url="$1" | ||
else | ||
echo "Unknown argument: $1" | ||
echo "Usage: $0 <clickhouse_url> <postgres_url> --organization_id <organization_id> [--force] [--ssl] [--debug] [--sync]" | ||
echo "Example: $0 clickhouse://username:password@host:port/database postgres://username:password@host:port/database --organization_id $(generate_uuid) --force --ssl --debug --sync" | ||
exit 1 | ||
fi | ||
shift | ||
;; | ||
esac | ||
done | ||
|
||
if [ -z "$clickhouse_url" ] || [ -z "$postgres_url" ] || [ -z "$organization_id" ]; then | ||
fake_organization_id=$(generate_uuid) | ||
echo "Incorrect command syntax." | ||
echo "Usage: $0 <clickhouse_url> <postgres_url> --organization_id <organization_id> [--force] [--ssl] [--debug] [--sync]" | ||
echo | ||
echo "Example: $0 clickhouse://username:password@host:port/database postgres://username:password@host:port/database --organization_id $fake_organization_id --force --ssl --debug --sync" | ||
exit 1 | ||
fi | ||
|
||
## Debugging flags | ||
## Enable only if needed to debug this script | ||
if [ "$debug" = "--debug" ]; then | ||
set -x -e | ||
fi | ||
|
||
## Parse the ClickHouse URL | ||
ch_user="" | ||
ch_passwd="" | ||
ch_host="" | ||
ch_port="" | ||
ch_database="" | ||
|
||
if [[ $clickhouse_url =~ ^clickhouse://([^:]+):([^@]+)@([^:]+):([0-9]+)/([^/]+)$ ]]; then | ||
ch_user="${BASH_REMATCH[1]}" | ||
ch_passwd="${BASH_REMATCH[2]}" | ||
ch_host="${BASH_REMATCH[3]}" | ||
ch_port="${BASH_REMATCH[4]}" | ||
ch_database="${BASH_REMATCH[5]}" | ||
else | ||
echo "Invalid ClickHouse URL format. Exiting." | ||
echo "Expected format: clickhouse://username:password@host:port/database" | ||
exit 1 | ||
fi | ||
|
||
## Parse the PostgreSQL URL | ||
pg_user="" | ||
pg_passwd="" | ||
pg_host="" | ||
pg_port="" | ||
pg_database="" | ||
|
||
if [[ $postgres_url =~ ^postgres://([^:]+):([^@]+)@([^:]+):([0-9]+)/([^/]+)$ ]]; then | ||
pg_user="${BASH_REMATCH[1]}" | ||
pg_passwd="${BASH_REMATCH[2]}" | ||
pg_host="${BASH_REMATCH[3]}" | ||
pg_port="${BASH_REMATCH[4]}" | ||
pg_database="${BASH_REMATCH[5]}" | ||
else | ||
echo "Invalid PostgreSQL URL format. Exiting." | ||
echo "Expected format: postgres://username:password@host:port/database" | ||
exit 1 | ||
fi | ||
|
||
# Set ClickHouse protocol based on --ssl flag | ||
if [ "$ssl" = "--ssl" ]; then | ||
ch_protocol="https" | ||
else | ||
ch_protocol="http" | ||
fi | ||
|
||
## Fetch workspace IDs associated with the organization | ||
workspace_ids_query="select id from tenants where organization_id = '$organization_id'" | ||
workspace_ids=$(execute_pg_select "$workspace_ids_query") | ||
|
||
if [ $? -ne 0 ]; then | ||
echo "Error executing PostgreSQL query to fetch workspace IDs. Exiting..." | ||
exit 1 | ||
elif [ -z "$workspace_ids" ]; then | ||
echo "No workspace IDs found for organization ID $organization_id." | ||
if [ "$force" != "--force" ]; then | ||
echo "Use --force if you still want to attempt to delete the organization anyway. Exiting..." | ||
exit 1 | ||
else | ||
echo "Respecting the --force flag and continuing..." | ||
fi | ||
else | ||
echo -e "Found workspace IDs for organization ID $organization_id:\n$workspace_ids" | ||
fi | ||
|
||
## Iterate over workspace IDs and delete them | ||
for workspace_id in $workspace_ids; do | ||
delete_workspace "$workspace_id" | ||
done | ||
|
||
## Delete the organization | ||
delete_organization "$organization_id" | ||
|
||
echo "Done!" |