Skip to content

Commit

Permalink
Add script to delete an entire organization including all traces and …
Browse files Browse the repository at this point in the history
…workspaces (#126)
  • Loading branch information
barberscott authored Jul 12, 2024
1 parent f165826 commit c08a9e1
Show file tree
Hide file tree
Showing 3 changed files with 378 additions and 2 deletions.
65 changes: 65 additions & 0 deletions charts/langsmith/docs/DELETE-ORGANIZATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Deleting Organizations

The LangSmith UI does not currently support the deletion of an individual organization from a self-hosted instance of LangSmith. This, however, can be accomplished by directly removing all traces from all materialized views in ClickHouse (except the runs_history views) and the runs and feedbacks tables and then removing the Organization from the Postgres tenants table.

This command using the Organization ID as an argument.

### Prerequisites

Ensure you have the following tools/items ready.

1. kubectl

- https://kubernetes.io/docs/tasks/tools/

2. PostgreSQL client

- https://www.postgresql.org/download/

3. PostgreSQL database connection:

- Host
- Port
- Username
- If using the bundled version, this is `postgres`
- Password
- If using the bundled version, this is `postgres`
- Database name
- If using the bundled version, this is `postgres`

4. Clickhouse database credentials

- Host
- Port
- Username
- If using the bundled version, this is `default`
- Password
- If using the bundled version, this is `password`
- Database name
- If using the bundled version, this is `default`

5. Connectivity to the PostgreSQL database from the machine you will be running the migration script on.

- If you are using the bundled version, you may need to port forward the postgresql service to your local machine.
- Run `kubectl port-forward svc/langsmith-postgres 5432:5432` to port forward the postgresql service to your local machine.

6. Connectivity to the Clickhouse database from the machine you will be running the migration script on.
- If you are using the bundled version, you may need to port forward the clickhouse service to your local machine.
- Run `kubectl port-forward svc/langsmith-clickhouse 8123:8123` to port forward the clickhouse service to your local machine.
- If you are using Clickhouse Cloud you will want to specify the --ssl flag and use port `8443`

### Running the deletion script for a single organization

Run the following command to run the organization removal script:

```bash
sh delete_organization.sh <postgres_url> <clickhouse_url> --organization_id <organization_id>
```

For example, if you are using the bundled version with port-forwarding, the command would look like:

```bash
sh delete_organization.sh "postgres://postgres:postgres@localhost:5432/postgres" "clickhouse://default:password@localhost:8123/default" --organization_id 4ec70ec7-0808-416a-b836-7100aeec934b
```

If you visit the Langsmith UI, you should now see organization is no longer present.
4 changes: 2 additions & 2 deletions charts/langsmith/docs/DELETE-TRACES.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ For example, if you are using the bundled version with port-forwarding, the comm
sh delete_trace_by_id.sh "clickhouse://default:password@localhost:8123/default" --trace_id 4ec70ec7-0808-416a-b836-7100aeec934b
```

If you visit the Langsmith UI, you should now see specified trace ID is deleted.
If you visit the Langsmith UI, you should now see specified trace ID is no longer present nor reflected in stats.

### Running the deletion script for a multiple traces from a file with one trace ID per line

Expand All @@ -58,4 +58,4 @@ For example, if you are using the bundled version with port-forwarding, the comm
sh delete_trace_by_id.sh "clickhouse://default:password@localhost:8123/default" --file path/to/traces.txt
```

If you visit the Langsmith UI, you should now see all the specified traces have been deleted.
If you visit the Langsmith UI, you should now see all the specified traces have been removed.
311 changes: 311 additions & 0 deletions charts/langsmith/scripts/delete_organization_sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
#!/bin/sh

## Function Definitions

# Function to generate a fake UUID
generate_uuid() {
if command -v uuidgen >/dev/null 2>&1; then
uuidgen
else
echo "uuidgen command not found. Exiting..."
exit 1
fi
}

# Function to execute a select statement against PostgreSQL
execute_pg_select(){
local query_string="$1"
local result=$(psql $postgres_url -t -c "$query_string" -A 2>&1)
if [ $? -ne 0 ]; then
echo "Error executing select statement: $result"
return 1
fi
echo "$result"
}

# Function for deleting from a ClickHouse table
delete_from_ch(){
local table="$1"
local run_column="$2"
local workspace_id="$3"
local row_count=""

local select_command_template="curl \
--fail \
-sS \
--user '$ch_user:$ch_passwd' \
--data-binary \"select count(1) from $ch_database.$table where tenant_id = '$workspace_id' \" \
$ch_protocol://$ch_host:$ch_port/?wait_end_of_query=1"

local delete_command_template="curl \
-vvv \
--user '$ch_user:$ch_passwd' \
--data-binary \"DELETE from $ch_database.$table where tenant_id = '$workspace_id' \" \
$ch_protocol://$ch_host:$ch_port?wait_end_of_query=0"

echo "Testing select of runs for Workspace ID $workspace_id from $table..."

## Get the count of rows to delete
## This both tests the select statement AND tells us whether we need to execute the DELETE FROM command
local row_count=$(sh -c "$select_command_template")

## If Row Count is empty that means Clickhouse errored out
if [ -z "$row_count" ]; then
echo "Error returned from ClickHouse on select statement. Exiting..." >&2
exit 1
## If Row Count is not 0 then we should be good to issue the delete
elif [ "$row_count" -gt 0 ]; then
echo "Success! Found $row_count rows in $table..."
echo "Deleting $row_count rows from Workspace ID $workspace_id in $table..."
SECONDS=0
sh -c "$delete_command_template"
echo "DELETE FROM query completed in $SECONDS seconds."

## Otherwise skip...
else
echo "No rows to delete in $table from Workspace ID $workspace_id!"
fi
}

# Function for deleting from PostgreSQL
delete_from_pg(){
local query_string="$1"
psql $postgres_url -c "$query_string"
}

# Function to delete workspaces
delete_workspace(){
local workspace_id="$1"

echo "Deleting workspace ID $workspace_id..."

## Find runs with this trace ID in the main runs table.
## If query returns no results, exit unless the `--force` parameter is passed in
table="runs"
run_column="id"

command_template="curl \
-s \
--fail \
--user '$ch_user:$ch_passwd' \
--data-binary \"SELECT distinct id from $ch_database.$table where (is_root, tenant_id, session_id, $run_column) IN (select is_root, tenant_id, session_id, id as $run_column from $ch_database.runs where tenant_id = '$workspace_id' and is_root)\" \
$ch_protocol://$ch_host:$ch_port"

check_traces=$(sh -c "$command_template")

if [ -n "$check_traces" ]; then
echo "Found Workspace ID $workspace_id, continuing..."
else
echo "Could not find any traces for Workspace ID $workspace_id."
if [ "$force" != "--force" ]; then
echo "Use --force if you still want to attempt to delete anyway. Exiting..."
exit 1
else
echo "Respecting the --force flag and continuing..."

echo "Issuing SQL commands even though the Workspace ID was not found in current runs table..."

fi
fi

## Delete from ClickHouse tables
if [ "$sync" = "--sync" ]; then
delete_from_ch runs_token_counts id "$workspace_id"
delete_from_ch runs_tags run_id "$workspace_id"
delete_from_ch runs_run_type id "$workspace_id"
delete_from_ch runs_run_id_v2 id "$workspace_id"
delete_from_ch runs_reference_example_id id "$workspace_id"
delete_from_ch runs_trace_id id "$workspace_id"
delete_from_ch runs_metadata_kv run_id "$workspace_id"
delete_from_ch feedbacks_rmt_id run_id "$workspace_id"
delete_from_ch feedbacks_rmt run_id "$workspace_id"
delete_from_ch feedbacks run_id "$workspace_id"
delete_from_ch runs id "$workspace_id"
else
delete_from_ch runs_token_counts id "$workspace_id" &
delete_from_ch runs_tags run_id "$workspace_id" &
delete_from_ch runs_run_type id "$workspace_id" &
delete_from_ch runs_run_id_v2 id "$workspace_id" &
delete_from_ch runs_reference_example_id id "$workspace_id" &
delete_from_ch runs_trace_id id "$workspace_id" &
delete_from_ch runs_metadata_kv run_id "$workspace_id" &
delete_from_ch feedbacks_rmt_id run_id "$workspace_id" &
delete_from_ch feedbacks_rmt run_id "$workspace_id" &
delete_from_ch feedbacks run_id "$workspace_id" &
delete_from_ch runs id "$workspace_id" &
fi

## Delete from PostgreSQL tables
pg_delete_tenant="DELETE FROM tenants WHERE id = '$workspace_id'"
if [ "$sync" = "--sync" ]; then
delete_from_pg "$pg_delete_tenant"
else
delete_from_pg "$pg_delete_tenant" &
fi

## Wait for all background processes to complete unless --sync flag is present
if [ "$sync" != "--sync" ]; then
wait
fi

echo "Deleted workspace ID $workspace_id."
}

# Function to delete the organization
delete_organization(){
local organization_id="$1"
pg_delete_organization="DELETE FROM organizations WHERE id = '$organization_id'"
if [ "$sync" = "--sync" ]; then
delete_from_pg "$pg_delete_organization"
else
delete_from_pg "$pg_delete_organization" &
fi

## Wait for all background processes to complete unless --sync flag is present
if [ "$sync" != "--sync" ]; then
wait
fi

echo "Deleted organization ID $organization_id."
}

## Argument Parsing
clickhouse_url=""
postgres_url=""
organization_id=""
force=""
ssl=""
debug=""
sync=""

while [ $# -gt 0 ]; do
case "$1" in
--force)
force="--force"
shift
;;
--ssl)
ssl="--ssl"
shift
;;
--debug)
debug="--debug"
shift
;;
--sync)
sync="--sync"
shift
;;
--organization_id)
if [ -n "$2" ]; then
organization_id="$2"
shift 2
else
echo "Error: --organization_id requires a non-empty argument."
exit 1
fi
;;
*)
if [ -z "$clickhouse_url" ]; then
clickhouse_url="$1"
elif [ -z "$postgres_url" ]; then
postgres_url="$1"
else
echo "Unknown argument: $1"
echo "Usage: $0 <clickhouse_url> <postgres_url> --organization_id <organization_id> [--force] [--ssl] [--debug] [--sync]"
echo "Example: $0 clickhouse://username:password@host:port/database postgres://username:password@host:port/database --organization_id $(generate_uuid) --force --ssl --debug --sync"
exit 1
fi
shift
;;
esac
done

if [ -z "$clickhouse_url" ] || [ -z "$postgres_url" ] || [ -z "$organization_id" ]; then
fake_organization_id=$(generate_uuid)
echo "Incorrect command syntax."
echo "Usage: $0 <clickhouse_url> <postgres_url> --organization_id <organization_id> [--force] [--ssl] [--debug] [--sync]"
echo
echo "Example: $0 clickhouse://username:password@host:port/database postgres://username:password@host:port/database --organization_id $fake_organization_id --force --ssl --debug --sync"
exit 1
fi

## Debugging flags
## Enable only if needed to debug this script
if [ "$debug" = "--debug" ]; then
set -x -e
fi

## Parse the ClickHouse URL
ch_user=""
ch_passwd=""
ch_host=""
ch_port=""
ch_database=""

if [[ $clickhouse_url =~ ^clickhouse://([^:]+):([^@]+)@([^:]+):([0-9]+)/([^/]+)$ ]]; then
ch_user="${BASH_REMATCH[1]}"
ch_passwd="${BASH_REMATCH[2]}"
ch_host="${BASH_REMATCH[3]}"
ch_port="${BASH_REMATCH[4]}"
ch_database="${BASH_REMATCH[5]}"
else
echo "Invalid ClickHouse URL format. Exiting."
echo "Expected format: clickhouse://username:password@host:port/database"
exit 1
fi

## Parse the PostgreSQL URL
pg_user=""
pg_passwd=""
pg_host=""
pg_port=""
pg_database=""

if [[ $postgres_url =~ ^postgres://([^:]+):([^@]+)@([^:]+):([0-9]+)/([^/]+)$ ]]; then
pg_user="${BASH_REMATCH[1]}"
pg_passwd="${BASH_REMATCH[2]}"
pg_host="${BASH_REMATCH[3]}"
pg_port="${BASH_REMATCH[4]}"
pg_database="${BASH_REMATCH[5]}"
else
echo "Invalid PostgreSQL URL format. Exiting."
echo "Expected format: postgres://username:password@host:port/database"
exit 1
fi

# Set ClickHouse protocol based on --ssl flag
if [ "$ssl" = "--ssl" ]; then
ch_protocol="https"
else
ch_protocol="http"
fi

## Fetch workspace IDs associated with the organization
workspace_ids_query="select id from tenants where organization_id = '$organization_id'"
workspace_ids=$(execute_pg_select "$workspace_ids_query")

if [ $? -ne 0 ]; then
echo "Error executing PostgreSQL query to fetch workspace IDs. Exiting..."
exit 1
elif [ -z "$workspace_ids" ]; then
echo "No workspace IDs found for organization ID $organization_id."
if [ "$force" != "--force" ]; then
echo "Use --force if you still want to attempt to delete the organization anyway. Exiting..."
exit 1
else
echo "Respecting the --force flag and continuing..."
fi
else
echo -e "Found workspace IDs for organization ID $organization_id:\n$workspace_ids"
fi

## Iterate over workspace IDs and delete them
for workspace_id in $workspace_ids; do
delete_workspace "$workspace_id"
done

## Delete the organization
delete_organization "$organization_id"

echo "Done!"

0 comments on commit c08a9e1

Please sign in to comment.