Skip to content

Commit

Permalink
Add queries for common checks
Browse files Browse the repository at this point in the history
  • Loading branch information
barberscott committed Aug 9, 2024
1 parent 482f454 commit 6db82e6
Show file tree
Hide file tree
Showing 14 changed files with 192 additions and 4 deletions.
47 changes: 47 additions & 0 deletions charts/langsmith/docs/RUN-SUPPORT-QUERY-CH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Generating Clickhouse Stats
This Helm repository contains queries to produce output that the LangSmith UI does not currently support directly (e.g. obtaining trace counts for multiple workspaces by date in a single query).

This command takes a clickhouse connection string that contains an embedded name and password (which can be passed in from a call to a secrets manager) and executes a query from an input file. In the example below, we are using the `ch_get_trace_counts_daily.sql` input file in the `support_queries` directory.

### Prerequisites

Ensure you have the following tools/items ready.

1. kubectl

- https://kubernetes.io/docs/tasks/tools/

2. Clickhouse database credentials

- Host
- Port
- Username
- If using the bundled version, this is `default`
- Password
- If using the bundled version, this is `password`
- Database name
- If using the bundled version, this is `default`

3. Connectivity to the Clickhouse database from the machine you will be running the `get_clickhouse_stats` script on.

- If you are using the bundled version, you may need to port forward the clickhouse service to your local machine.
- Run `kubectl port-forward svc/langsmith-clickhouse 8123:8123` to port forward the clickhouse service to your local machine.

### Running the clickhouse stats generation script

## Running the query script

Run the following command to run the desired query:

```bash
sh run_support_query_ch.sh <clickhouse_url> --input path/to/query.sql
```

For example, if you are using the bundled version with port-forwarding, the command might look like:

```bash
sh run_support_query_ch.sh "clickhouse://default:password@localhost:8123/default" --input support_queries/clickhouse/ch_get_trace_counts_daily.sql
```

which will output the count of daily traces by workspace ID and organization ID. To extract this to a file add the flag `--output path/to/file.csv`

4 changes: 2 additions & 2 deletions charts/langsmith/docs/RUN-SUPPORT-QUERY-PG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This Helm repository contains queries to produce output that the LangSmith UI does not currently support directly (e.g. obtaining trace counts for multiple organizations in a single query).

This command takes a postgres connection string that contains an embedded name and password (which can be passed in from a call to a secrets manager) and executes a query from an input file. In the example below, we are using the `pg_get_trace_counts_daily.sql` input file in the `support_queries` directory.
This command takes a postgres connection string that contains an embedded name and password (which can be passed in from a call to a secrets manager) and executes a query from an input file. In the example below, we are using the `pg_get_trace_counts_daily.sql` input file in the `support_queries/postgres` directory.

### Prerequisites

Expand Down Expand Up @@ -44,7 +44,7 @@ sh run_support_query_pg.sh <postgres_url> --input path/to/query.sql
For example, if you are using the bundled version with port-forwarding, the command might look like:

```bash
sh run_support_query_pg.sh "postgres://postgres:postgres@localhost:5432/postgres" --input support_queries/pg_get_trace_counts_daily.sql
sh run_support_query_pg.sh "postgres://postgres:postgres@localhost:5432/postgres" --input support_queries/postgres/pg_get_trace_counts_daily.sql
```

which will output the count of daily traces by workspace ID and organization ID. To extract this to a file add the flag `--output path/to/file.csv`
4 changes: 2 additions & 2 deletions charts/langsmith/scripts/run_support_query_ch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -97,14 +97,14 @@ fi

# Execute the query and output to the specified CSV file or stdout
if [ -n "$output_file" ]; then
curl $curl_opts --user "$ch_user:$ch_passwd" --data-binary "$metrics_query_string" "$ch_protocol://$ch_host:$ch_port/?database=$ch_database" > "$output_file"
curl $curl_opts --user "$ch_user:$ch_passwd" -H "X-ClickHouse-Format: CSVWithNames" --data-binary "$metrics_query_string" "$ch_protocol://$ch_host:$ch_port/?database=$ch_database" > "$output_file"
if [ $? -ne 0 ]; then
echo "Error: Failed to connect to ClickHouse."
exit 1
fi
echo "Query results have been successfully written to $output_file"
else
curl $curl_opts --user "$ch_user:$ch_passwd" --data-binary "$metrics_query_string" "$ch_protocol://$ch_host:$ch_port/?database=$ch_database"
curl $curl_opts --user "$ch_user:$ch_passwd" -H "X-ClickHouse-Format: CSVWithNames" --data-binary "$metrics_query_string" "$ch_protocol://$ch_host:$ch_port/?database=$ch_database"
if [ $? -ne 0 ]; then
echo "Error: Failed to connect to ClickHouse."
exit 1
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
select toStartOfInterval(inserted_at, interval 1 day) as ts,
tenant_id as workspace_id,
count(distinct id) as trace_count
from default.runs_history
where is_root = 1
group by ts, tenant_id
order by ts, tenant_id
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
select toStartOfInterval(inserted_at, interval 1 day) as ts,
tenant_id as workspace_id,
count(distinct id) as trace_count
from default.runs
where is_root = 1
group by ts, tenant_id
order by ts, tenant_id
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
-- This query retreives a list of users by organization.
-- There will be one row per unique user-organization combination

select distinct
u.email as user_email,
u.full_name as user_name,
o.display_name as organization_name,
o.id as organization_id
from users u

join identities i
on u.id = i.user_id

join organizations o
on i.organization_id = o.id
and not o.is_personal
and i.tenant_id is null
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
-- This query retreives a list of users by workspace and organization.
-- There will be one row per unique user-workspace combination

select
u.email as user_email,
u.full_name as user_name,
o.display_name as organization_name,
o.id as organization_id,
t.display_name as workspace_name,
t.id as workspace_id
from users u

join identities i
on u.id = i.user_id

join tenants t
on i.tenant_id = t.id

join organizations o
on t.organization_id = o.id
and NOT o.is_personal
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
-- This query retreives a list of users and the count of organizations and workspaces they are a member of
-- There will be one row per unique user

select
u.email as user_email,
u.full_name as user_name,
count(distinct o.id) as org_count,
count(distinct t.id) as workspace_count
from users u

join identities i
on u.id = i.user_id

join tenants t
on i.tenant_id = t.id

join organizations o
on t.organization_id = o.id
and NOT o.is_personal

group by
user_email,
user_name
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
-- This query returns a workspace ID

select
organizations.id as org_id,
organizations.display_name as org_name,
tenant_id as workspace_id,
tenants.display_name as workspace_name,
count(distinct dataset.id) as dataset_count
from dataset

join tenants
on dataset.tenant_id = tenants.id

join organizations
on tenants.organization_id = organizations.id

group by
org_id,
org_name,
workspace_id,
workspace_name

order BY
prompt_count desc
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
-- This query returns a workspace ID

select
organizations.id as org_id,
organizations.display_name as org_name,
tenant_id as workspace_id,
tenants.display_name as workspace_name,
count(distinct hub_repos.id) as prompt_count,
count(distinct hub_commits.id) as revision_count
from hub_repos

join tenants
on hub_repos.tenant_id = tenants.id

join organizations
on tenants.organization_id = organizations.id

join hub_commits
on hub_repos.id = hub_commits.repo_id

group by
org_id,
org_name,
workspace_id,
workspace_name

order BY
prompt_count desc
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
-- This query pulls a list of workspaces by organization
-- Personal orgs if they exist are excluded

select distinct
ws.organization_id as organization_id,
o.display_name as organization_name,
ws.id as workspace_id,
ws.display_name as workspace_name
from tenants ws

join organizations o
on ws.organization_id = o.id

where not o.is_personal

0 comments on commit 6db82e6

Please sign in to comment.