Skip to content

Commit

Permalink
Add more documentation and notebooks (#106)
Browse files Browse the repository at this point in the history
sinnec authored Jan 26, 2023
1 parent 43a8b6b commit 298e1d2
Showing 14 changed files with 404 additions and 47 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -11,8 +11,7 @@ check_data/
*.ipynb_checkpoints
*.db
site/
notebooks/
test.ipynb
dist/
build/
whitebox_sdk.egg-info
whitebox_sdk.egg-info
51 changes: 33 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -103,37 +103,52 @@ mkdocs serve -f docs/mkdocs/mkdocs.yml -a localhost:8001
```

# Deploy Whitebox

## Using docker

Whitebox uses postgres as its database. They need to run in the same docker network. An example docker-compose file is located in the `examples` folder. Make sure you replace the SECRET_KEY with one of your own. Look below for more info.

```bash
docker-compose -f examples/docker-compose/docker-compose.yml up
```
```bash
docker-compose -f examples/docker-compose/docker-compose.yml up
```

If you just need to run Whitebox, make sure you set the `DATABASE_URL` in the environment.

```bash
docker run -dp 8000:8000 sqdhub/whitebox:main -e DATABASE_URL=postgresql://user:password@host:port/db_name
```
To save the api key encrypted in the database, provide a SECRET_KEY variable in the environment that is consisted of a 16 bytes string.
```bash
python -c "from secrets import token_hex; print(token_hex(16))"
```
***Save this token somewhere safe.***
```bash
docker run -dp 8000:8000 sqdhub/whitebox:main -e DATABASE_URL=postgresql://user:password@host:port/db_name
```

To save the api key encrypted in the database, provide a SECRET_KEY variable in the environment that is consisted of a 16 bytes string.

```bash
python -c "from secrets import token_hex; print(token_hex(16))"
```

**_Save this token somewhere safe._**

The api key can be retrieved directly from the postgres database:

```bash
API_KEY=$(docker exec <postgres_container_id> /bin/sh -c "psql -U postgres -c \"SELECT api_key FROM users WHERE username='admin';\" -tA")
```bash
API_KEY=$(docker exec <postgres_container_id> /bin/sh -c "psql -U postgres -c \"SELECT api_key FROM users WHERE username='admin';\" -tA")

echo $API_KEY
```

echo $API_KEY
```
If you've set the `SECRET_KEY` in the environment get the decrypted key using:

```bash
docker exec <whitebox_container_id> /usr/local/bin/python scripts/decrypt_api_key.py $API_KEY
```
```bash
docker exec <whitebox_container_id> /usr/local/bin/python scripts/decrypt_api_key.py $API_KEY
```

## Using Helm

You can also install Whitebox server and all of its dependencies in your k8s cluster using `helm`

```bash
helm repo add squaredev https://chartmuseum.squaredev.io/
helm repo update
helm install whitebox squaredev/whitebox
```

# Contributing

30 changes: 30 additions & 0 deletions docs/mkdocs/docs/css/extra.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
:root {

/* Primary color shades */
--md-primary-fg-color: #21babe;
--md-primary-fg-color--light: #21babe;
--md-primary-fg-color--dark: #86e6e9;
--md-primary-bg-color: hsla(0, 0%, 100%, 1);
--md-primary-bg-color--light: hsla(0, 0%, 100%, 0.7);
--md-typeset-a-color: #21babe;

/* Accent color shades */
--md-accent-fg-color: #006493;
--md-accent-fg-color--transparent: hsla(189, 100%, 37%, 0.1);
--md-accent-bg-color: hsla(0, 0%, 100%, 1);
--md-accent-bg-color--light: hsla(0, 0%, 100%, 0.7);
}

:root > * {

/* Code block color shades */
--md-code-bg-color: hsla(0, 0%, 96%, 1);
--md-code-fg-color: hsla(200, 18%, 26%, 1);

/* Footer */
--md-footer-bg-color: #21babe;
--md-footer-bg-color--dark: hsla(0, 0%, 0%, 0.32);
--md-footer-fg-color: hsla(0, 0%, 100%, 1);
--md-footer-fg-color--light: hsla(0, 0%, 100%, 0.7);
--md-footer-fg-color--lighter: hsla(0, 0%, 100%, 0.3);
}
8 changes: 4 additions & 4 deletions docs/mkdocs/docs/features.md
Original file line number Diff line number Diff line change
@@ -10,19 +10,19 @@

## Descriptive Statistics

Whitebox provides a nice <a href="/monitor/descriptive-statistics/#descriptive-statistics" class="external-link" target="_blank">list of descriptive statistics</a> of input dataset, making the overview of data easy.
Whitebox provides a nice [list of descriptive statistics](../metric-definitions/#descriptive-statistics) of input dataset, making the overview of data easy.

## Classification Models Metrics

Whitebox includes comprehensive <a href="/monitor/evaluation-metrics/#evaluation-metrics" class="external-link" target="_blank">metrics</a> tracking for classification models. This allows users to easily evaluate the performance of their classification models and identify areas for improvement. Additionally, users can set custom thresholds for each metric to receive alerts when performance deviates from expected results.
Whitebox includes comprehensive [metrics](../metric-definitions/#evaluation-metrics) tracking for classification models. This allows users to easily evaluate the performance of their classification models and identify areas for improvement. Additionally, users can set custom thresholds for each metric to receive alerts when performance deviates from expected results.

## Data / Concept Drift Monitoring

Whitebox includes monitoring for <a href="/monitor/drift/#drift" class="external-link" target="_blank">data and concept drift</a>. This feature tracks changes in the distribution of the data used to train models and alerts users when significant changes occur. Additionally, it detects changes in the performance of deployed models and alerts users when significant drift is detected. This allows users to identify and address data and model drift early, reducing the risk of poor model performance.
Whitebox includes monitoring for data and concept drift. This feature tracks changes in the distribution of the data used to train models and alerts users when significant changes occur. Additionally, it detects changes in the performance of deployed models and alerts users when significant drift is detected. This allows users to identify and address data and model drift early, reducing the risk of poor model performance.

## Explainable AI

Whitebox includes model explaination also. The explainability performed through the <a href="/monitor/explainability/#explainability" class="external-link" target="_blank">explainability report</a> which allows user to know anytime which feature had the most impact on model's prediction.
Whitebox includes model explaination also. The explainability performed through the explainability report which allows user to know anytime which feature had the most impact on model's prediction.

## Alerts

File renamed without changes.
4 changes: 2 additions & 2 deletions docs/mkdocs/docs/metric-definitions.md
Original file line number Diff line number Diff line change
@@ -207,10 +207,10 @@ where:

### Light Gradient Boosting Machine

LightGBM is an open-source framework for gradient boosted machines. By default LightGBM will train a Gradient Boosted Decision Tree (GBDT), but it also supports random forests, Dropouts meet Multiple Additive Regression Trees (DART), and Gradient Based One-Side Sampling (Goss). The framework is fast and was designed for distributed training. It supports large-scale datasets and training on the GPU. LightGBM also provide highly optimised, scalable and fast implementations of gradient boosted machines (GBMs). The official documentation of LightGBM is accessible [here](https://lightgbm.readthedocs.io/en/latest/index.html).
LightGBM is an open-source framework for gradient boosted machines. By default LightGBM will train a Gradient Boosted Decision Tree (GBDT), but it also supports random forests, Dropouts meet Multiple Additive Regression Trees (DART), and Gradient Based One-Side Sampling (Goss). The framework is fast and was designed for distributed training. It supports large-scale datasets and training on the GPU. LightGBM also provide highly optimised, scalable and fast implementations of gradient boosted machines (GBMs). The official documentation of LightGBM is accessible <a href="https://lightgbm.readthedocs.io/en/latest/index.html" class="external-link" target="_blank">here</a>.

## Explainable AI models

### Local Interpretable Model-agnostic Explanations

LIME (Local Interpretable Model-agnostic Explanations), an explainable AI technique, aids in illuminating a machine learning model and making each prediction's particular implications understandable. The technique is appropriate for local explanations since it describes the classifier for a particular single instance. LIME modifies the input data to produce a succession of false data that only partially retain the original features. The original implementation along with documentation of LIME technique could be found in [this repo](https://github.com/marcotcr/lime).
LIME (Local Interpretable Model-agnostic Explanations), an explainable AI technique, aids in illuminating a machine learning model and making each prediction's particular implications understandable. The technique is appropriate for local explanations since it describes the classifier for a particular single instance. LIME modifies the input data to produce a succession of false data that only partially retain the original features. The original implementation along with documentation of LIME technique could be found in <a href="https://github.com/marcotcr/lime" class="external-link" target="_blank">this repo</a>.
23 changes: 23 additions & 0 deletions docs/mkdocs/docs/sdk-docs.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# SDK Documentation

This is the documentation for Whitebox's SDK. For an interactive experience, you can expirement with the SDK's <a href="https://github.com/squaredev-io/whitebox/tree/main/examples/notebooks" class="external-link" target="_blank">Jupyter notebooks</a>.

## Models

**_create_model_**_(name, type, features, prediction, probability, labels, description="")_
@@ -65,3 +67,24 @@ Inserts a set of inference rows into the database.
!!! info

The non processed and processed dataframes along with the timestamps and actuals series must **ALL** have the same length.

## Monitors

**_create_model_monitor_**_(model_id, name, status, metric, severity, email, feature=None, lower_threshold=None)_

Creates a monitor for a specific metric.

| Parameter | Type | Description |
| ------------------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **model_id** | `str` | The ID of the model. |
| **name** | `str` | The name of the monitor. |
| **status** | `MonitorStatus` | The status of the monitor. Possible values for `MonitorStatus`: `active`, `inactive`. |
| **metric** | `MonitorMetrics` | The metric that will be monitored. Possible values for `MonitorMetrics`: `accuracy`, `precision`, `recall`, `f1`, `data_drift`, `concept_drift`, `missing_values_count`. |
| **severity** | `AlertSeverity` | The severity of the alert the monitor produces. Possible values for `AlertSeverity`: `low`, `mid`, `high`. |
| **email** | `str` | The email to which the alert will be sent. |
| **feature** | `str` | The feature to be monitored. Defaults to `None`. |
| **lower_threshold** | `float` | The threshold below which an alert will be produced. Defaults to `None`. |

!!! note

Some metrics like the data drift don't use a threshold so the feature that will be monitored should be inserted. In any case, both `feature` and `lower_threshold` can't be `None` at the same time.
25 changes: 19 additions & 6 deletions docs/mkdocs/docs/tutorial/installation.md
Original file line number Diff line number Diff line change
@@ -4,10 +4,11 @@

Install whitebox server and all of its dependencies using `docker-compose`

Copy the folloing code in a file named `docker-compose.yml`:
Copy the following code in a file named `docker-compose.yml`:

```yaml
version: "3.10"
name: Whitebox
services:
postgres:
image: postgres:15
@@ -24,36 +25,48 @@ services:
- "5432:5432"
volumes:
- wb_data:/var/lib/postgresql/data
networks:
- whitebox

whitebox:
profiles: ["whitebox"]
image: sqdhub/whitebox:main
restart: unless-stopped
environment:
- APP_NAME=Whitebox | Docker
- DATABASE_URL=postgresql://postgres:postgres@postgres:5432/postgres
- SECRET_KEY=<add_your_own> # Optional, if not set the API key won't be encrypted
ports:
- "8000:8000"
depends_on:
- postgres
networks:
- whitebox

volumes:
wb_data:

networks:
whitebox:
name: whitebox
```
and then run the following command:
With your terminal navigate to `docker-compose.yml`'s location and then run the following command:

<div class="termy">

```console
$ docker compose up
$ docker-compose up
```

</div>

## Kubernetes

!!! info
You can also install Whitebox server and all of its dependencies in your k8s cluster using `helm`:

As kubernetes deployment is part of our Professional offering please [contact us](https://forms.office.com/pages/responsepage.aspx?id=-XXSgSIX1keVdU9wdH0U-XphvBYZ6r5PmfX1dlo1e3tUOFQyNkVNQkZRVUo0WTNXRkZTNVlPSzhJQy4u).
```bash
helm repo add squaredev https://chartmuseum.squaredev.io/
helm repo update
helm install whitebox squaredev/whitebox
```
5 changes: 2 additions & 3 deletions docs/mkdocs/docs/tutorial/monitors_alerts.md
Original file line number Diff line number Diff line change
@@ -14,10 +14,9 @@ model_monitor = wb.create_model_monitor(
name="test",
status=MonitorStatus.active,
metric=MonitorMetrics.accuracy,
feature="feature1",
lower_threshold=0.7,
severity=AlertSeverity.high,
email="[email protected]",
email="[email protected]",
lower_threshold=0.7
)
```

8 changes: 5 additions & 3 deletions docs/mkdocs/docs/tutorial/sdk.md
Original file line number Diff line number Diff line change
@@ -2,12 +2,14 @@

## Installation

Installing Whitebox is a pretty easy job. Just install it like any other python package:
Installing Whitebox is a pretty easy job. Just install it like any other python package.

Install the SDK with `pip`:

<div class="termy">

```console
$ pip install whitebox
$ pip install whitebox-sdk
```

</div>
@@ -107,7 +109,7 @@ wb.delete_model("some_model_id")

Once you have created a model you can start loading your data. Let's start with the training dataset!

In our example we will create a `pd.DataFrame` from a `.csv` file. Of course you can use any method you like to create your `pd.DataFrame` as long as your non-processed and processed datasets have **the same amount of rows** (a.k.a. the same length)!
In our example we will create a `pd.DataFrame` from a `.csv` file. Of course you can use any method you like to create your `pd.DataFrame` as long as your non-processed and processed datasets have **the same amount of rows** (a.k.a. the same length) and there are **more than one rows**!

```Python
import pandas as pd
11 changes: 4 additions & 7 deletions docs/mkdocs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -6,15 +6,11 @@ theme:
palette:
- media: "(prefers-color-scheme: light)"
scheme: default
primary: cyan
accent: blue
toggle:
icon: material/weather-sunny
name: Switch to dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
primary: cyan
accent: blue
toggle:
icon: material/weather-night
name: Switch to light mode
@@ -53,10 +49,11 @@ markdown_extensions:
- toc:
permalink: true
extra_css:
- css/termynal.css
- css/custom.css
- css/termynal.css
- css/custom.css
- css/extra.css
extra_javascript:
- js/termynal.js
- js/custom.js
- mathjax-config.js
- js/mathjax-config.js
- https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML
1 change: 1 addition & 0 deletions examples/docker-compose/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -21,6 +21,7 @@ services:

whitebox:
image: sqdhub/whitebox:main
platform: linux/amd64
restart: unless-stopped
environment:
- APP_NAME=Whitebox | Docker
278 changes: 278 additions & 0 deletions examples/notebooks/sdk-example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Usaging Whitebox SDK\n",
"\n",
"First of all we need to import the Whitebox class:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from whitebox import Whitebox"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we create an instance of the Whitebox class adding the host and API key as parameters:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"wb = Whitebox(host=\"http://127.0.0.1:8000\", api_key=\"1073a9a03e5c6bf06b3f7a0c23d44ff923842a63dc75929ff0543705bbd3fa26\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to start adding training datasets and inferences, you first need to create a model.\n",
"\n",
"Let's create a sample model. When the model is created successfully, the `Model` object that was added into the database is returned."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': '5dce48be-a16f-45dc-a878-d80b5b02a1f2',\n",
" 'created_at': '2023-01-17T15:02:27.356199',\n",
" 'updated_at': '2023-01-17T15:02:27.356199',\n",
" 'name': 'Model 1',\n",
" 'description': '',\n",
" 'type': 'binary',\n",
" 'features': {'additionalProp1': 'numerical',\n",
" 'additionalProp2': 'numerical',\n",
" 'target': 'numerical'},\n",
" 'labels': {'additionalProp1': 0, 'additionalProp2': 1},\n",
" 'prediction': 'y_prediction_multi',\n",
" 'probability': 'proba'}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wb.create_model(name=\"Model 1\", type=\"binary\", features={'additionalProp1': 'numerical',\n",
" 'additionalProp2': 'numerical',\n",
" 'target': 'numerical'}, labels={'additionalProp1': 0, 'additionalProp2': 1}, prediction=\"y_prediction_multi\", probability=\"proba\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to fetch a model from the database we'll need the `model_id`. If the `model_id` exists, a `Model` object will be returned. Otherwise you'll get nothing!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': '5dce48be-a16f-45dc-a878-d80b5b02a1f2',\n",
" 'created_at': '2023-01-17T15:02:27.356199',\n",
" 'updated_at': '2023-01-17T15:02:27.356199',\n",
" 'name': 'Model 1',\n",
" 'description': '',\n",
" 'type': 'binary',\n",
" 'features': {'additionalProp1': 'numerical',\n",
" 'additionalProp2': 'numerical',\n",
" 'target': 'numerical'},\n",
" 'labels': {'additionalProp1': 0, 'additionalProp2': 1},\n",
" 'prediction': 'y_prediction_multi',\n",
" 'probability': 'proba'}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wb.get_model(\"5dce48be-a16f-45dc-a878-d80b5b02a1f2\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you have created a model you can start loading your data. Let's start with the training dataset!\n",
"\n",
"In our example we will create a pd.DataFrame from a .csv file. Of course you can use any method you like to create your pd.DataFrame as long as your non-processed and processed datasets have the same amount of rows (a.k.a. the same length) and there are more than one rows!\n",
"\n",
"If the training data is successfully saved, `True` will be returned, otherwise `False`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"src/analytics/data/testing/classification_test_data.csv\")\n",
"\n",
"wb.log_training_dataset(model_id=\"5dce48be-a16f-45dc-a878-d80b5b02a1f2\", processed=df, non_processed=df)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"To load your inferences you have to follow the exact same procedure as with the training datasets. The only difference is that you need to provide a `pd.Series` with the timestamps and (optionally) a `pd.Series` with the actuals, whose indices should match the ones in the non-processed and processed `pd.DataFrames`.\n",
"\n",
"If the inferences are successfully saved, `True` will be returned, otherwise `False`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"from whitebox.tests.v1.mock_data import timestamps, mixed_actuals\n",
"df = pd.read_csv(\"src/analytics/data/testing/classification_test_data.csv\")\n",
"\n",
"\n",
"wb.log_inferences(model_id=\"5dce48be-a16f-45dc-a878-d80b5b02a1f2\", processed=df, non_processed=df, timestamps=timestamps, actuals=mixed_actuals)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You can create a monitor in whitebox so that alert are created automaticaly when some value is out of bounds:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from whitebox import Whitebox\n",
"\n",
"wb = Whitebox(host=\"127.0.0.1:8000\", api_key=\"some_api_key\")\n",
"\n",
"model_monitor = wb.create_model_monitor(\n",
" model_id=\"mock_model_id\",\n",
" name=\"test\",\n",
" status=\"active\",\n",
" metric=\"accuracy\",\n",
" severity=\"high\",\n",
" email=\"jackie.chan@somemail.io\",\n",
" lower_threshold=0.7\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to delete a model you hae to provide the `model_id`.\n",
"\n",
"If the model is successfully deleted, `True` will be returned, otherwise `False`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wb.delete_model(\"5dce48be-a16f-45dc-a878-d80b5b02a1f2\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:29) [Clang 14.0.6 ]"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "32a5a47fe20cdfbd609287887f2e78a4d5f2f7afeda3da775d5794970f9a3f8e"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
4 changes: 2 additions & 2 deletions whitebox/schemas/modelMonitor.py
Original file line number Diff line number Diff line change
@@ -30,10 +30,10 @@ class ModelMonitorBase(BaseModel):
name: str
status: MonitorStatus
metric: MonitorMetrics
feature: Optional[str]
lower_threshold: Optional[float]
severity: AlertSeverity
email: str
feature: Optional[str]
lower_threshold: Optional[float]


class ModelMonitor(ModelMonitorBase, ItemBase):

0 comments on commit 298e1d2

Please sign in to comment.