Skip to content

Commit d6cb363

Browse files
andnigamotlckurze
committed
Time series: Add "Primer: Machine Learning for Time Series Data"
Co-authored-by: Andreas Motl <[email protected]> Co-authored-by: ckurze <[email protected]>
1 parent 3779db3 commit d6cb363

20 files changed

+1304
-1
lines changed
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

docs/domain/timeseries/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,5 +93,6 @@ and analyzing. Industrial applications.
9393
Basics <basics>
9494
Advanced <advanced>
9595
Connectivity <connect>
96-
video
96+
Video Tutorials <video>
97+
Machine Learning Primer <ml-primer/index>
9798
:::

docs/domain/timeseries/ml-primer/10-about-intro.md

Lines changed: 474 additions & 0 deletions
Large diffs are not rendered by default.

docs/domain/timeseries/ml-primer/20-mlops-cratedb-mlflow.md

Lines changed: 650 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
# Experiment Tracking with CrateDB using SQL only
2+
3+
_Introduction to Time Series Modeling with CrateDB (Part 3)._
4+
5+
This is part 3 of our blog series about "Running Time Series Models in Production using CrateDB".
6+
7+
8+
## Introduction
9+
10+
While MLflow is a handy tool, it is also possible to track your experiments exclusively using
11+
CrateDB and SQL, without using any machine learning framework at all.
12+
13+
Because CrateDB provides storage support for both nested documents, and binary data, you can also
14+
store parameters, metrics, the model configuration, and the model itself, directly into CrateDB.
15+
16+
The next section demonstrates it on behalf of two corresponding examples.
17+
18+
19+
## Storing Experiment Metadata
20+
21+
CrateDB supports you in storing and recording your experiment metadata.
22+
23+
### 1. Deploy database schema
24+
25+
Create database tables in CrateDB, to store metrics and parameters.
26+
27+
```sql
28+
CREATE TABLE metrics_params (
29+
timestamp TIMESTAMP DEFAULT now(),
30+
run_name TEXT,
31+
metrics OBJECT(DYNAMIC),
32+
parameters OBJECT(DYNAMIC)
33+
);
34+
```
35+
36+
Using CrateDB's dynamic `OBJECT` column, you can store arbitrary key-value pairs into the `metrics`
37+
and `parameters` columns. This makes it possible to adjust which parameters and metrics you want to
38+
add throughout the experiments, and evolve corresponding details while you go.
39+
40+
### 2. Record metrics and parameters
41+
42+
Instead of recording the metrics and parameters to MLflow, as demonstrated at
43+
[MLOps powered by CrateDB and MLflow » Experiment Tracking][ml-timeseries-blog-part-2], you will
44+
record them by directly inserting into the database table.
45+
46+
```sql
47+
INSERT INTO
48+
metrics_params (run_name, metrics, parameters)
49+
VALUES ('random_run_name',
50+
'{"precision": 0.667, "recall": 0.667}',
51+
'{"anomaly_threshold": 2.5, "alm_suppress_minutes": 3.5}');
52+
```
53+
54+
### 3. Read back recordings
55+
56+
To read back individual parameters of your recordings, you can utilize the standard
57+
[SQL `SELECT` statements].
58+
59+
To retrieve all recorded metrics and parameters after a certain point in time:
60+
```sql
61+
SELECT *
62+
FROM metrics_params
63+
WHERE timestamp > '2021-01-01';
64+
```
65+
66+
To retrieve specific parameters or metrics:
67+
```sql
68+
SELECT metrics['precision'], parameters['anomaly_threshold']
69+
FROM metrics_params
70+
WHERE timestamp > '2021-01-01';
71+
```
72+
73+
74+
## Storing Model Data
75+
76+
CrateDB supports you in storing your model data.
77+
78+
Independently of recording experiment metadata, you may also want to store the model itself into
79+
CrateDB, by leveraging its [BLOB data type].
80+
81+
In order to store models into CrateDB, you will need two database tables:
82+
83+
- A regular RDBMS database table, storing the model configuration and
84+
relevant metadata.
85+
- A blob database table, storing serialized models in binary format,
86+
usually in Python's [pickle format].
87+
88+
### 1. Deploy database schema
89+
90+
Create those tables, again utilizing CrateDB's nested object support for flexible
91+
schema evolution:
92+
93+
```sql
94+
CREATE TABLE model_config (
95+
timestamp TIMESTAMP DEFAULT now(),
96+
digest TEXT, -- this is the link to the model blog
97+
run_name TEXT,
98+
config OBJECT(DYNAMIC)
99+
);
100+
101+
CREATE BLOB TABLE models;
102+
```
103+
104+
### 2. Upload the model
105+
106+
To upload the model, run the following Python program after adjusting the spots
107+
about the database connection and credentials.
108+
109+
```python
110+
from io import BytesIO
111+
import pickle
112+
from crate import client
113+
114+
file = BytesIO()
115+
# Serialize the model object and store it in the in-memory file
116+
pickle.dump(model, file)
117+
118+
conn = client.connect(
119+
"https://<your-instance>.azure.cratedb.net:4200",
120+
username="admin",
121+
password="<your-password>",
122+
verify_ssl_cert=True,
123+
)
124+
125+
blob_container = conn.get_blob_container('models')
126+
blob_digest = blob_container.put(file)
127+
```
128+
129+
Make sure to update the model configuration table accordingly:
130+
131+
```python
132+
cursor = conn.cursor()
133+
cursor.execute(
134+
"INSERT INTO model_config (digest, run_name, config) VALUES (?, ?, ?)",
135+
(blob_digest, "random_run_name", model.config.to_dict()))
136+
```
137+
138+
CrateDB automatically creates all the model config columns.
139+
140+
![crate model config](/_assets/img/ml-timeseries-primer/cratedb-model-configuration.png)
141+
142+
### 3. Read back the model
143+
144+
To retrieve a model from the blob store table again, you will need to get the digest value
145+
of the model from the configuration table:
146+
147+
```sql
148+
SELECT digest FROM model_config WHERE run_name = 'random_run_name';
149+
```
150+
151+
Then, use this digest, i.e. the blob identifier, to get the blob payload, and
152+
deserialize it from pickle format:
153+
154+
```python
155+
blob_content = b""
156+
for chunk in blob_container.get(digest):
157+
blob_content += chunk
158+
159+
model = pickle.loads(blob_content)
160+
```
161+
162+
163+
[BLOB data type]: https://crate.io/docs/crate/reference/en/latest/general/blobs.html
164+
[ml-timeseries-blog-part-1]: https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data
165+
[ml-timeseries-blog-part-2]: https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-part-2?hs_preview=uXVBkYrk-136061503799
166+
[pickle format]: https://realpython.com/python-pickle-module/
167+
[SQL `SELECT` statements]: https://crate.io/docs/crate/reference/en/latest/sql/statements/select.html
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
(timeseries-ml-primer)=
2+
# Primer: Machine Learning for Time Series Data
3+
4+
Learn how to apply machine learning procedures to time series data.
5+
6+
```{toctree}
7+
:glob:
8+
:maxdepth: 2
9+
10+
*
11+
```

0 commit comments

Comments
 (0)