Open
Description
Requirement
There are some scenarios of time series forecasting in real world, such as forecasting the pv/uv or the traffic load from the history data. The sample data is as follows:
app_id | time | uv | pv |
---|---|---|---|
1 | 1 | 1 | 5 |
1 | 2 | 2 | 6 |
1 | 3 | 3 | 7 |
1 | 4 | 4 | 8 |
2 | 1 | 11 | 12 |
2 | 2 | 13 | 14 |
2 | 3 | 15 | 16 |
2 | 4 | 17 | 18 |
The table above contains the data of two entities/app_id 1
and 2
. Each entity has the uv/pv in 4 time steps 1, 2, 3, 4
.
The common data transform function for time series data is convert the data of sliding windows to rows
. If the window size is 2
, the result table is:
app_id | time | uv | uv_1 | uv_2 | pv | pv_1 | pv_2 |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | NULL | NULL | 5 | NULL | NULL |
1 | 2 | 2 | 1 | NULL | 6 | 5 | NULL |
1 | 3 | 3 | 2 | 1 | 7 | 6 | 5 |
1 | 4 | 4 | 3 | 2 | 8 | 7 | 6 |
2 | 1 | 11 | NULL | NULL | 12 | NULL | NULL |
2 | 2 | 13 | 11 | NULL | 14 | 12 | NULL |
2 | 3 | 15 | 13 | 11 | 16 | 14 | 12 |
2 | 4 | 17 | 15 | 13 | 18 | 16 | 14 |
Points:
- Do the existed SQL syntax (MaxCompute / Hive / MySQL) support this transformation? If yes, how complex is it?
- Do we need extend the SQL syntax in SQLFlow (such as
TO RUN
clause) ?