Skip to content

Commit 9494631

Browse files
committed
feat(dwh): update
1 parent 53f88c8 commit 9494631

File tree

1 file changed

+89
-0
lines changed

1 file changed

+89
-0
lines changed

pages/data-warehouse/quickstart.mdx

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,96 @@ You are now connected to your Data Warehouse for ClickHouse® deployment, and ca
8888

8989
## How to import and query an example data set
9090

91+
### Creating a database and ingesting data
9192

93+
<Message type="note">
94+
This example is based on the **New York Taxi Data** from the [Official ClickHouse documentation](https://clickhouse.com/docs/getting-started/example-datasets/nyc-taxi).
95+
</Message>
96+
97+
1. Run the command below to create a new database:
98+
```sql
99+
CREATE DATABASE nyc_taxi;
100+
```
101+
102+
2. Create a new table in the database you just created:
103+
```sql
104+
CREATE TABLE nyc_taxi.trips_small (
105+
trip_id UInt32,
106+
pickup_datetime DateTime,
107+
dropoff_datetime DateTime,
108+
pickup_longitude Nullable(Float64),
109+
pickup_latitude Nullable(Float64),
110+
dropoff_longitude Nullable(Float64),
111+
dropoff_latitude Nullable(Float64),
112+
passenger_count UInt8,
113+
trip_distance Float32,
114+
fare_amount Float32,
115+
extra Float32,
116+
tip_amount Float32,
117+
tolls_amount Float32,
118+
total_amount Float32,
119+
payment_type Enum('CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4, 'UNK' = 5),
120+
pickup_ntaname LowCardinality(String),
121+
dropoff_ntaname LowCardinality(String)
122+
)
123+
ENGINE = MergeTree
124+
PRIMARY KEY (pickup_datetime, dropoff_datetime);
125+
```
126+
127+
3. Insert data from an Amazon S3 bucket
128+
```sql
129+
INSERT INTO nyc_taxi.trips_small
130+
SELECT
131+
trip_id,
132+
pickup_datetime,
133+
dropoff_datetime,
134+
pickup_longitude,
135+
pickup_latitude,
136+
dropoff_longitude,
137+
dropoff_latitude,
138+
passenger_count,
139+
trip_distance,
140+
fare_amount,
141+
extra,
142+
tip_amount,
143+
tolls_amount,
144+
total_amount,
145+
payment_type,
146+
pickup_ntaname,
147+
dropoff_ntaname
148+
FROM s3(
149+
'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{0..2}.gz',
150+
'TabSeparatedWithNames'
151+
);
152+
```
153+
154+
### Querying the database
155+
156+
1. Run the command below to count the rows you inserted in the table:
157+
```sql
158+
SELECT count()
159+
FROM nyc_taxi.trips_small;
160+
```
161+
162+
2. Run the command below to list the 10 first rows of your table:
163+
```sql
164+
SELECT *
165+
FROM nyc_taxi.trips_small
166+
LIMIT 10;
167+
```
168+
169+
3. Run the command below to display the 10 first neighborhoods with the most frequent pickups:
170+
```sql
171+
SELECT
172+
pickup_ntaname,
173+
count(*) AS count
174+
FROM nyc_taxi.trips_small WHERE pickup_ntaname != ''
175+
GROUP BY pickup_ntaname
176+
ORDER BY count DESC
177+
LIMIT 10;
178+
```
179+
180+
To perform more in-depth tests with larger data sets, refer to our [dedicated documentation](/data-warehouse/reference-content/example-datasets/)
92181

93182
## How to manage your deployment
94183

0 commit comments

Comments
 (0)