Skip to content

Commit c1d8310

Browse files
Merge pull request #2162 from segmentio/DOC-361-IF
Added object/array table to Warehouse Schema doc
2 parents 6a98b9d + 457fece commit c1d8310

File tree

2 files changed

+98
-5
lines changed

2 files changed

+98
-5
lines changed

src/connections/storage/warehouses/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Examples of data warehouses include Amazon Redshift, Google BigQuery, and Postgr
2424
> info "Looking for the Warehouse Schemas docs?"
2525
> They've moved! Check them out [here](schema/).
2626
27-
{% include components/reference-button.html href="https://segment.com/academy/intro/when-to-use-sql-for-analysis/&referrer=docs" icon="media/academy.svg" title="Analytics Academy: When to use SQL for analysis" description="When your existing analytics tools can't answer your questions, it's time to level-up and use SQL for analysis." %}
27+
{% include components/reference-button.html href="https://segment.com/academy/intro/when-to-use-sql-for-analysis/?referrer=docs" icon="media/academy.svg" title="Analytics Academy: When to use SQL for analysis" description="When your existing analytics tools can't answer your questions, it's time to level-up and use SQL for analysis." %}
2828

2929
### More Help
3030

src/connections/storage/warehouses/schema.md

Lines changed: 97 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -229,10 +229,103 @@ AND table_name = '<event>'
229229
ORDER by column_name
230230
```
231231

232-
> info "Note"
233-
> If you send us an array, we stringify it in Redshift. That way you don't end up having to pollute your events. It won't work if you have a lot of array elements but should work decently to store and query those. We also flatten nested objects. 
232+
### How event tables handle nested objects and arrays
233+
234+
To preserve the quality of your events data, Segment uses the following methods to store objects and arrays in the event tables:
235+
236+
<table>
237+
<thead>
238+
<tr>
239+
<th> Field </th>
240+
<th> Code (Example) </th>
241+
<th> Schema (Example) </th>
242+
</tr>
243+
</thead>
244+
245+
<tr>
246+
<td><b>Object (Context):</b> Flatten </td>
247+
<td markdown="1">
248+
249+
``` json
250+
context: {
251+
app: {
252+
version: "1.0.0"
253+
}
254+
}
255+
```
256+
</td>
257+
<td>
258+
<b>Column Name:</b><br/>
259+
context_app_version
260+
<br/><br/>
261+
<b>Value:</b><br/>
262+
"1.0.0"
263+
</td>
264+
</tr>
265+
266+
<tr>
267+
<td> <b>Object (Traits):</b> Flatten </td>
268+
<td markdown= "1">
269+
270+
```json
271+
traits: {
272+
address: {
273+
street: "6th Street"
274+
}
275+
}
276+
```
234277

278+
</td>
279+
<td>
280+
<b>Column Name:</b><br/>
281+
address_street<br/>
282+
<br/>
283+
<b>Value:</b><br/>
284+
"6th Street"
285+
</td>
286+
</tr>
287+
288+
<tr>
289+
<td><b>Object (Properties):</b> Stringify</td>
290+
<td markdown="1">
291+
292+
```json
293+
properties: {
294+
product_id: {
295+
sku: "G-32"
296+
}
297+
}
298+
```
299+
</td>
300+
<td>
301+
<b>Column Name:</b><br/>
302+
product_id<br/><br/>
303+
<b>Value:</b><br/>
304+
"{sku.'G-32'}"
305+
</td>
306+
</tr>
307+
308+
<tr>
309+
<td><b>Array (Any):</b> Stringify</td>
310+
<td markdown="1">
311+
312+
```json
313+
products: {
314+
product_id: [
315+
"507f1", "505bd"
316+
]
317+
}
318+
```
235319

320+
</td>
321+
<td>
322+
<b>Column Name:</b> <br/>
323+
product_id <br/><br/>
324+
<b>Value:</b>
325+
"[507f1, 505bd]"
326+
</td>
327+
</tr>
328+
</table>
236329

237330
## Tracks vs. Events Tables
238331

@@ -303,7 +396,7 @@ New event properties and traits create columns. Segment processes the incoming d
303396
304397
When Segment process a new batch and discover a new column to add, we take the most recent occurrence of a column and choose its datatype.
305398

306-
The datatypes that we support right now are
399+
The data types that we currently support include
307400

308401
- `timestamp`
309402
- `integer` 
@@ -325,7 +418,7 @@ All four timestamps pass through to your Warehouse for every ETL'd event. In mos
325418

326419
`timestamp` is the UTC-converted timestamp which is set by the Segment library. If you are importing historical events using a server-side library, this is the timestamp you'll want to reference in your queries.
327420

328-
`original_timestamp` is the original timestamp set by the Segment library at the time the event is created. Keep in mind, this timestamp can be affected by device clock skew. You can override this value by manually passing in a value for `timestamp` which will then be relabed as `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column.
421+
`original_timestamp` is the original timestamp set by the Segment library at the time the event is created. Keep in mind, this timestamp can be affected by device clock skew. You can override this value by manually passing in a value for `timestamp` which will then be relabeled as `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column.
329422

330423
`sent_at` is the UTC timestamp set by library when the Segment API call was sent. This timestamp can also be affected by device clock skew.
331424

0 commit comments

Comments
 (0)