Skip to content

Commit bd7e5bf

Browse files
authored
refine the 40-load-data/04-transform/ (#2188)
1 parent 6f6726a commit bd7e5bf

File tree

9 files changed

+165
-356
lines changed

9 files changed

+165
-356
lines changed

docs/en/guides/40-load-data/04-transform/00-querying-parquet.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,4 +54,19 @@ FROM @parquet_query_stage
5454
FILE_FORMAT => 'parquet_query_format',
5555
PATTERN => '.*[.]parquet'
5656
);
57+
```
58+
### Query with Metadata
59+
60+
Query Parquet files directly from a stage, including metadata columns like `metadata$filename` and `metadata$file_row_number`:
61+
62+
```sql
63+
SELECT
64+
metadata$filename AS file,
65+
metadata$file_row_number AS row,
66+
*
67+
FROM @parquet_query_stage
68+
(
69+
FILE_FORMAT => 'parquet_query_format',
70+
PATTERN => '.*[.]parquet'
71+
);
5772
```

docs/en/guides/40-load-data/04-transform/01-querying-csv.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,19 @@ FROM @csv_query_stage
6969
FILE_FORMAT => 'csv_query_format',
7070
PATTERN => '.*[.]csv[.]gz'
7171
);
72+
```
73+
### Query with Metadata
74+
75+
Query CSV files directly from a stage, including metadata columns like `metadata$filename` and `metadata$file_row_number`:
76+
77+
```sql
78+
SELECT
79+
metadata$filename AS file,
80+
metadata$file_row_number AS row,
81+
$1, $2, $3
82+
FROM @csv_query_stage
83+
(
84+
FILE_FORMAT => 'csv_query_format',
85+
PATTERN => '.*[.]csv'
86+
);
7287
```

docs/en/guides/40-load-data/04-transform/02-querying-tsv.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,4 +68,19 @@ FROM @tsv_query_stage
6868
FILE_FORMAT => 'tsv_query_format',
6969
PATTERN => '.*[.]tsv[.]gz'
7070
);
71+
```
72+
### Query with Metadata
73+
74+
Query TSV files directly from a stage, including metadata columns like `metadata$filename` and `metadata$file_row_number`:
75+
76+
```sql
77+
SELECT
78+
metadata$filename AS file,
79+
metadata$file_row_number AS row,
80+
$1, $2, $3
81+
FROM @tsv_query_stage
82+
(
83+
FILE_FORMAT => 'tsv_query_format',
84+
PATTERN => '.*[.]tsv'
85+
);
7186
```

docs/en/guides/40-load-data/04-transform/03-querying-ndjson.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,4 +66,19 @@ FROM @ndjson_query_stage
6666
FILE_FORMAT => 'ndjson_query_format',
6767
PATTERN => '.*[.]ndjson[.]gz'
6868
);
69+
```
70+
### Query with Metadata
71+
72+
Query NDJSON files directly from a stage, including metadata columns like `metadata$filename` and `metadata$file_row_number`:
73+
74+
```sql
75+
SELECT
76+
metadata$filename AS file,
77+
metadata$file_row_number AS row,
78+
$1:title, $1:author
79+
FROM @ndjson_query_stage
80+
(
81+
FILE_FORMAT => 'ndjson_query_format',
82+
PATTERN => '.*[.]ndjson'
83+
);
6984
```

docs/en/guides/40-load-data/04-transform/03-querying-orc.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ import StepContent from '@site/src/components/Steps/step-content';
88
## Syntax
99

1010
```sql
11-
SELECT [<alias>.]<column> [, <column> ...] | [<alias>.]$<col_position> [, $<col_position> ...]
12-
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
13-
[(
11+
SELECT [<alias>.]<column> [, <column> ...] | [<alias>.]$<col_position> [, $<col_position> ...]
12+
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
13+
[(
1414
[<connection_parameters>],
1515
[ PATTERN => '<regex_pattern>'],
1616
[ FILE_FORMAT => 'ORC | <custom_format_name>'],
@@ -39,7 +39,7 @@ The iris dataset contains 3 classes of 50 instances each, where each class refer
3939
Create an external stage with your Amazon S3 bucket where your iris dataset file is stored.
4040

4141
```sql
42-
CREATE STAGE orc_query_stage
42+
CREATE STAGE orc_query_stage
4343
URL = 's3://databend-doc'
4444
CONNECTION = (
4545
AWS_KEY_ID = '<your-key-id>',
@@ -78,5 +78,24 @@ FROM
7878
'https://github.com/tensorflow/io/raw/master/tests/test_orc/iris.orc' (file_format = > 'orc');
7979
```
8080

81+
</StepContent>
82+
<StepContent number="4">
83+
84+
### Query with Metadata
85+
86+
Query ORC files directly from a stage, including metadata columns like `metadata$filename` and `metadata$file_row_number`:
87+
88+
```sql
89+
SELECT
90+
metadata$filename AS file,
91+
metadata$file_row_number AS row,
92+
*
93+
FROM @orc_query_stage
94+
(
95+
FILE_FORMAT => 'orc',
96+
PATTERN => '.*[.]orc'
97+
);
98+
```
99+
81100
</StepContent>
82101
</StepsWrap>

docs/en/guides/40-load-data/04-transform/04-querying-avro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ FROM @avro_query_stage
8484
);
8585
```
8686

87-
#### Query with Metadata
87+
### Query with Metadata
8888

8989
Query Avro files directly from a stage, including metadata columns like `metadata$filename` and `metadata$file_row_number`:
9090

docs/en/guides/40-load-data/04-transform/04-querying-metadata.md

Lines changed: 9 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,6 @@ title: Working with File and Column Metadata
33
sidebar_label: Metadata
44
---
55

6-
This guide explains how to query metadata from staged files. The supported file formats for metadata querying are summarized in the table below:
7-
8-
| Metadata Type | Supported File Formats |
9-
|---------------------|------------------------------------------------------|
10-
| File-level metadata | CSV, TSV, Parquet, NDJSON, Avro |
11-
| Column-level metadata (INFER_SCHEMA) | Parquet |
12-
136
The following file-level metadata fields are available for the supported file formats:
147

158
| File Metadata | Type | Description |
@@ -22,68 +15,13 @@ These metadata fields are available in:
2215
- SELECT queries over stages (e.g., `SELECT FROM @stage`)
2316
- `COPY INTO <table>` statements
2417

25-
### Examples
26-
27-
1. Querying Metadata Fields
28-
29-
You can directly select metadata fields when reading from a stage:
30-
31-
```sql
32-
SELECT
33-
metadata$filename,
34-
metadata$file_row_number
35-
FROM @my_internal_stage
36-
LIMIT 1;
37-
```
38-
39-
```sql
40-
│ metadata$filename │ metadata$file_row_number │
41-
├───────────────────┼───────────────────────────┤
42-
iris.parquet10
43-
```
44-
45-
2. Using Metadata in COPY INTO
46-
47-
You can pass metadata fields into target table columns using COPY INTO:
48-
49-
```sql
50-
COPY INTO iris_with_meta
51-
FROM (SELECT metadata$filename, metadata$file_row_number, $1, $2, $3, $4, $5 FROM @my_internal_stage/iris.parquet)
52-
FILE_FORMAT=(TYPE=parquet);
53-
```
54-
55-
## Inferring Column Metadata from Files
56-
57-
Databend allows you to retrieve column-level metadata from your staged files using the [INFER_SCHEMA](/sql/sql-functions/table-functions/infer-schema) function. This is currently supported for **Parquet** files.
58-
59-
| Column Metadata | Type | Description |
60-
|-----------------|---------|--------------------------------------------------|
61-
| `column_name` | String | Indicates the name of the column. |
62-
| `type` | String | Indicates the data type of the column. |
63-
| `nullable` | Boolean | Indicates whether the column allows null values. |
64-
| `order_id` | UInt64 | Represents the column's position in the table. |
65-
66-
### Examples
67-
68-
The following example retrieves column metadata from a Parquet file staged in `@my_internal_stage`:
69-
70-
```sql
71-
SELECT * FROM INFER_SCHEMA(location => '@my_internal_stage/iris.parquet');
72-
```
73-
74-
```sql
75-
┌──────────────────────────────────────────────┐
76-
│ column_name │ type │ nullable │ order_id │
77-
├──────────────┼─────────┼──────────┼──────────┤
78-
│ id │ BIGINT │ true │ 0
79-
│ sepal_length │ DOUBLE │ true │ 1
80-
│ sepal_width │ DOUBLE │ true │ 2
81-
│ petal_length │ DOUBLE │ true │ 3
82-
│ petal_width │ DOUBLE │ true │ 4
83-
│ species │ VARCHAR │ true │ 5
84-
└──────────────────────────────────────────────┘
85-
```
86-
87-
## Tutorials
18+
## Detailed Guides for Querying Metadata
8819

89-
- [Querying Metadata](/tutorials/load/query-metadata)
20+
| File Format | Guide |
21+
|-------------|---------------------------------------------------------------------------------------------------|
22+
| Parquet | [Querying Parquet Files with Metadata](/docs/en/guides/40-load-data/04-transform/00-querying-parquet.md#query-with-metadata) |
23+
| CSV | [Querying CSV Files with Metadata](/docs/en/guides/40-load-data/04-transform/01-querying-csv.md#query-with-metadata) |
24+
| TSV | [Querying TSV Files with Metadata](/docs/en/guides/40-load-data/04-transform/02-querying-tsv.md#query-with-metadata) |
25+
| NDJSON | [Querying NDJSON Files with Metadata](/docs/en/guides/40-load-data/04-transform/03-querying-ndjson.md#query-with-metadata) |
26+
| ORC | [Querying ORC Files with Metadata](/docs/en/guides/40-load-data/04-transform/03-querying-orc.md#query-with-metadata) |
27+
| Avro | [Querying Avro Files with Metadata](/docs/en/guides/40-load-data/04-transform/04-querying-avro.md#query-with-metadata) |

0 commit comments

Comments
 (0)