Skip to content

Commit e601aca

Browse files
committed
Data Lake docs improvements
1 parent a27b345 commit e601aca

File tree

10 files changed

+122
-67
lines changed

10 files changed

+122
-67
lines changed

docs/deploy/blob-storage.md renamed to docs/quix-cloud/managed-services/blob-storage.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ description: Connect your cluster to external object storage (S3, GCS, Azure Blo
55

66
# Blob storage connections
77

8-
Connect your cluster to a bucket/container so Quix can enable **Quix Lake**-the platform’s open storage layer for Kafka topic data.
8+
Connect your cluster to a bucket/container so Quix can enable **Quix Lake** or any other managed service that requires a Blob storage connection.
99

10-
![Connections list](../images/blob-storage/connections-list-running.png)
10+
![Connections list](../../images/blob-storage/connections-list-running.png)
1111

1212
!!! important "One connection per cluster"
1313
Each **cluster** supports **one** blob storage connection.
@@ -37,7 +37,7 @@ Connect your cluster to a bucket/container so Quix can enable **Quix Lake**-the
3737

3838
## Test before saving
3939

40-
![Testing connection](../images/blob-storage/test-connecting.png)
40+
![Testing connection](../../images/blob-storage/test-connecting.png)
4141

4242
When you click **Test connection**, Quix runs a short round-trip check to make sure your details are correct and that the platform can both see and use your storage.
4343

@@ -55,7 +55,7 @@ Each step is shown in the dialog. Successful steps are marked with a ✓, and yo
5555
**Failure**
5656
If a step fails, you’ll see ✗ next to it along with the reason (for example, “Access denied” or “Wrong region”). This makes it easy to fix permissions or update your settings.
5757

58-
![Access denied example](../images/blob-storage/test-error.png)
58+
![Access denied example](../../images/blob-storage/test-error.png)
5959

6060
## Providers
6161

@@ -132,6 +132,6 @@ If a step fails, you’ll see ✗ next to it along with the reason (for example,
132132

133133
* [What is Quix Lake](../quix-cloud/quixlake/overview.md) - what it is and why it exists
134134
* [Open format](../quix-cloud/quixlake/open-format.md) - layout and schemas (Avro, Parquet)
135-
* [Quix Lake Catalog](../quix-cloud/quixlake/catalog.md) - browse, search, and manage datasets
136-
* [Quix Lake Sink](../quix-cloud/managed-services/sink.md) - persist topics to your bucket/container
137-
* [Quix Lake Replay (managed)](../quix-cloud/managed-services/replay.md) - re-run datasets back to Kafka
135+
* [Quix Lake - API](../quix-cloud/quixlake/api.md) - browse, search, and manage datasets
136+
* [Quix Lake - Sink](../quix-cloud/managed-services/sink.md) - persist topics to your bucket/container
137+
* [Quix Lake - Replay](../quix-cloud/managed-services/replay.md) - re-run datasets back to Kafka

docs/quix-cloud/managed-services/dynamic-configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
The **Dynamic Configuration Manager** is a managed service for handling
44
**large, versioned configuration files** related to devices, sensors, or
55
physical assets.
6+
67
These configurations often change in real time (e.g., updates to
78
equipment parameters, IoT sensor mappings, or lab/test system setups),
89
but are **too large to send through Kafka directly**.

docs/quix-cloud/managed-services/overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ Note
1515
## Available managed services
1616

1717
- Dynamic Configuration Manager (`DynamicConfiguration`)
18-
- Quix Lake Sink (`quixlake.Sink`)
19-
- Quix Lake Replay (`quixlake.Replay`)
18+
- Quix Lake - Sink (`DataLake.Sink`)
19+
- Quix Lake - Replay (`DataLake.Replay`)
2020

2121
## Quick example
2222

docs/quix-cloud/managed-services/replay.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22

3-
title: Quix Lake Replay (managed)
3+
title: Quix Lake - Replay
44
description: Managed service that replays persisted datasets from Quix Lake back into Kafka with full fidelity.
55
---
66

7-
# Quix Lake Replay
7+
# Quix Lake - Replay
88

99
Quix Lake Replay is a managed service that streams persisted datasets from **Quix Lake** back into **Kafka**, preserving timestamps, partitions, offsets, headers, and gaps for high-fidelity re-runs and simulations.
1010

@@ -19,17 +19,17 @@ You can launch a replay from multiple places in the Portal:
1919
Use the pipeline canvas to add a Replay tile and start it in context of your flow. The screenshot shows the entry point on the canvas.
2020
![Start from pipeline](./images/replay/start-from-pipeline.png)
2121

22-
### From the Data Catalog
22+
### From the Data Lake UI
2323

24-
Open **Quix Lake → Catalog**, select the dataset (topic + time window/keys/partitions), and start a replay directly from the catalog. The screenshot highlights the replay action on a selected dataset.
25-
![Start from Data Catalog](./images/replay/start-from-catalog.png)
24+
Open **Data Lake**, select the dataset (topic + time window/keys/partitions), and start a replay directly from the catalog. The screenshot highlights the replay action on a selected dataset.
25+
![Start from Data Lake UI](./images/replay/start-from-catalog.png)
2626

2727
## Example YAML (basic)
2828

2929
```yaml
3030
deployments:
31-
- name: Quix Lake Replay
32-
application: quixlake.Replay
31+
- name: Quix Lake - Replay
32+
application: DataLake.Replay
3333
version: latest
3434
deploymentType: Managed
3535
resources:

docs/quix-cloud/managed-services/sink.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
2-
title: Quix Lake Sink
2+
title: Quix Lake - Sink
33
description: Connector that persists Kafka data into Quix Lake.
44
---
55

6-
# Quix Lake Sink
6+
# Quix Lake - Sink
77

88
The Quix Lake Sink writes Kafka topic data to your blob storage in **Avro** (raw messages) and **Parquet** (index and optional custom metadata), enabling fast discovery and high-fidelity **Replay**.
99

@@ -44,7 +44,7 @@ Metadata:
4444
## How to run (UI)
4545

4646
1. Create or log in to your Quix account.
47-
2. Go to **Connectors → Add connector → Quix quixlake Sink**.
47+
2. Go to **Connectors → Add connector → Quix Lake - Sink**.
4848
3. Click **Set up connector**, fill the parameters below, then **Test connection & deploy**.
4949

5050
!!! info "Managed service"
@@ -78,7 +78,7 @@ You can configure the sink in **YAML** or via the **Quix Cloud UI**.
7878
* `consumerGroup` - Kafka consumer group ID (default: `quixstreams-default`)
7979
* `autoOffsetReset` - `latest` or `earliest` (default: `latest`)
8080

81-
#### Quix Lake settings
81+
#### Sink settings
8282

8383
* `avroCompression` - `snappy` or `gzip` (default: `snappy`)
8484
* `maxWorkers` - threads for uploading (default: `5`)
@@ -90,8 +90,8 @@ You can configure the sink in **YAML** or via the **Quix Cloud UI**.
9090

9191
```yaml
9292
deployments:
93-
- name: Quix Lake Sink
94-
application: quixlake.Sink
93+
- name: Quix Lake - Sink
94+
application: DataLake.Sink
9595
version: latest
9696
deploymentType: Managed
9797
resources:
@@ -132,7 +132,7 @@ deployments:
132132
* **Parquet (Custom metadata, optional)**
133133
Your key–value annotations (`Topic`, `Key`, `MetadataKey`, `MetadataValue`, `UpdatedUtc`) used for search and grouping in the Catalog.
134134

135-
See **Open format** for full schemas and layout.
135+
See [Open format](../quixlake/open-format.md) for full schemas and layout.
136136

137137
## Operational behavior
138138

@@ -151,20 +151,20 @@ See **Open format** for full schemas and layout.
151151

152152
* **Logs**: per-segment lifecycle (rolling, upload, index write), retries, and timings
153153
* **Metrics**: records persisted, bytes uploaded, active uploads, average upload speed
154-
* **Catalog**: new datasets appear as index files land; use **Refresh** if you need to surface them sooner
154+
* **Data Lake UI*: new datasets appear as index files land; use **Refresh** if you need to surface them sooner
155155

156156
## Security
157157

158158
* Uses the **cluster’s** blob storage connection (scoped credentials; one bucket/container per connection)
159159
* Honor your cloud controls: IAM roles, key rotation, server-side encryption, access logs, retention
160-
* The sink does not delete raw data; deletion flows through **Catalog** with soft-delete and trash retention
160+
* The sink does not delete raw data; deletion flows through **Data Lake API** with soft-delete and trash retention
161161

162162
## Troubleshooting
163163

164164
* **Access denied**
165165
Verify the blob connection’s permissions: list, read, write, and delete on the bucket/container.
166-
* **Nothing appears in Catalog**
167-
Check sink logs for successful index writes; click **Refresh** in Catalog; ensure time filters include the new data.
166+
* **Nothing appears in Data Lake UI**
167+
Check sink logs for successful index writes; click **Refresh** in the UI; ensure time filters include the new data.
168168
* **Small-file explosion**
169169
Increase `rollBytes` and/or `rollSeconds`, or add a replica to smooth throughput.
170170
* **Slow uploads**
@@ -173,6 +173,6 @@ See **Open format** for full schemas and layout.
173173
## See also
174174

175175
* [Open format](../quixlake/open-format.md)
176-
* [Quix Lake User Interface](../quixlake/user-interface.md)
177-
* [Quix Lake Replay (managed)](./replay.md)
178-
* [Blob storage connections](../../deploy/blob-storage.md)
176+
* [Quix Lake - User Interface](../quixlake/user-interface.md)
177+
* [Quix Lake - Replay](./replay.md)
178+
* [Blob storage connections](./blob-storage.md)

docs/quix-cloud/quixlake/api.md

Lines changed: 60 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22

3-
title: Quix Lake API
4-
description: Programmatic access to Quix Lake for search, metadata, file discovery, and lifecycle operations. Backend for the Quix Lake UI; Metadata endpoints are the primary integration surface for your applications.
3+
title: Quix Lake - API
4+
description: Programmatic access to Quix Lake for search, metadata, file discovery, and lifecycle operations. Backend for the Quix Lake UI; Metadata endpoints are the primary integration surface for your external re-indexing applications.
55
---
66

77
# Overview
@@ -17,9 +17,19 @@ See the UI page: [Quix Lake User Interface](./user-interface.md).
1717

1818
![Open API](./images/user-interface-open-swagger.png)
1919

20-
## Catalog
2120

22-
These routes back the catalog’s topic/key lists, facets, and search results.
21+
## Catalog endpoints
22+
23+
The **Catalog** endpoints powers discovery features across workspaces and topics. These endpoints let you search indexed stream metadata, explore available topics and keys, and manage cache refreshes to ensure results are up to date.
24+
25+
Use these routes to:
26+
27+
* Perform flexible searches (text, fuzzy, prefix/suffix, and time-bound queries).
28+
* Enumerate available topics, keys, and metadata facets for filtering.
29+
* Keep catalog results fresh with cache refresh operations at the workspace or topic level.
30+
* Retrieve workspace and global metadata keys to support consistent UI-driven filtering.
31+
32+
Together, these endpoints back the catalog’s search grid, topic/key lists, and filtering facets.
2333

2434
### Search stream metadata
2535

@@ -74,9 +84,22 @@ Returns workspace identifiers that have discoverable data for the caller.
7484
!!! note
7585
The search response includes a total-count header so clients can page results consistently with the UI.
7686

77-
## Data
7887

79-
Programmatic visibility into raw objects and time bounds—useful for exports, verification, and operational tooling.
88+
Here’s a refined introduction for the **Data** section that aligns in tone and clarity with the improved **Catalog** section:
89+
90+
91+
92+
## Data endpoints
93+
94+
The **Data** endpoints provides direct visibility into raw storage objects and their temporal ranges. These endpoints are designed for operational use cases such as exports, audits, verification, and impact analysis.
95+
96+
Use these routes to:
97+
98+
* Enumerate the exact Avro segment files that make up a selection.
99+
* Identify all files associated with a key to preview or audit deletions.
100+
* Compute temporal bounds (min/max timestamps and partitions) for sets of keys.
101+
102+
Together, these endpoints give precise, programmatic insight into how cataloged data is physically stored and bounded in time.
80103

81104
### Get timestamped file descriptors
82105

@@ -96,9 +119,18 @@ Reports minimum/maximum timestamps and observed partitions for a set of keys. Us
96119
!!! tip
97120
A common flow is to use **search** to find candidate keys, then use **files** to enumerate exact object paths.
98121

122+
123+
99124
## Data Deletion
100125

101-
Safe lifecycle operations. Defaults are **soft delete** to protect data; hard delete removes both metadata and files.
126+
The **Data Deletion API** supports safe lifecycle management of cataloged data. By default, deletions are **soft**, preserving underlying files and enabling recovery. When required, **hard deletes** can permanently remove both metadata and storage objects.
127+
128+
Use these routes to:
129+
130+
* Delete metadata and files for a single key or multiple keys (soft or hard).
131+
* Restore streams that were previously soft-deleted, individually or in batches.
132+
133+
These operations ensure controlled data cleanup while supporting compliance and recovery workflows.
102134

103135
### Delete metadata/files for a single key
104136

@@ -123,11 +155,28 @@ Clears soft-delete markers for multiple keys.
123155
!!! warning
124156
Use hard delete only when retention and compliance requirements allow it.
125157

158+
159+
126160
## Metadata
127161

128-
Attach custom **key/value properties** to datasets and query by those properties. This is intended for your applications to enrich datasets created by the [Quix Lake Sink (managed)](../managed-services/sink.md), so they’re easy to group, filter, and audit across API and UI.
162+
The **Metadata API** lets you enrich datasets with custom **key/value properties** and query them later for grouping, filtering, lineage, and auditing. This is especially useful when working with datasets produced by the [Quix Lake Sink (managed)](../managed-services/sink.md), enabling your applications to attach meaningful business or operational context.
163+
164+
Typical metadata examples include:
165+
166+
* Machine or device identifiers
167+
* Sensor calibration ranges
168+
* Driver, batch, or run identifiers
169+
* Experiment tags or quality tiers
170+
* Flattened JSON values
171+
172+
Use these routes to:
173+
174+
* Add or update metadata properties for a key.
175+
* Retrieve all metadata associated with a key.
176+
* Soft-delete all metadata for a key, or selectively remove specific properties.
177+
178+
These operations provide a lightweight but powerful mechanism for managing dataset context across both API and UI.
129179

130-
**Good examples of properties:** machine id, sensor range, driver, concrete batch, simple JSON flattened to strings, experiment or run identifiers, quality tiers.
131180

132181
### Upsert metadata entries
133182

@@ -152,6 +201,8 @@ Removes only the listed property names.
152201
!!! tip
153202
When searching, you can request the full tag set per result to drive rules without extra reads.
154203

204+
205+
155206
## Security
156207

157208
* Authenticate with **Bearer** JWT.

docs/quix-cloud/quixlake/open-format.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22

3-
title: Open format
3+
title: Quix Lake - Open format
44
description: How Quix Lake structures data in Avro and Parquet for portability and performance.
55
---
66

7-
# Open format
7+
# Quix Lake - Open format
88

99
Quix Lake stores Kafka messages and metadata as **open files** in your blob storage (S3, GCS, Azure Blob, MinIO). The layout favors portability, fast discovery, and full-fidelity replay.
1010

@@ -92,10 +92,10 @@ Quix Lake stores Kafka messages and metadata as **open files** in your blob stor
9292

9393
## How it flows
9494

95-
* Ingest: write Avro into partitioned folders
96-
* Index: write Parquet descriptors alongside
97-
* Discover: Catalog and APIs read Parquet to list and filter quickly
98-
* Use: Replay to Kafka, or query with your engines of choice
95+
* Ingest: write Avro into partitioned folders ([Quix Lake - Sink](../managed-services/sink.md))
96+
* Index: write Parquet descriptors alongside ([Quix Lake - Sink](../managed-services/sink.md))
97+
* Discover: UI and APIs read Parquet to list and filter your datasets ([Quix Lake - API](./api.md))
98+
* Use: Replay to Kafka, or query with your engines of choice ([Quix Lake - Replay](../managed-services/replay.md))
9999

100100
## Guarantees
101101

@@ -106,6 +106,8 @@ Quix Lake stores Kafka messages and metadata as **open files** in your blob stor
106106

107107
## See also
108108

109-
* [Quix Lake User Interface](./user-interface.md) - discover datasets programmatically
110-
* [Quix Lake Replay (managed)](../managed-services/replay.md) - send datasets back to Kafka
111-
* [Blob storage connections](../../deploy/blob-storage.md) - wire up your bucket or container
109+
* [Quix Lake - Sink](../managed-services/sink.md) - persist data from Kafka to your Blob Storage
110+
* [Quix Lake - API](./api.md) - discover datasets programmatically
111+
* [Quix Lake - User Interface](./user-interface.md) - discover datasets using Quix Cloud user interface
112+
* [Quix Lake - Replay](../managed-services/replay.md) - send datasets back to Kafka
113+
* [Blob storage connections](../managed-services/blob-storage.md) - wire up your bucket or container

docs/quix-cloud/quixlake/overview.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: The central storage layer of Quix Cloud for capturing and managing
55

66
# What is Quix Lake
77

8-
Quix Lake is the central storage layer of Quix Cloud. It captures, organizes, and manages Kafka topic data in an open, file-based format on blob storage systems such as Amazon S3, Azure Blob, Google Cloud Storage, or MinIO.
8+
**Quix Lake** is the central storage layer of **Quix Cloud**. It captures, organizes, and manages Kafka topic data in an open, file-based format on blob storage systems such as Amazon S3, Azure Blob, Google Cloud Storage, or MinIO.
99

1010
Instead of relying on proprietary databases, Quix Lake uses open formats and Hive-style partitioning so your data stays:
1111

@@ -23,7 +23,7 @@ Earlier persistence options were tied to specific databases and SDKs, which limi
2323

2424
* Kafka messages are persisted exactly as they arrive, including timestamps, headers, partitions, offsets, and idle gaps
2525
* Metadata is indexed alongside raw data to enable fast discovery without scanning Avro
26-
* Services like **Catalog**, **Replay**, and sinks operate directly on the open files in your bucket
26+
* Services like **API**, **Replay**, and **Sink** operate directly on the open files in your bucket
2727
* You keep full control of storage, security, and lifecycle in your own cloud account
2828

2929
## Where your data lives
@@ -47,7 +47,7 @@ See **[open format](./open-format.md)** for the full layout and schemas.
4747

4848
## What you can do
4949

50-
* **Explore datasets** with the **Quix Lake Catalog** UI or API
50+
* **Explore datasets** with the **Quix Lake** UI or API
5151
* **Replay** persisted datasets back into Kafka with full fidelity
5252
* **Search and filter** by time ranges, topics, keys, and custom metadata
5353
* **Query externally** using DuckDB, Spark, Trino, Athena, or BigQuery over Avro and Parquet
@@ -59,7 +59,7 @@ See **[open format](./open-format.md)** for the full layout and schemas.
5959

6060
1. **Ingest**: a sink writes raw Kafka messages to Avro files in your storage
6161
2. **Index**: Parquet index files summarize time, partition, offsets, and sizes
62-
3. **Discover**: the Catalog and APIs read the index to list and filter quickly
62+
3. **Discover**: the UI and APIs read the index to list and filter quickly
6363
4. **Replay**: any discovered dataset can be streamed back to Kafka with original order and timing preserved or simulated
6464
5. **Use**: build pipelines that mix historical data with live streams, and run queries over Parquet
6565

@@ -71,7 +71,8 @@ See **[open format](./open-format.md)** for the full layout and schemas.
7171
## See also
7272

7373
* [Open format](./open-format.md)
74-
* [Quix Lake User Interface](./user-interface.md)
75-
* [Quix Lake Replay (managed)](../managed-services/replay.md)
76-
* [Quix Lake Sink (managed)](../managed-services/sink.md)
77-
* [Blob storage connections](../../deploy/blob-storage.md)
74+
* [Quix Lake - Sink](../managed-services/sink.md)
75+
* [Quix Lake - User Interface](./user-interface.md)
76+
* [Quix Lake - API](./user-interface.md)
77+
* [Quix Lake - Replay](../managed-services/replay.md)
78+
* [Blob storage connections](../managed-services/blob-storage.md)

0 commit comments

Comments
 (0)