Data Lake docs improvements

PatrickMiraP · PatrickMiraP · commit e601acad4c49 · 2025-09-11T18:52:26.000+02:00
diff --git a/docs/quix-cloud/managed-services/blob-storage.md b/docs/quix-cloud/managed-services/blob-storage.md
@@ -5,9 +5,9 @@ description: Connect your cluster to external object storage (S3, GCS, Azure Blo
 
 # Blob storage connections
 
-Connect your cluster to a bucket/container so Quix can enable **Quix Lake**-the platform’s open storage layer for Kafka topic data.
+Connect your cluster to a bucket/container so Quix can enable **Quix Lake** or any other managed service that requires a Blob storage connection.
 
-![Connections list](../images/blob-storage/connections-list-running.png)
+![Connections list](../../images/blob-storage/connections-list-running.png)
 
 !!! important "One connection per cluster"
     Each **cluster** supports **one** blob storage connection.  
@@ -37,7 +37,7 @@ Connect your cluster to a bucket/container so Quix can enable **Quix Lake**-the
 
 ## Test before saving
 
-![Testing connection](../images/blob-storage/test-connecting.png)
+![Testing connection](../../images/blob-storage/test-connecting.png)
 
 When you click **Test connection**, Quix runs a short round-trip check to make sure your details are correct and that the platform can both see and use your storage.
 
@@ -55,7 +55,7 @@ Each step is shown in the dialog. Successful steps are marked with a ✓, and yo
 **Failure**  
 If a step fails, you’ll see ✗ next to it along with the reason (for example, “Access denied” or “Wrong region”). This makes it easy to fix permissions or update your settings.
 
-![Access denied example](../images/blob-storage/test-error.png)
+![Access denied example](../../images/blob-storage/test-error.png)
 
 ## Providers
 
@@ -132,6 +132,6 @@ If a step fails, you’ll see ✗ next to it along with the reason (for example,
 
 * [What is Quix Lake](../quix-cloud/quixlake/overview.md) - what it is and why it exists
 * [Open format](../quix-cloud/quixlake/open-format.md) - layout and schemas (Avro, Parquet)
-* [Quix Lake Catalog](../quix-cloud/quixlake/catalog.md) - browse, search, and manage datasets
-* [Quix Lake Sink](../quix-cloud/managed-services/sink.md) - persist topics to your bucket/container
-* [Quix Lake Replay (managed)](../quix-cloud/managed-services/replay.md) - re-run datasets back to Kafka
+* [Quix Lake - API](../quix-cloud/quixlake/api.md) - browse, search, and manage datasets
+* [Quix Lake - Sink](../quix-cloud/managed-services/sink.md) - persist topics to your bucket/container
+* [Quix Lake - Replay](../quix-cloud/managed-services/replay.md) - re-run datasets back to Kafka
diff --git a/docs/quix-cloud/managed-services/dynamic-configuration.md b/docs/quix-cloud/managed-services/dynamic-configuration.md
@@ -3,6 +3,7 @@
 The **Dynamic Configuration Manager** is a managed service for handling
 **large, versioned configuration files** related to devices, sensors, or
 physical assets.
+
 These configurations often change in real time (e.g., updates to
 equipment parameters, IoT sensor mappings, or lab/test system setups),
 but are **too large to send through Kafka directly**.
diff --git a/docs/quix-cloud/managed-services/overview.md b/docs/quix-cloud/managed-services/overview.md
@@ -15,8 +15,8 @@ Note
 ## Available managed services
 
 - Dynamic Configuration Manager (`DynamicConfiguration`)
-- Quix Lake Sink (`quixlake.Sink`)
-- Quix Lake Replay (`quixlake.Replay`)
+- Quix Lake - Sink (`DataLake.Sink`)
+- Quix Lake - Replay (`DataLake.Replay`)
 
 ## Quick example
 
diff --git a/docs/quix-cloud/managed-services/replay.md b/docs/quix-cloud/managed-services/replay.md
@@ -1,10 +1,10 @@
 ---
 
-title: Quix Lake Replay (managed)
+title: Quix Lake - Replay
 description: Managed service that replays persisted datasets from Quix Lake back into Kafka with full fidelity.
 ---
 
-# Quix Lake Replay
+# Quix Lake - Replay
 
 Quix Lake Replay is a managed service that streams persisted datasets from **Quix Lake** back into **Kafka**, preserving timestamps, partitions, offsets, headers, and gaps for high-fidelity re-runs and simulations.
 
@@ -19,17 +19,17 @@ You can launch a replay from multiple places in the Portal:
 Use the pipeline canvas to add a Replay tile and start it in context of your flow. The screenshot shows the entry point on the canvas.
 ![Start from pipeline](./images/replay/start-from-pipeline.png)
 
-### From the Data Catalog
+### From the Data Lake UI
 
-Open **Quix Lake → Catalog**, select the dataset (topic + time window/keys/partitions), and start a replay directly from the catalog. The screenshot highlights the replay action on a selected dataset.
-![Start from Data Catalog](./images/replay/start-from-catalog.png)
+Open **Data Lake**, select the dataset (topic + time window/keys/partitions), and start a replay directly from the catalog. The screenshot highlights the replay action on a selected dataset.
+![Start from Data Lake UI](./images/replay/start-from-catalog.png)
 
 ## Example YAML (basic)
 
 ```yaml
 deployments:
-  - name: Quix Lake Replay
-    application: quixlake.Replay
+  - name: Quix Lake - Replay
+    application: DataLake.Replay
     version: latest
     deploymentType: Managed
     resources:
diff --git a/docs/quix-cloud/managed-services/sink.md b/docs/quix-cloud/managed-services/sink.md
@@ -1,9 +1,9 @@
 ---
-title: Quix Lake Sink
+title: Quix Lake - Sink
 description: Connector that persists Kafka data into Quix Lake.
 ---
 
-# Quix Lake Sink
+# Quix Lake - Sink
 
 The Quix Lake Sink writes Kafka topic data to your blob storage in **Avro** (raw messages) and **Parquet** (index and optional custom metadata), enabling fast discovery and high-fidelity **Replay**.
 
@@ -44,7 +44,7 @@ Metadata:
 ## How to run (UI)
 
 1. Create or log in to your Quix account.
-2. Go to **Connectors → Add connector → Quix quixlake Sink**.
+2. Go to **Connectors → Add connector → Quix Lake - Sink**.
 3. Click **Set up connector**, fill the parameters below, then **Test connection & deploy**.
 
 !!! info "Managed service"
@@ -78,7 +78,7 @@ You can configure the sink in **YAML** or via the **Quix Cloud UI**.
 * `consumerGroup` - Kafka consumer group ID (default: `quixstreams-default`)
 * `autoOffsetReset` - `latest` or `earliest` (default: `latest`)
 
-#### Quix Lake settings
+#### Sink settings
 
 * `avroCompression` - `snappy` or `gzip` (default: `snappy`)
 * `maxWorkers` - threads for uploading (default: `5`)
@@ -90,8 +90,8 @@ You can configure the sink in **YAML** or via the **Quix Cloud UI**.
 
 ```yaml
 deployments:
-- name: Quix Lake Sink
-  application: quixlake.Sink
+- name: Quix Lake - Sink
+  application: DataLake.Sink
   version: latest
   deploymentType: Managed
   resources:
@@ -132,7 +132,7 @@ deployments:
 * **Parquet (Custom metadata, optional)**
   Your key–value annotations (`Topic`, `Key`, `MetadataKey`, `MetadataValue`, `UpdatedUtc`) used for search and grouping in the Catalog.
 
-See **Open format** for full schemas and layout.
+See [Open format](../quixlake/open-format.md) for full schemas and layout.
 
 ## Operational behavior
 
@@ -151,20 +151,20 @@ See **Open format** for full schemas and layout.
 
 * **Logs**: per-segment lifecycle (rolling, upload, index write), retries, and timings
 * **Metrics**: records persisted, bytes uploaded, active uploads, average upload speed
-* **Catalog**: new datasets appear as index files land; use **Refresh** if you need to surface them sooner
+* **Data Lake UI*: new datasets appear as index files land; use **Refresh** if you need to surface them sooner
 
 ## Security
 
 * Uses the **cluster’s** blob storage connection (scoped credentials; one bucket/container per connection)
 * Honor your cloud controls: IAM roles, key rotation, server-side encryption, access logs, retention
-* The sink does not delete raw data; deletion flows through **Catalog** with soft-delete and trash retention
+* The sink does not delete raw data; deletion flows through **Data Lake API** with soft-delete and trash retention
 
 ## Troubleshooting
 
 * **Access denied**
   Verify the blob connection’s permissions: list, read, write, and delete on the bucket/container.
-* **Nothing appears in Catalog**
-  Check sink logs for successful index writes; click **Refresh** in Catalog; ensure time filters include the new data.
+* **Nothing appears in Data Lake UI**
+  Check sink logs for successful index writes; click **Refresh** in the UI; ensure time filters include the new data.
 * **Small-file explosion**
   Increase `rollBytes` and/or `rollSeconds`, or add a replica to smooth throughput.
 * **Slow uploads**
@@ -173,6 +173,6 @@ See **Open format** for full schemas and layout.
 ## See also
 
 * [Open format](../quixlake/open-format.md)
-* [Quix Lake User Interface](../quixlake/user-interface.md)
-* [Quix Lake Replay (managed)](./replay.md)
-* [Blob storage connections](../../deploy/blob-storage.md)
+* [Quix Lake - User Interface](../quixlake/user-interface.md)
+* [Quix Lake - Replay](./replay.md)
+* [Blob storage connections](./blob-storage.md)
diff --git a/docs/quix-cloud/quixlake/api.md b/docs/quix-cloud/quixlake/api.md
@@ -1,7 +1,7 @@
 ---
 
-title: Quix Lake API
-description: Programmatic access to Quix Lake for search, metadata, file discovery, and lifecycle operations. Backend for the Quix Lake UI; Metadata endpoints are the primary integration surface for your applications.
+title: Quix Lake - API
+description: Programmatic access to Quix Lake for search, metadata, file discovery, and lifecycle operations. Backend for the Quix Lake UI; Metadata endpoints are the primary integration surface for your external re-indexing applications.
 ---
 
 # Overview
@@ -17,9 +17,19 @@ See the UI page: [Quix Lake User Interface](./user-interface.md).
 
   ![Open API](./images/user-interface-open-swagger.png)
 
-## Catalog
 
-These routes back the catalog’s topic/key lists, facets, and search results.
+## Catalog endpoints
+
+The **Catalog** endpoints powers discovery features across workspaces and topics. These endpoints let you search indexed stream metadata, explore available topics and keys, and manage cache refreshes to ensure results are up to date.
+
+Use these routes to:
+
+* Perform flexible searches (text, fuzzy, prefix/suffix, and time-bound queries).
+* Enumerate available topics, keys, and metadata facets for filtering.
+* Keep catalog results fresh with cache refresh operations at the workspace or topic level.
+* Retrieve workspace and global metadata keys to support consistent UI-driven filtering.
+
+Together, these endpoints back the catalog’s search grid, topic/key lists, and filtering facets.
 
 ### Search stream metadata
 
@@ -74,9 +84,22 @@ Returns workspace identifiers that have discoverable data for the caller.
 !!! note
     The search response includes a total-count header so clients can page results consistently with the UI.
 
-## Data
 
-Programmatic visibility into raw objects and time bounds—useful for exports, verification, and operational tooling.
+Here’s a refined introduction for the **Data** section that aligns in tone and clarity with the improved **Catalog** section:
+
+
+
+## Data endpoints
+
+The **Data** endpoints provides direct visibility into raw storage objects and their temporal ranges. These endpoints are designed for operational use cases such as exports, audits, verification, and impact analysis.
+
+Use these routes to:
+
+* Enumerate the exact Avro segment files that make up a selection.
+* Identify all files associated with a key to preview or audit deletions.
+* Compute temporal bounds (min/max timestamps and partitions) for sets of keys.
+
+Together, these endpoints give precise, programmatic insight into how cataloged data is physically stored and bounded in time.
 
 ### Get timestamped file descriptors
 
@@ -96,9 +119,18 @@ Reports minimum/maximum timestamps and observed partitions for a set of keys. Us
 !!! tip
     A common flow is to use **search** to find candidate keys, then use **files** to enumerate exact object paths.
 
+
+
 ## Data Deletion
 
-Safe lifecycle operations. Defaults are **soft delete** to protect data; hard delete removes both metadata and files.
+The **Data Deletion API** supports safe lifecycle management of cataloged data. By default, deletions are **soft**, preserving underlying files and enabling recovery. When required, **hard deletes** can permanently remove both metadata and storage objects.
+
+Use these routes to:
+
+* Delete metadata and files for a single key or multiple keys (soft or hard).
+* Restore streams that were previously soft-deleted, individually or in batches.
+
+These operations ensure controlled data cleanup while supporting compliance and recovery workflows.
 
 ### Delete metadata/files for a single key
 
@@ -123,11 +155,28 @@ Clears soft-delete markers for multiple keys.
 !!! warning
     Use hard delete only when retention and compliance requirements allow it.
 
+
+
 ## Metadata
 
-Attach custom **key/value properties** to datasets and query by those properties. This is intended for your applications to enrich datasets created by the [Quix Lake Sink (managed)](../managed-services/sink.md), so they’re easy to group, filter, and audit across API and UI.
+The **Metadata API** lets you enrich datasets with custom **key/value properties** and query them later for grouping, filtering, lineage, and auditing. This is especially useful when working with datasets produced by the [Quix Lake Sink (managed)](../managed-services/sink.md), enabling your applications to attach meaningful business or operational context.
+
+Typical metadata examples include:
+
+* Machine or device identifiers
+* Sensor calibration ranges
+* Driver, batch, or run identifiers
+* Experiment tags or quality tiers
+* Flattened JSON values
+
+Use these routes to:
+
+* Add or update metadata properties for a key.
+* Retrieve all metadata associated with a key.
+* Soft-delete all metadata for a key, or selectively remove specific properties.
+
+These operations provide a lightweight but powerful mechanism for managing dataset context across both API and UI.
 
-**Good examples of properties:** machine id, sensor range, driver, concrete batch, simple JSON flattened to strings, experiment or run identifiers, quality tiers.
 
 ### Upsert metadata entries
 
@@ -152,6 +201,8 @@ Removes only the listed property names.
 !!! tip
     When searching, you can request the full tag set per result to drive rules without extra reads.
 
+
+
 ## Security
 
 * Authenticate with **Bearer** JWT.
diff --git a/docs/quix-cloud/quixlake/open-format.md b/docs/quix-cloud/quixlake/open-format.md
@@ -1,10 +1,10 @@
 ---
 
-title: Open format
+title: Quix Lake - Open format
 description: How Quix Lake structures data in Avro and Parquet for portability and performance.
 ---
 
-# Open format
+# Quix Lake - Open format
 
 Quix Lake stores Kafka messages and metadata as **open files** in your blob storage (S3, GCS, Azure Blob, MinIO). The layout favors portability, fast discovery, and full-fidelity replay.
 
@@ -92,10 +92,10 @@ Quix Lake stores Kafka messages and metadata as **open files** in your blob stor
 
 ## How it flows
 
-* Ingest: write Avro into partitioned folders
-* Index: write Parquet descriptors alongside
-* Discover: Catalog and APIs read Parquet to list and filter quickly
-* Use: Replay to Kafka, or query with your engines of choice
+* Ingest: write Avro into partitioned folders ([Quix Lake - Sink](../managed-services/sink.md))
+* Index: write Parquet descriptors alongside ([Quix Lake - Sink](../managed-services/sink.md))
+* Discover: UI and APIs read Parquet to list and filter your datasets ([Quix Lake - API](./api.md))
+* Use: Replay to Kafka, or query with your engines of choice ([Quix Lake - Replay](../managed-services/replay.md))
 
 ## Guarantees
 
@@ -106,6 +106,8 @@ Quix Lake stores Kafka messages and metadata as **open files** in your blob stor
 
 ## See also
 
-* [Quix Lake User Interface](./user-interface.md) - discover datasets programmatically
-* [Quix Lake Replay (managed)](../managed-services/replay.md) - send datasets back to Kafka
-* [Blob storage connections](../../deploy/blob-storage.md) - wire up your bucket or container
+* [Quix Lake - Sink](../managed-services/sink.md) - persist data from Kafka to your Blob Storage
+* [Quix Lake - API](./api.md) - discover datasets programmatically
+* [Quix Lake - User Interface](./user-interface.md) - discover datasets using Quix Cloud user interface
+* [Quix Lake - Replay](../managed-services/replay.md) - send datasets back to Kafka
+* [Blob storage connections](../managed-services/blob-storage.md) - wire up your bucket or container
diff --git a/docs/quix-cloud/quixlake/overview.md b/docs/quix-cloud/quixlake/overview.md
@@ -5,7 +5,7 @@ description: The central storage layer of Quix Cloud for capturing and managing
 
 # What is Quix Lake
 
-Quix Lake is the central storage layer of Quix Cloud. It captures, organizes, and manages Kafka topic data in an open, file-based format on blob storage systems such as Amazon S3, Azure Blob, Google Cloud Storage, or MinIO.
+**Quix Lake** is the central storage layer of **Quix Cloud**. It captures, organizes, and manages Kafka topic data in an open, file-based format on blob storage systems such as Amazon S3, Azure Blob, Google Cloud Storage, or MinIO.
 
 Instead of relying on proprietary databases, Quix Lake uses open formats and Hive-style partitioning so your data stays:
 
@@ -23,7 +23,7 @@ Earlier persistence options were tied to specific databases and SDKs, which limi
 
 * Kafka messages are persisted exactly as they arrive, including timestamps, headers, partitions, offsets, and idle gaps
 * Metadata is indexed alongside raw data to enable fast discovery without scanning Avro
-* Services like **Catalog**, **Replay**, and sinks operate directly on the open files in your bucket
+* Services like **API**, **Replay**, and **Sink** operate directly on the open files in your bucket
 * You keep full control of storage, security, and lifecycle in your own cloud account
 
 ## Where your data lives
@@ -47,7 +47,7 @@ See **[open format](./open-format.md)** for the full layout and schemas.
 
 ## What you can do
 
-* **Explore datasets** with the **Quix Lake Catalog** UI or API
+* **Explore datasets** with the **Quix Lake** UI or API
 * **Replay** persisted datasets back into Kafka with full fidelity
 * **Search and filter** by time ranges, topics, keys, and custom metadata
 * **Query externally** using DuckDB, Spark, Trino, Athena, or BigQuery over Avro and Parquet
@@ -59,7 +59,7 @@ See **[open format](./open-format.md)** for the full layout and schemas.
 
 1. **Ingest**: a sink writes raw Kafka messages to Avro files in your storage
 2. **Index**: Parquet index files summarize time, partition, offsets, and sizes
-3. **Discover**: the Catalog and APIs read the index to list and filter quickly
+3. **Discover**: the UI and APIs read the index to list and filter quickly
 4. **Replay**: any discovered dataset can be streamed back to Kafka with original order and timing preserved or simulated
 5. **Use**: build pipelines that mix historical data with live streams, and run queries over Parquet
 
@@ -71,7 +71,8 @@ See **[open format](./open-format.md)** for the full layout and schemas.
 ## See also
 
 * [Open format](./open-format.md)
-* [Quix Lake User Interface](./user-interface.md)
-* [Quix Lake Replay (managed)](../managed-services/replay.md)
-* [Quix Lake Sink (managed)](../managed-services/sink.md)
-* [Blob storage connections](../../deploy/blob-storage.md)
+* [Quix Lake - Sink](../managed-services/sink.md)
+* [Quix Lake - User Interface](./user-interface.md)
+* [Quix Lake - API](./user-interface.md)
+* [Quix Lake - Replay](../managed-services/replay.md)
+* [Blob storage connections](../managed-services/blob-storage.md)
diff --git a/docs/quix-cloud/quixlake/user-interface.md b/docs/quix-cloud/quixlake/user-interface.md
diff --git a/mkdocs.yml b/mkdocs.yml