Solutions: Refurbish "long-term store" #440

amotl · 2025-10-24T00:04:06Z

About

Refactor sloppy placeholder page into dedicated thing that educates readers well about its ingredients. This time, it is about CrateDB's benefits when using it as a long-term store, without the need to export data to different systems.

The page is still in its infancy, so every kind of support is welcome to shape it into the right direction.

Preview

https://cratedb-guide--440.org.readthedocs.build/solution/longterm/

References

Improve general guidance aka. easy user journey #227

Review

Please come up with any suggestions for improvements you can think about. Adding certain content, cross-linking to other resources, fixing bugs or clarifying ambiguities: Every bit counts. 🙏

Trivia

@coderabbitai: Please consider all todo comments as instructions to come up with specific suggestions how to improve the individual spots. Please use concise language and active voice and don't tell users unnecessary or too detailed stuff. Suggestions should well include recommendations to refactor content from other sections, when applicable. Please come up with suggestions using inline comments, so they can be processed efficiently, or respond on the conversations we've already started about them. Thanks!

coderabbitai · 2025-10-24T00:04:22Z

Walkthrough

Added a new Long-term store doc and navigation entry, removed the legacy time-series longterm page, updated time-series references to point to the new longterm page, added a Community Wetterdienst card to sample applications, and normalized an Airflow doc heading.

Changes

Cohort / File(s)	Summary
Long-term store reorganization `docs/solution/longterm/index.md`, `docs/solution/index.md`, `docs/solution/time-series/index.md`, `docs/solution/time-series/longterm.md` (deleted)	Added new long-term store documentation and TOC/card entry; removed legacy `time-series/longterm.md`; updated time-series overview, See Also, and Domains to reference the new longterm page.
Docs content tweaks & Community card `docs/start/application/index.md`, `docs/integrate/airflow/data-retention-hot-cold.md`	Inserted a Community grid-item-card (Wetterdienst) into Sample applications and normalized the Airflow document heading text (minor stylistic change).

Sequence Diagram(s)

No sequence diagram — changes are documentation-only and do not alter runtime control flow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify cross-links between docs/solution/index.md, docs/solution/longterm/index.md, and docs/solution/time-series/index.md.
Confirm deletion of docs/solution/time-series/longterm.md leaves no stray references.
Check new grid-item-card markup in docs/start/application/index.md for layout and TOC consistency.

Possibly related PRs

Solutions: Refurbish whole section #438 — overlaps solution long-term/time-series content and TOC relocation.
Refactoring / Naming things: s/use cases/solutions/; consolidate "metrics & telemetry data" #276 — related restructuring of the solution index and placement of long-term documentation.
Tutorials: Add article-info widgets to all tutorials #436 — edits the Airflow data-retention doc; related to the heading normalization here.

Suggested labels

refurbish

Suggested reviewers

matriv
bmunkholm
kneth

Poem

🐇
I hopped through docs with tiny, tidy paws,
Moved old pages to new well-marked stalls.
Wetterdienst blooms in community light,
Headings smoothed and links set right.
A rabbit cheers for docs that shine tonight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "Solutions: Refurbish 'long-term store'" directly corresponds to the primary changes in this PR. The raw summary confirms the changeset involves reorganizing and improving the long-term store documentation—moving content from a time-series subsection to a dedicated solution page at docs/solution/longterm/index.md, updating navigation structures, and refactoring related documentation. The term "Refurbish" accurately captures the intent to convert a placeholder page into a well-structured guide. The title is concise, specific, and clearly communicates the main objective to anyone scanning the commit history.
Description Check	✅ Passed	The PR description is clearly related to the changeset and provides meaningful context about the changes. It explains that the PR refactors a placeholder page into a dedicated guide for CrateDB's long-term store capabilities, emphasizing retaining data without exporting to other systems. The description includes a preview link, references to related issues (GH-227), and explicitly requests feedback on content additions, cross-linking, and improvements. While some portions contain review instructions rather than traditional description content, the overall description meaningfully documents the purpose and scope of the changes being made.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch solution-longterm

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e10ec2e and 1c611bb.

📒 Files selected for processing (1)

docs/solution/longterm/index.md (1 hunks)

✅ Files skipped from review due to trivial changes (1)

docs/solution/longterm/index.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

docs/solution/longterm/index.md (2)

14-19: Actionable TODO: Elaborate topic domain and ingredients.

The instruction is clear. Expand this section following the pattern of docs/solution/analytics/index.md and docs/solution/industrial/index.md to summarize benefits, key ingredients, and motivations for keeping raw data long-term without exporting to external systems.

Would you like me to draft an expanded section based on the analytics and industrial examples?

23-40: Actionable TODO: Prometheus adapter benefits already researched.

Excellent groundwork—the Q&A block and references are ready to be incorporated. The excerpt clearly explains CrateDB's role as a long-term storage backend for Prometheus. Integrate this content into the main section text, replacing or contextualizing the TODO block.

Would you like me to refactor this TODO into prose and integrate it into the section?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48fc073 and 93e31fb.

📒 Files selected for processing (6)

docs/integrate/airflow/data-retention-hot-cold.md (1 hunks)
docs/solution/index.md (3 hunks)
docs/solution/longterm/index.md (1 hunks)
docs/solution/time-series/index.md (1 hunks)
docs/solution/time-series/longterm.md (0 hunks)
docs/start/application/index.md (2 hunks)

💤 Files with no reviewable changes (1)

docs/solution/time-series/longterm.md

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

docs/solution/longterm/index.md

58-58: Bare URL used

(MD034, no-bare-urls)

docs/start/application/index.md

99-99: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (9)

docs/integrate/airflow/data-retention-hot-cold.md (1)

2-2: Concise heading text.

The abbreviation "hot/cold" improves readability without sacrificing clarity.
docs/start/application/index.md (2)
2-2: Style normalization.

Header case change ("Sample Applications" → "Sample applications") improves consistency across documentation.

90-120: Community section structure is well-organized; wrap bare URL.

The Community rubric and grid-item-card follow established patterns effectively. However, the bare URL at line 99 violates markdownlint (MD034). The :link: attribute already references the URL, so the bare URL in the document is redundant.

Apply this diff to remove the bare URL:
 ::::{grid-item-card}
-:link: https://wetterdienst.readthedocs.io/en/latest/usage/python-api.html#export
-:link-type: url
 (weather-data-storage)=
 :::{rubric} Store and analyze massive amounts of synoptic weather data
 :::
Move the URL into a proper grid-item-card link attribute (it is already there at line 99–100). Remove lines 99–100 as the link URL is now specified via the :link: and :link-type: grid-item-card attributes, and the reference anchor and rubric should follow on subsequent lines.

Here's the corrected structure:
 ::::{grid-item-card}
+(weather-data-storage)=
+:link: https://wetterdienst.readthedocs.io/en/latest/usage/python-api.html#export
+:link-type: url
 
 :::{rubric} Store and analyze massive amounts of synoptic weather data
 :::
The grid-item-card's :link: attribute handles the URL, so no bare text is needed in the body.
⛔ Skipped due to learnings
Learnt from: amotl
PR: crate/cratedb-guide#364
File: docs/tutorial/index.md:48-55
Timestamp: 2025-10-07T10:19:41.866Z
Learning: In MyST (Markdown Structured Text) syntax used in the CrateDB Guide docs, bare URLs in directive options like `:link:` are perfectly valid and expected. For example, `:::{card} :link: https://learn.cratedb.com/ :::` is correct MyST syntax. The markdownlint MD034 warning for bare URLs in these directive options is a false positive because markdownlint doesn't understand MyST-specific syntax.
docs/solution/longterm/index.md (2)

1-3: Smart cross-linking anchors.

Multiple reference anchors (longterm-store, timeseries-longterm, timeseries-long-term-storage) support flexible navigation from different sections and maintain backward compatibility.

52-68: No bare URL violation found; cross-references resolve correctly.

The file contains a properly formatted reference-style link (line 54 references line 71's definition), not a bare URL. There is no violation of MD034. Both cross-references—airflow-data-retention-hot-cold and weather-data-storage—resolve to existing anchors in the documentation.

Likely an incorrect or invalid review comment.

docs/solution/time-series/index.md (1)

75-94: Appropriate cross-linking to new long-term store domain.

The addition of {ref}\longterm-store`` in the Domains list correctly references the new dedicated long-term storage section. This refactoring maintains navigation coherence while promoting long-term storage to peer status with analytics and machine-learning domains.

docs/solution/index.md (3)

10-10: Toctree entry enables proper navigation.

Adding longterm/index to the toctree ensures the new long-term storage section is accessible from the sidebar and properly integrated into the documentation hierarchy.

19-19: Clear scope update.

The updated tagline now correctly reflects the expanded documentation scope to include long-term storage alongside time series and analytics.

40-52: Well-structured grid card with clear value proposition.

The Long-term store card clearly articulates the core benefit—keeping raw data accessible for analysis without forced retirement or export. The "What's inside" bullets are informative and promise practical guidance (time-based bucketing, Dask import, storage optimization).

However, verify that each bullet corresponds to substantial content in docs/solution/longterm/index.md. Currently, the longterm/index.md file contains only placeholder TODOs and cross-references; these bullets should either be delivered in expanded content or adjusted to match current reality.

amotl · 2025-10-24T00:08:01Z

docs/solution/longterm/index.md

+:::{todo}
+**Instructions:**
+Elaborate a bit longer about the topic domain and the ingredients of this section
+in an abstract way, concisely highlighting and summarizing relevant benefits,
+like the `../analytics/index` and `../industrial/index` pages are doing it already.
+:::


@coderabbitai: Let me repeat my instructions and better reference corresponding pages here, in Markdown format, so you might be able to use them better for coming up with relevant improvements.

Elaborate a bit longer about the topic domain and the ingredients of this section in an abstract way, concisely highlighting and summarizing relevant benefits, like the Real-time raw-data analytics and Industrial data landing pages are doing it already.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Fetch the Real-time raw-data analytics landing page echo "=== Real-time raw-data analytics page ===" cat docs/solution/analytics/index.md echo "" echo "=== Industrial data page ===" cat docs/solution/industrial/index.md

Length of output: 4816

@amotl: Based on the structure and style of the example pages, here's an elaborated version for the long-term store section:

(longterm-store)= (timeseries-longterm)= (timeseries-long-term-storage)= # Long-term store :::{div} sd-text-muted Never retire data just because your other systems can't handle the cardinality. ::: Many organizations need to retain data for years or decades to meet regulatory requirements, support historical analysis, or preserve valuable insights for future use. However, traditional storage systems force you to choose between accessibility and affordability, often leading to data exports, archival systems, or downsampling that sacrifice query capabilities. CrateDB eliminates this trade-off by storing large volumes of data efficiently while keeping it fully accessible for querying and analysis. Unlike systems that struggle with high cardinality or require expensive tiered architectures, CrateDB handles billions of unique records in a single platform, maintaining fast query performance even on historic datasets spanning years. By keeping all your data in one place, you avoid the complexity and costs of exporting to specialized long-term storage systems, data lakes, or cold storage tiers. Your historical data remains as queryable as your recent data, enabling seamless analysis across any time range without data movement, ETL pipelines, or rehydration processes. With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL. Other than integrating well with commodity systems using standard database access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface on top.

This elaboration:

Paragraph 1: Establishes the domain and business need for long-term storage

Paragraph 2: Highlights CrateDB's core capabilities (efficiency, accessibility, cardinality handling)

Paragraph 3: Emphasizes the key benefit of avoiding exports and maintaining simplicity

Paragraph 4: Mentions technical capabilities (PostgreSQL compatibility, SQL, standard interfaces)

The content follows the same concise, active style as the example pages while specifically addressing the benefits mentioned in your PR objectives about not exporting data and handling cardinality.

This is your chance to have your say on this section. If you ever thought about dropping a few words about this very topic and why CrateDB does an excellent job here, now it's the right time. 🍀

Please let us know what you think about this explanation, added with 015a182, so you can comment inline on the GitHub PR. In the same fashion, the subsequent commit cc50c75 adds corresponding teaser information about OpenTelemetry, also up for review. 🙏

/cc @karynzv, @hammerhead, @surister, @zolbatar, @kneth

docs/solution/longterm/index.md

amotl · 2025-10-24T00:12:52Z

docs/solution/longterm/index.md

+## Related sections
+
+{ref}`metrics-store` includes information about how to
+store and analyze high volumes of system monitoring information
+like metrics and log data with CrateDB.
+
+{ref}`analytics` describes how
+CrateDB provides real-time analytics on raw data stored for the long term.
+Keep massive amounts of data ready in the hot zone for analytics purposes.
+
+[Optimizing storage efficiency for historic time series data]
+illustrates how to reduce table storage size by 80%,
+by using arrays for time-based bucketing, a historical table having
+a dedicated layout, and querying using the UNNEST table function.
+
+{ref}`Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold>`
+describes how to manage aging data by leveraging CrateDB cluster
+features to mix nodes with different hardware setups, i.e. hot
+nodes using the latest generation of NVMe drives for responding
+to analytics queries quickly, and cold nodes that have access to
+cheap mass storage for retaining historic data.
+
+{ref}`weather-data-storage` provides information about how to
+use CrateDB for mass storage of synoptic weather observations,
+allowing you to query them efficiently.
+
+
+[Optimizing storage efficiency for historic time series data]: https://community.cratedb.com/t/optimizing-storage-for-historic-time-series-data/762


That's an example how to apply another little proposal, to check how it will be received by others. In this case, it's the first page that received a "Related sections" section at the bottom, contrary to a "See also" section.

Here, each item can easily include a bit more explanatory prose (teaser text), without overdoing it.

Page footer sections: "See also" vs. "Related sections" #429

/cc @karynzv, @hammerhead, @surister, @kneth

matriv

👍

amotl · 2025-10-24T22:35:15Z

docs/solution/longterm/index.md

+{ref}`Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold>`
+describes how to manage aging data by leveraging CrateDB cluster
+features to mix nodes with different hardware setups, i.e. hot
+nodes using the latest generation of NVMe drives for responding
+to analytics queries quickly, and cold nodes that have access to
+cheap mass storage for retaining historic data.


While the Airflow-based system by was a first mover and acted as a role model, CrateDB Toolkit's retention and expiration subsystem ¹ effectively implements the core ideas ² of the same thing today, so it can be reused in more flexible ways without depending on Airflow.

When it's considered "ready" or "good enough", we may link it into the documentation here.

/cc @hammerhead, @joerg84

Footnotes

https://cratedb-toolkit.readthedocs.io/retention.html ↩

What is missing yet for an "as a service" MVP is a miniature scheduler to invoke corresponding retention tasks recurrently, and accompanying the CLI by a HTTP API, so the code can become a service daemon where clients can talk to remotely. After establishing this, the service can be ~~whipped~~ deployed into any sort of container/cloud environment, where it can possibly gain matureness. People who can use cron or any other scheduler system can make immediate leverage of the CLI interface and the routines behind. ↩

21fe148 implements the proposal outlined above, bundles two available automatic data retention variants, and tries to find the right jargon to convey what's inside.

- Airflow-based data retention - CTK-based data retention

amotl · 2025-10-25T13:22:33Z

docs/solution/longterm/index.md

+## Tools
+
+### Automatic retention and expiration
+
+When operating a system storing and processing large amounts of data,
+it is crucial to manage data flows and life-cycles well, which includes
+handling concerns of data expiry, size reduction, and archival.
+
+Optimally, corresponding tasks are automated rather than manually
+performed. CrateDB provides relevant integrations and standalone
+applications for automatic data retention purposes.
+
+:::{rubric} Apache Airflow
+:::
+
+{ref}`Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold>`
+describes how to manage aging data by leveraging CrateDB cluster
+features to mix nodes with different hardware setups, i.e. hot
+nodes using the latest generation of NVMe drives for responding
+to analytics queries quickly, and cold nodes that have access to
+cheap mass storage for retaining historic data.
+
+:::{rubric} CrateDB Toolkit
+:::
+
+[CrateDB Toolkit Retention and Expiration] is a data retention and
+expiration policy management system for CrateDB, providing multiple
+retention strategies.
+
+:::{note}
+The system derives its concepts from [InfluxDB data retention] ideas and
+from the {ref}`Airflow-based data retention tasks for CrateDB <airflow-data-retention-policy>`,
+but aims to be usable as a standalone system in different software environments.
+Effectively, it is a Python library and CLI around a policy management
+table defined per [retention-policy-ddl.sql].
+:::


Nice content, but it obstructs the conciseness of this page. Please relocate to a separate one and reference it within the "Related sections" section.

amotl added reorganize Moving content around, inside and between other systems. cross linking Linking to different locations of the documentation. guidance Matters of layout, shape, and structure. labels Oct 24, 2025

amotl marked this pull request as ready for review October 24, 2025 00:21

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

amotl commented Oct 24, 2025

View reviewed changes

amotl changed the title ~~Solution: Refurbish section about "long-term store"~~ Solutions: Refurbish section about "long-term store" Oct 24, 2025

amotl changed the title ~~Solutions: Refurbish section about "long-term store"~~ Solutions: Refurbish "long-term store" Oct 24, 2025

amotl force-pushed the reorganize-solutions branch 3 times, most recently from 48df3d7 to 12c998f Compare October 24, 2025 03:14

amotl force-pushed the solution-longterm branch from 93e31fb to f20fa7b Compare October 24, 2025 03:43

This comment was marked as resolved.

Sign in to view

amotl force-pushed the solution-longterm branch from f20fa7b to cc50c75 Compare October 24, 2025 04:00

amotl requested review from matriv and seut October 24, 2025 04:33

seut approved these changes Oct 24, 2025

View reviewed changes

matriv approved these changes Oct 24, 2025

View reviewed changes

amotl force-pushed the reorganize-solutions branch from 1132546 to 79d12d4 Compare October 24, 2025 18:39

amotl force-pushed the solution-longterm branch from cc50c75 to 328cfe9 Compare October 24, 2025 18:43

amotl force-pushed the reorganize-solutions branch 2 times, most recently from 9af6323 to 9aa307a Compare October 24, 2025 18:48

amotl force-pushed the solution-longterm branch from 328cfe9 to 16d03d5 Compare October 24, 2025 18:49

amotl force-pushed the reorganize-solutions branch from 9aa307a to b73c681 Compare October 24, 2025 19:43

amotl force-pushed the solution-longterm branch from 16d03d5 to 26cd67c Compare October 24, 2025 19:45

amotl commented Oct 24, 2025

View reviewed changes

Base automatically changed from reorganize-solutions to main October 25, 2025 00:05

amotl force-pushed the solution-longterm branch from 26cd67c to 21fe148 Compare October 25, 2025 01:02

Long-term store: Refurbish section

df149bb

Long-term store: Add "Tools" section, bundling data retention utilities

e10ec2e

- Airflow-based data retention - CTK-based data retention

amotl force-pushed the solution-longterm branch from 21fe148 to e10ec2e Compare October 25, 2025 01:14

This comment was marked as resolved.

Sign in to view

Long-term store: Implement suggestions by CodeRabbit

1c611bb

amotl added refurbish A more significant refurbishment. and removed reorganize Moving content around, inside and between other systems. labels Oct 25, 2025

amotl commented Oct 25, 2025

View reviewed changes

Solutions: Refurbish "long-term store" #440

Are you sure you want to change the base?

Solutions: Refurbish "long-term store" #440

Uh oh!

Conversation

amotl commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

Preview

References

Review

Trivia

Uh oh!

coderabbitai bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

amotl Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

amotl Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amotl Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

matriv left a comment

Choose a reason for hiding this comment

Uh oh!

amotl Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

amotl Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

amotl Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amotl commented Oct 24, 2025 •

edited

Loading

coderabbitai bot commented Oct 24, 2025 •

edited

Loading

amotl Oct 24, 2025 •

edited

Loading