Skip to content

feat(charts/cozystack): add recommended sysctl and etcd defaults#131

Open
IvanHunters wants to merge 1 commit intomainfrom
feat/add-sysctl-etcd-defaults
Open

feat(charts/cozystack): add recommended sysctl and etcd defaults#131
IvanHunters wants to merge 1 commit intomainfrom
feat/add-sysctl-etcd-defaults

Conversation

@IvanHunters
Copy link
Copy Markdown

@IvanHunters IvanHunters commented Apr 27, 2026

Summary

Adds recommended sysctl and etcd defaults to Talos machine configuration to improve DRBD connection stability and prevent etcd storage issues with large LINSTOR CRD datasets.

Changes

Machine sysctls

  • TCP orphan handling: Set tcp_orphan_retries=3 and tcp_fin_timeout=30 to prevent orphaned TCP connections from accumulating with DRBD peer connections
  • Network backlog: Increase netdev_max_backlog, netdev_budget, and netdev_budget_usecs to prevent packet loss under high DRBD replication traffic
  • TCP keepalive: Reduce keepalive timers to enable faster detection of dead DRBD peer connections (10min idle + 60s probe window instead of default 2h+75s)

Cluster etcd

  • quota-backend-bytes=8GiB: Prevent etcd running out of space with large LINSTOR CRD datasets
  • max-request-bytes=10MiB: Allow larger CRD objects to be stored

Related

Summary by CodeRabbit

  • Chores
    • Enhanced system network and TCP performance through configuration tuning to improve reliability and throughput.
    • Extended etcd cluster backend capacity and increased maximum request payload limits to better handle larger dataset operations.

Add recommended sysctl parameters for TCP orphan handling, network backlog,
and TCP keepalive to improve DRBD connection stability and prevent port exhaustion.

Add etcd quota and request size limits to prevent etcd running out of space
with large LINSTOR CRD datasets and allow larger CRD objects to be stored.

Signed-off-by: ohotnikov.ivan <ohotnikov.ivan@e-queo.net>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

The PR updates the Talos machine configuration Helm template helper to add recommended network tuning sysctls (TCP orphan handling, backlog, and keepalive settings) and etcd cluster extraArgs for quota and request size limits.

Changes

Cohort / File(s) Summary
Talos & etcd Configuration
charts/cozystack/templates/_helpers.tpl
Adds network/TCP sysctls (tcp_orphan_retries, tcp_fin_timeout, netdev_max_backlog, netdev_budget, tcp_keepalive_*) and etcd extraArgs (quota-backend-bytes, max-request-bytes) to machine configuration template.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related issues

  • #2332: This PR directly implements all proposed defaults specified in the issue, adding the exact sysctls and etcd configuration parameters outlined in the feature request.

Poem

🐰 A rabbit hops through network tunnels deep,
With sysctls tuned and etcd's coffers to keep,
TCP orphans find rest at last,
While connections hold steady and fast,
Eight gigs of space—no more going kaput! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding recommended sysctl and etcd defaults to the Cozystack Talos configuration.
Linked Issues check ✅ Passed The PR implements all required sysctls (TCP orphan handling, network backlog, TCP keepalive) and etcd extraArgs (quota-backend-bytes and max-request-bytes) as specified in issue #2332.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked issue objectives; only the Helm helper template file is modified to add the specified sysctl and etcd configurations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/add-sysctl-etcd-defaults

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several kernel sysctl optimizations for TCP orphan handling, network backlog, and keepalive settings. It also increases etcd's storage quota and maximum request size to accommodate large LINSTOR CRD datasets. A review comment suggests that the Kubernetes API server should also be configured with a matching --max-resource-write-bytes limit to ensure the increased etcd request size is fully effective.

{{- toYaml .Values.advertisedSubnets | nindent 6 }}
extraArgs:
quota-backend-bytes: "8589934592" # 8GiB - prevent etcd running out of space with large LINSTOR CRD datasets
max-request-bytes: "10485760" # 10MiB - allow larger CRD objects to be stored
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Increasing max-request-bytes in etcd is a necessary step for handling large objects, but it is often insufficient on its own. To fully support larger CRD objects (such as large LINSTOR datasets), the Kubernetes API server should also be configured with a matching --max-resource-write-bytes limit in its extraArgs. Without this corresponding change, the API server may reject large requests before they reach etcd, rendering the increased etcd limit ineffective for those operations.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
charts/cozystack/templates/_helpers.tpl (1)

32-42: Sysctl defaults look reasonable; consider making them configurable.

The added TCP/network tunings are sensible defaults for clusters running DRBD/LINSTOR replication. A few non-blocking observations:

  • These sysctls apply uniformly to controlplane and workers (intended, since DRBD runs cluster-wide), but they are also imposed on clusters that don't use DRBD. If a deployment uses cozystack without LINSTOR/DRBD, the aggressive tcp_keepalive_time=600 (vs. kernel default 7200) and shortened tcp_fin_timeout=30 will still apply.
  • All values are hardcoded — there's no way to override via values.yaml without forking the chart. This matches the style of the existing sysctls above (lines 29–31), so it is consistent, but you may eventually want to expose at least the DRBD-specific knobs (similar to how nr_hugepages is gated via $.Values.nr_hugepages).
♻️ Optional: gate DRBD-tuning sysctls behind a value (e.g. .Values.drbdTuning)
     net.ipv4.neigh.default.gc_thresh1: "4096"
     net.ipv4.neigh.default.gc_thresh2: "8192"
     net.ipv4.neigh.default.gc_thresh3: "16384"
+    {{- if (default true $.Values.drbdTuning) }}
     # TCP orphan handling
     net.ipv4.tcp_orphan_retries: "3"
     net.ipv4.tcp_fin_timeout: "30"
     # Network backlog
     net.core.netdev_max_backlog: "5000"
     net.core.netdev_budget: "600"
     net.core.netdev_budget_usecs: "8000"
     # TCP keepalive (early detection of dead connections)
     net.ipv4.tcp_keepalive_time: "600"
     net.ipv4.tcp_keepalive_intvl: "10"
     net.ipv4.tcp_keepalive_probes: "6"
+    {{- end }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/cozystack/templates/_helpers.tpl` around lines 32 - 42, The hardcoded
sysctl entries (net.ipv4.tcp_keepalive_time, net.ipv4.tcp_keepalive_intvl,
net.ipv4.tcp_keepalive_probes, net.ipv4.tcp_fin_timeout, net.core.netdev_* and
tcp_orphan_retries) should be made configurable or gated so non-DRBD clusters
are not forced to use aggressive values; add a boolean flag (e.g.
.Values.drbdTuning default false) or a map (.Values.sysctls) in values.yaml and
update the template to only render these DRBD-specific tunings when
.Values.drbdTuning is true (or allow per-key overrides via the map), following
the existing pattern used for nr_hugepages to locate and change the rendering
logic in the helper template.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/cozystack/templates/_helpers.tpl`:
- Around line 134-136: The etcd extraArgs (quota-backend-bytes and
max-request-bytes) are fine but need an in-template clarification: add a comment
next to quota-backend-bytes and max-request-bytes (the keys under extraArgs)
stating these apply only to control-plane nodes and that kube-apiserver enforces
a 3MiB hard client request body limit (so the 10MiB etcd setting is for direct
etcd or special use cases like LINSTOR CRDs), and mention operational
considerations (memory sizing/defrag/alerts) so operators understand sizing
tradeoffs and why these values were chosen.

---

Nitpick comments:
In `@charts/cozystack/templates/_helpers.tpl`:
- Around line 32-42: The hardcoded sysctl entries (net.ipv4.tcp_keepalive_time,
net.ipv4.tcp_keepalive_intvl, net.ipv4.tcp_keepalive_probes,
net.ipv4.tcp_fin_timeout, net.core.netdev_* and tcp_orphan_retries) should be
made configurable or gated so non-DRBD clusters are not forced to use aggressive
values; add a boolean flag (e.g. .Values.drbdTuning default false) or a map
(.Values.sysctls) in values.yaml and update the template to only render these
DRBD-specific tunings when .Values.drbdTuning is true (or allow per-key
overrides via the map), following the existing pattern used for nr_hugepages to
locate and change the rendering logic in the helper template.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88104462-eb73-4562-b962-3f8d22321d20

📥 Commits

Reviewing files that changed from the base of the PR and between 00358fd and 523f62c.

📒 Files selected for processing (1)
  • charts/cozystack/templates/_helpers.tpl

Comment on lines +134 to +136
extraArgs:
quota-backend-bytes: "8589934592" # 8GiB - prevent etcd running out of space with large LINSTOR CRD datasets
max-request-bytes: "10485760" # 10MiB - allow larger CRD objects to be stored
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does kube-apiserver in Kubernetes 1.30+ require any flag (e.g., --max-request-bytes) to be coordinated when etcd's --max-request-bytes is raised to 10MiB, or does the apiserver pass requests through transparently up to etcd's limit?

💡 Result:

No, kube-apiserver in Kubernetes 1.30+ does not require any flag like --max-request-bytes to be coordinated when etcd's --max-request-bytes is raised to 10MiB. The apiserver passes etcd requests transparently up to etcd's limit, but enforces its own independent limit on incoming client request body sizes for write operations. kube-apiserver has a hardcoded default MaxRequestBodyBytes of 3MiB (3145728 bytes) for decoding write request bodies (e.g., CREATE/UPDATE/PATCH), which fails requests larger than this with "Request entity too large: limit is 3145728" before reaching etcd. This limit accounts for JSON-to-protobuf conversion expansion when storing in etcd (etcd default 1.5MiB, apiserver allows ~2x). It is not exposed as a command-line flag in standard kube-apiserver; users of the generic apiserver library can configure it programmatically. etcd's --max-request-bytes independently limits gRPC requests from kube-apiserver (e.g., Put,Txn). Raising it to 10MiB allows larger backend requests without apiserver changes, as apiserver does not impose additional byte limits on outgoing etcd calls beyond what fits its object model. apiserver transparently forwards transformed storage requests to etcd. No changes in 1.30+ alter this; official docs list no --max-request-bytes flag for kube-apiserver, and source code confirms the client-side body limit is separate from etcd's server-side limit.

Citations:


Etcd defaults align with upstream guidance — note the kube-apiserver request-size boundary.

Both values are correctly placed inside the eq .MachineType "controlplane" block, so they apply only to control-plane nodes where etcd runs. The values match etcd's documented recommendations:

  • quota-backend-bytes: 8589934592 (8 GiB) is the upstream-suggested maximum (per the etcd "How to debug large db size issue" blog post).
  • max-request-bytes: 10485760 (10 MiB) sits at etcd's documented soft cap; exactly at the 10 MB boundary and safe from WAL-decode failure risk (Restriction on MaxRequestSize and maxWALEntrySize etcd-io/etcd#14114).

Important clarification: kube-apiserver (1.30+) has a hardcoded 3MiB limit on incoming client request bodies for write operations and does not expose this as a configurable flag. This means API clients cannot send requests larger than 3MiB, even though etcd can accept 10MiB. The 10MiB etcd setting still provides value for direct etcd operations or future use cases, but no apiserver flag coordination is needed.

Operational considerations to keep in mind:

  • Memory footprint: etcd (boltDB) mmaps the backend file, so raising quota-backend-bytes from the default 2 GiB to 8 GiB allows the resident-set on each control-plane node to grow up to ~8 GiB just for etcd. Ensure control-plane VMs are sized accordingly.
  • Defragmentation: Larger backends grow fragmentation. Confirm that periodic etcdctl defrag (or equivalent) is in place; otherwise the 8 GiB quota can still be hit by fragmented free pages.
  • Alerting: Recommend adding/keeping alerts on etcd_mvcc_db_total_size_in_bytes and etcd_server_quota_backend_bytes so operators see growth trends well before hitting the new ceiling.
  • LINSTOR CRD objects: The 10MiB etcd setting is sized to handle LINSTOR CRD workloads. Document that this setting exists to support that use case and not as a general invitation to store large blobs in arbitrary CRs (which remain bound by the apiserver's 3MiB client limit).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/cozystack/templates/_helpers.tpl` around lines 134 - 136, The etcd
extraArgs (quota-backend-bytes and max-request-bytes) are fine but need an
in-template clarification: add a comment next to quota-backend-bytes and
max-request-bytes (the keys under extraArgs) stating these apply only to
control-plane nodes and that kube-apiserver enforces a 3MiB hard client request
body limit (so the 10MiB etcd setting is for direct etcd or special use cases
like LINSTOR CRDs), and mention operational considerations (memory
sizing/defrag/alerts) so operators understand sizing tradeoffs and why these
values were chosen.

Copy link
Copy Markdown
Contributor

@lexfrei lexfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title scope

Recent PRs touching charts/cozystack/templates/_helpers.tpl use a concept-style scope rather than a path: feat(cozystack): enable allocateNodeCIDRs by default (#91), fix(chart): cast metadata.id to string for regexMatch … (#110), feat(helpers): add bond interface discovery helpers (#94). feat(charts/cozystack): … is the first path-style scope I see in this repo. feat(cozystack): would mirror #91, which was a similar defaults-tweak for this same chart.

Merge ordering with #130

This PR edits cluster.etcd.advertisedSubnets immediately adjacent to where #130 wraps the same block in an if/else discovery fallback. There's no semantic conflict, but git merge-tree reports a textual conflict — whichever PR lands second will need a small rebase.

TCP keepalive blast radius

tcp_keepalive_time/intvl/probes = 600/10/6 is roughly 12× more aggressive on idle timeout than the kernel defaults (7200/75/9). The PR body frames this as DRBD-focused, but net.ipv4.tcp_keepalive_* are kernel-wide and apply to every long-lived idle TCP socket on the node — pods holding NFS mounts, idle DB clients, MQ consumers, etc. all inherit the new failure characteristics. Likely still the right default for a Cozystack cluster, but worth calling out explicitly in the PR description so operators understand the scope.

The other sysctls (tcp_orphan_retries, tcp_fin_timeout, netdev_*) are more conservative and look fine.

etcd quota and request size

quota-backend-bytes: 8 GiB is etcd's documented upper recommended ceiling, not a typical default — etcd's own default is 2 GiB. Same direction for max-request-bytes: 10 MiB (vs etcd's 1.5 MiB default). Both are correct numbers for a LINSTOR-heavy control plane, but they're strong opinions to bake in for every Cozystack cluster: a small dev cluster still pays for the larger backend in etcd RSS regardless of whether it ever uses it.

Could these be exposed in values.yaml with the proposed numbers as defaults? For example:

etcd:
  quotaBackendBytes: "8589934592"
  maxRequestBytes:   "10485760"

The recommendation still ships as the default, but operators can tune without forking the chart. #91 used this exact pattern for allocateNodeCIDRs — defaults-tweak plus values exposure in the same change.

Tests

pkg/engine/render_test.go already asserts on the rendered sysctls: block (lines 114, 211). It would be cheap to extend those assertions to the new sysctl keys, and to add a check that etcd.extraArgs renders only for MachineType: controlplane — that's already correct in the template (the block sits inside {{- if eq .MachineType "controlplane" }}), but it isn't covered by a test today.

Style nit

Inline YAML comments after string scalars (quota-backend-bytes: "8589934592" # 8GiB - …) parse fine, but the rest of this template uses Go template comments ({{- /* … */ -}}) for explanations — see #130 for examples. Aligning would help readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add recommended sysctl and etcd defaults to Talos configuration

3 participants