Skip to content

Commit 3e0fa04

Browse files
committed
query rejection blog post improvement
Signed-off-by: Erlan Zholdubai uulu <[email protected]>
1 parent f5392e6 commit 3e0fa04

File tree

1 file changed

+23
-7
lines changed

1 file changed

+23
-7
lines changed

website/content/en/blog/2025/query-rejection.md

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ author: Erlan Zholdubai uulu ([@erlan-z](https://github.com/erlan-z))
1212

1313
# Introduction
1414

15-
We had events where a set of seemingly **harmless-looking** dashboard queries kept slipping just under our limits yet repeatedly **OOM-killing the querier pods**. Our safeguard mechanisms weren’t enough, and the only hope was that the tenant would either stop those queries or that we’d have to throttle all traffic from that tenant. Usually it wasn’t all traffic causing trouble—it was a small set of queries coming from a specific dashboard or some query with specific characteristics. We wished there was a way to manually specify query characteristics and reject them without throttling everything. **This inspired us to build query rejection**, a last-resort safety net for operators running multi-tenant Cortex clusters.
15+
Although Cortex includes various safeguards to protect against overload, they can’t prevent every failure scenario. In some environments, a small set of seemingly harmless-looking dashboard queries have repeatedly slipped just under the limits yet still OOM-killed the querier pods. Built-in protections weren’t enough, and the only available option was to throttle all incoming traffic. These queries often came from a specific dashboard or followed a predictable pattern. There was no way to block just those without affecting everything else. This inspired the introduction of query rejection, a last-resort safety net for operators running multi-tenant Cortex clusters.
1616

1717
## Why Limits Aren’t Enough
1818

@@ -30,8 +30,8 @@ Think of query rejection as an “emergency stop” in a factory. It sits in fro
3030

3131
**Key features:**
3232

33-
- **Per-tenant control:** It's defined in the tenant limit configuration, which only targets queries from specific tenant. 
34-
- **Precise matching:** You can specify different query attributes to narrow down to specific queries. All fields within a rule set must match (AND logic). If needed, you can define multiple independent rule sets to target different types of queries.
33+
- **Per-tenant control:** It's defined in the tenant limit configuration, which only targets queries from specific tenant.
34+
- **Precise matching:** You can specify different query attributes to narrow down to specific queries. All fields within a rejection rule must match (AND logic). If needed, you can define multiple independent rejection rules to target different types of queries.
3535
- **Pre-processing enforcement:** Query rejection is applied before the query is executed, allowing known-bad patterns to be blocked before consuming any resources.
3636

3737
## Matching Criteria
@@ -49,7 +49,7 @@ By combining these fields, you can zero in on the exact query patterns causing p
4949

5050
## Configuring Query Rejection
5151

52-
You define query rejection rules per tenant in a runtime config file. Each rule specifies a set of attributes that must all match for the query to be rejected. The configuration supports multiple such rule sets.
52+
You define query rejection rules per tenant in a runtime config file. Each rejection rule specifies a set of attributes that must all match for the query to be rejected. The configuration supports multiple such rules.
5353

5454
Here’s an example configuration:
5555

@@ -103,9 +103,25 @@ Because this request matches all the configured attributes, it will be blocked.
103103

104104
- **Communicate with tenants.** Let affected tenants know if their queries are being blocked, and help them adjust their dashboards accordingly.
105105

106-
## Conclusion
106+
## Ruler Queries
107+
108+
Query rejection only applies to API queries and does not apply to ruler queries. However, Ruler queries are typically instant and lightweight, so a complex query‑rejection mechanism isn’t required for them. In situations where a rule group contains heavy queries and no other mitigations are effective, operators can disable the entire rule group.
109+
110+
Rule group disabling is configured per tenant, similar to query rejection. When you disable a rule group, Cortex stops evaluating the rules within that group, removing the problematic queries altogether. For example:
107111

108-
When traditional safeguards fall short, query rejection gives operators precise control to block only what’s harmful—without slowing down everything else.
112+
```yaml
113+
# runtime_config.yaml
114+
overrides:
115+
<tenant_id>:
116+
disabled_rule_groups:
117+
- namespace: "keep_firing_for_test"
118+
name: "smallsteps"
119+
```
120+
121+
This makes it easy to mitigate issues from the ruler without introducing query rejection logic for those queries.
122+
123+
## Conclusion
109124

110-
If you operate a shared Cortex environment, consider learning how to use query rejection effectively. It might just save you from the next incident—by preventing OOM kills, degraded performance, or disruption to other tenants.
125+
When traditional safeguards fall short, query rejection gives operators precise control to block only what’s harmful; without slowing down everything else.
111126

127+
If you operate a shared Cortex environment, consider learning how to use query rejection effectively. It might just save you from the next incident; by preventing OOM kills, degraded performance, or disruption to other tenants.

0 commit comments

Comments
 (0)