Skip to content

Add a hard limit on field list size #5843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions quickwit/quickwit-search/src/list_fields.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ use anyhow::Context;
use futures::future;
use futures::future::try_join_all;
use itertools::Itertools;
use once_cell::sync::OnceCell;
use quickwit_common::shared_consts::SPLIT_FIELDS_FILE_NAME;
use quickwit_common::uri::Uri;
use quickwit_metastore::SplitMetadata;
Expand All @@ -37,6 +38,19 @@ use crate::search_job_placer::group_jobs_by_index_id;
use crate::service::SearcherContext;
use crate::{ClusterClient, SearchError, SearchJob, list_relevant_splits, resolve_index_patterns};

/// QW_FIELD_LIST_SIZE_LIMIT defines a hard limit on the number of fields that
/// can be returned (error otherwise).
///
/// Having many fields can happen when a user is creating fields dynamically in
/// a JSON type with random field names. This leads to huge memory consumption
/// when building the response. This is a workaround until a way is found to
/// prune the long tail of rare fields.
fn get_field_list_size_limit() -> usize {
static FIELD_LIST_SIZE_LIMIT: OnceCell<usize> = OnceCell::new();
*FIELD_LIST_SIZE_LIMIT
.get_or_init(|| quickwit_common::get_from_env("QW_FIELD_LIST_SIZE_LIMIT", 100_000))
}

/// Get the list of splits for the request which we need to scan.
pub async fn get_fields_from_split(
searcher_context: &SearcherContext,
Expand Down Expand Up @@ -184,6 +198,12 @@ fn merge_leaf_list_fields(
flush_group(&mut responses, &mut current_group);
}
}
if responses.len() >= get_field_list_size_limit() {
return Err(SearchError::Internal(format!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could return a truncated list instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with a truncated list is that it doesn't clearly indicate that something went wrong and that results are not deterministic. This is really a short term patch to prevent excessive memory usage, but I think we should work on a proper solution that correctly prunes the long tail of "rare" fields.

"list fields response exceeded {} fields",
get_field_list_size_limit()
)));
}
current_group.push(entry);
}
if !current_group.is_empty() {
Expand Down