-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Naively increase the meta field char limit 50->500 #131478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
seanstory
wants to merge
74
commits into
elastic:main
from
seanstory:seanstory/increase-mapping-field-meta-char-limit
Closed
Naively increase the meta field char limit 50->500 #131478
seanstory
wants to merge
74
commits into
elastic:main
from
seanstory:seanstory/increase-mapping-field-meta-char-limit
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
🔍 Preview links for changed docs |
buildkite test this |
…ces (elastic#131251) * Refactoring inference services to accept context * fix linting issues * adding mock cluster service to fix IT test * refactoring to remove duplication in constructors * remove unnecessary blank line * refactor to have uniform constructor call * refactor to have uniform constructor call for sagemaker * fix linting issues * fix failed unit tests --------- Co-authored-by: Elastic Machine <[email protected]>
Fixed by elastic#131370 or elastic#130963. Closes elastic#130505 Closes elastic#130504 Closes elastic#130501 Closes elastic#131024
…t {p0=vector-tile/20_aggregations/stats agg} elastic#131484
This PR adds the missing ignore_unavailable, allow_no_indices and expand_wildcards query parameters.
We miss checking whether a field exists when populating the dimension attributes. This issue occurs when a field exists in some, but not all target indices.
The corresponding issue elastic#116781 has already been fixed.
… work (elastic#131505) Disable entitlements for DirectIOIT, the suite requires delegation to work. On main the suite is skipped (direct IO is disabled by default), but this blocks backports.
…t {p0=downsample-with-security/10_basic/Downsample index} elastic#131513
Removes YAML tests for the `/_cluster/allocation/explain` API. The tests passed in alternate values for the APIs. An example is passing "true" for fields expecting a boolean value. While this is explicitly supported by the API, this is not the correct place to be testing this behaviour, and resulted in the API specification failing validation. Relates elastic#127028
…ate states (elastic#129633) Continuation of elastic#127148 When datanodes send the STATS intermediate states to the coordinator, it aggregates them. Now, however, the TopN groups sent by a datanode may not be acceptable in the coordinator (Because it has better values already), so it will discard such values. However, the engine wasn't handling intermediate groups with nulls (TopNBlockHash uses nulls to discard unused groups). See https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/GroupingAggregator.java#L47 _This code isn't connected with the query yet, so there's no bug in production_
Add verification that the optimizers do not modify the number of attributes and the attribute datatype. We add special handling for Lookup Join, by checking EsQueryExec esQueryExec && esQueryExec.indexMode() == LOOKUP and another special handling for ProjectAwayColumns.ALL_FIELDS_PROJECTED Closes elastic#125576
This adds support for splitting `Page`s of large values when loading from single segment, non-descending hits. This is hottest code path as it's how we load data for aggregation. So! We had to make very very very sure this doesn't slow down the fast path of loading doc values. Caveat - this only defends against loading large values via the row-by-row load mechanism that we use for stored fields and _source. That covers the most common kinds of large values - mostly `text` and geo fields. If we need to split further on docs values, we'll have to invent something for them specifically. For now, just row-by-row. This works by flipping the order in which we load row-by-row and column-at-a-time values. Previously we loaded all column-at-a-time values first because that was simpler. Then we loaded all of the row-by-row values. Now we save the column-at-a-time values and instead load row-by-row until the `Page`'s estimated size is larger than a "jumbo" size which defaults to a megabyte. Once we load enough rows that we estimate the page is "jumbo", we then stop loading rows. The Page will look like this: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | <-- after loading this row | | | | | | we crossed to "jumbo" size | | | | | | | | | | | | | | | | | | <-- these rows are entirely empty | | | | | | | | | | | | ``` Then we chop the page to the last row: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | ``` Then fill in the column-at-a-time columns: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | 1 | XXXX | 11 | 1.0 | | XXXX | 2 | XXXX | 22 | -2.0 | | XXXX | 3 | XXXX | 33 | 1e9 | | XXXX | 4 | XXXX | 44 | 913 | | XXXX | 5 | XXXX | 55 | 0.1234 | | XXXX | 6 | XXXX | 66 | 3.1415 | ``` And then we return *that* `Page`. On the next `Driver` iteration we start from where we left off.
The `TS` source is guarded by a feature flag. Fixes elastic#131500
Fix elastic#129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](elastic#129372 (comment)) and my draft elastic#131286 for enabling this.)
…cellationViaTimeoutWithAllowPartialResultsSetToFalse elastic#131248
* Fix msearch rest-api-spec * Add YAML tests for added parameters
RLIKE LIST did not manage to make it into 9.1. In this PR, we modify the documentation to make it clear that it will be available in 9.2, not 9.1
…sqlTSAfterDownsampling elastic#131500
…ksIT testRemoveBanParentsOnDisconnect elastic#131562
…#130495) This change removes RemoteClusterService.getRemoteClusterNames() since getRegisteredRemoteClusterNames() provides the same functionality. The comment in getRegisteredRemoteClusterNames() was removed since it is no longer accurate after the change in PR elastic#47891.
…ResultsIT testPartialResults elastic#131481
This PR changes the test to simply wait expected master on every node instead of selectively waiting on one non-master and one master node. The later is problematic since it uses API that is not suitable when the cluster is changing master. Relates: elastic#127213
Relates: ES-12419, ES-12420
…t {p0=search/40_indices_boost/Indices boost with alias} elastic#131598
An exception here should be impossible, but we don't assert that, nor do we emit a log message to prove it didn't happen in a production environment. This commit adds the missing log and assert.
Clarifies in its documentation that `BlobContainer#getRegister` offers only read-after-write semantics rather than full linearizability, and adds comments to its callers justifying why this is still safe.
In order to better understand the infrequent failures in elastic#129445 and in the hope to reproduce the issue better, this PR adds logging around which shards and nodes documents end up in FieldSortIT#testSortMixedFieldTypes and increases the log level for this test suite in the o.e.search packages to TRACE and in some action.search packages to DEBUG to better understand where exceptions are thrown and to better trace how resources are released after that. Relates to elastic#129445
Spell out that total memory needs to account for multiple nodes, and other processes, and that the OOM killer might react if you ignore this guidance.
…lastic#131419) * update `kibana_system` to grant it access to `.chat-*` system index * fix unit test
`SampleOperator.Status` wasn't declared as a NamedWritable by the plugin, leading to serialization errors when `SAMPLE` is used with `profile: true`. It leads to an `IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.compute.operator.Operator$Status][sample]` Profiles will be tested in this PR: elastic#131474, that's currently failing because of this bug
…thub.com:seanstory/elasticsearch into seanstory/increase-mapping-field-meta-char-limit
❌ Author of the following commits did not sign a Contributor Agreement: Please, read and sign the above mentioned agreement if you want to contribute to this project |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WIP. Wanting to validate that naively bumping this limit doesn't cause any significant issues.
At the same time, will be evaluating the potential impact this could have on LLM understanding of mappings, if we can add much longer field-level "descriptions".