Naively increase the meta field char limit 50->500 #131478

seanstory · 2025-07-17T20:15:19Z

WIP. Wanting to validate that naively bumping this limit doesn't cause any significant issues.

At the same time, will be evaluating the potential impact this could have on LLM understanding of mappings, if we can add much longer field-level "descriptions".

github-actions · 2025-07-17T20:16:39Z

🔍 Preview links for changed docs

docs/reference/elasticsearch/mapping-reference/mapping-field-meta.md

seanstory · 2025-07-17T20:25:50Z

buildkite test this

…ces (elastic#131251) * Refactoring inference services to accept context * fix linting issues * adding mock cluster service to fix IT test * refactoring to remove duplication in constructors * remove unnecessary blank line * refactor to have uniform constructor call * refactor to have uniform constructor call for sagemaker * fix linting issues * fix failed unit tests --------- Co-authored-by: Elastic Machine <[email protected]>

Fixed by elastic#131370 or elastic#130963. Closes elastic#130505 Closes elastic#130504 Closes elastic#130501 Closes elastic#131024

…ExplosionNoFetch elastic#128720

…t {p0=vector-tile/20_aggregations/stats agg} elastic#131484

…elastic#120911

This PR adds the missing ignore_unavailable, allow_no_indices and expand_wildcards query parameters.

We miss checking whether a field exists when populating the dimension attributes. This issue occurs when a field exists in some, but not all target indices.

…EvalStats elastic#131503

The corresponding issue elastic#116781 has already been fixed.

…astic#131508

… work (elastic#131505) Disable entitlements for DirectIOIT, the suite requires delegation to work. On main the suite is skipped (direct IO is disabled by default), but this blocks backports.

…t {p0=downsample-with-security/10_basic/Downsample index} elastic#131513

Removes YAML tests for the `/_cluster/allocation/explain` API. The tests passed in alternate values for the APIs. An example is passing "true" for fields expecting a boolean value. While this is explicitly supported by the API, this is not the correct place to be testing this behaviour, and resulted in the API specification failing validation. Relates elastic#127028

…ate states (elastic#129633) Continuation of elastic#127148 When datanodes send the STATS intermediate states to the coordinator, it aggregates them. Now, however, the TopN groups sent by a datanode may not be acceptable in the coordinator (Because it has better values already), so it will discard such values. However, the engine wasn't handling intermediate groups with nulls (TopNBlockHash uses nulls to discard unused groups). See https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/GroupingAggregator.java#L47 _This code isn't connected with the query yet, so there's no bug in production_

Add verification that the optimizers do not modify the number of attributes and the attribute datatype. We add special handling for Lookup Join, by checking EsQueryExec esQueryExec && esQueryExec.indexMode() == LOOKUP and another special handling for ProjectAwayColumns.ALL_FIELDS_PROJECTED Closes elastic#125576

This adds support for splitting `Page`s of large values when loading from single segment, non-descending hits. This is hottest code path as it's how we load data for aggregation. So! We had to make very very very sure this doesn't slow down the fast path of loading doc values. Caveat - this only defends against loading large values via the row-by-row load mechanism that we use for stored fields and _source. That covers the most common kinds of large values - mostly `text` and geo fields. If we need to split further on docs values, we'll have to invent something for them specifically. For now, just row-by-row. This works by flipping the order in which we load row-by-row and column-at-a-time values. Previously we loaded all column-at-a-time values first because that was simpler. Then we loaded all of the row-by-row values. Now we save the column-at-a-time values and instead load row-by-row until the `Page`'s estimated size is larger than a "jumbo" size which defaults to a megabyte. Once we load enough rows that we estimate the page is "jumbo", we then stop loading rows. The Page will look like this: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | <-- after loading this row | | | | | | we crossed to "jumbo" size | | | | | | | | | | | | | | | | | | <-- these rows are entirely empty | | | | | | | | | | | | ``` Then we chop the page to the last row: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | | XXXX | | XXXX | | | ``` Then fill in the column-at-a-time columns: ``` | txt1 | int | txt2 | long | double | |------|-----|------|------|--------| | XXXX | 1 | XXXX | 11 | 1.0 | | XXXX | 2 | XXXX | 22 | -2.0 | | XXXX | 3 | XXXX | 33 | 1e9 | | XXXX | 4 | XXXX | 44 | 913 | | XXXX | 5 | XXXX | 55 | 0.1234 | | XXXX | 6 | XXXX | 66 | 3.1415 | ``` And then we return *that* `Page`. On the next `Driver` iteration we start from where we left off.

The `TS` source is guarded by a feature flag. Fixes elastic#131500

Fix elastic#129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](elastic#129372 (comment)) and my draft elastic#131286 for enabling this.)

…cellationViaTimeoutWithAllowPartialResultsSetToFalse elastic#131248

* Fix msearch rest-api-spec * Add YAML tests for added parameters

RLIKE LIST did not manage to make it into 9.1. In this PR, we modify the documentation to make it clear that it will be available in 9.2, not 9.1

…sqlTSAfterDownsampling elastic#131500

…ksIT testRemoveBanParentsOnDisconnect elastic#131562

…#130495) This change removes RemoteClusterService.getRemoteClusterNames() since getRegisteredRemoteClusterNames() provides the same functionality. The comment in getRegisteredRemoteClusterNames() was removed since it is no longer accurate after the change in PR elastic#47891.

…stic#131556) Fixes elastic#131500

…ResultsIT testPartialResults elastic#131481

…tic#131376

See title

This PR changes the test to simply wait expected master on every node instead of selectively waiting on one non-master and one master node. The later is problematic since it uses API that is not suitable when the cluster is changing master. Relates: elastic#127213

Relates: ES-12419, ES-12420

…t {p0=search/40_indices_boost/Indices boost with alias} elastic#131598

An exception here should be impossible, but we don't assert that, nor do we emit a log message to prove it didn't happen in a production environment. This commit adds the missing log and assert.

Clarifies in its documentation that `BlobContainer#getRegister` offers only read-after-write semantics rather than full linearizability, and adds comments to its callers justifying why this is still safe.

In order to better understand the infrequent failures in elastic#129445 and in the hope to reproduce the issue better, this PR adds logging around which shards and nodes documents end up in FieldSortIT#testSortMixedFieldTypes and increases the log level for this test suite in the o.e.search packages to TRACE and in some action.search packages to DEBUG to better understand where exceptions are thrown and to better trace how resources are released after that. Relates to elastic#129445

…sts testReader elastic#131573

Spell out that total memory needs to account for multiple nodes, and other processes, and that the OOM killer might react if you ignore this guidance.

…lastic#131419) * update `kibana_system` to grant it access to `.chat-*` system index * fix unit test

`SampleOperator.Status` wasn't declared as a NamedWritable by the plugin, leading to serialization errors when `SAMPLE` is used with `profile: true`. It leads to an `IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.compute.operator.Operator$Status][sample]` Profiles will be tested in this PR: elastic#131474, that's currently failing because of this bug

Addresses elastic#130015

…c#131440)

…thub.com:seanstory/elasticsearch into seanstory/increase-mapping-field-meta-char-limit

cla-checker-service · 2025-07-21T16:06:27Z

❌ Author of the following commits did not sign a Contributor Agreement:
e0c1a9b, a4f345b

Please, read and sign the above mentioned agreement if you want to contribute to this project

Naively increase the meta field char limit 50->500

95e0a9d

elasticsearchmachine added the v9.2.0 label Jul 17, 2025

Samiul-TheSoccerFan and others added 26 commits July 17, 2025 16:57

ESQL: Unmuted fixed tests (elastic#131476)

decce56

Fixed by elastic#131370 or elastic#130963. Closes elastic#130505 Closes elastic#130504 Closes elastic#130501 Closes elastic#131024

Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackIT testLookup…

7e7093a

…ExplosionNoFetch elastic#128720

Mute org.elasticsearch.test.rest.yaml.RcsCcsCommonYamlTestSuiteIT tes…

a528ebb

…t {p0=vector-tile/20_aggregations/stats agg} elastic#131484

Mute org.elasticsearch.packaging.test.DockerTests test050BasicApiTests …

d365412

…elastic#120911

Add missing query params to index recovery API spec (elastic#131490)

654d1f4

This PR adds the missing ignore_unavailable, allow_no_indices and expand_wildcards query parameters.

Fix NPE in TimeSeriesExtractFieldOperator (elastic#131497)

9edf9f6

We miss checking whether a field exists when populating the dimension attributes. This issue occurs when a field exists in some, but not all target indices.

Mute org.elasticsearch.xpack.esql.action.EsqlActionBreakerIT testFrom…

a29d8ea

…EvalStats elastic#131503

ESQL: Unmute some generative tests (elastic#131446)

d637927

The corresponding issue elastic#116781 has already been fixed.

Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test el…

3c11ab2

…astic#131508

Disable entitlements for DirectIOIT, the suite requires delegation to…

db369bb

… work (elastic#131505) Disable entitlements for DirectIOIT, the suite requires delegation to work. On main the suite is skipped (direct IO is disabled by default), but this blocks backports.

Mute org.elasticsearch.xpack.downsample.DownsampleWithBasicRestIT tes…

fb6e0c0

…t {p0=downsample-with-security/10_basic/Downsample index} elastic#131513

ES|QL: fix generative test (elastic#131515)

c9d5076

ScoreTests capability check (elastic#131516)

d70093b

Upgrade apm-agent to 1.55.0 (elastic#131510)

27a09d8

Skip tests with TS command for non-snapshot builds (elastic#131518)

732bab0

The `TS` source is guarded by a feature flag. Fixes elastic#131500

Mute org.elasticsearch.xpack.search.CrossClusterAsyncSearchIT testCan…

f5c6b35

…cellationViaTimeoutWithAllowPartialResultsSetToFalse elastic#131248

Fix msearch rest-api-spec (elastic#130627)

6543e50

* Fix msearch rest-api-spec * Add YAML tests for added parameters

[DOCS][ESQL] Fix release version in Docs for RLIKE LIST (elastic#131465)

e67e50b

RLIKE LIST did not manage to make it into 9.1. In this PR, we modify the documentation to make it clear that it will be available in 9.2, not 9.1

Mute org.elasticsearch.xpack.downsample.DownsampleIT testAggMetricInE…

efa71d7

…sqlTSAfterDownsampling elastic#131500

refactor: enhance semantic_text inference error msg (elastic#131519)

a786c93

elasticsearchmachine and others added 28 commits July 18, 2025 23:44

Mute org.elasticsearch.action.admin.cluster.node.tasks.CancellableTas…

668accc

…ksIT testRemoveBanParentsOnDisconnect elastic#131562

Skip downsample IT test with TS command when command not present (ela…

256a7fe

…stic#131556) Fixes elastic#131500

Mute org.elasticsearch.xpack.esql.action.CrossClusterQueryWithPartial…

3c914e1

…ResultsIT testPartialResults elastic#131481

Mute org.elasticsearch.packaging.test.DockerTests test010Install elas…

a07e7e9

…tic#131376

Executing shard recovery in project context (elastic#130525)

6dd4d67

See title

Add shard write-load to cluster info (elastic#131496)

b371590

Relates: ES-12419, ES-12420

Mute org.elasticsearch.test.rest.yaml.RcsCcsCommonYamlTestSuiteIT tes…

08cbdbc

…t {p0=search/40_indices_boost/Indices boost with alias} elastic#131598

Log failure in internalSend (elastic#131418)

a692cbd

An exception here should be impossible, but we don't assert that, nor do we emit a log message to prove it didn't happen in a production environment. This commit adds the missing log and assert.

Document read-after-write semantics for getRegister (elastic#131522)

888e9a2

Clarifies in its documentation that `BlobContainer#getRegister` offers only read-after-write semantics rather than full linearizability, and adds comments to its callers justifying why this is still safe.

Mute org.elasticsearch.compute.lucene.read.SortedSetOrdinalsBuilderTe…

93640e7

…sts testReader elastic#131573

Clarify heap size configuration (elastic#131607)

b4a455d

Spell out that total memory needs to account for multiple nodes, and other processes, and that the OOM killer might react if you ignore this guidance.

update kibana_system to grant it access to .chat-* system index (e…

feb0b8f

…lastic#131419) * update `kibana_system` to grant it access to `.chat-*` system index * fix unit test

Update index mapping update privileges (elastic#130894)

0eca703

ES|QL: Improve generative tests for FORK [130015] (elastic#131206)

9db4361

Addresses elastic#130015

ESQL: Add asynchronous pre-optimization step for logical plan (elasti…

c666679

…c#131440)

Naively increase the meta field char limit 50->500

d5f6cbc

Adjust test to account for 500 char limit

27c54ab

Added index setting for field meta character limit

1f65597

Min val should be 0, restored deleted comment

4282385

[CI] Auto commit changes from spotless

a4f345b

Move setting to MappterService

b3589a4

make it dynamic

460fe79

remove commented-out line

c6c1762

Merge branch 'seanstory/increase-mapping-field-meta-char-limit' of gi…

c5e25fa

…thub.com:seanstory/elasticsearch into seanstory/increase-mapping-field-meta-char-limit

seanstory closed this Jul 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Naively increase the meta field char limit 50->500 #131478

Naively increase the meta field char limit 50->500 #131478

Uh oh!

seanstory commented Jul 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 17, 2025 •

edited

Loading

Uh oh!

seanstory commented Jul 17, 2025

Uh oh!

cla-checker-service bot commented Jul 21, 2025

Uh oh!

Uh oh!

Naively increase the meta field char limit 50->500 #131478

Naively increase the meta field char limit 50->500 #131478

Uh oh!

Conversation

seanstory commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

seanstory commented Jul 17, 2025

Uh oh!

cla-checker-service bot commented Jul 21, 2025

Uh oh!

Uh oh!

seanstory commented Jul 17, 2025 •

edited

Loading

github-actions bot commented Jul 17, 2025 •

edited

Loading