Fix for dissect processor overeager delimiter consumption #128885
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses #119264 and the overly greedy consumption of delimiters by the ES implementation of the dissect processor.
This code modifies the implementation to correctly match the spec and only attempts to greedily consume repeated delimiters if the skip right padding operator is used.
In the process of this, 2 new unit tests were created to highlight the current incorrect behaviour and one tests which did not match the specification was modified to correctly reflect the desired behaviour.
To ensure our implementation aligns with the rest of the product stack, logstash-plugins/logstash-filter-dissect#92 was merged to add the new units tests to Logstash's implimentation of Disect where it passed without the issues raised in #119264, further indicating that our current implementation is faulty.