remove parser/parserMap from DataSchema#19173
Conversation
| @@ -219,37 +216,31 @@ public abstract class SeekableStreamIndexTaskTestBase extends EasyMockSupport | |||
| OBJECT_MAPPER = new TestUtils().getTestObjectMapper(); | |||
| OBJECT_MAPPER.registerSubtypes(new NamedType(JSONParseSpec.class, "json")); | |||
| OLD_DATA_SCHEMA = DataSchema.builder() | |||
There was a problem hiding this comment.
Not so OLD anymore now is it
Does this need to exist? It looks like the tests that use it all have names like "xyzWithLegacyParser". Consider removing it and the tests that use it
|
|
||
| public DataSchema( | ||
| String dataSource, | ||
| @Nullable TimestampSpec timestampSpec, // can be null in old task spec |
There was a problem hiding this comment.
Should timestampSpec and dimensionsSpec be required now that parser is not allowed? They used to be computed on demand from parser if not specified. From reading the code, I think there was this weird thing where if you omitted all three of them, the error you'd get from getTimestampSpec() and getDimensionsSpec() would be an NPE with message "inputRowParser". Now those methods will just return null, which may be worse, since they could cause even stranger failures downstream.
There was a problem hiding this comment.
I originally had it like this, requiring both of these be set to non-null, but a bunch of tests everywhere failed, which caused me to look a bit closer at the logic.
The previous behavior was that if both timestampSpec and dimensionsSpec are null, and parser was null, then it was fine and allowed, and they apparently got filled in later by callers with defaults based on at least some of the passing tests that had neither set. What was not ok was both time and dims being set and also parser being set, or parser being set but having null timestamp and dims. Idk if this is what was intended, but it seems like what it was.
I'm ok changing the logic to require them, but it would be a slight user facing behavior change. Thoughts?
There was a problem hiding this comment.
The previous behavior was that if both timestampSpec and dimensionsSpec are null, and parser was null, then it was fine and allowed
How did that work for timestampSpec? The old code was:
public TimestampSpec getTimestampSpec()
{
if (timestampSpec == null) {
timestampSpec = Preconditions.checkNotNull(getParser(), "inputRowParser").getParseSpec().getTimestampSpec();
}
return timestampSpec;
}
If timestampSpec and parser are both not set, wouldn't calling getTimestampSpec() throw NPE? Maybe the tests weren't calling it, but I think in real production usage, it would always be called.
I suppose in the old code dimensionsSpec could be null though. In that case we would use an exclusions-based dimensionsSpec created by computeDimensionsSpec. We can keep that behavior.
There was a problem hiding this comment.
How did that work for timestampSpec?
not sure, maybe it was only for tests that were not doing anything with it... will have a look when i fixup stuff
| ) | ||
| { | ||
| this(dataSource, timestampSpec, dimensionsSpec, aggregators, granularitySpec, transformSpec, projections); | ||
| InvalidInput.conditionalException(parserMap == null, "parser was removed in Druid 37, define the timestampSpec and dimensionSpec on the schema directly instead of nested inside the parser definition"); |
There was a problem hiding this comment.
Should have both a release note and an item in the documentation that for supervisors, any usages of parser need be migrated to timestampSpec, dimensionsSpec, and (elsewhere in the spec) inputFormat -before- updating. After updating the supervisor will fail to deserialize so updating it will be trickier.
Btw, dimensionsSpec (spelling).
There was a problem hiding this comment.
oops, yea I marked this as release notes, but forgot to write one. It somewhat falls under the umbrella of #19166, but this was not removed in that PR because hadoop was still using it.
Description
Removes
parserfromDataSchemasince parsers are no longer supported by any ingestion task type after #19109 and #19166(same release note as #19166 since they are a bit related)
Release note
Support for the deprecated
parserhas been removed or streaming ingest tasks such as Kafka and Kinesis. Operators must now specifyinputSource/inputFormaton theioConfigof the supervisor spec, and thedataSchemamust not specify aparser. This should be done prior to upgrading to Druid 37 or newer.