Add config option to preserve null values for collections#518
Add config option to preserve null values for collections#518absurdfarce merged 4 commits into1.xfrom
Conversation
Seeing some strange results when adding test values manually via cqlsh. Presume this is a Python driver issue but that isn't especially relevant for this issue.
|
Note that there's an interesting question here about why this CQL: results in this return value from dsbulk: That's pretty clearly wrong, but I don't think it's a dsbulk error. I'm adding these test values via cqlsh and I see the same results when I query the tables via cqlsh so I'm pretty sure there's a Python driver problem lurking there somewhere. Regardless it pretty clearly isn't a dsbulk issue. |
|
Ping @adutra for review on this one as well |
| schemaSettings.isAllowExtraFields(), schemaSettings.isAllowMissingFields()); | ||
| schemaSettings.isAllowExtraFields(), | ||
| schemaSettings.isAllowMissingFields(), | ||
| codecSettings.allowsNullCollections()); |
There was a problem hiding this comment.
There seems to be an inconsistency in naming here for boolean properties: isAllow vs allows.
There was a problem hiding this comment.
Good catch, I'll clean this up.
There was a problem hiding this comment.
Should be fixed now.
| return builder.build(); | ||
| } | ||
|
|
||
| public boolean allowsNullCollections() { |
There was a problem hiding this comment.
I wonder: doesn't this property belong in SchemaSettings rather than CodecSettings?
There was a problem hiding this comment.
It's funny, cause I originally had it in SchemaSettings and subsequently moved it. My thinking in doing so was that you're actually modifying the behviour of the codec i.e. changing how it interprets null values in it's responses. So in the end I thought CodecSettings seemed like a more natural home. After making the move it felt more right to me; the other entries in SchemaSettings didn't really seem to match up to what was going on with this config.
Happy to discuss if you think SchemaSettings seems more appropriate.
The default behaviour for the Java driver is to convert null or empty values for the bytes associated with a collection to a Java type... see here for an example. This behaviour is implemented within the codec layer of the Java driver meaning that by the time the data reaches dsbulk it's already been converted... so dsbulk has no means to distinguish between an empty collection generated in this way and a legit empty collection.
This PR adds a config option which loads a custom codec for collection types. This custom codec simply returns an actual null value if null bytes (or empty bytes) are observed by the codec in the decode process. In all other cases the default behaviour of the codec is preserved.
I've included a unit test for this functionality as well but the following manual test should be enough to demonstrate the issue (and the results of this fix):