-
Notifications
You must be signed in to change notification settings - Fork 129
feat!(query): RowBinaryWithNamesAndTypes
#221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments regarding the intermediate implementation.
RowBinaryWithNamesAndTypes
for enchanced type safetyRowBinaryWithNamesAndTypes
for enchanced type safety
RowBinaryWithNamesAndTypes
for enchanced type safetyRowBinaryWithNamesAndTypes
for enchanced type safety
src/rowbinary/de.rs
Outdated
fn deserialize_bool<V: Visitor<'data>>(self, visitor: V) -> Result<V::Value> { | ||
if Validator::VALIDATION { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not removed for empty validate()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is not needed indeed, if everything inlines properly..
fn deserialize_unit<V: Visitor<'data>>(self, visitor: V) -> Result<V::Value> { | ||
// TODO: revise this. | ||
// TODO - skip validation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems that it can break validation? Shouldn't we return Unsupported
for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what is the use case for this method, TBH :)
join_panic_schema_hint(&columns), | ||
); | ||
} | ||
AccessType::WithSeqAccess // ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of (Row, T)
it's impossible to get WithMapAccess
, can be considered as a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will address in a follow-up
RowBinaryWithNamesAndTypes
RowBinaryWithNamesAndTypes
I know this was just merged, but is there an expected date on when this would be released? |
Hi: as a user this is very concerning. Is it possible to add deserialization failure as a separate error variant on the |
we can do that, but isn't it against the Rust book itself? https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html That's a classic case of data corruption, no? |
Summary
Warning
This PR implements RBWNAT for Query only; Insert should be implemented as a follow-up.
First of all, let's abbreviate RowBinaryWithNamesAndTypes format as RBWNAT, and the regular RowBinary as just RB for simplicity.
There is a significant amount of issues created in the repo regarding schema incompatibility or obscure error messages in the repository (see the full list below). The reason is that the deserialization is effectively implemented in a "data-driven" way, where the user structures dictate the way the stream in RB should be (de)serialized, so it is possible to have a hiccup where two UInt32 may be deserialized as a single UInt64, which in worst case scenario may lead to corrupted data. For example:
This test will deserialize a wrong value on the main branch, cause DateTime64 is streamed as 8 bytes (Int64), and 2x(U)Int32 are also streamed as 8 bytes in total. It correctly throws an error on this branch now with enabled validation mode.
This PR introduces:
Client::with_disabled_validation
if you really, really want it. In that case,RowBinary
format is used as before.types
internal crate that contains utils to deal with RBWNAT and Native data types strings parsing into a proper AST. Rustified from https://github.com/ClickHouse/clickhouse-js/blob/main/packages/client-common/src/parse/column_types.ts, but not entirely. The most important part is the correctness and the tests, the actual implementation detail can be adjusted in the follow-up.HashMap<K, V>
, and not only asVec<(K, V)>
, which was confusing.rbwnat.rs
).deserialize_struct
switches fromSeqAccess
to a customMapAccess
that takes the schema field order into account. Performance loss in that case is only around 10%.Source files to look at:
insert
to be implemented as well, so that's a follow-up.Current benchmarks results
Select numbers
This branch:
Main branch:
Selecting a ton of records from
system.numbers
is strictly worse. It is still not clear to me why, cause validation can be simply disabled, and the code is more or less the as in the main, adding only one boolean check.However...
NYC taxi data
This branch after ccfac33:
Main branch (using version from #227):
Issues overview
Note
If an issue is checked in the list, that means there is also a test that demonstrates proper error messages in case of schema mismatch.
Resolved issues
Query::fetch_one()
andQuery::fetch_optional()
can still finish successfully even in the case of schema mismatch #187Custom("premature end of input")
onmax(DateTime64)
query on empty table #218 - clearer error messagesRelated issues
Previously closed issues with unclear error messages
InvalidUtf8Encoding(Utf8Error { valid_up_to: 84, error_len: Some(1) })
error #173u8
,i8
, andbool
#100 - a similar issue to 2xUInt32 decoded into 1xInt64Follow-up issues
RowBinaryWithNamesAndTypes
#10 and Consideration of Type Safety #199 - should be closed after RBWNAT insert implementationFixedString
type #49, Save vector of custom type error into String type in clickhouse #72 - needs working insert with RBWNATCANNOT_READ_ALL_DATA
Error when serializing Nested types with Map fields #214Cannot read all data
, after callingend()
#59