-
Notifications
You must be signed in to change notification settings - Fork 205
Feat: support bit_count function #1602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
val table = "bitwise_count_test" | ||
withTable(table) { | ||
sql(s"create table $table(col1 long, col2 int, col3 short, col4 byte) using parquet") | ||
sql(s"insert into $table values(1111, 2222, 17, 7)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind adding random number cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Added tests with random data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a test with a Parquet file not created by Spark (see makeParquetFileAllTypes
) and see if this reports correct results with unsigned int columns?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1602 +/- ##
============================================
+ Coverage 56.12% 58.21% +2.08%
- Complexity 976 1108 +132
============================================
Files 119 130 +11
Lines 11743 12703 +960
Branches 2251 2377 +126
============================================
+ Hits 6591 7395 +804
- Misses 4012 4122 +110
- Partials 1140 1186 +46 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that we have this one dragged long and I hope we can merge soon, but one more question
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> { | ||
let arg = self.arg.evaluate(batch)?; | ||
match arg { | ||
ColumnarValue::Array(array) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any possibility that we get a dictionary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems impossible to me, but if you have a case study on how this can be tested, I'm ready to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to have many repeated values in order to create dictionary values
If you use something like https://github.com/apache/datafusion-comet/blob/main/spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala#L940
you should be able to test it
If this is the case, can we delegate to a |
# Conflicts: # native/spark-expr/src/comet_scalar_funcs.rs
@mbutrovich I couldn't find a built-in implementation of bit_count in the DataFusion project, but i rewrote it using scalarFunc without adding a new proto expr. |
let operand = $OPERAND | ||
.as_any() | ||
.downcast_ref::<$DT>() | ||
.expect("compute_op failed to downcast array"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps report $DT as well in case of failure?
val table = "bitwise_count_test" | ||
withTable(table) { | ||
sql(s"create table $table(col1 long, col2 int, col3 short, col4 byte) using parquet") | ||
sql(s"insert into $table values(1111, 2222, 17, 7)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a test with a Parquet file not created by Spark (see makeParquetFileAllTypes
) and see if this reports correct results with unsigned int columns?
Which issue does this PR close?
Related to Epic: #240
bit_count: SELECT bit_count(0) => 0
DataFusionComet bit_count has same behavior with Spark 's bit_count function
Spark: https://spark.apache.org/docs/latest/api/sql/index.html#bit_count
Closes #.
Rationale for this change
Defined under Epic: #240
What changes are included in this PR?
bitwise_count.rs: impl for bit_count function
planner.rs: Maps Spark 's bit_count function to DataFusionComet bit_count physical expression from Spark physical expression
expr.proto: bit_count has been added,
QueryPlanSerde.scala: bit_count pattern matching case has been added,
CometExpressionSuite.scala: A new UT has been added for bit_count function.
How are these changes tested?
A new UT has been added.