You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for sharing, this performs significantly better than what I was using! While validating the getFirstPassStat statistics on our data I discovered a sumLong bug in ColumnStats.scala Part B.1.1.
ColumnStats.scala - Because the sumLong calculation is happening after the reduce the bug returns the sum of the unique values from the column instead of summing all the values in the column. The fix is simply multiplying the unique column values by the number of times the value appears in the partition.
else if (colValue.isInstanceOf[Double]) {
val colDoubleValue = colValue.asInstanceOf[Double]
if (maxDouble colDoubleValue) minDouble = colDoubleValue
sumDouble += (colDoubleValue * colCount)
}
TestTableStatsSinglePathMain.scala - Because all the id values are unique the sumLong assertion isn't catching the bug. Adding the following sumLong test for age:
Thanks for sharing, this performs significantly better than what I was using! While validating the getFirstPassStat statistics on our data I discovered a sumLong bug in ColumnStats.scala Part B.1.1.
ColumnStats.scala - Because the sumLong calculation is happening after the reduce the bug returns the sum of the unique values from the column instead of summing all the values in the column. The fix is simply multiplying the unique column values by the number of times the value appears in the partition.
Bug: sumLong += colLongValue
Fix: sumLong += (colLongValue * colCount)
The following else if adds support for Double:
TestTableStatsSinglePathMain.scala - Because all the id values are unique the sumLong assertion isn't catching the bug. Adding the following sumLong test for age:
assertResult(98l)(firstPassStats.columnStatsMap(2).sumLong)
Fails the test returning:
98 = 20 + 20 + 20 + 20 + 10 + 8
38 = 20 + 10 + 8
The text was updated successfully, but these errors were encountered: