forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
125345: sql: dynamically determine histogram sample size r=Uzair5162 a=Uzair5162 We currently sample a default of 10k rows when collecting stats to construct histograms. We have seen cases of histograms for large tables missing frequent values as a result of poor samples, and users have frequently reported slow queries as a result of this. This commit dynamically picks a sample size based on estimated table size $n$, using a sample size of $582n^{0.29}$ bounded between 10k and 300k. This formula was derived using empirical research on the performance of different sample sizes using the code in effc506. The formula approximates the following table: | Table Size | Sample Size | | ------------- | ----------- | | 10,000 | 10,000 | | 100,000 | 15,000 | | 1,000,000 | 30,000 | | 10,000,000 | 60,000 | | 100,000,000 | 100,000 | | 1,000,000,000 | 300,000 | These sample sizes empirically achieved the following coverage: - 100k rows/15k samples: ~100% coverage of multiplicities down to 100x, ~80% down to 10x - 1m rows/30k samples: ~100% coverage of multiplicities down to 1000x, ~95% down to 100x - 10m rows/60k samples: ~100% coverage of multiplicities down to 10000x, ~95% down to 1000x, ~50% down to 100x - 100m rows/100k samples: ~100% coverage of multiplicities down to 10000x, ~65% down to 1000x, ~10% down to 100x - 1b rows/300k samples: ~100% coverage of multiplicities down to 100000x, ~95% down to 10000x, ~25% down to 1000x Fixes: cockroachdb#123972 Fixes: cockroachdb#97701 Release note (sql change): Histograms are no longer constructed using a default sample size of 10k. Samples are now dynamically sized based on table size unless the sample count has been set in the table or cluster settings. Co-authored-by: Uzair Ahmad <[email protected]>
- Loading branch information
Showing
6 changed files
with
207 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
// Copyright 2024 The Cockroach Authors. | ||
// | ||
// Use of this software is governed by the Business Source License | ||
// included in the file licenses/BSL.txt. | ||
// | ||
// As of the Change Date specified in that file, in accordance with | ||
// the Business Source License, use of this software will be governed | ||
// by the Apache License, Version 2.0, included in the file | ||
// licenses/APL.txt. | ||
|
||
package sql | ||
|
||
import ( | ||
"math" | ||
"testing" | ||
|
||
"github.com/cockroachdb/cockroach/pkg/util/leaktest" | ||
"github.com/cockroachdb/cockroach/pkg/util/log" | ||
) | ||
|
||
func TestComputeNumberSamples(t *testing.T) { | ||
defer leaktest.AfterTest(t)() | ||
defer log.Scope(t).Close(t) | ||
|
||
testData := []struct { | ||
numRows int | ||
expectedNumSamples int | ||
}{ | ||
{0, 10000}, | ||
{100, 10000}, | ||
{10000, 10000}, | ||
{100000, 16402}, | ||
{1000000, 31983}, | ||
{10000000, 62362}, | ||
{100000000, 121597}, | ||
{1000000000, 237095}, | ||
{10000000000, 300000}, | ||
{math.MaxInt, 300000}, | ||
} | ||
|
||
checkComputeNumberSamples := func(computedNumSamples, expectedNumSamples int) { | ||
if computedNumSamples != expectedNumSamples { | ||
t.Fatalf("expected %d samples, got %d", expectedNumSamples, computedNumSamples) | ||
} | ||
} | ||
for _, td := range testData { | ||
checkComputeNumberSamples(int(computeNumberSamples(uint64(td.numRows))), td.expectedNumSamples) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters