-
Notifications
You must be signed in to change notification settings - Fork 493
teach pebble about block level prefix synthesis #3350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0794b68
to
945a439
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is so much simpler than the prefix iterator approach (which I'm still trying to get to work in all cases)..
We should add checks in NewRawRangeDelIter
and NewRawRangeKeyIter
and error out since we can't handle the transform there.
Reviewable status: 0 of 10 files reviewed, 5 unresolved discussions (waiting on @msbutler)
-- commits
line 10 at r1:
This slowdown is when prefix synthesis is not enabled, correct?
sstable/block.go
line 774 at r1 (raw file):
if i.transforms.SyntheticPrefix != nil { if !bytes.HasPrefix(key, i.transforms.SyntheticPrefix) { if i.cmp(i.transforms.SyntheticPrefix, key) >= 0 {
This is a problem - cmp
is only defined for full keys. The CRDB comparer looks at the last byte and treats it as a suffix length so things would break.
What the prefix replacing iterator does is to also pass a "test" key in the synthetic prefix range (I use the vtable lower bound) and use that for the comparison.
sstable/reader.go
line 173 at r1 (raw file):
} // SyntheticPrefix represents a byte slice that it implicitly prepended to every
*is
sstable/reader.go
line 175 at r1 (raw file):
// SyntheticPrefix represents a byte slice that it implicitly prepended to every // key in a file being read or accessed by a reader. Specifically, the file is // implied to contain keys that do not explicitly include the prefix, and the
This explanation is a bit confusing.. maybe say that the table is assumed to contain pieces of keys that become full keys when prepended a valid prefix. The table's bloom filters are constructed only on the pieces of keys in the table, but interactions with the file ...
One question I have - are the keys / "key pieces" in the table required to be full keys (i.e. can they be passed to Compare
or Split
?). I believe they will be in practice but we don't depend on that as long as they're valid with the SyntheticPrefix?
sstable/reader_common.go
line 90 at r2 (raw file):
return false } if p.ContentPrefix != nil && p.SyntheticPrefix != nil {
I'd compare lengths just in case. I think it's enough to check ContentPrefix
945a439
to
3a40f7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added those invariants.
Reviewable status: 0 of 10 files reviewed, 2 unresolved discussions (waiting on @msbutler and @RaduBerinde)
Previously, RaduBerinde wrote…
This slowdown is when prefix synthesis is not enabled, correct?
Indeed. Amended the commit msg to clarify
sstable/block.go
line 774 at r1 (raw file):
Previously, RaduBerinde wrote…
This is a problem -
cmp
is only defined for full keys. The CRDB comparer looks at the last byte and treats it as a suffix length so things would break.
What the prefix replacing iterator does is to also pass a "test" key in the synthetic prefix range (I use the vtable lower bound) and use that for the comparison.
Changed it to use i.FirstUserKey which gets hydrated at init.
I believe we could also use bytes.Compare() since the syntheticPrefix will never contain suffix bytes (which have weird ordering), but I think the FirstUserKey approach is cleaner.
sstable/reader.go
line 175 at r1 (raw file):
Previously, RaduBerinde wrote…
This explanation is a bit confusing.. maybe say that the table is assumed to contain pieces of keys that become full keys when prepended a valid prefix. The table's bloom filters are constructed only on the pieces of keys in the table, but interactions with the file ...
One question I have - are the keys / "key pieces" in the table required to be full keys (i.e. can they be passed to
Compare
orSplit
?). I believe they will be in practice but we don't depend on that as long as they're valid with the SyntheticPrefix?
Done. Since a prefix-less key will always contain a full suffix, I believe all compare operations will be valid on it. Might be worth adding some invariants here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @msbutler)
sstable/block.go
line 774 at r1 (raw file):
Previously, msbutler (Michael Butler) wrote…
Changed it to use i.FirstUserKey which gets hydrated at init.
I believe we could also use bytes.Compare() since the syntheticPrefix will never contain suffix bytes (which have weird ordering), but I think the FirstUserKey approach is cleaner.
This deserves a comment. Something like:
The seek key is before or after the entire range of keys that start with SyntheticPrefix. To determine which, we need to compare against a valid key in this range. We use firstUserKey
which has the synthetic prefix.
sstable/reader_common.go
line 90 at r5 (raw file):
return false } if len(p.ContentPrefix)>0{
[nit] formatting (do you have crlfmt
or gofmt
set up to run automatically?)
testdata/ingest_external
line 525 at r5 (raw file):
---- # Synthesize prefix with one greater than the original one.
These comments don't make sense, we are replacing the empty prefix in both cases.
3a40f7d
to
6932834
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 2 unresolved discussions (waiting on @msbutler and @RaduBerinde)
sstable/reader_common.go
line 90 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] formatting (do you have
crlfmt
orgofmt
set up to run automatically?)
i've been running it manually with the following cmd:
❯ which mbfmt
mbfmt () {
echo crlfmt
git diff-tree --no-commit-id --name-only -r HEAD | grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox} '.go' | xargs -n1 crlfmt -tab 2 -w
}
it's not great with multi commit prs. I should revisit running it automatically via my IDE.
testdata/ingest_external
line 525 at r5 (raw file):
Previously, RaduBerinde wrote…
These comments don't make sense, we are replacing the empty prefix in both cases.
Oops, bad copy pasta.
@RaduBerinde @sumeerbhola friendly ping to review this! @sumeerbhola let me know if I should wait to merge until you have taken a look at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Reviewable status: 0 of 11 files reviewed, 7 unresolved discussions (waiting on @msbutler and @sumeerbhola)
sstable/block.go
line 420 at r7 (raw file):
// be used after ikey is set. key []byte // fullKey is a buffer used for key prefix decompression.
Mention that when a synthetic prefix is used, this always starts with that prefix.
sstable/block.go
line 691 at r7 (raw file):
} if i.transforms.SyntheticPrefix != nil { i.firstUserKey = append(append(i.firstUserKeyWithPrefixBuf[:0], i.transforms.SyntheticPrefix...), i.firstUserKey...)
[nit] this might allocate twice (not sure if Go "merges" the appends into a single operation). More importantly, we never actually set firstUserKeyWithPrefixBuf
to anything, so we won't reuse buffers.
I'd replace with:
i.firstUserKeyWithPrefixBuf = slices.Grow(i.firstUserKeyWithPrefixBuf[:0], len(i.transforms.SyntheticPrefix) + len(i.firstUserKey))
i.firstUserKeyWithPrefixBuf = append(i.firstUserKeyWithPrefixBuf, i.transforms.SyntheticPrefix...)
i.firstUserKeyWithPrefixBuf = append(i.firstUserKeyWithPrefixBuf, i.firstUserKey...)
i.firstUserKey = i.firstUserKeyWithPrefixBuf
sstable/block.go
line 964 at r7 (raw file):
return i.Last() } i.offset = -1
Worth adding a comment like above, mentioning that Next will return the first user key.
sstable/reader_common.go
line 86 at r9 (raw file):
// wrapper should be used. func (p *PrefixReplacement) UsePrefixReplacementIterator() bool { if p == nil {
[nit] return p != nil && len(p.ContentPrefix) > 0
sstable/reader_test.go
line 1381 at r9 (raw file):
} if prefix != nil { keyToWrite = append(keyToWrite, prefix...)
[nit] keyToWrite = append(slices.Clip(prefix), key...)
6932834
to
b4090ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 2 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)
sstable/block.go
line 420 at r7 (raw file):
Previously, RaduBerinde wrote…
Mention that when a synthetic prefix is used, this always starts with that prefix.
Done.
sstable/block.go
line 691 at r7 (raw file):
Previously, RaduBerinde wrote…
[nit] this might allocate twice (not sure if Go "merges" the appends into a single operation). More importantly, we never actually set
firstUserKeyWithPrefixBuf
to anything, so we won't reuse buffers.I'd replace with:
i.firstUserKeyWithPrefixBuf = slices.Grow(i.firstUserKeyWithPrefixBuf[:0], len(i.transforms.SyntheticPrefix) + len(i.firstUserKey)) i.firstUserKeyWithPrefixBuf = append(i.firstUserKeyWithPrefixBuf, i.transforms.SyntheticPrefix...) i.firstUserKeyWithPrefixBuf = append(i.firstUserKeyWithPrefixBuf, i.firstUserKey...) i.firstUserKey = i.firstUserKeyWithPrefixBuf
Done.
To clarify, this change will increase the capacity of firstUserKeyWithPrefixBuf
, if necessary. But we do reuse the buf when we reset the iter for reuse, right?
https://github.com/msbutler/pebble/blob/butler-rebased-prefix/sstable/block.go#L541
sstable/block.go
line 964 at r7 (raw file):
Previously, RaduBerinde wrote…
Worth adding a comment like above, mentioning that Next will return the first user key.
Done.
sstable/reader_common.go
line 86 at r9 (raw file):
Previously, RaduBerinde wrote…
[nit] return p != nil && len(p.ContentPrefix) > 0
Done.
sstable/reader_test.go
line 1381 at r9 (raw file):
Previously, RaduBerinde wrote…
[nit]
keyToWrite = append(slices.Clip(prefix), key...)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, all discussions resolved (waiting on @sumeerbhola)
sstable/block.go
line 691 at r7 (raw file):
Previously, msbutler (Michael Butler) wrote…
Done.
To clarify, this change will increase the capacity of
firstUserKeyWithPrefixBuf
, if necessary. But we do reuse the buf when we reset the iter for reuse, right?
https://github.com/msbutler/pebble/blob/butler-rebased-prefix/sstable/block.go#L541
Right. I was saying that the previous version of the change (r7 on reviewable) did not ever set firstUserKeyWithPrefixBuf
to anything, so it would always be nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick pass, and have only a few nits
Reviewed 5 of 10 files at r10, 1 of 1 files at r11, 2 of 9 files at r12, all commit messages.
Reviewable status: 3 of 11 files reviewed, 4 unresolved discussions (waiting on @msbutler)
-- commits
line 72 at r12:
different than ...
isn't this the buggy path (that @RaduBerinde was trying to fix in #3344) that we are abandoning in favor of that PR?
sstable/block.go
line 783 at r10 (raw file):
// with SyntheticPrefix. To determine which, we need to compare against a // valid key in the block. We use firstUserKey which has the synthetic // prefix.
nit: a comment like this belongs before the predicate, since that is the line of code this is talking about.
sstable/block.go
line 965 at r10 (raw file):
// with SyntheticPrefix. To determine which, we need to compare against a // valid key in the block. We use firstUserKey which has the synthetic // prefix.
same nit
sstable/reader.go
line 181 at r10 (raw file):
// keys that did include the prefix. Note that all Compare operations may act on // a prefix-less key as the synthetic prefix will never modify key metadata // stored in the key suffix.
Should this also say that such files must not contain rangedels or range keys, since the transformation is only applied to the point keys.
Previously, sumeerbhola wrote…
Yes, the plan is to remove that once Restore is able to generate "prefix free" ssts. |
This patch teaches block iterator about synthetic prefixes. When the iterator is configured with a synthetic prefix, it will assume that that prefix is implicitly prepended to every key in the underlying sst blocks when interacting with or returning those keys. According to the block level benchmarks, adding prefix synthesis support slows blockiter.Next() and blockiter.Prev() down by ~1% on the main read path-- that is, without synthetic suffix or prefix replacement. ``` ❯ benchstat pre.txt post.txt goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble/sstable cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ pre.txt │ post.txt │ │ sec/op │ sec/op vs base │ BlockIterSeekGE/syntheticSuffix=false;restart=16-24 473.1n ± 2% 478.8n ± 4% ~ (p=0.138 n=10) BlockIterSeekGE/syntheticSuffix=true;restart=16-24 637.7n ± 12% 565.7n ± 14% ~ (p=0.565 n=10) BlockIterSeekLT/syntheticSuffix=false;restart=16-24 541.4n ± 3% 540.8n ± 3% ~ (p=1.000 n=10) BlockIterSeekLT/syntheticSuffix=true;restart=16-24 729.0n ± 1% 703.5n ± 2% -3.50% (p=0.000 n=10) BlockIterNext/syntheticSuffix=false;restart=16-24 15.62n ± 0% 15.78n ± 0% +1.02% (p=0.000 n=10) BlockIterNext/syntheticSuffix=true;restart=16-24 22.65n ± 16% 22.86n ± 13% ~ (p=0.515 n=10) BlockIterPrev/syntheticSuffix=false;restart=16-24 30.70n ± 0% 31.24n ± 0% +1.76% (p=0.000 n=10) BlockIterPrev/syntheticSuffix=true;restart=16-24 40.91n ± 2% 42.03n ± 2% +2.74% (p=0.000 n=10) geomean 123.2n 121.9n -1.00% ```
The block level iteration benchmarks suggest that block level prefix synthesis does not significantly slow down iteration, compared to block level iteration on a block with the prefix prewritten to the key. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble/sstable cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ post.txt │ │ sec/op │ BlockIterSeekGE/syntheticPrefix=false;syntheticSuffix=false;restart=16-24 481.4n ± 3% BlockIterSeekGE/syntheticPrefix=false;syntheticSuffix=true;restart=16-24 569.7n ± 14% BlockIterSeekGE/syntheticPrefix=true;syntheticSuffix=false;restart=16-24 497.9n ± 2% BlockIterSeekGE/syntheticPrefix=true;syntheticSuffix=true;restart=16-24 604.8n ± 7% BlockIterSeekLT/syntheticPrefix=false;syntheticSuffix=false;restart=16-24 548.9n ± 3% BlockIterSeekLT/syntheticPrefix=false;syntheticSuffix=true;restart=16-24 720.1n ± 1% BlockIterSeekLT/syntheticPrefix=true;syntheticSuffix=false;restart=16-24 535.1n ± 3% BlockIterSeekLT/syntheticPrefix=true;syntheticSuffix=true;restart=16-24 713.4n ± 2% BlockIterNext/syntheticPrefix=false;syntheticSuffix=false;restart=16-24 15.90n ± 0% BlockIterNext/syntheticPrefix=false;syntheticSuffix=true;restart=16-24 22.92n ± 14% BlockIterNext/syntheticPrefix=true;syntheticSuffix=false;restart=16-24 15.44n ± 0% BlockIterNext/syntheticPrefix=true;syntheticSuffix=true;restart=16-24 22.15n ± 15% BlockIterPrev/syntheticPrefix=false;syntheticSuffix=false;restart=16-24 31.43n ± 0% BlockIterPrev/syntheticPrefix=false;syntheticSuffix=true;restart=16-24 42.23n ± 1% BlockIterPrev/syntheticPrefix=true;syntheticSuffix=false;restart=16-24 31.35n ± 0% BlockIterPrev/syntheticPrefix=true;syntheticSuffix=true;restart=16-24 41.81n ± 2% ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 11 files reviewed, 3 unresolved discussions (waiting on @sumeerbhola)
sstable/block.go
line 965 at r10 (raw file):
Previously, sumeerbhola wrote…
same nit
Done.
sstable/reader.go
line 181 at r10 (raw file):
Previously, sumeerbhola wrote…
Should this also say that such files must not contain rangedels or range keys, since the transformation is only applied to the point keys.
Done.
This patch enables the db user to ingest a virtual sst with a synthetic prefix and for higher level readers to plumb the synthetic prefix to the block iterator which will conduct prefix synthesis. Note that the db user can still ingest a virtual sst with a prefix _replacement_ rule, by passing the virtual sst the original prefix and the replacement prefix. This tranformation is different that block level prefix synthesis as it replaces the original prefix with the new prefix.
fdb9dae
to
a28181a
Compare
See individual commits