Skip to content

teach pebble about block level prefix synthesis #3350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 1, 2024

Conversation

msbutler
Copy link
Contributor

See individual commits

@msbutler msbutler requested a review from RaduBerinde February 26, 2024 17:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler self-assigned this Feb 26, 2024
@msbutler msbutler force-pushed the butler-rebased-prefix branch from 0794b68 to 945a439 Compare February 26, 2024 19:47
@msbutler msbutler marked this pull request as ready for review February 26, 2024 19:48
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is so much simpler than the prefix iterator approach (which I'm still trying to get to work in all cases)..

We should add checks in NewRawRangeDelIter and NewRawRangeKeyIter and error out since we can't handle the transform there.

Reviewable status: 0 of 10 files reviewed, 5 unresolved discussions (waiting on @msbutler)


-- commits line 10 at r1:
This slowdown is when prefix synthesis is not enabled, correct?


sstable/block.go line 774 at r1 (raw file):

	if i.transforms.SyntheticPrefix != nil {
		if !bytes.HasPrefix(key, i.transforms.SyntheticPrefix) {
			if i.cmp(i.transforms.SyntheticPrefix, key) >= 0 {

This is a problem - cmp is only defined for full keys. The CRDB comparer looks at the last byte and treats it as a suffix length so things would break.
What the prefix replacing iterator does is to also pass a "test" key in the synthetic prefix range (I use the vtable lower bound) and use that for the comparison.


sstable/reader.go line 173 at r1 (raw file):

}

// SyntheticPrefix represents a byte slice that it implicitly prepended to every

*is


sstable/reader.go line 175 at r1 (raw file):

// SyntheticPrefix represents a byte slice that it implicitly prepended to every
// key in a file being read or accessed by a reader. Specifically, the file is
// implied to contain keys that do not explicitly include the prefix, and the

This explanation is a bit confusing.. maybe say that the table is assumed to contain pieces of keys that become full keys when prepended a valid prefix. The table's bloom filters are constructed only on the pieces of keys in the table, but interactions with the file ...

One question I have - are the keys / "key pieces" in the table required to be full keys (i.e. can they be passed to Compare or Split?). I believe they will be in practice but we don't depend on that as long as they're valid with the SyntheticPrefix?


sstable/reader_common.go line 90 at r2 (raw file):

		return false
	}
	if p.ContentPrefix != nil && p.SyntheticPrefix != nil {

I'd compare lengths just in case. I think it's enough to check ContentPrefix

Copy link
Contributor Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added those invariants.

Reviewable status: 0 of 10 files reviewed, 2 unresolved discussions (waiting on @msbutler and @RaduBerinde)


-- commits line 10 at r1:

Previously, RaduBerinde wrote…

This slowdown is when prefix synthesis is not enabled, correct?

Indeed. Amended the commit msg to clarify


sstable/block.go line 774 at r1 (raw file):

Previously, RaduBerinde wrote…

This is a problem - cmp is only defined for full keys. The CRDB comparer looks at the last byte and treats it as a suffix length so things would break.
What the prefix replacing iterator does is to also pass a "test" key in the synthetic prefix range (I use the vtable lower bound) and use that for the comparison.

Changed it to use i.FirstUserKey which gets hydrated at init.

I believe we could also use bytes.Compare() since the syntheticPrefix will never contain suffix bytes (which have weird ordering), but I think the FirstUserKey approach is cleaner.


sstable/reader.go line 175 at r1 (raw file):

Previously, RaduBerinde wrote…

This explanation is a bit confusing.. maybe say that the table is assumed to contain pieces of keys that become full keys when prepended a valid prefix. The table's bloom filters are constructed only on the pieces of keys in the table, but interactions with the file ...

One question I have - are the keys / "key pieces" in the table required to be full keys (i.e. can they be passed to Compare or Split?). I believe they will be in practice but we don't depend on that as long as they're valid with the SyntheticPrefix?

Done. Since a prefix-less key will always contain a full suffix, I believe all compare operations will be valid on it. Might be worth adding some invariants here.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 3 unresolved discussions (waiting on @msbutler)


sstable/block.go line 774 at r1 (raw file):

Previously, msbutler (Michael Butler) wrote…

Changed it to use i.FirstUserKey which gets hydrated at init.

I believe we could also use bytes.Compare() since the syntheticPrefix will never contain suffix bytes (which have weird ordering), but I think the FirstUserKey approach is cleaner.

This deserves a comment. Something like:

The seek key is before or after the entire range of keys that start with SyntheticPrefix. To determine which, we need to compare against a valid key in this range. We use firstUserKey which has the synthetic prefix.


sstable/reader_common.go line 90 at r5 (raw file):

		return false
	}
	if len(p.ContentPrefix)>0{

[nit] formatting (do you have crlfmt or gofmt set up to run automatically?)


testdata/ingest_external line 525 at r5 (raw file):

----

# Synthesize prefix with one greater than the original one.

These comments don't make sense, we are replacing the empty prefix in both cases.

@msbutler msbutler force-pushed the butler-rebased-prefix branch from 3a40f7d to 6932834 Compare February 27, 2024 15:54
Copy link
Contributor Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 2 unresolved discussions (waiting on @msbutler and @RaduBerinde)


sstable/reader_common.go line 90 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] formatting (do you have crlfmt or gofmt set up to run automatically?)

i've been running it manually with the following cmd:

❯ which mbfmt
mbfmt () {
	echo crlfmt
	git diff-tree --no-commit-id --name-only -r HEAD | grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox} '.go' | xargs -n1 crlfmt -tab 2 -w
}

it's not great with multi commit prs. I should revisit running it automatically via my IDE.


testdata/ingest_external line 525 at r5 (raw file):

Previously, RaduBerinde wrote…

These comments don't make sense, we are replacing the empty prefix in both cases.

Oops, bad copy pasta.

@msbutler msbutler requested a review from sumeerbhola February 27, 2024 15:55
@msbutler
Copy link
Contributor Author

@RaduBerinde @sumeerbhola friendly ping to review this!

@sumeerbhola let me know if I should wait to merge until you have taken a look at this.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Awesome!

Reviewable status: 0 of 11 files reviewed, 7 unresolved discussions (waiting on @msbutler and @sumeerbhola)


sstable/block.go line 420 at r7 (raw file):

	// be used after ikey is set.
	key []byte
	// fullKey is a buffer used for key prefix decompression.

Mention that when a synthetic prefix is used, this always starts with that prefix.


sstable/block.go line 691 at r7 (raw file):

	}
	if i.transforms.SyntheticPrefix != nil {
		i.firstUserKey = append(append(i.firstUserKeyWithPrefixBuf[:0], i.transforms.SyntheticPrefix...), i.firstUserKey...)

[nit] this might allocate twice (not sure if Go "merges" the appends into a single operation). More importantly, we never actually set firstUserKeyWithPrefixBuf to anything, so we won't reuse buffers.

I'd replace with:

i.firstUserKeyWithPrefixBuf = slices.Grow(i.firstUserKeyWithPrefixBuf[:0], len(i.transforms.SyntheticPrefix) + len(i.firstUserKey))
i.firstUserKeyWithPrefixBuf = append(i.firstUserKeyWithPrefixBuf, i.transforms.SyntheticPrefix...)
i.firstUserKeyWithPrefixBuf = append(i.firstUserKeyWithPrefixBuf, i.firstUserKey...)
i.firstUserKey = i.firstUserKeyWithPrefixBuf

sstable/block.go line 964 at r7 (raw file):

				return i.Last()
			}
			i.offset = -1

Worth adding a comment like above, mentioning that Next will return the first user key.


sstable/reader_common.go line 86 at r9 (raw file):

// wrapper should be used.
func (p *PrefixReplacement) UsePrefixReplacementIterator() bool {
	if p == nil {

[nit] return p != nil && len(p.ContentPrefix) > 0


sstable/reader_test.go line 1381 at r9 (raw file):

					}
					if prefix != nil {
						keyToWrite = append(keyToWrite, prefix...)

[nit] keyToWrite = append(slices.Clip(prefix), key...)

@msbutler msbutler force-pushed the butler-rebased-prefix branch from 6932834 to b4090ae Compare March 1, 2024 00:47
Copy link
Contributor Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 2 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)


sstable/block.go line 420 at r7 (raw file):

Previously, RaduBerinde wrote…

Mention that when a synthetic prefix is used, this always starts with that prefix.

Done.


sstable/block.go line 691 at r7 (raw file):

Previously, RaduBerinde wrote…

[nit] this might allocate twice (not sure if Go "merges" the appends into a single operation). More importantly, we never actually set firstUserKeyWithPrefixBuf to anything, so we won't reuse buffers.

I'd replace with:

i.firstUserKeyWithPrefixBuf = slices.Grow(i.firstUserKeyWithPrefixBuf[:0], len(i.transforms.SyntheticPrefix) + len(i.firstUserKey))
i.firstUserKeyWithPrefixBuf = append(i.firstUserKeyWithPrefixBuf, i.transforms.SyntheticPrefix...)
i.firstUserKeyWithPrefixBuf = append(i.firstUserKeyWithPrefixBuf, i.firstUserKey...)
i.firstUserKey = i.firstUserKeyWithPrefixBuf

Done.

To clarify, this change will increase the capacity of firstUserKeyWithPrefixBuf, if necessary. But we do reuse the buf when we reset the iter for reuse, right?
https://github.com/msbutler/pebble/blob/butler-rebased-prefix/sstable/block.go#L541


sstable/block.go line 964 at r7 (raw file):

Previously, RaduBerinde wrote…

Worth adding a comment like above, mentioning that Next will return the first user key.

Done.


sstable/reader_common.go line 86 at r9 (raw file):

Previously, RaduBerinde wrote…

[nit] return p != nil && len(p.ContentPrefix) > 0

Done.


sstable/reader_test.go line 1381 at r9 (raw file):

Previously, RaduBerinde wrote…

[nit] keyToWrite = append(slices.Clip(prefix), key...)

Done.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 11 files reviewed, all discussions resolved (waiting on @sumeerbhola)


sstable/block.go line 691 at r7 (raw file):

Previously, msbutler (Michael Butler) wrote…

Done.

To clarify, this change will increase the capacity of firstUserKeyWithPrefixBuf, if necessary. But we do reuse the buf when we reset the iter for reuse, right?
https://github.com/msbutler/pebble/blob/butler-rebased-prefix/sstable/block.go#L541

Right. I was saying that the previous version of the change (r7 on reviewable) did not ever set firstUserKeyWithPrefixBuf to anything, so it would always be nil

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick pass, and have only a few nits
:lgtm:

Reviewed 5 of 10 files at r10, 1 of 1 files at r11, 2 of 9 files at r12, all commit messages.
Reviewable status: 3 of 11 files reviewed, 4 unresolved discussions (waiting on @msbutler)


-- commits line 72 at r12:
different than ...

isn't this the buggy path (that @RaduBerinde was trying to fix in #3344) that we are abandoning in favor of that PR?


sstable/block.go line 783 at r10 (raw file):

				// with SyntheticPrefix. To determine which, we need to compare against a
				// valid key in the block. We use firstUserKey which has the synthetic
				// prefix.

nit: a comment like this belongs before the predicate, since that is the line of code this is talking about.


sstable/block.go line 965 at r10 (raw file):

			// with SyntheticPrefix. To determine which, we need to compare against a
			// valid key in the block. We use firstUserKey which has the synthetic
			// prefix.

same nit


sstable/reader.go line 181 at r10 (raw file):

// keys that did include the prefix. Note that all Compare operations may act on
// a prefix-less key as the synthetic prefix will never modify key metadata
// stored in the key suffix.

Should this also say that such files must not contain rangedels or range keys, since the transformation is only applied to the point keys.

@RaduBerinde
Copy link
Member

-- commits line 72 at r12:

Previously, sumeerbhola wrote…

different than ...

isn't this the buggy path (that @RaduBerinde was trying to fix in #3344) that we are abandoning in favor of that PR?

Yes, the plan is to remove that once Restore is able to generate "prefix free" ssts.

msbutler added 2 commits March 1, 2024 08:36
This patch teaches block iterator about synthetic prefixes. When the iterator
is configured with a synthetic prefix, it will assume that that prefix is
implicitly prepended to every key in the underlying sst blocks when interacting
with or returning those keys.

According to the block level benchmarks, adding prefix synthesis support slows
blockiter.Next() and blockiter.Prev() down by ~1% on the main read path-- that
is, without synthetic suffix or prefix replacement.
```
❯ benchstat pre.txt post.txt
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/pebble/sstable
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
                                                    │   pre.txt    │              post.txt               │
                                                    │    sec/op    │    sec/op     vs base               │
BlockIterSeekGE/syntheticSuffix=false;restart=16-24   473.1n ±  2%   478.8n ±  4%       ~ (p=0.138 n=10)
BlockIterSeekGE/syntheticSuffix=true;restart=16-24    637.7n ± 12%   565.7n ± 14%       ~ (p=0.565 n=10)
BlockIterSeekLT/syntheticSuffix=false;restart=16-24   541.4n ±  3%   540.8n ±  3%       ~ (p=1.000 n=10)
BlockIterSeekLT/syntheticSuffix=true;restart=16-24    729.0n ±  1%   703.5n ±  2%  -3.50% (p=0.000 n=10)
BlockIterNext/syntheticSuffix=false;restart=16-24     15.62n ±  0%   15.78n ±  0%  +1.02% (p=0.000 n=10)
BlockIterNext/syntheticSuffix=true;restart=16-24      22.65n ± 16%   22.86n ± 13%       ~ (p=0.515 n=10)
BlockIterPrev/syntheticSuffix=false;restart=16-24     30.70n ±  0%   31.24n ±  0%  +1.76% (p=0.000 n=10)
BlockIterPrev/syntheticSuffix=true;restart=16-24      40.91n ±  2%   42.03n ±  2%  +2.74% (p=0.000 n=10)
geomean                                               123.2n         121.9n        -1.00%
```
The block level iteration benchmarks suggest that block level prefix synthesis
does not significantly slow down iteration, compared to block level iteration
on a block with the prefix prewritten to the key.

```
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/pebble/sstable
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
                                                                          │   post.txt   │
                                                                          │    sec/op    │
BlockIterSeekGE/syntheticPrefix=false;syntheticSuffix=false;restart=16-24   481.4n ±  3%
BlockIterSeekGE/syntheticPrefix=false;syntheticSuffix=true;restart=16-24    569.7n ± 14%
BlockIterSeekGE/syntheticPrefix=true;syntheticSuffix=false;restart=16-24    497.9n ±  2%
BlockIterSeekGE/syntheticPrefix=true;syntheticSuffix=true;restart=16-24     604.8n ±  7%
BlockIterSeekLT/syntheticPrefix=false;syntheticSuffix=false;restart=16-24   548.9n ±  3%
BlockIterSeekLT/syntheticPrefix=false;syntheticSuffix=true;restart=16-24    720.1n ±  1%
BlockIterSeekLT/syntheticPrefix=true;syntheticSuffix=false;restart=16-24    535.1n ±  3%
BlockIterSeekLT/syntheticPrefix=true;syntheticSuffix=true;restart=16-24     713.4n ±  2%
BlockIterNext/syntheticPrefix=false;syntheticSuffix=false;restart=16-24     15.90n ±  0%
BlockIterNext/syntheticPrefix=false;syntheticSuffix=true;restart=16-24      22.92n ± 14%
BlockIterNext/syntheticPrefix=true;syntheticSuffix=false;restart=16-24      15.44n ±  0%
BlockIterNext/syntheticPrefix=true;syntheticSuffix=true;restart=16-24       22.15n ± 15%
BlockIterPrev/syntheticPrefix=false;syntheticSuffix=false;restart=16-24     31.43n ±  0%
BlockIterPrev/syntheticPrefix=false;syntheticSuffix=true;restart=16-24      42.23n ±  1%
BlockIterPrev/syntheticPrefix=true;syntheticSuffix=false;restart=16-24      31.35n ±  0%
BlockIterPrev/syntheticPrefix=true;syntheticSuffix=true;restart=16-24       41.81n ±  2%
```
Copy link
Contributor Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 11 files reviewed, 3 unresolved discussions (waiting on @sumeerbhola)


sstable/block.go line 965 at r10 (raw file):

Previously, sumeerbhola wrote…

same nit

Done.


sstable/reader.go line 181 at r10 (raw file):

Previously, sumeerbhola wrote…

Should this also say that such files must not contain rangedels or range keys, since the transformation is only applied to the point keys.

Done.

This patch enables the db user to ingest a virtual sst with a synthetic prefix
and for higher level readers to plumb the synthetic prefix to the block
iterator which will conduct prefix synthesis.

Note that the db user can still ingest a virtual sst with a prefix _replacement_
rule, by passing the virtual sst the original prefix and the replacement
prefix. This tranformation is different that block level prefix synthesis as it
replaces the original prefix with the new prefix.
@msbutler msbutler force-pushed the butler-rebased-prefix branch 4 times, most recently from fdb9dae to a28181a Compare March 1, 2024 15:46
@msbutler msbutler merged commit 9fc9047 into cockroachdb:master Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants