Skip to content

Conversation

@ScottDugas
Copy link
Collaborator

@ScottDugas ScottDugas commented Sep 3, 2025

This introduces a new KeySpacePath.exportAllData to export all the data stored in the path. This can eventually be used to import into another cluster, or back into the same cluster, after clearing.

Other than the path information, the data exported is raw bytes, with no transformation or indicated semantics.

Resolves: 3572

@ScottDugas ScottDugas added the enhancement New feature or request label Sep 3, 2025
@ScottDugas ScottDugas changed the title Keyspacepath export Add new KeySpacePath.exportAllData Sep 5, 2025
import javax.annotation.Nonnull;
import java.util.UUID;

/**
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you're reviewing this top-to-bottom, this class was extracted from KeySpaceDirectoryTest, but setupSampleData was added.

@ScottDugas ScottDugas marked this pull request as ready for review September 8, 2025 16:56
@Nullable byte[] continuation,
@Nonnull ScanProperties scanProperties) {
return new LazyCursor<>(toTupleAsync(context)
.thenApply(tuple -> KeyValueCursor.Builder.withSubspace(new Subspace(tuple))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to try and head off some potential problems: in #3397, there's a modification being made to the KeyValueCursor continuation to wrap it in a protobuf. This would allow us to avoid returning an empty ByteString as the continuation in case there is a single key range, which may be something that a KeySpacePath could run into. (It would require that the first key in the keyspace path is set, I think.)

There's sort of an open question as to how we migrate all of the uses over from the old format to the new format. It would be nice if we could set the serialization mode on this one to the new format so that we don't have to worry about migrating it later, I think. Though that would introduce a dependency on #3397 being merged first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.
I changed exportAllData on the interface to be EXPERIMENTAL, and added a note that we expect the continuation change without preserving backwards compatibility when that merges. If the rest of the pieces are in place such that this is useful before #3397 gets merged, we can potentially update the javadoc, and make this be backwards compatible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3397 merged first, and so this just uses the new continuation type.

this.rawKeyValue = rawKeyValue;

// Convert the raw key to a Tuple and resolve it starting from the provided path
Tuple keyTuple = Tuple.fromBytes(rawKeyValue.getKey());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be concerned with keys that are not Tuple parseable? I believe all of the keys that we currently generate are Tuple parseable, though we could adjust that in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeySpacePathImpl always produces tuples, and AFAIK every key we have ever used has been in tuples. I feel like committing to that at this point, is worthwhile. If we find a strong reason to put non-tuple data in a key, we'll have to cross that bridge when we get to it.

this.resolvedPath = path.toResolvedPathAsync(context).thenCompose(resolvedPath -> {
// Now use the resolved path to find the child for the key
// We need to figure out how much of the key corresponds to the resolved path
Tuple pathTuple = resolvedPath.toTuple();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this validate that pathTuple is a prefix of keyTuple? Strictly speaking, I don't think the check is necessary, but it could be a sensible thing to validate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's a good double check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I took advantage of TupleHelpers.isPrefix, which apparently didn't have any unit tests, but I'm adding them in a followup PR. https://github.com/FoundationDB/fdb-record-layer/pull/3578/files#diff-c7f3ee5c693bba884071e40ab98f13c12a86350a819f71f1c6a60eb7b750623c
I'm not sure if it's worth bringing into this PR.

Tuple keyTuple = Tuple.fromBytes(rawKeyValue.getKey());

// First resolve the provided path to get its resolved form
this.resolvedPath = path.toResolvedPathAsync(context).thenCompose(resolvedPath -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It almost seems like the method provided should be a method on the KeySpacePath. I was sort of surprised that we didn't already have it. I can see how we'd have to be clear in the name that we're starting with the full key rather than just the suffix. I suppose we could structure this here to go the other way: we resolve the KeySpacePath first, use it to get a raw byte prefix, and then resolve the Tuple suffix using findChildForKey

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also kind of surprised. I was a bit fixated on getting this working, that I didn't pause to think that I should add it there. I'll move most of this there.

// Verify accessor methods
KeyValue retrievedKeyValue = dataInPath.getRawKeyValue();
assertNotNull(retrievedKeyValue);
assertEquals(originalKeyValue.getKey(), retrievedKeyValue.getKey());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that for arrays, the assertion here ends up asserting that the two arrays are pointer equal. Which I guess makes sense given that the original KeyValue is just being wrapped. That may be what you inteded to assert here, but I just wanted to make sure that that's what was going on here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, updated, but I'm probably going to make more substantial changes to the api here, namely hiding the key entirely.

Comment on lines 92 to 93
KeySpacePath companyPath = root.path("company");
DataInKeySpacePath dataInPath = new DataInKeySpacePath(companyPath, keyValue, context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to have an assertion somewhere that states that you can choose any path along the way, and all of those will return the same DataInKeySpacePath

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was the intent of the parameter. Fixed.

/**
* Class representing a {@link KeyValue} pair within in {@link KeySpacePath}.
*/
public class DataInKeySpacePath {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this implement equals and hashCode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably, but I'm not entirely sure how to do that with the CompletableFuture for the resolved path.
It can either fall through and say it's equal if the original path, and raw value or equal, or, alternately, I can change it so that the constructor takes the resolved path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started doing this path, an discovered that ResolvedKeySpacePath.equals is broken, and I've started working on a fix for that. Its a fair amount of new tests, and a couple other equals/hashcode implementations, so I'm going to pull that out into a separate PR, and come back to this once it is fixed.

Issue: #3594

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I went ahead and added ResolvedKeySpacePath.equals, and updated the tests to take advantage of it, but this class holds onto a future for the resolved path, making equals hard. It's possible that as I go along, i'll see that it makes more sense to have exportAllData resolve these futures so that DataInKeySpacePath has a concrete ResolvedKeySpacePath, but I think in the short term I'd rather leave this as-is, and see how it shakes out as the surrounding code is implemented.

* @param context the context in which to export
* @return a list of the raw {@code KeyValue}s being exported
*/
private static List<KeyValue> exportAllData(final KeySpacePath pathToExport, final FDBRecordContext context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this returns KeyValues instead of DataInKeySpacePaths?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historical. Originally I had exportAllData returning KeyValue, and at the time it seemed easier to just have this return the raw data. But I went back and encapsulated the raw key entirely, so this now returns a DataInKeySpacePath.

try (FDBRecordContext context = database.openContext()) {
KeySpacePath continuationPath = root.path("continuation");

final ScanProperties scanProperties = ScanProperties.FORWARD_SCAN.with(props -> props.setReturnedRowLimit(limit));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test reverse scans like this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added.


@ParameterizedTest
@ValueSource(ints = {1, 2, 3, 30})
void exportAllDataWithContinuation(int limit) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case that is relevant to continuations: scanning a single record key space path. Something like:

  1. Resolving a path to a tuple then
  2. Setting that specific key to some value and
  3. Validating that we get that value back and that the continuation that comes back is sensible (not null and ideally not empty)

I'm not actually sure we'd get that key back (that is, whether the scan is over a strict subspace or not). And if it does return a continuation, I actually think it would return an empty continuation for the first element (because of #3206, see #3397), but it would be nice to know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added a test that it does a sane thing if you have only a single element under the keyspace. I tried to parameterize to get a couple other edge cases.

Copy link
Collaborator Author

@ScottDugas ScottDugas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through a bunch of the comments, and responded. I've made a bunch of the changes locally, but when I got to implementing DataInKeySpacePath.equals/hashCode, I discovered that ResolvedKeySpacePath does not implement those correctly. I'm going to pause, the work on this PR, to resolve that. I'm submitting my commits in the meantime, for visibility.

@Nullable byte[] continuation,
@Nonnull ScanProperties scanProperties) {
return new LazyCursor<>(toTupleAsync(context)
.thenApply(tuple -> KeyValueCursor.Builder.withSubspace(new Subspace(tuple))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.
I changed exportAllData on the interface to be EXPERIMENTAL, and added a note that we expect the continuation change without preserving backwards compatibility when that merges. If the rest of the pieces are in place such that this is useful before #3397 gets merged, we can potentially update the javadoc, and make this be backwards compatible.

/**
* Class representing a {@link KeyValue} pair within in {@link KeySpacePath}.
*/
public class DataInKeySpacePath {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably, but I'm not entirely sure how to do that with the CompletableFuture for the resolved path.
It can either fall through and say it's equal if the original path, and raw value or equal, or, alternately, I can change it so that the constructor takes the resolved path.

this.rawKeyValue = rawKeyValue;

// Convert the raw key to a Tuple and resolve it starting from the provided path
Tuple keyTuple = Tuple.fromBytes(rawKeyValue.getKey());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeySpacePathImpl always produces tuples, and AFAIK every key we have ever used has been in tuples. I feel like committing to that at this point, is worthwhile. If we find a strong reason to put non-tuple data in a key, we'll have to cross that bridge when we get to it.

*/
@API(API.Status.UNSTABLE)
@Nonnull
RecordCursor<DataInKeySpacePath> exportAllData(@Nonnull FDBRecordContext context,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that this is an interface, and while I don't think we intended anyone to extend it, I should probably put a default implementation in.

Tuple keyTuple = Tuple.fromBytes(rawKeyValue.getKey());

// First resolve the provided path to get its resolved form
this.resolvedPath = path.toResolvedPathAsync(context).thenCompose(resolvedPath -> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also kind of surprised. I was a bit fixated on getting this working, that I didn't pause to think that I should add it there. I'll move most of this there.

// Verify accessor methods
KeyValue retrievedKeyValue = dataInPath.getRawKeyValue();
assertNotNull(retrievedKeyValue);
assertEquals(originalKeyValue.getKey(), retrievedKeyValue.getKey());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, updated, but I'm probably going to make more substantial changes to the api here, namely hiding the key entirely.

Comment on lines +238 to +240
assertEquals("blob_id", resolved.getDirectoryName());
byte[] resolvedBytes = (byte[]) resolved.getResolvedValue();
assertArrayEquals(blobId, resolvedBytes);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I updated assertNameAndValue to check for byte[] and use assertArrayEquals appropriately.

Comment on lines 56 to 57
@ValueSource(ints = {0, 1, 2, 3, 4, 5})
void resolution() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, but it was supposed to be used to resolve from a path deeper than the root. Updated the test to actually use it.

Comment on lines 92 to 93
KeySpacePath companyPath = root.path("company");
DataInKeySpacePath dataInPath = new DataInKeySpacePath(companyPath, keyValue, context);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was the intent of the parameter. Fixed.

/**
* Class representing a {@link KeyValue} pair within in {@link KeySpacePath}.
*/
public class DataInKeySpacePath {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started doing this path, an discovered that ResolvedKeySpacePath.equals is broken, and I've started working on a fix for that. Its a fair amount of new tests, and a couple other equals/hashcode implementations, so I'm going to pull that out into a separate PR, and come back to this once it is fixed.

Issue: #3594

I left a TODO for myself to add unit tests of withRemainder...
I updated the tests to not use KeyValue directly, at which point,
having DataInKeySpacePath expose the value and not the raw KeyValue
seemed to make sense
*/
@API(API.Status.EXPERIMENTAL)
@Nonnull
default CompletableFuture<ResolvedKeySpacePath> toResolvedPathAsync(@Nonnull FDBRecordContext context, byte[] key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doing work similar to keySpace.resolveFromKey? What is the difference? Is it because of efficiency (not needing to parse the key from the start every time)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's primarily here because we want to export from a KeySpacePath, and that does not have a reference back to the KeySpace. The majority of the logic is shared.
It probably has some efficiency gains if this path is on one of the last sub-directories, but it is still resolving this KeySpacePath on every call.

*/
@API(API.Status.EXPERIMENTAL)
@Nonnull
default RecordCursor<DataInKeySpacePath> exportAllData(@Nonnull FDBRecordContext context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this added to the API at this level? Would it make sense to add this to the FDBRecordStore?
Update: Discussed with Scott, and the reason is that this would allow exporting multiple stores with a single call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. If a client wants to export a bunch of stores at once, that would be challenging if this were on the store itself. And, if a user wants to export a single store that is still possible.
Also, the store takes a subspace (although IMHO it should take a KeySpacePath), so it would be awkward to deal with DirectoryLayerDirectory, as eventually we want to be providing the logical keys, rather than the actual tuples.

* @return a new {@code ResolvedKeySpacePath} that is the same as this, except with a different {@link #getRemainder()}.
*/
@Nonnull
public ResolvedKeySpacePath withRemainder(@Nullable final Tuple newRemainder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public ResolvedKeySpacePath withRemainder(@Nullable final Tuple newRemainder) {
@VisibleForTesting
public ResolvedKeySpacePath withRemainder(@Nullable final Tuple newRemainder) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be used by external clients, but I'm on board with giving limited visibility and exposing as needed. I also reduced it to package-scoped.

keySpace.root().getValue(), keySpace.root(), context));
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a test that fails to resolve a key?
Would it make sense to add test for resolve with NULL value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that fails to resolve (has a different root)

Regarding NULL, are you suggesting creating a DataInKeySpacePath with KeyValue(key, null), or a path that has a NULL type in it?
For the former, I don't think FDB will ever return such a KeyValue, and if you have one, you would never be able to insert it, because set does not allow a null value. Give that, it probably makes sense to have DataInKeySpacePath error if it gets a null value.

For the later, I think that should be better tested in KeySpacePath.toResolvedPathAsync.

}
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a test that fails to export?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a way at this level to cause errors that would be relevant from a business-logic perspective. We could have it export from a closed transaction, or something like that, but that's doesn't seem particularly valuable here.

.addSubdirectory(new KeySpaceDirectory("bytes", KeyType.BYTES))
.addSubdirectory(new KeySpaceDirectory("uuids", KeyType.UUID))
.addSubdirectory(new KeySpaceDirectory("booleans", KeyType.BOOLEAN)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add NULL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it might be worthwhile to have it cover all KeyTypes. I reworked the test a bit to do that. I think it also makes the overall test a little bit cleaner.

Predominately:
- DataInKeySpace now errors if it gets a null value
- Improved javadoc linking KeySpace.resolveFromKeyAsync and
  KeySpacePath.toResolvedPathAsync
- Fix error message in default method
- Reduce visibility of ResolvedKeySPacePath.withRemainder
- new negative tests in DataInKeySpacePathTest
- Better coverage of exporting with mixed-type sub-directories
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add KeySpacePath.export method

3 participants