Skip to content

migrate hdfs integration tests to embedded-tests#19158

Merged
clintropolis merged 12 commits intoapache:masterfrom
clintropolis:migrate-hdfs-integration-tests
Mar 17, 2026
Merged

migrate hdfs integration tests to embedded-tests#19158
clintropolis merged 12 commits intoapache:masterfrom
clintropolis:migrate-hdfs-integration-tests

Conversation

@clintropolis
Copy link
Member

This PR migrates the native batch hdfs integration tests to embedded-tests. They are different enough from the AbstractCloudInputSourceParallelIndexTest tests to be annoying (path instead of uri, etc), so has its own base class, but at least they share a template. It uses MiniDFSCluster for the hdfs server so we don't need a whole docker hadoop cluster like the old ITs.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicks.

// Azure resource: configures Azure as deep storage (druid.storage.type=azure, etc.).
// Adding it after the HDFS resource ensures Azure's deep-storage settings are not
// overridden by anything the HDFS resource might set.
// AzureStorageResource.onStarted() registers AzureStorageDruidModule automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line can be omitted.

Comment on lines +66 to +70
// The AzureStorageResource creates the container; deleting it cleans up all segments
// written during the test run.
azureResource.getStorageClient()
.getContainerReference(azureResource.getAzureContainerName())
.deleteIfExists();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractAzureInputSourceParallelIndexTest simply does azureResource.deleteStorageContainer() for this. Would that not suffice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't have a AzureTestUtil currently like AbstractAzureInputSourceParallelIndexTest does, should it have one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that. I thought deleteStorageContainer() method was defined on the AzureStorageResource itself.

But yes, looking at the code again, I think it would make sense to reuse AzureTestUtil for this new test.

Comment on lines +66 to +70
try (Storage storage = GoogleStorageTestModule.createStorageForTests(gcsResource.getUrl())) {
// Delete all blobs under the deep-storage prefix for this test run.
storage.list(gcsResource.getBucket()).iterateAll()
.forEach(blob -> blob.delete());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractGcsInputSourceParallelIndexTest uses gcsResource.deletePrefixFolderFromGcs(dataSource); instead of this. I guess that is incorrect/insufficient since it would end up deleting only files for the last test datasource.

Thinking about it though, I think we might even skip this deletion altogether since the cluster itself (along with the GCS testcontainer) will be torn down after this anyway.


// MinIO resource: configures S3/MinIO as deep storage (druid.storage.type=s3, etc.).
// Adding it after the HDFS resource ensures the S3 deep-storage settings win.
// MinIOStorageResource.onStarted() registers S3StorageDruidModule automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can omit this line.

public class HdfsStorageResource implements EmbeddedResource
{
private final boolean configureAsDeepStorage;
private MiniDFSCluster miniDFSCluster;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@clintropolis clintropolis force-pushed the migrate-hdfs-integration-tests branch from 3c2491f to eb532b5 Compare March 16, 2026 17:54
@clintropolis clintropolis force-pushed the migrate-hdfs-integration-tests branch from eb532b5 to 2cc8672 Compare March 16, 2026 17:59
@clintropolis
Copy link
Member Author

this PR seems broken after #19143, some dependency related problem since it works again if i remove the iceberg stuff, will try to sort it out later

@clintropolis clintropolis merged commit 2956e98 into apache:master Mar 17, 2026
37 checks passed
@clintropolis clintropolis deleted the migrate-hdfs-integration-tests branch March 17, 2026 06:28
@github-actions github-actions bot added this to the 37.0.0 milestone Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants