migrate hdfs integration tests to embedded-tests#19158
migrate hdfs integration tests to embedded-tests#19158clintropolis merged 12 commits intoapache:masterfrom
Conversation
| // Azure resource: configures Azure as deep storage (druid.storage.type=azure, etc.). | ||
| // Adding it after the HDFS resource ensures Azure's deep-storage settings are not | ||
| // overridden by anything the HDFS resource might set. | ||
| // AzureStorageResource.onStarted() registers AzureStorageDruidModule automatically. |
There was a problem hiding this comment.
I think this line can be omitted.
| // The AzureStorageResource creates the container; deleting it cleans up all segments | ||
| // written during the test run. | ||
| azureResource.getStorageClient() | ||
| .getContainerReference(azureResource.getAzureContainerName()) | ||
| .deleteIfExists(); |
There was a problem hiding this comment.
AbstractAzureInputSourceParallelIndexTest simply does azureResource.deleteStorageContainer() for this. Would that not suffice?
There was a problem hiding this comment.
This test doesn't have a AzureTestUtil currently like AbstractAzureInputSourceParallelIndexTest does, should it have one?
There was a problem hiding this comment.
Ah, I missed that. I thought deleteStorageContainer() method was defined on the AzureStorageResource itself.
But yes, looking at the code again, I think it would make sense to reuse AzureTestUtil for this new test.
| try (Storage storage = GoogleStorageTestModule.createStorageForTests(gcsResource.getUrl())) { | ||
| // Delete all blobs under the deep-storage prefix for this test run. | ||
| storage.list(gcsResource.getBucket()).iterateAll() | ||
| .forEach(blob -> blob.delete()); | ||
| } |
There was a problem hiding this comment.
AbstractGcsInputSourceParallelIndexTest uses gcsResource.deletePrefixFolderFromGcs(dataSource); instead of this. I guess that is incorrect/insufficient since it would end up deleting only files for the last test datasource.
Thinking about it though, I think we might even skip this deletion altogether since the cluster itself (along with the GCS testcontainer) will be torn down after this anyway.
|
|
||
| // MinIO resource: configures S3/MinIO as deep storage (druid.storage.type=s3, etc.). | ||
| // Adding it after the HDFS resource ensures the S3 deep-storage settings win. | ||
| // MinIOStorageResource.onStarted() registers S3StorageDruidModule automatically. |
| public class HdfsStorageResource implements EmbeddedResource | ||
| { | ||
| private final boolean configureAsDeepStorage; | ||
| private MiniDFSCluster miniDFSCluster; |
3c2491f to
eb532b5
Compare
eb532b5 to
2cc8672
Compare
|
this PR seems broken after #19143, some dependency related problem since it works again if i remove the iceberg stuff, will try to sort it out later |
This PR migrates the native batch hdfs integration tests to
embedded-tests. They are different enough from theAbstractCloudInputSourceParallelIndexTesttests to be annoying (path instead of uri, etc), so has its own base class, but at least they share a template. It usesMiniDFSClusterfor the hdfs server so we don't need a whole docker hadoop cluster like the old ITs.