Skip to content

Conversation

tigrulya-exe
Copy link

This PR adds support for HDFS as a StorageBackend implementation. It also provides Kerberos authentication through the use of a provided keytab and supports asynchronous metric collection based on HDFS client file system statistics.

Users can provide HDFS client configuration in two ways: either by using traditional XML files, specifying their location in the hdfs.core-site.path and hdfs.hdfs-site.path options, or by passing the configuration options as regular Kafka options with the hdfs.conf. prefix.

@tigrulya-exe tigrulya-exe requested a review from a team as a code owner August 27, 2024 14:47
@jeqo
Copy link
Contributor

jeqo commented Sep 13, 2024

Thanks @tigrulya-exe! This looks like a great addition and quite complete coverage of the storage back-end. However, I'm hesitant to move forward on the review as I lack experience on HDFS to be useful on anything apart from the APIs usage.
I'd like to leave this PR open in the meantime to gather feedback and let others to chime in around how to proceed with adding a new back-end.

There are still some work on the project we would like to prioritize before on-boarding a new back-end as well: preparing for Tiered Storage becoming prod-ready in 3.9 or later, and adding docs and release process, etc.

A couple of alternatives while this is open for discussion is to point to your fork (or a separate repo with just HDFS) from our README to let users know there's an HDFS implementation.

Let me know wdyt, and thanks again for your contribution!

@tigrulya-exe
Copy link
Author

@jeqo Hi! Thank you for the feedback! I think it's a nice idea to point to our fork with the HDFS storage implementation in your README while this PR is open for discussion :) I don't think we need to create a separate repository just for HDFS, as it could complicate porting future features from the main repository

@tigrulya-exe
Copy link
Author

@jeqo Hi! Eventually, we decided to move our implementation of the storage backend to a separate repository. However, we discovered that there were no publicly available Maven repositories containing your jars. Could you please publish them in one of the Maven artifactories so we and other potential developers of custom storage backends can use them without having to build the core project locally?

We will also be grateful if you publish the testFixtures jar of the :storage:core module, so we can use BaseStorageTest in our tests.

@jeqo
Copy link
Contributor

jeqo commented Mar 6, 2025

@tigrulya-exe thanks for the update! Yes, I'm working on this. Could you validate if the snapshot artifacts are available for you? e.g. this is the JAR for test-fixtures: https://oss.sonatype.org/service/local/repositories/snapshots/content/io/aiven/tiered-storage-for-apache-kafka-storage-core/0.0.1-SNAPSHOT/tiered-storage-for-apache-kafka-storage-core-0.0.1-20250306.172800-8-test-fixtures.jar

@tigrulya-exe
Copy link
Author

@jeqo Hi! Thanks for the quick reply! I deleted locally built core project artifacts from the local Gradle cache and added https://oss.sonatype.org/service/local/repositories/snapshots/content as a Maven repository to our project, and it seems like everything works, including tests inherited from BaseStorageTest. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants