-
Notifications
You must be signed in to change notification settings - Fork 41
Add HDFS StorageBackend implementation #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add HDFS StorageBackend implementation #583
Conversation
Thanks @tigrulya-exe! This looks like a great addition and quite complete coverage of the storage back-end. However, I'm hesitant to move forward on the review as I lack experience on HDFS to be useful on anything apart from the APIs usage. There are still some work on the project we would like to prioritize before on-boarding a new back-end as well: preparing for Tiered Storage becoming prod-ready in 3.9 or later, and adding docs and release process, etc. A couple of alternatives while this is open for discussion is to point to your fork (or a separate repo with just HDFS) from our README to let users know there's an HDFS implementation. Let me know wdyt, and thanks again for your contribution! |
@jeqo Hi! Thank you for the feedback! I think it's a nice idea to point to our fork with the HDFS storage implementation in your README while this PR is open for discussion :) I don't think we need to create a separate repository just for HDFS, as it could complicate porting future features from the main repository |
@jeqo Hi! Eventually, we decided to move our implementation of the storage backend to a separate repository. However, we discovered that there were no publicly available Maven repositories containing your jars. Could you please publish them in one of the Maven artifactories so we and other potential developers of custom storage backends can use them without having to build the core project locally? We will also be grateful if you publish the |
@tigrulya-exe thanks for the update! Yes, I'm working on this. Could you validate if the snapshot artifacts are available for you? e.g. this is the JAR for test-fixtures: https://oss.sonatype.org/service/local/repositories/snapshots/content/io/aiven/tiered-storage-for-apache-kafka-storage-core/0.0.1-SNAPSHOT/tiered-storage-for-apache-kafka-storage-core-0.0.1-20250306.172800-8-test-fixtures.jar |
@jeqo Hi! Thanks for the quick reply! I deleted locally built core project artifacts from the local Gradle cache and added |
This PR adds support for HDFS as a StorageBackend implementation. It also provides Kerberos authentication through the use of a provided keytab and supports asynchronous metric collection based on HDFS client file system statistics.
Users can provide HDFS client configuration in two ways: either by using traditional XML files, specifying their location in the
hdfs.core-site.path
andhdfs.hdfs-site.path
options, or by passing the configuration options as regular Kafka options with thehdfs.conf.
prefix.