Skip to content

ZOOKEEPER-4849: Option to Provide Custom X509 Implementation of QuorumAuthServer and QuorumAuthLearner #2269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

purushah
Copy link
Contributor

… of QuorumAuthServer and QuorumAuthLearner

@purushah purushah changed the title ZooKeeperZOOKEEPER-4849: Option to Provide Custom X509 Implementation… ZOOKEEPER-4849: Option to Provide Custom X509 Implementation of QuorumAuthServer and QuorumAuthLearner Jun 17, 2025
@purushah
Copy link
Contributor Author

purushah commented Jun 30, 2025

@eolivelli @kezhuw @anmolnar could you please review? This change implements our paranoid security recommendation: each ZooKeeper quorum member must present a distinct mTLS certificate, validated against its CNAME. The patch adds the ability to configure authentication for both quorum servers and learners—very similar to the work tracked in https://issues.apache.org/jira/browse/ZOOKEEPER-2123

@kezhuw kezhuw self-requested a review July 3, 2025 08:18
Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the implementation of ZOOKEEPER-2123, we now have the option to implement a custom X509 AuthenticationProvider for client-server authentication. We also need a similar option for quorum authentication.

I think X509AuthenticationProvider exists mainly for integration with ACL. Regarding ssl, it has no specials(it do have some customizations, say hostname verification, but that does not matter), it is just a normal cert verifier.

In our cloud environment, we could restrict quorum members to those with specific certificates, allowing only those to join the quorum.

Why are you ever issue certificates for these servers or can you revoke those you want to reject ? I think you could use a dedicated ca for zookeeper quorum, and never issue certificates to non zookeeper servers.

@@ -390,6 +393,10 @@ public void parseProperties(Properties zkProp) throws IOException, ConfigExcepti
multiAddressReachabilityCheckEnabled = parseBoolean(key, value);
} else if (key.equals("oraclePath")) {
oraclePath = value;
} else if (key.equals(QuorumAuth.QUORUM_SSL_AUTHPROVIDER)) {
sslAuthServerProvider = value;
} else if (key.equals(QuorumAuth.QUORUM_SSL_LEARNER_AUTHPROVIDER)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No java system properties ? I found that most other ssl config options supports both, sslQuorumReloadCertFiles is an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve always wondered why we need to pull the server-side settings from JVM system properties when they’re almost always defined in zoo.cfg. Supporting both adds needless complexity. It makes sense to allow client-side overrides via –D flags, but reading the server properties that way isn’t strictly necessary. In any event, I’ll go ahead and add the system-property support for completeness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but reading the server properties that way isn’t strictly necessary

I guess ZooKeeper server was designed to own/occupy jvm.

You may have to delete these tow options from QuorumPeerConfig, otherwise it will not propagate them as system properties.

@purushah
Copy link
Contributor Author

@kezhuw thank you for your review. About using a dedicated CA: we actually use a global certificate-management system called Athenz to issue certs for all our services, including multiple ZooKeeper quorums. Spinning up a completely separate CA per quorum would be painful—every time you add or remove a ZooKeeper node you’d have to provision or revoke a cert in that dedicated CA.

Instead, you can continue using Athenz but tighten its issuance policies for your ZK quorums. For example:

  • Scoped Roles or Domains: Define an Athenz domain (e.g. zookeeper.quorum) and only allow services in that domain to get certs with a specific OU or SAN (ou=zookeeper-quorum).

  • Dynamic Membership: When a new ZK server spins up, it simply presents its Athenz role and automatically gets a cert scoped to the quorum domain—no manual CA changes.

This lets you keep a single, centrally managed CA (Athenz) while still ensuring only bona fide quorum members can join.

…mAuthServer and QuorumAuthLearner - review comments
Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config option does not look right to me, I left some inline comments.

I am open to export QuorumAuthServer, but still have some concerns.

every time you add or remove a ZooKeeper node you’d have to provision or revoke a cert in that dedicated CA.

Dynamic Membership: When a new ZK server spins up, it simply presents its Athenz role and automatically gets a cert scoped to the quorum domain—no manual CA changes.

We could divide membership changes.

  1. Add a new member. This requires no change. CA simply signs a new cert for the new member. Existing quorum member could verify the newly signed cert with existing trust store.
  2. Revoke a old member. ZooKeeper currently have ssl.quorum.crl to support cert revocation and sslQuorumReloadCertFiles to detect trust store changes in filesystem.

ssl.crl and ssl.quorum.crl : (Java system properties: zookeeper.ssl.crl and zookeeper.ssl.quorum.crl) New in 3.5.5: Specifies whether Certificate Revocation List is enabled in client and quorum TLS protocols. Default: false

sslQuorumReloadCertFiles : (No Java system property) New in 3.5.5, 3.6.0: Allows Quorum SSL keyStore and trustStore reloading when the certificates on the filesystem change without having to restart the ZK process. Default: false

The questions from my side:

  1. Is ssl.quorum.crl and sslQuorumReloadCertFiles enough for revocation ?
  2. If not, how QuorumAuthLearner would help in your case ?

Scoped Roles or Domains: Define an Athenz domain (e.g. zookeeper.quorum) and only allow services in that domain to get certs with a specific OU or SAN (ou=zookeeper-quorum).

  1. Does ssl.quorum.hostnameVerification already fulfill the requirement ? This require a match between zookeeper node hostname and cert san names. For example, it will reject node "node1.kafka.quorum" from joining quorum with cert san "node1.zookeeper.quorum".
  2. How about a dedicated ca(say, "ca.zookeeper.quorum") for zookeeper domain so you don't have to filter out non zookeeper nodes ? This is what I suggested in previous comment. This way, you don't have to reject node "node1.kafka.quorum" from joining quorum with cert san "node1.kafka.quorum" as cert "node1.kafka.quorum" is not signed by "ca.zookeeper.quorum".
  3. Is it worth for zookeeper to have an option (say, "zookeeper.ssl.quorum.nameFilter") to filter certs on cn/san ? I would prefer dedicated ca.

@eolivelli @anmolnar @cnauroth Any thoughts ?

authLearner = getSslQuorumAuthLearner();
} catch (Exception e) {
LOG.error(e.getMessage(), e);
throw new SaslException(e.getMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a IOException ? I guess you may have to change some signatures.

@@ -390,6 +393,10 @@ public void parseProperties(Properties zkProp) throws IOException, ConfigExcepti
multiAddressReachabilityCheckEnabled = parseBoolean(key, value);
} else if (key.equals("oraclePath")) {
oraclePath = value;
} else if (key.equals(QuorumAuth.QUORUM_SSL_AUTHPROVIDER)) {
sslAuthServerProvider = value;
} else if (key.equals(QuorumAuth.QUORUM_SSL_LEARNER_AUTHPROVIDER)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but reading the server properties that way isn’t strictly necessary

I guess ZooKeeper server was designed to own/occupy jvm.

You may have to delete these tow options from QuorumPeerConfig, otherwise it will not propagate them as system properties.

// Reflection helpers to access private methods

private QuorumAuthServer invokeGetSslQuorumAuthServer(QuorumPeer peer) throws Exception {
Method m = QuorumPeer.class.getDeclaredMethod("getSslQuorumAuthServer");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you should do here is reading the field authServer for assertion, its getter is getQuorumAuthServer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants