Skip to content

Subscription: add topic owner epoch fencing#17780

Open
VGalaxies wants to merge 1 commit into
masterfrom
subscription-topic-owner-fencing
Open

Subscription: add topic owner epoch fencing#17780
VGalaxies wants to merge 1 commit into
masterfrom
subscription-topic-owner-fencing

Conversation

@VGalaxies
Copy link
Copy Markdown
Contributor

Summary

  • Add topic-level owner epoch metadata for subscription fencing.
  • Propagate owner id and epoch from subscription consumers.
  • Reject stale owners during heartbeat, subscribe, poll, and commit.
  • Add focused tests for serialization and owner transfer fencing.

Tests

  • mvn test -pl iotdb-core/relational-grammar,iotdb-client/service-rpc,iotdb-client/subscription,iotdb-core/node-commons,iotdb-core/datanode -Dtest=SubscriptionReceiverV1Test -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
  • mvn test -pl iotdb-client/service-rpc,iotdb-client/subscription,iotdb-core/node-commons -Dtest=TopicDeSerTest -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false

@Caideyipi
Copy link
Copy Markdown
Collaborator

Findings

  • P1: TopicMeta 的新尾部字段破坏旧格式兼容。deserialize(InputStream) 用 inputStream.available() 判断是否还有 owner
    字段,但 TopicMeta 在 snapshot 里是连续写入的;旧版本数据没有 owner flag,多个 topic 时这里会把下一个 topic
    的首字节当成 owner flag 消费,导致后续反序列化错位。ByteBuffer.hasRemaining() 在旧 procedure/plan
    列表里也有同类问题。见 TopicMeta.java#L260

(

if (inputStream.available() > 0 && ReadWriteIOUtils.readBool(inputStream)) {
)
和 TopicMetaKeeper.java#L112

(https://github.com/apache/iotdb/blob/76f2a882411ba790f7db290740f0232a1e5c4022/iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/subscription/meta/topic/TopicMetaKeeper.java#L112)。建议改成版本化或长度前缀格式,或只从
config 属性恢复 owner,不在非自描述对象尾部追加可选字段。

  • P1: owner epoch 单调性没有在真实元数据替换路径上保证。transferOwner() 只在同一个对象上检查递增,但 AlterTopicPlan /
    metadata replace 路径直接 remove + add 新 TopicMeta,新对象从 -1 初始化,所以较小 epoch 也能覆盖当前 epoch,使旧
    owner 重新合法。见 TopicMeta.java#L135

(

if (isOwnerFencingEnabled() && ownerEpoch <= this.ownerEpoch) {
)
和 SubscriptionInfo.java#L340

(https://github.com/apache/iotdb/blob/76f2a882411ba790f7db290740f0232a1e5c4022/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/persistence/subscription/SubscriptionInfo.java#L340)。建议在
ConfigNode alter/handle meta change 时对比 existing owner epoch 并拒绝回退。

  • P2: 新增的 owner status code 没有接入客户端错误分类。1913-1916 会落到 default,变成 generic critical
    exception;heartbeat 还会把 provider 标成 unavailable,导致 stale owner 场景反复重连/重试且错误信息不稳定。见
    TSStatusCode.java#L319

(https://github.com/apache/iotdb/blob/76f2a882411ba790f7db290740f0232a1e5c4022/iotdb-client/service-rpc/src/main/java/org/apache/iotdb/rpc/TSStatusCode.java#L319)、AbstractSubscriptionProvider.java#L445

(https://github.com/apache/iotdb/blob/76f2a882411ba790f7db290740f0232a1e5c4022/iotdb-client/subscription/src/main/java/org/apache/iotdb/session/subscription/consumer/base/AbstractSubscriptionProvider.java#L445)。建议显式处理
owner fenced/required/lease expired,映射成明确的不可重试业务异常。

@VGalaxies VGalaxies force-pushed the subscription-topic-owner-fencing branch from 76f2a88 to b6086b2 Compare May 28, 2026 10:35
@VGalaxies
Copy link
Copy Markdown
Contributor Author

Thanks @Caideyipi, addressed the three findings in b6086b2.

  • P1 serialization compatibility: removed the optional owner fields from the non-self-describing TopicMeta serialization tail. Owner state is now restored from topic attributes after deserialization, so sequential TopicMeta snapshot/procedure streams are not consumed out of boundary. Added a sequential deserialization regression test.
  • P1 epoch rollback: added TopicMeta.validateOwnerProgression(...) and applied it on both ConfigNode alter-topic metadata replacement and DataNode topic-meta update handling, rejecting owner clears/rollbacks/stale same-epoch owner changes. Added ConfigNode rollback coverage and extended the network-partition old-SN test to verify stale topic meta cannot make the old owner valid again.
  • P2 client classification: mapped 1913-1917 explicitly to SubscriptionOwnerFencedException, a SubscriptionRuntimeNonCriticalException subclass, so stale-owner/business fencing errors no longer fall into the generic critical default path.

Local verification passed:

  • mvn spotless:apply -pl iotdb-client/service-rpc,iotdb-client/subscription,iotdb-core/node-commons,iotdb-core/datanode,iotdb-core/confignode -DskipTests
  • mvn test -pl iotdb-client/service-rpc,iotdb-client/subscription,iotdb-core/node-commons -Dtest=TopicDeSerTest -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
  • mvn test -pl iotdb-client/service-rpc,iotdb-client/subscription,iotdb-core/node-commons,iotdb-core/confignode -Dtest=SubscriptionInfoTest -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
  • mvn test -pl iotdb-core/relational-grammar,iotdb-client/service-rpc,iotdb-client/subscription,iotdb-core/node-commons,iotdb-core/datanode -Dtest=SubscriptionReceiverV1Test -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
  • mvn test -pl iotdb-client/service-rpc,iotdb-client/subscription -Dtest=TSStatusCodeTest -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
  • git diff --check

The Sonar duplication check is queued again on the new commit; I will follow up if the rerun still fails.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 44.23077% with 145 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.93%. Comparing base (0c25e53) to head (25c34c2).
⚠️ Report is 28 commits behind head on master.

Files with missing lines Patch % Lines
.../subscription/receiver/SubscriptionReceiverV1.java 0.00% 40 Missing ⚠️
...tdb/commons/subscription/meta/topic/TopicMeta.java 81.72% 17 Missing ⚠️
...on/consumer/base/AbstractSubscriptionConsumer.java 0.00% 12 Missing ⚠️
...umer/base/AbstractSubscriptionConsumerBuilder.java 0.00% 10 Missing ⚠️
...on/consumer/base/AbstractSubscriptionProvider.java 0.00% 10 Missing ⚠️
...on/exception/SubscriptionOwnerFencedException.java 0.00% 8 Missing ⚠️
...on/consumer/tree/SubscriptionTreePullConsumer.java 0.00% 8 Missing ⚠️
...on/consumer/tree/SubscriptionTreePushConsumer.java 0.00% 8 Missing ⚠️
.../db/subscription/agent/SubscriptionTopicAgent.java 78.12% 7 Missing ⚠️
...er/table/SubscriptionTablePullConsumerBuilder.java 0.00% 6 Missing ⚠️
... and 4 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17780      +/-   ##
============================================
+ Coverage     40.58%   40.93%   +0.34%     
- Complexity     2575     2615      +40     
============================================
  Files          5181     5187       +6     
  Lines        350404   351646    +1242     
  Branches      44801    45029     +228     
============================================
+ Hits         142225   143948    +1723     
+ Misses       208179   207698     -481     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@jt2594838 jt2594838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an IT to show a complete use case.

Comment on lines +202 to +209
if (!topicMetaKeeper.containsTopicMeta(topicName)) {
return RpcUtils.SUCCESS_STATUS;
}

final TopicMeta topicMeta = topicMetaKeeper.getTopicMeta(topicName);
if (!topicMeta.isOwnerFencingEnabled()) {
return RpcUtils.SUCCESS_STATUS;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to merge contains and get?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I merged the contains/get path into a single getTopicMeta(...) call and handle the missing-topic case with a null check before owner-fencing validation.

@VGalaxies VGalaxies force-pushed the subscription-topic-owner-fencing branch from b6086b2 to 25c34c2 Compare June 2, 2026 09:46
@VGalaxies
Copy link
Copy Markdown
Contributor Author

Addressed the latest review comments in 25c34c2.

  • Added IoTDBSubscriptionTopicOwnerIT as a complete local subscription use case: a stale owner (sn1/epoch 5) is fenced after the topic is owned by sn2/epoch 6, while the current owner can subscribe, poll data, commit, and unsubscribe.
  • Kept the new IT aligned with the current subscription IT convention: it is @ignore because SubscriptionConfig#getSubscriptionEnabled() is currently hard-coded false in this codebase, but the class compiles and is discovered under the with-integration-tests profile.
  • Preserved SubscriptionOwnerFencedException through subscribe redirection so the stale-owner path remains a business fencing error.
  • Merged the SubscriptionTopicAgent contains/get check into one getTopicMeta(...) null check.

Local verification:

  • mvn spotless:apply -pl integration-test -P with-integration-tests -DskipTests
  • mvn verify -DskipUTs -Dit.test=IoTDBSubscriptionTopicOwnerIT -DfailIfNoTests=false -Dfailsafe.failIfNoSpecifiedTests=false -pl integration-test -am -P with-integration-tests
  • mvn test -pl iotdb-client/service-rpc,iotdb-client/subscription -Dtest=TSStatusCodeTest -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
  • mvn test -pl iotdb-core/relational-grammar,iotdb-client/service-rpc,iotdb-client/subscription,iotdb-core/node-commons,iotdb-core/datanode -Dtest=SubscriptionReceiverV1Test -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
  • git diff --check

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 2, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
5.2% Duplication on New Code (required ≤ 5%)

See analysis details on SonarQube Cloud

Copy link
Copy Markdown
Contributor

@jt2594838 jt2594838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need one more test to show how to update the ownership and how to show the ownership.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants