Skip to content

Unit tests of GRV requests #3869

@xumengpanda

Description

@xumengpanda

When a cluster (in real environment or simulation) does not work as expected, it takes time to identify which component causes the problem.

Since GRV is a critical part in transaction's correctness and performance, we should consider adding unit tests to check its contract.

Correctness

  • GRV should monotonically increase, even in different failure scenarios (which will be described later). A test workload can have multiple clients issues GRV and check that the versions monotonically increase per client and across clients;

Performance

  • GRV latency should be similar for each client from each proxy;
  • GRV throughput is expected;
  • GRV performance does not degrade much (which will be quantified) when partial failure happens.

Partial failure: Failure that does not trigger master recovery.

  • Network between a proxy and master or resolver is slinky. The latency on these links is higher;
  • A proxy has noisy neighbor and it gets less CPU, cache and memory bandwidth resource;

If only one proxy has the partial failure, an ideal system should redirect traffic to other healthy proxies. The GRV latency should not degrade much. The GRV throughput should only decrease proportional to the number of degraded proxies.

This is orthogonal to the failure monitoring project
This issue focuses on testing and understanding if the GRV contract is uphold and how the system's GRV requests reacts to failures.

cc. @sfc-gh-kmakino @sears @yliucode

Metadata

Metadata

Assignees

No one assigned

    Labels

    testingsimulation, real cluster, and unit tests.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions