Unit tests of GRV requests

When a cluster (in real environment or simulation) does not work as expected, it takes time to identify which component causes the problem.

Since GRV is a critical part in transaction's correctness and performance, we should consider adding unit tests to check its contract. 

**Correctness**
- [ ] GRV should monotonically increase, even in different failure scenarios (which will be described later). A test workload can have multiple clients issues GRV and check that the versions monotonically increase per client and across clients;

**Performance**
- [ ] GRV latency should be similar for each client from each proxy;
- [ ] GRV throughput is expected;
- [ ] GRV performance does not degrade much (which will be quantified) when partial failure happens. 

**Partial failure**: Failure that does not trigger master recovery.
- Network between a proxy and master or resolver is slinky. The latency on these links is higher;
- A proxy has noisy neighbor and it gets less CPU, cache and memory bandwidth resource; 

If only one proxy has the partial failure, an ideal system should redirect traffic to other healthy proxies. The GRV latency should not degrade much. The GRV throughput should only decrease proportional to the number of degraded proxies. 

**This is orthogonal to the failure monitoring project**
This issue focuses on testing and understanding if the GRV contract is uphold and how the system's GRV requests reacts to failures. 

cc. @sfc-gh-kmakino @sears @yliucode 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unit tests of GRV requests #3869

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unit tests of GRV requests #3869

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions