refactor(event cache): consolidate logic around retrieving the latest gap #4733

bnjbvr · 2025-02-27T16:35:32Z

This sits on top of #4731, which could be reviewed independently or as part of this PR (since it's only a single commit).

Before this PR, the logic when back-paginating is the following:

(1) try to back-paginate from the store
- if there's a gap in the chunk, indicate so to the caller
- load a chunk, if it's a gap, indicate so to the caller (if it's an events chunk, return it, and we're done)
- if we've reached the end of the timeline, and haven't waited for a pagination token, maybe we should wait for one
(2) then if we had a gap, we do the following:
- look at the in-memory linked chunk and try to find a gap
- depending on whether we found one or not, wait for one from sync

One can see that the gap is looked for in multiple places. Also, each of the major steps involve taking and releasing the state lock, which means that in between, some other things can happen to the room event cache's state. In particular, it can be shrunk after step (1) but before step (2), confusing the code a lot. In particular, the test that got transformed into a regression test by enabling the event cache's storage showed the following:

step (1) happens, there are no events: load_more_events_backwards() returns "Gap", meaning "wait for an initial prev-batch token" in this case.
the linked chunk got shrunk because a limited sync was observed by the event cache,
step (2) observes there's no gap in the in-memory linked chunk but there are events: it concludes we've hit the end of the timeline, while we haven't.

The key here is that step (2) should've called load_more_event_backwards() again, to properly handle the consequences of shrinking (either prepend a new chunk of events, or load a latest gap).

This PR consolidates all this logic, so load_more_event_backwards() outcome gets more information:

a gap now includes the prev-batch token, which is passed to the networking step.
the outcome may also be "wait for an initial prev-batch token, because i've never seen one". Hopefully the logic in conclude_load_more_for_fully_loaded_chunk is sufficiently documented with the code comment.

When the outcome is WaitForInitialPrevToken, we do the race between waiting 3 seconds or getting a prev-batch token from sync, and restart by calling the load_more_event_backwards() again. This properly handles a linked chunk that's shrunk when the state lock was released, fixing the regression test.

As a result, get_or_wait_for_token becomes useless, as it was only used in tests, and its meaning wasn't quite correct in the presence of the event cache, because of lazy-loading. I've removed it; first I started to port the tests, but they didn't make a lot of sense anymore (the waiting doesn't happen in load_more_event_backwards(), so I would've had to call paginate_backwards_impl, turning those "unit" tests into full integration tests — which I didn't want to). All this code is heavily tested in event cache integration tests, or indirectly by the timeline tests as well, so I'm confident it's safe to remove these tests.

Part of #3280.

codecov · 2025-02-27T16:55:38Z

Codecov Report

Attention: Patch coverage is 95.91837% with 2 lines in your changes missing coverage. Please review.

Project coverage is 86.14%. Comparing base (55143e1) to head (32e3975).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/matrix-sdk/src/event_cache/room/mod.rs	90.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4733      +/-   ##
==========================================
- Coverage   86.15%   86.14%   -0.02%     
==========================================
  Files         291      291              
  Lines       34287    34276      -11     
==========================================
- Hits        29539    29526      -13     
- Misses       4748     4750       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Hywan

Okay. So. Those are relatively important changes and I've no comment about it: not because I don't understand it, but because I think it's more robust to put things this way, and the changes make a lot of sense to me. Thanks.

The removal of get_or_wait_for_token is welcomed, embedding the logic inside LoadMoreEventsBackwardsOutcome is a nice idea.

The new loop in pagination that takes a new (write) lock guard for every iteration makes it more robust again state changes during calls to load_more_events_backwards(). Passing the prev_token directly from LoadMoreEventsBackwardsOutcome (so from what we have in the store!) to paginate_backwards_with_network doesn't solve the problem of concurrent paginations, but it does solve a race about the prev_token. All in all, it's easier to think about this code and its flow.

Thank you.

…s gap token

Also the tests, they were not quite useful to port to the new mechanism because they made little sense.

bnjbvr requested a review from Hywan February 27, 2025 16:35

bnjbvr requested a review from a team as a code owner February 27, 2025 16:35

bnjbvr force-pushed the bnjbvr/simplify-pagination-conditions branch from bb3825e to 6f1a9c5 Compare February 27, 2025 16:40

Hywan approved these changes Feb 28, 2025

View reviewed changes

bnjbvr force-pushed the bnjbvr/simplify-pagination-conditions branch from 6f1a9c5 to e0b9627 Compare February 28, 2025 11:07

bnjbvr enabled auto-merge (rebase) February 28, 2025 11:07

bnjbvr added 3 commits February 28, 2025 12:46

test(event cache): turn test into failing regression test

6c28719

refactor(event cache): consolidate logic around returning the previou…

1dcd754

…s gap token

refactor(event cache): get rid of get_or_wait_for_token

32e3975

Also the tests, they were not quite useful to port to the new mechanism because they made little sense.

bnjbvr force-pushed the bnjbvr/simplify-pagination-conditions branch from e0b9627 to 32e3975 Compare February 28, 2025 11:46

bnjbvr merged commit f7297ed into main Feb 28, 2025
42 checks passed

bnjbvr deleted the bnjbvr/simplify-pagination-conditions branch February 28, 2025 12:00

bitfriend mentioned this pull request Feb 28, 2025

Simplified sliding sync acterglobal/a3#2522

Draft

bnjbvr mentioned this pull request Mar 19, 2025

Sometimes Timeline::paginate_backward() returns true when it is not at the start of the timeline #4494

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(event cache): consolidate logic around retrieving the latest gap #4733

refactor(event cache): consolidate logic around retrieving the latest gap #4733

Uh oh!

bnjbvr commented Feb 27, 2025

Uh oh!

codecov bot commented Feb 27, 2025 •

edited

Loading

Uh oh!

Hywan left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

refactor(event cache): consolidate logic around retrieving the latest gap #4733

refactor(event cache): consolidate logic around retrieving the latest gap #4733

Uh oh!

Conversation

bnjbvr commented Feb 27, 2025

Uh oh!

codecov bot commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Hywan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 27, 2025 •

edited

Loading

Hywan left a comment •

edited

Loading