remove `CachingStoreManager` from `factor-key-value` #2995

dicej · 2025-01-27T21:39:47Z

The semantic (non-)guarantees for wasi-keyvalue are still under discussion, but meanwhile the behavior of Spin's write-behind cache has caused some headaches, so I'm removing it until we have more clarity on what's allowed and what's disallowed by the proposed standard.

The original motivation behind CachingStoreManager was to reflect the anticipated behavior of an eventually-consistent, low-latency, cloud-based distributed store and, per Hyrum's Law help app developers avoid depending on the behavior of a local, centralized store which would not match that of a distributed store. However, the write-behind caching approach interacts poorly with the lazy connection establishment which some StoreManager implementations use, leading writes to apparently succeed even when the connection fails.

Subsequent discussion regarding the above issue arrived at a consensus that we should not consider a write to have succeeded until and unless we've successfully connected to and received a write confirmation from at least one replica in a distributed system. I.e. rather than the replication factor (RF) = 0 we've been effectively providing up to this point, we should provide RF=1. The latter still provides low-latency performance when the nearest replica is reasonably close, but improves upon RF=0 in that it shifts responsibility for the write from Spin to the backing store prior to returning "success" to the application.

Note that RF=1 (and indeed anything less than RF=ALL) cannot guarantee that the write will be seen immediately (or, in the extreme case of an unrecoverable failure, at all) by readers connected to other replicas. Applications requiring a stronger consistency model should use an ACID-style backing store rather than an eventually consistent one.

The semantic (non-)guarantees for wasi-keyvalue are still [under discussion](WebAssembly/wasi-keyvalue#56), but meanwhile the behavior of Spin's write-behind cache has caused [some headaches](spinframework#2952), so I'm removing it until we have more clarity on what's allowed and what's disallowed by the proposed standard. The original motivation behind `CachingStoreManager` was to reflect the anticipated behavior of an eventually-consistent, low-latency, cloud-based distributed store and, per [Hyrum's Law](https://www.hyrumslaw.com/) help app developers avoid depending on the behavior of a local, centralized store which would not match that of a distributed store. However, the write-behind caching approach interacts poorly with the lazy connection establishment which some `StoreManager` implementations use, leading writes to apparently succeed even when the connection fails. Subsequent discussion regarding the above issue arrived at a consensus that we should not consider a write to have succeeded until and unless we've successfully connected to and received a write confirmation from at least one replica in a distributed system. I.e. rather than the replication factor (RF) = 0 we've been effectively providing up to this point, we should provide RF=1. The latter still provides low-latency performance when the nearest replica is reasonably close, but improves upon RF=0 in that it shifts responsibility for the write from Spin to the backing store prior to returning "success" to the application. Note that RF=1 (and indeed anything less than RF=ALL) cannot guarantee that the write will be seen immediately (or, in the extreme case of an unrecoverable failure, at all) by readers connected to other replicas. Applications requiring a stronger consistency model should use an ACID-style backing store rather than an eventually consistent one. Signed-off-by: Joel Dice <[email protected]>

lann · 2025-01-27T21:43:06Z

How hard would it be to change the cache from write-behind to write-through (instead of removing it)?

dicej · 2025-01-27T21:50:43Z

How hard would it be to change the cache from write-behind to write-through?

Probably not hard. The question is: do we want caching at all, and if so, how much?

We can always bring it back if/when there's a clear need, but for now the simplest thing seems to be to just remove it. I also wonder if specific key-value implementations might want to enforce their own caching rules.

lann

Fair enough. I guess if you're performance-sensitive enough for this to matter then you probably ought to avoid immediately reading your writes anyway 🤷

dicej mentioned this pull request Jan 27, 2025

Key-Value Swallows Write Errors When Backing Impl Fails #2952

Closed

lann approved these changes Jan 27, 2025

View reviewed changes

itowlson approved these changes Jan 27, 2025

View reviewed changes

dicej merged commit abba902 into spinframework:main Jan 27, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove `CachingStoreManager` from `factor-key-value` #2995

remove `CachingStoreManager` from `factor-key-value` #2995

dicej commented Jan 27, 2025

lann commented Jan 27, 2025 •

edited

Loading

dicej commented Jan 27, 2025

lann left a comment

remove CachingStoreManager from factor-key-value #2995

remove CachingStoreManager from factor-key-value #2995

Conversation

dicej commented Jan 27, 2025

lann commented Jan 27, 2025 • edited Loading

dicej commented Jan 27, 2025

lann left a comment

Choose a reason for hiding this comment

remove `CachingStoreManager` from `factor-key-value` #2995

remove `CachingStoreManager` from `factor-key-value` #2995

lann commented Jan 27, 2025 •

edited

Loading