Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove CachingStoreManager from factor-key-value #2995

Merged
merged 1 commit into from
Jan 27, 2025

Conversation

dicej
Copy link
Contributor

@dicej dicej commented Jan 27, 2025

The semantic (non-)guarantees for wasi-keyvalue are still under discussion, but meanwhile the behavior of Spin's write-behind cache has caused some headaches, so I'm removing it until we have more clarity on what's allowed and what's disallowed by the proposed standard.

The original motivation behind CachingStoreManager was to reflect the anticipated behavior of an eventually-consistent, low-latency, cloud-based distributed store and, per Hyrum's Law help app developers avoid depending on the behavior of a local, centralized store which would not match that of a distributed store. However, the write-behind caching approach interacts poorly with the lazy connection establishment which some StoreManager implementations use, leading writes to apparently succeed even when the connection fails.

Subsequent discussion regarding the above issue arrived at a consensus that we should not consider a write to have succeeded until and unless we've successfully connected to and received a write confirmation from at least one replica in a distributed system. I.e. rather than the replication factor (RF) = 0 we've been effectively providing up to this point, we should provide RF=1. The latter still provides low-latency performance when the nearest replica is reasonably close, but improves upon RF=0 in that it shifts responsibility for the write from Spin to the backing store prior to returning "success" to the application.

Note that RF=1 (and indeed anything less than RF=ALL) cannot guarantee that the write will be seen immediately (or, in the extreme case of an unrecoverable failure, at all) by readers connected to other replicas. Applications requiring a stronger consistency model should use an ACID-style backing store rather than an eventually consistent one.

The semantic (non-)guarantees for wasi-keyvalue are still [under
discussion](WebAssembly/wasi-keyvalue#56), but meanwhile
the behavior of Spin's write-behind cache has caused [some
headaches](spinframework#2952), so I'm removing it
until we have more clarity on what's allowed and what's disallowed by the
proposed standard.

The original motivation behind `CachingStoreManager` was to reflect the
anticipated behavior of an eventually-consistent, low-latency, cloud-based
distributed store and, per [Hyrum's Law](https://www.hyrumslaw.com/) help app
developers avoid depending on the behavior of a local, centralized store which
would not match that of a distributed store.  However, the write-behind caching
approach interacts poorly with the lazy connection establishment which some
`StoreManager` implementations use, leading writes to apparently succeed even
when the connection fails.

Subsequent discussion regarding the above issue arrived at a consensus that we
should not consider a write to have succeeded until and unless we've
successfully connected to and received a write confirmation from at least one
replica in a distributed system.  I.e. rather than the replication factor (RF) =
0 we've been effectively providing up to this point, we should provide RF=1.
The latter still provides low-latency performance when the nearest replica is
reasonably close, but improves upon RF=0 in that it shifts responsibility for
the write from Spin to the backing store prior to returning "success" to the
application.

Note that RF=1 (and indeed anything less than RF=ALL) cannot guarantee that the
write will be seen immediately (or, in the extreme case of an unrecoverable
failure, at all) by readers connected to other replicas.  Applications requiring
a stronger consistency model should use an ACID-style backing store rather than
an eventually consistent one.

Signed-off-by: Joel Dice <[email protected]>
@lann
Copy link
Collaborator

lann commented Jan 27, 2025

How hard would it be to change the cache from write-behind to write-through (instead of removing it)?

@dicej
Copy link
Contributor Author

dicej commented Jan 27, 2025

How hard would it be to change the cache from write-behind to write-through?

Probably not hard. The question is: do we want caching at all, and if so, how much?

We can always bring it back if/when there's a clear need, but for now the simplest thing seems to be to just remove it. I also wonder if specific key-value implementations might want to enforce their own caching rules.

Copy link
Collaborator

@lann lann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I guess if you're performance-sensitive enough for this to matter then you probably ought to avoid immediately reading your writes anyway 🤷

@dicej dicej merged commit abba902 into spinframework:main Jan 27, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants