You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the current issues with Hypercore is that a fork in the history is a fatal corruption of the data. This means that peoples' datasets can be destroyed by a botched "key move" between computers.
Another issue is that, because history cannot be rewritten, it's not currently possible to upgrade a datastructure on Hypercore (such as Hyperdb or Hyperdrive). If a breaking change has to be made to the data structure, then the old hypercore has to be replaced with an entirely new hypercore.
To counter-act this issue, @mafintosh and I have been talking about a meta "pointer structure" which provides a level of indirection between the "public URL" and the "internal identifier" of the hypercore. This would make it possible to replace a Dat dataset's internal data structures without changing the publicly-facing URL/key.
Such a data structure might look something like this:
The key would provide the ID of a hypercore, while seq would be a monotonically-increasing value. To update the pointer, the owner of the public-facing URL would publish a new signed HypercorePointer with a seq equal to the previous pointer's seq plus one.
During the exchange for a hypercore, peers will share their latest HypercorePointer and resolve to sync the pointer with the highest seq number. (They could continue to sync previous feeds.) The hypercore pointed to would then be synced within the existing swarm & connection.
Implications for apps & consuming clients
The HypercorePointer makes it possible to change the internal dataset without changing the URL.
When this occurs, the hypercore's data would essentially be reset, and all history could be altered. This is not a trivial event; from the perspective of any consuming application, the hypercore's previous state has been completely invalidated.
If the pointer is updated to fix a fork-corruption, it's likely that the application doing the fix would then try to recreate the last state on the new log. However, a pointer-update will have to be viewed by applications as a total reset, since the destination state can change
To manage this, we would most likely need to surface the HypercorePointer to the APIs and UIs in some way. @mafintosh explored the idea of calling the seq of the pointer a "major version" while the seq of an individual log is the "revision" or perhaps "minor version." This would mean that hypercore-based data structures are addressed by a major/minor version, such as 5.3.
The semantics of a major-version change, under this scheme, would be "this is basically a whole new dat, so clear any current indexes on it and reindex from scratch."
Thoughts and discussion open!
The text was updated successfully, but these errors were encountered:
I'm currently reviewing the Multi-writer DEP and it's a good example of a data-structure that would be affected by this proposal. The vector clocks currently encode the revisions, and the InflatedEntry encodes the keys of the hypercores. If this proposal were put in place, I'd suggest that the InflatedEntry be updated to include both the key and major version of the included logs. This way, the vector clocks would remain a set of revision scalars with 1 per feed. Nodes would write a new InflatedEntry with an updated major version to react to a major-version change.
My initial reaction to this is against; it's a bunch of implementation complexity and semantic burden, and doesn't feel like a satisfying solution to the motivating problems. For example, what if somebody copies the their key and then creates two conflicting HypercorePointer messages (same seq, different keys)?
I'll need more time to think it through and write up a good reply.
One of the current issues with Hypercore is that a fork in the history is a fatal corruption of the data. This means that peoples' datasets can be destroyed by a botched "key move" between computers.
Another issue is that, because history cannot be rewritten, it's not currently possible to upgrade a datastructure on Hypercore (such as Hyperdb or Hyperdrive). If a breaking change has to be made to the data structure, then the old hypercore has to be replaced with an entirely new hypercore.
To counter-act this issue, @mafintosh and I have been talking about a meta "pointer structure" which provides a level of indirection between the "public URL" and the "internal identifier" of the hypercore. This would make it possible to replace a Dat dataset's internal data structures without changing the publicly-facing URL/key.
Such a data structure might look something like this:
The
key
would provide the ID of a hypercore, whileseq
would be a monotonically-increasing value. To update the pointer, the owner of the public-facing URL would publish a new signedHypercorePointer
with aseq
equal to the previous pointer'sseq
plus one.During the exchange for a hypercore, peers will share their latest
HypercorePointer
and resolve to sync the pointer with the highestseq
number. (They could continue to sync previous feeds.) The hypercore pointed to would then be synced within the existing swarm & connection.Implications for apps & consuming clients
The
HypercorePointer
makes it possible to change the internal dataset without changing the URL.When this occurs, the hypercore's data would essentially be reset, and all history could be altered. This is not a trivial event; from the perspective of any consuming application, the hypercore's previous state has been completely invalidated.
If the pointer is updated to fix a fork-corruption, it's likely that the application doing the fix would then try to recreate the last state on the new log. However, a pointer-update will have to be viewed by applications as a total reset, since the destination state can change
To manage this, we would most likely need to surface the
HypercorePointer
to the APIs and UIs in some way. @mafintosh explored the idea of calling theseq
of the pointer a "major version" while theseq
of an individual log is the "revision" or perhaps "minor version." This would mean that hypercore-based data structures are addressed by a major/minor version, such as5.3
.The semantics of a major-version change, under this scheme, would be "this is basically a whole new dat, so clear any current indexes on it and reindex from scratch."
Thoughts and discussion open!
The text was updated successfully, but these errors were encountered: