Proposal & Discussion: Hypercore "major version" pointer #31

pfrazee · 2018-06-20T21:07:56Z

One of the current issues with Hypercore is that a fork in the history is a fatal corruption of the data. This means that peoples' datasets can be destroyed by a botched "key move" between computers.

Another issue is that, because history cannot be rewritten, it's not currently possible to upgrade a datastructure on Hypercore (such as Hyperdb or Hyperdrive). If a breaking change has to be made to the data structure, then the old hypercore has to be replaced with an entirely new hypercore.

To counter-act this issue, @mafintosh and I have been talking about a meta "pointer structure" which provides a level of indirection between the "public URL" and the "internal identifier" of the hypercore. This would make it possible to replace a Dat dataset's internal data structures without changing the publicly-facing URL/key.

Such a data structure might look something like this:

message HypercorePointer {
  required bytes key = 1;
  required uint16 seq = 2;
}

The key would provide the ID of a hypercore, while seq would be a monotonically-increasing value. To update the pointer, the owner of the public-facing URL would publish a new signed HypercorePointer with a seq equal to the previous pointer's seq plus one.

During the exchange for a hypercore, peers will share their latest HypercorePointer and resolve to sync the pointer with the highest seq number. (They could continue to sync previous feeds.) The hypercore pointed to would then be synced within the existing swarm & connection.

Implications for apps & consuming clients

The HypercorePointer makes it possible to change the internal dataset without changing the URL.

When this occurs, the hypercore's data would essentially be reset, and all history could be altered. This is not a trivial event; from the perspective of any consuming application, the hypercore's previous state has been completely invalidated.

If the pointer is updated to fix a fork-corruption, it's likely that the application doing the fix would then try to recreate the last state on the new log. However, a pointer-update will have to be viewed by applications as a total reset, since the destination state can change

To manage this, we would most likely need to surface the HypercorePointer to the APIs and UIs in some way. @mafintosh explored the idea of calling the seq of the pointer a "major version" while the seq of an individual log is the "revision" or perhaps "minor version." This would mean that hypercore-based data structures are addressed by a major/minor version, such as 5.3.

The semantics of a major-version change, under this scheme, would be "this is basically a whole new dat, so clear any current indexes on it and reindex from scratch."

Thoughts and discussion open!

The text was updated successfully, but these errors were encountered:

pfrazee · 2018-06-21T17:36:15Z

I'm currently reviewing the Multi-writer DEP and it's a good example of a data-structure that would be affected by this proposal. The vector clocks currently encode the revisions, and the InflatedEntry encodes the keys of the hypercores. If this proposal were put in place, I'd suggest that the InflatedEntry be updated to include both the key and major version of the included logs. This way, the vector clocks would remain a set of revision scalars with 1 per feed. Nodes would write a new InflatedEntry with an updated major version to react to a major-version change.

bnewbold · 2018-06-27T19:24:01Z

My initial reaction to this is against; it's a bunch of implementation complexity and semantic burden, and doesn't feel like a satisfying solution to the motivating problems. For example, what if somebody copies the their key and then creates two conflicting HypercorePointer messages (same seq, different keys)?
I'll need more time to think it through and write up a good reply.

pfrazee · 2018-06-27T19:28:38Z

For example, what if somebody copies the their key and then creates two conflicting HypercorePointer messages (same seq, different keys)?

You'd publish another message with a higher seq. But yes, then you basically have multiple authoritative major versions of a feed. Not ideal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal & Discussion: Hypercore "major version" pointer #31

Proposal & Discussion: Hypercore "major version" pointer #31

pfrazee commented Jun 20, 2018

pfrazee commented Jun 21, 2018

bnewbold commented Jun 27, 2018

pfrazee commented Jun 27, 2018

Proposal & Discussion: Hypercore "major version" pointer #31

Proposal & Discussion: Hypercore "major version" pointer #31

Comments

pfrazee commented Jun 20, 2018

Implications for apps & consuming clients

pfrazee commented Jun 21, 2018

bnewbold commented Jun 27, 2018

pfrazee commented Jun 27, 2018