Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal & Discussion: Hypercore "major version" pointer #31

Open
pfrazee opened this issue Jun 20, 2018 · 3 comments
Open

Proposal & Discussion: Hypercore "major version" pointer #31

pfrazee opened this issue Jun 20, 2018 · 3 comments

Comments

@pfrazee
Copy link
Contributor

pfrazee commented Jun 20, 2018

One of the current issues with Hypercore is that a fork in the history is a fatal corruption of the data. This means that peoples' datasets can be destroyed by a botched "key move" between computers.

Another issue is that, because history cannot be rewritten, it's not currently possible to upgrade a datastructure on Hypercore (such as Hyperdb or Hyperdrive). If a breaking change has to be made to the data structure, then the old hypercore has to be replaced with an entirely new hypercore.

To counter-act this issue, @mafintosh and I have been talking about a meta "pointer structure" which provides a level of indirection between the "public URL" and the "internal identifier" of the hypercore. This would make it possible to replace a Dat dataset's internal data structures without changing the publicly-facing URL/key.

Such a data structure might look something like this:

message HypercorePointer {
  required bytes key = 1;
  required uint16 seq = 2;
}

The key would provide the ID of a hypercore, while seq would be a monotonically-increasing value. To update the pointer, the owner of the public-facing URL would publish a new signed HypercorePointer with a seq equal to the previous pointer's seq plus one.

During the exchange for a hypercore, peers will share their latest HypercorePointer and resolve to sync the pointer with the highest seq number. (They could continue to sync previous feeds.) The hypercore pointed to would then be synced within the existing swarm & connection.

Implications for apps & consuming clients

The HypercorePointer makes it possible to change the internal dataset without changing the URL.

When this occurs, the hypercore's data would essentially be reset, and all history could be altered. This is not a trivial event; from the perspective of any consuming application, the hypercore's previous state has been completely invalidated.

If the pointer is updated to fix a fork-corruption, it's likely that the application doing the fix would then try to recreate the last state on the new log. However, a pointer-update will have to be viewed by applications as a total reset, since the destination state can change

To manage this, we would most likely need to surface the HypercorePointer to the APIs and UIs in some way. @mafintosh explored the idea of calling the seq of the pointer a "major version" while the seq of an individual log is the "revision" or perhaps "minor version." This would mean that hypercore-based data structures are addressed by a major/minor version, such as 5.3.

The semantics of a major-version change, under this scheme, would be "this is basically a whole new dat, so clear any current indexes on it and reindex from scratch."


Thoughts and discussion open!

@pfrazee
Copy link
Contributor Author

pfrazee commented Jun 21, 2018

I'm currently reviewing the Multi-writer DEP and it's a good example of a data-structure that would be affected by this proposal. The vector clocks currently encode the revisions, and the InflatedEntry encodes the keys of the hypercores. If this proposal were put in place, I'd suggest that the InflatedEntry be updated to include both the key and major version of the included logs. This way, the vector clocks would remain a set of revision scalars with 1 per feed. Nodes would write a new InflatedEntry with an updated major version to react to a major-version change.

@bnewbold
Copy link
Contributor

My initial reaction to this is against; it's a bunch of implementation complexity and semantic burden, and doesn't feel like a satisfying solution to the motivating problems. For example, what if somebody copies the their key and then creates two conflicting HypercorePointer messages (same seq, different keys)?
I'll need more time to think it through and write up a good reply.

@pfrazee
Copy link
Contributor Author

pfrazee commented Jun 27, 2018

For example, what if somebody copies the their key and then creates two conflicting HypercorePointer messages (same seq, different keys)?

You'd publish another message with a higher seq. But yes, then you basically have multiple authoritative major versions of a feed. Not ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants