Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypercore Split Resolution DEP #43

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pfrazee
Copy link
Contributor

@pfrazee pfrazee commented Oct 2, 2018

I had a new idea for solving Hypercore forks that I wanted to propose. From the summary:

The Hypercore data-structure is an append-only log which depends on maintaining a linear and unbranching history in order to encode the state-changes of its data content. A "split" event occurs when the log branches, losing its linear history. This spec provides a process for resolving "splits" in a Hypercore log.

It's a fairly simple premise:

Hypercore's APIs will provide a method for registering a "split handler." The split handler will be implemented by the application using the Hypercore. It may have a standard definition provided by a higher-level data structure such as HyperDB or Hyperdrive.

In practice, it would look something like:

var log = hypercore(..)
log.setSplitHandler((seq, blockA, blockB) => {
  var parsedA = parse(blockA)
  var parsedB = parse(blockB)
  if (parsedA.mtime < parsedB.mtime) return 1
  if (parsedB.mtime < parsedA.mtime) return -1
  return Buffer.compare(blockA, blockB)
})

Basically, I think we can trade immutable history and the risk of total dat corruption for (in error conditions) mutable history and the risk of some lost writes.

@EGreg
Copy link

EGreg commented Oct 7, 2018

I have a lot of experience with designing such architectures, because preventing forks in a byzantine fault tolerant way is essentially the kind of BFT consensus that is used to eliminate the double-spend problem for crypto tokens.

If there is more than one valid history, one way to reach consensus is to vote on which valid history to take.

Another approach is to try to see if there were any conflicting writes in the current block, and if so, reject BOTH transactions. Then you have to agree on when a block is truly ended. See for example this debate:

https://forum.intercoin.org/t/intercoin-technology-consensus/80/8

The problem with consensus in general is that sybil attacks can manipulate the vote, ie a lot of computers can join a particular swarm of a file they are interested in corrupting or preventing further progress by eg going silent, so remaining nodes never know whether there was a majority of nodes are faking being offline, or there is a true netsplit and a larger branch might have a more valid resolution.

(In Bitcoin, etc. it is resolved by having a dictator be found using costly Proof of Work, but this leads to an arms race of electricity waste, something Dat would hardly want to be associated with. And also, the branch may STILL be overwritten anytime if another branch with even more proof of work appears later.)

The usual way to prevent sybil attacks is to have federation from the original nodes and slow issuance of new accounts, so at no point can a network be overwhelmed by a large number of new nodes coming in. This is the approach MaidSAFE takes, where new nodes furthermore have to contribute a lot of useful activity (such as Proof of Storage of data over a long period of time) before they are allowed to participate in voting about data and other automatic governance decisions. All this, of course, should be sharded by key, but the reputation of a node is itself a piece of data that is stored by older (and more privileged) nodes only.

https://medium.com/safenetwork/proof-of-resource-on-the-safe-network-192d5c951ebc

Also read this:

https://hackernoon.com/decentralized-objective-consensus-without-proof-of-work-a983a0489f0a

And the followup (maybe, less relevant)

https://hackernoon.com/proof-of-membership-for-blockchains-1534a3f9faba

However, there is a third option, which is like a simple version of the above one. Namely, there is a central authority that makes a decision about who gets to watch which data. This is the approach taken by Ripple. They publish and sign their Unique Node List (UNL) which is the set of all nodes participating in the network, to eliminate Sybil attacks in one fell swoop. The downside is that many in the crypto community hate Ripple and call them “centralized”, but it’s hard to argue with the results: it freed them up to work on other things, and now their network is very robust, and their coin is traded on tons of exchanges.

In Ripple’s case, the UNL contains the nodes that watch ALL tokens. In Dat’s case, each Dat network would need to have a signed policy for choosing nodes that participate in a consensus, whether it is about all Dats or just a subset (such as one Dat).

At the end of the day, this is not an easy problem, and if 51% or more of the nodes are not responding, the system cannot make any further progress without risking all the progress being rolled back in the future. This means eg I can’t accept payment for an expensive item because at some point in the future, the token you paid may be yanked from me and return to you, or someone else.

The way this is usually solved is to make it more expensive to take over 51% of the network than there is value represented by the network.

@pfrazee
Copy link
Contributor Author

pfrazee commented Oct 7, 2018 via email

"Split resolution" will occur during replication when a "split" is detected due to a received block. Split resolution should not occur during local writes. (Conflicting local writes should be rejected outright).

The split handler must be constructed in such a way that all peers will come to the same decisions independently. Peers should not make a decision based on information such as "time that the block was received," since it is not global knowledge. By only using global information, a split can be reliably resolved across the network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the code example from the opening post be mentioned here? It helped me create a better model of what is being described here, and I imagine it would too for people reading this in the future.


In cases with a strict security requirement this might be useful, but the append-only invariant can be difficult to maintain for users who migrate their dats between devices or restore from backups. For most users, it would be more desirable to risk losing some writes than to risk losing an entire dat due to a split.

This DEP specifies a process for recovering from splits so that users can safely backup or transfer their dats.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "safely" a guarantee here? The paragraph preceding it seems to imply like there's some risks involved (e.g. data loss) which do not guarantee safety.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a trade of costs. Currently, a split-event results in total hypercore corruption: no more forward progress can be made in the hypercore, and it must be replaced with a new keypair and history. This proposal provides a solution for resolving splits, therefore it re-enables forward progress on the hypercore. However it does not create a process for restoring any data in the "discarded branch" and so it leaves the potential for data loss.

The reason that this proposal doesn't include some form of data restoration from the dead-branch is that hypercore's semantics are too generic to create a universal solution. A restoration process is possible, but it would depend on the data-structure encoded on the hypercore. For instance, a hyperdrive might handle restoration by showing the user a list of the orphaned files, and prompting them to either "restore" or "discard" each file. This would cause each file to be rewritten onto the newly live branch.



# Unresolved questions
[unresolved]: #unresolved-questions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 34 the following statement is made:

It may have a standard definition provided by a higher-level data structure such as HyperDB or Hyperdrive.

Perhaps this would be worth mentioning as part of the unresolved questions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really an unresolved question; it's more a prompt for followup DEPs providing the standard definitions for the various data structures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha.


Hypercore's APIs will provide a method for registering a "split handler." The split handler will be implemented by the application using the Hypercore. It may have a standard definition provided by a higher-level data structure such as HyperDB or Hyperdrive.

The "split handler" function will receive the `seq` number, `blockA`, and `blockB`. It will be expected to return `-1` to accept `blockA` or `1` to accept `blockB`. If `0` is returned, no resolution occurs and the new block is rejected (the current behavior). When a split is resolved, any subsequent messages after the rejected block will also be rejected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using numbers might make a lot of sense for the JS implementation, but less so in Rust (an Enum seems like a better fit). Could the language be adjusted to reflect that better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure


"Split resolution" will cause data loss as some part of the history must be discarded. If the managing software is not careful, this can result in massive data loss (e.g. if the split occurs at the first message during recovery). To limit this potential, the managing software can query the network for the latest history in cases where a split is likely (such as a backup recovery process).

"Split resolution" will cause the append-only invariant of Hypercore logs to be optional. This means that file history and versions will not be immutable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications of changing the file history and versions to not be immutable? Which assumptions will be changed by this? It feels like this would be a pretty big change to the core premise of Hypercore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The largest implication is that specifying a revision-number no longer provides a strong guarantee of content. Explaining in more detail:

With immutable history, it's enough to provide the pair (pubkey, revision) to specify a set of content with a strong guarantee of the content. That is, for any (pubkey, revision) pair, there is only one dataset which may exist in the world. (That's assuming that the history's immutability is actually maintained, which depends on the network gossiping effectively about any known splits.)

With mutable history, to provide a strong guarantee of content, you must include a content hash. Therefore you must provide a triple (pubkey, revision, hash) to specify a strong guarantee.

We had already planned to create the triple-form as something called "strong links" for two reasons:

  1. When DNS is involved, you're actually specifying the pair (domain-name, revision) which means that the immutable history guarantee is not upheld. DNS is common enough that this is a concern.
  2. It's not physically impossible to maintain a split in the network, at least for some time, making the immutable history guarantee somewhat weak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another implication has to do with processing guarantees. For instance, if we assert immutable history, then a process which ingests hypercores and produces computed views can assume that previously-processed revisions will never change. This is a nice optimization which might even enable the process to discard revisions after processing them.

If the immutable history guarantee is lost, that optimization is no longer possible, and the processor has to watch for splits and recompute its views accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it should be optional for a data structure built on hypercore to include a split-resolution algorithm. If it does not, then it can assume immutable history.

Co-Authored-By: pfrazee <[email protected]>
@pfrazee
Copy link
Contributor Author

pfrazee commented Oct 25, 2018

@mafintosh and I discussed this in a WG meeting. We came to two conclusions:

  1. This spec is missing an important facet -- it needs to ensure that the hypercore owner is made aware of the full split prior to attempting resolution. When a split occurs, it's because the owner isn't aware of some previously-published history. It will need to query peers in order to discover the starting-point of the split. And, if it wants to restore any data from the split, it's going to need to download as much history as possible from the discarded branch. Currently this spec doesn't cover how that can happen.
  2. We have a pretty full docket with the hyperswarm and hyperdrive v2 updates, so this spec probably needs to wait till those are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants