Skip to content

Commit

Permalink
documentation polished, technical documentation added
Browse files Browse the repository at this point in the history
  • Loading branch information
tomaskulich committed May 20, 2015
1 parent 9972c6e commit 760fe17
Show file tree
Hide file tree
Showing 4 changed files with 164 additions and 4 deletions.
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,15 @@

[![Build Status](https://drone.io/github.com/vacuumlabs/persistent/status.png)](https://drone.io/github.com/vacuumlabs/persistent/latest)

check out [changes in 2.0 version!] (https://github.com/vacuumlabs/persistent/wiki/20changes).
Check out [changes in 2.0 version!] (changes_2_0.md)

The project is forked from
Learn how you can use [transients] (transients.md)

Want to understand the code? Want to contribute? See [technical overview] (technical.md)

<!-- The project is forked from
[polux/persistent](https://github.com/polux/persistent).
-->

## What are persistent data structures
*Persistent* data structure is an immutable structure; the main difference with standard data structures is how you 'write' to them: instead of mutating
Expand Down Expand Up @@ -69,5 +74,3 @@ or Dart2JS on Node (the numbers are quite independent of the structure size):

Although the factors are quite big, the whole operation is still very fast and it probably won't be THE bottleneck which would slow down your app.

Some [advanced topics](https://github.com/vacuumlabs/persistent/wiki/Advanced-topics).

9 changes: 9 additions & 0 deletions changes_2_0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- memory footprint reduced with a factor of 15 (wait what? Was the old implementation so
ineffective? Or the new one is so cool? The truth is: both. Check out benchmarks)

- changes in API, most notably PersistentMap -> PMap, PersistentVector -> PVec

- more effective == and != on PMap

- deleted several classes, the whole class/interface hierarchy becomes much simpler (although little bit dirtier; some performance-motivated compromises were introduced)

64 changes: 64 additions & 0 deletions technical.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Technical overview

The implementation of Persistent Vector is very similar to the one found in
Facebook's [immutable.js] (https://github.com/facebook/immutable-js). We show almost no invention
here. The rest of the document describes our design of Persistent Map which is more unusual and needs
more explanation.


## PMap technical overview

The implementation is a version of HAMT, be sure you understand the basic concepts before reading
further. Good places to start are [wikipedia] (http://en.wikipedia.org/wiki/Hash_array_mapped_trie)
or this [blog post]
(http://blog.higher-order.net/2009/09/08/understanding-clojures-persistenthashmap-deftwice.html).
The following text explains issues, that are specific for our implementation.

The whole HAMT consists of two types of nodes: Node and Leaf (they are called _Node and _Leaf in the
code).

Node is typical HAMT inner node. It's branching factor is set to 16 (this may change) currently this
gets us best results in the benchmarks. Note that Node implements PMap interface.

Leaf can hold several key-value pairs. These are stored in a simple List such as:
[hash1, ke1, value1, hash2, key2, value2, etc..]
if the leaf grows big (currently, > 48 such h,k,v triplets), it is split up to several Nodes. Similarly,
if Node stores only few k,v pairs (in all its nodes) it is compacted to one single Leaf (threshold
for this is currently set to < 32 triplets)

Few things to note here:

- In the tree, h,k,v triplets are stored in a way to guarantee the following property: if iterating
through one Node by inorder (i.e. you are recursively visiting its children from the 0th to the
15-th), you enumerate h,k,v triplets sorted by hash value. This may look unimportant on the first
glance, but it simplifies several things; for example comparing Leaf with Node on equality, or doing intersection
with Leaf and Node gets easier. For this purpose, we do the following:

- In a single Leaf, h,k,v triplets are sorted by the hash. This allows us to binsearch for the
correct value, when doing lookup.

- In the put / lookup process, we consume the key hash from the first digits (not from the last,
as usual). Note that hashes of small objects (especially, small ints) tend to have just zeros
in the leading places. To overcome this problem, we work with mangled hash, which has enough
entropy also in the first digits (check out _mangeHash function).

- In the Node implementation we're not compacting the array of children. Typically, to save memory, HAMT
implementation stores only not-null children. Such implementations then use bitmask to correctly
determine, what the proper indexes of individual (not-null) children would be (if the nulls were
there). Such trick is neat, but it costs time, and moreover, we don't need it. Why? Because we
store up to 48 values in a single Leaf. This means, when the Leaf gets expanded to a proper Node,
most of its children will be not null. (Exercise: you randomly pick 48 numbers from
interval 0,15 inclusive. What is the expectation for the count of numbers not picked at least once?)

- Node is a strange class. It serves for two purposes (which is probably not the cleanest design):
it implements all PMap methods (in fact, when you construct new PMap, what you got is Node) and it
implements low-level method for HAMT manipulation. Moreover, PMap methods (such as assoc) can be
called only on the root Node - on every other Node, such call will lead to inconsistent result.
Why such bad design?

- The main purpose is to save time and memory by creating an additional object that would encapsulate the
root Node (yes, it matters).

- All "bad things" happen only internally and there is no possibility for the end-user to get the
structure to the inconsistent state. So, it's not such a bad design after all.

84 changes: 84 additions & 0 deletions transients.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
## Working with Transients

Persistent data structure can 'unfreeze' into *Transient* structure (TMap, TVec, ..), which is mutable. The purpose of this is to gain some speedup while still working with Persistents. The typical workflow is as follows:

1. TMap trans = pers.asTransient();
2. do a lot of mutations on trans
3. Persistent result = trans.asPersistent();

There are several notable things here:
- When working with transients, methods like `assoc` or `delete` are no longer there. Instead of these, use `doAssoc`, `doDelete`, etc. These methods return void and mutate the structure inplace.
- once `.asPersistent` is called, you cannot modify transient anymore. If you try doing it, you'll get an exception.
- conversion Persistent -> Transient and vice-versa is O(1), which means fast, in this case, really fast.
- everything is safe, i.e. basic contract 'there is no way, how to modify Persistent data structure' still holds
- since there is some repeating pattern in steps 1-3, `PersistentStructure.withTransient(modifier)` helper method exists.

### Equality and hash

Two persistent structures are equal if they carry the equal data.
This allows them to be used as map keys - the key is the data in the context,
not the object itself.

Two transient structures are equal in the standard meaning of a word - if they are the same object.

The hash code is consistent with the equality operator.

## Example

import 'package:persistent/persistent.dart';

main() {
// Persistency:
PMap map1 = new PMap.from({"a":1, "b":2});
PMap map2 = new PMap.from({"b":3, "c":4});
print(map1["a"]); // 1
print(map1.lookup("b")); // 2
print(map1.lookup("c", orElse: ()=>":(")); // :(
print(map1.insert("c", 3)); // {a: 1, b: 2, c: 3}
print(map1.insert("d", 4)); // {a: 1, b: 2, d: 4}
final map3 = map2.insert("c", 3, (x,y) => x+y);
print(map3.delete("b")); // {c: 7}
print(map3.delete("a", safe: true)); // {b: 3, c: 7}
print(map1); // {a: 1, b: 2}
print(map2); // {b: 3, c: 4}
print(map3); // {b: 3, c: 7}
// Transiency:
final vector1 = new PersistentVector.from(["x", "y"]);
print(vector1.push("z")); // (x, y, z)
print(vector1.push("q")); // (x, y, q)
var temp = vector1.asTransient();
temp.doPush("z");
temp.doPush("q");
temp[1] = "Y";
final vector2 = temp.asPersistent();
final vector3 = vector2.withTransient((TransientVector v){
v.doSet(2, "Z");
v.doPop();
v[0] = "X";
});
print(vector1); // (x, y)
print(vector2); // (x, Y, z, q)
print(vector3); // (X, Y, Z)
// Features
print(map1.toList()); // [Pair(a, 1), Pair(b, 2)]
final set1 = new PersistentSet.from(["a", "b"]);
final set2 = new PersistentSet.from([1, 2, 3]);
print((set1 * set2).toList());
// [Pair(a, 2), Pair(a, 1), Pair(b, 3), Pair(b, 2), Pair(b, 1), Pair(a, 3)]
}

0 comments on commit 760fe17

Please sign in to comment.