-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
documentation polished, technical documentation added
- Loading branch information
1 parent
9972c6e
commit 760fe17
Showing
4 changed files
with
164 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
- memory footprint reduced with a factor of 15 (wait what? Was the old implementation so | ||
ineffective? Or the new one is so cool? The truth is: both. Check out benchmarks) | ||
|
||
- changes in API, most notably PersistentMap -> PMap, PersistentVector -> PVec | ||
|
||
- more effective == and != on PMap | ||
|
||
- deleted several classes, the whole class/interface hierarchy becomes much simpler (although little bit dirtier; some performance-motivated compromises were introduced) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# Technical overview | ||
|
||
The implementation of Persistent Vector is very similar to the one found in | ||
Facebook's [immutable.js] (https://github.com/facebook/immutable-js). We show almost no invention | ||
here. The rest of the document describes our design of Persistent Map which is more unusual and needs | ||
more explanation. | ||
|
||
|
||
## PMap technical overview | ||
|
||
The implementation is a version of HAMT, be sure you understand the basic concepts before reading | ||
further. Good places to start are [wikipedia] (http://en.wikipedia.org/wiki/Hash_array_mapped_trie) | ||
or this [blog post] | ||
(http://blog.higher-order.net/2009/09/08/understanding-clojures-persistenthashmap-deftwice.html). | ||
The following text explains issues, that are specific for our implementation. | ||
|
||
The whole HAMT consists of two types of nodes: Node and Leaf (they are called _Node and _Leaf in the | ||
code). | ||
|
||
Node is typical HAMT inner node. It's branching factor is set to 16 (this may change) currently this | ||
gets us best results in the benchmarks. Note that Node implements PMap interface. | ||
|
||
Leaf can hold several key-value pairs. These are stored in a simple List such as: | ||
[hash1, ke1, value1, hash2, key2, value2, etc..] | ||
if the leaf grows big (currently, > 48 such h,k,v triplets), it is split up to several Nodes. Similarly, | ||
if Node stores only few k,v pairs (in all its nodes) it is compacted to one single Leaf (threshold | ||
for this is currently set to < 32 triplets) | ||
|
||
Few things to note here: | ||
|
||
- In the tree, h,k,v triplets are stored in a way to guarantee the following property: if iterating | ||
through one Node by inorder (i.e. you are recursively visiting its children from the 0th to the | ||
15-th), you enumerate h,k,v triplets sorted by hash value. This may look unimportant on the first | ||
glance, but it simplifies several things; for example comparing Leaf with Node on equality, or doing intersection | ||
with Leaf and Node gets easier. For this purpose, we do the following: | ||
|
||
- In a single Leaf, h,k,v triplets are sorted by the hash. This allows us to binsearch for the | ||
correct value, when doing lookup. | ||
|
||
- In the put / lookup process, we consume the key hash from the first digits (not from the last, | ||
as usual). Note that hashes of small objects (especially, small ints) tend to have just zeros | ||
in the leading places. To overcome this problem, we work with mangled hash, which has enough | ||
entropy also in the first digits (check out _mangeHash function). | ||
|
||
- In the Node implementation we're not compacting the array of children. Typically, to save memory, HAMT | ||
implementation stores only not-null children. Such implementations then use bitmask to correctly | ||
determine, what the proper indexes of individual (not-null) children would be (if the nulls were | ||
there). Such trick is neat, but it costs time, and moreover, we don't need it. Why? Because we | ||
store up to 48 values in a single Leaf. This means, when the Leaf gets expanded to a proper Node, | ||
most of its children will be not null. (Exercise: you randomly pick 48 numbers from | ||
interval 0,15 inclusive. What is the expectation for the count of numbers not picked at least once?) | ||
|
||
- Node is a strange class. It serves for two purposes (which is probably not the cleanest design): | ||
it implements all PMap methods (in fact, when you construct new PMap, what you got is Node) and it | ||
implements low-level method for HAMT manipulation. Moreover, PMap methods (such as assoc) can be | ||
called only on the root Node - on every other Node, such call will lead to inconsistent result. | ||
Why such bad design? | ||
|
||
- The main purpose is to save time and memory by creating an additional object that would encapsulate the | ||
root Node (yes, it matters). | ||
|
||
- All "bad things" happen only internally and there is no possibility for the end-user to get the | ||
structure to the inconsistent state. So, it's not such a bad design after all. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
## Working with Transients | ||
|
||
Persistent data structure can 'unfreeze' into *Transient* structure (TMap, TVec, ..), which is mutable. The purpose of this is to gain some speedup while still working with Persistents. The typical workflow is as follows: | ||
|
||
1. TMap trans = pers.asTransient(); | ||
2. do a lot of mutations on trans | ||
3. Persistent result = trans.asPersistent(); | ||
|
||
There are several notable things here: | ||
- When working with transients, methods like `assoc` or `delete` are no longer there. Instead of these, use `doAssoc`, `doDelete`, etc. These methods return void and mutate the structure inplace. | ||
- once `.asPersistent` is called, you cannot modify transient anymore. If you try doing it, you'll get an exception. | ||
- conversion Persistent -> Transient and vice-versa is O(1), which means fast, in this case, really fast. | ||
- everything is safe, i.e. basic contract 'there is no way, how to modify Persistent data structure' still holds | ||
- since there is some repeating pattern in steps 1-3, `PersistentStructure.withTransient(modifier)` helper method exists. | ||
|
||
### Equality and hash | ||
|
||
Two persistent structures are equal if they carry the equal data. | ||
This allows them to be used as map keys - the key is the data in the context, | ||
not the object itself. | ||
|
||
Two transient structures are equal in the standard meaning of a word - if they are the same object. | ||
|
||
The hash code is consistent with the equality operator. | ||
|
||
## Example | ||
|
||
import 'package:persistent/persistent.dart'; | ||
|
||
main() { | ||
// Persistency: | ||
PMap map1 = new PMap.from({"a":1, "b":2}); | ||
PMap map2 = new PMap.from({"b":3, "c":4}); | ||
print(map1["a"]); // 1 | ||
print(map1.lookup("b")); // 2 | ||
print(map1.lookup("c", orElse: ()=>":(")); // :( | ||
print(map1.insert("c", 3)); // {a: 1, b: 2, c: 3} | ||
print(map1.insert("d", 4)); // {a: 1, b: 2, d: 4} | ||
final map3 = map2.insert("c", 3, (x,y) => x+y); | ||
print(map3.delete("b")); // {c: 7} | ||
print(map3.delete("a", safe: true)); // {b: 3, c: 7} | ||
print(map1); // {a: 1, b: 2} | ||
print(map2); // {b: 3, c: 4} | ||
print(map3); // {b: 3, c: 7} | ||
// Transiency: | ||
final vector1 = new PersistentVector.from(["x", "y"]); | ||
print(vector1.push("z")); // (x, y, z) | ||
print(vector1.push("q")); // (x, y, q) | ||
var temp = vector1.asTransient(); | ||
temp.doPush("z"); | ||
temp.doPush("q"); | ||
temp[1] = "Y"; | ||
final vector2 = temp.asPersistent(); | ||
final vector3 = vector2.withTransient((TransientVector v){ | ||
v.doSet(2, "Z"); | ||
v.doPop(); | ||
v[0] = "X"; | ||
}); | ||
print(vector1); // (x, y) | ||
print(vector2); // (x, Y, z, q) | ||
print(vector3); // (X, Y, Z) | ||
// Features | ||
print(map1.toList()); // [Pair(a, 1), Pair(b, 2)] | ||
final set1 = new PersistentSet.from(["a", "b"]); | ||
final set2 = new PersistentSet.from([1, 2, 3]); | ||
print((set1 * set2).toList()); | ||
// [Pair(a, 2), Pair(a, 1), Pair(b, 3), Pair(b, 2), Pair(b, 1), Pair(a, 3)] | ||
} |