Project Journal

July 3 2015

Transform is working (O_o). There aren't any more bugs that I know about (I have about 30 test cases), although I'm sure the randomizer will find some new edge cases I didn't think of. Still not implemented in transform: embedded types (easy), set-null (hard).
Apply is working, although that was really easy.
The only hard-ish function left to do is compose, although compose will probably be a lot simpler than transform. The trick will be factoring both of them to reuse as much code as possible.

Transform is by far the hardest code in the type. I'd say its about 65% done in total.

I had a big design review with Nate last week while I was briefly in SF, and we talked through some things:

Invertibility:

So, I think the only thing non-invertible in operations is the removed data. Instead of just specifying that data is being removed, I'm going to make operations optionally invertible. I don't know how this will play out exactly in livedb.

It might also be worth adding a standard getInvert(doc, op) function for ops which returns an op's invert. Formally, given op' = invert(doc, op) then doc * op * op` == doc. This would be really easy to write for all the other OT types too and it would make OT-level undo easy to implement.

Initializing data (aka set null):

The ability to use OT to initialize is actually super useful / important. Its used to express 'initialize doc to {count:0} and then increment count'. Without explicit support, its impossible to express that operation in a way that transforms correctly.

The JSON-patch specification contains arbitrary conditionals (instead of just if-null-then-insert), but unfortunately I can't think of a way to capture the semantics of that in a general way through transform.

Format of operations:

We spent a lot of time talking about the awkwardness of the current operation format. If you want to set doc.name.first = "Nate" then currently you have to say: {o:{name:{o:{first:{di:"Nate"}}}}} Nate (fairly) complained that adding the o's everywhere is awkward, and making it work properly for lists requires coercing strings into numbers in javascript and sorting.

He suggested instead we use lists-of-lists:

['name','first',{i:'Nate'}]

... Which is a compacted version of this:

['name',[['first',[{i:'Nate'}]]]]

If we wanted to set first and last names:

['name',[['first',{i:'Nate'}], ['last',{i:'Smith'}]]]

I'm not sure what to think about this. The fact that there's multiple ways to write the same operation is going to either make the OT code more complicated, or require translating between the expanded and compact representations at the start and end of all the OT functions. So ... ugh.

I translated all the examples from the draft spec to nate's proposed system: https://gist.github.com/josephg/bd05e9dd240dc0ac7888 In terms of bytes-on-wire its about the same for the operations I looked at, though if you have super deep objects and simple operations (super common) then nate's proposal will start looking a lot better.

I'd love some input - otherwise I'm going to sit on it and think for a week or two. There might be a compromise solution somewhere, but in any case I can't make much more progress until we figure this out.

July 26

Its been awhile since I've done any work on this. Not since I was in SF for CFAR. Today I'm planning on messing around with what Nate said - play around with other formats for the operations.

In short, the idea is that operations are translatable between {o:{x:{o:{y:{}} and 'x',[['y',...]]. The second form can be condensed simply into ['x','y',...]. I don't like there being two formats for the operation (the condensed version and the expanded version) but the condensed version is a much easier way to talk about operations.

I updated the gist with the sizes of the operations each way. They're all smaller with nate's format. As expected, the biggest compaction comes simple edits deep in an object:

Set user.details.name.first = 'Seph':

old: {o:{user:{o:{details:{o:{name:{o:{first:{i:"Seph"}}}}}}}}}
size: 76

new: ['user', 'details', 'name', 'first', {i:"Seph"}]
size: 46

As well as being more compact, the second is also way more readable.

Rules:

Operations are either null (a no-op) or a list.
Lists are descents. They have the following format: [scalar, mix of scalars & operations, descents]
All lists start with a scalar, except the root list which doesn't have to.
Operations are objects saying what to do at this node. For example: {i:"hi"} would insert 'hi' at the current node. Operations can have multiple components.
The descent list always ends in either another descent, or an operation.
The descents at the end descend from the final node reached in this descent. For example, [a, b, c, [d, e, f, {e:{}}]] would perform the edit at a.b.c.d.e.f.
There can be multiple descents. When there are, they must be in order sorted by their first scalar value (which is their first item). All numbers come before all strings.

I got type.apply working & passing the tests. It looks quite a bit longer in code than the old version (90 lines vs 60) but the uglified output is actually smaller:

The object version:

sephsmac:json1 josephg$ pbpaste | coffee -ps | uglifyjs -cm | wc
 845

The list version:

However, most of that is because I have nice big string names in the object version. Without those that version drops to 687 lines. But thats pretty close!

Ok, I've got checkOp and apply working. I'd like to make this version a bit better than the last. To do that I might do the work of adding edits as I go.

July 30

I have that failing test case in apply() I want to fix. Then I want to figure out the spec for embedded ops and add them to apply before moving on.

I'd like to also add the code & spec for setNull, but that sounds hard.

I got the test case passing in apply and translated the remaining test cases to the new format.

I also added yet more test cases & fixed some code for checkOp

July 31

So, I'm thinking about transform. The transform logic isn't actually that bad; the hard thing is really figuring out the resulting indexes. Also with the new op format actually writing the ops back out in a sensible way is a pain.

I implemented a write cursor for creating ops in a nice way, and I'm pretty happy with it. Now that I've got that written the next tool I want is a higher order function for walking the children of a pair of operations. The function will need to be able to take in transform functions for numerical children, but with that sorted transform and compose shouldn't be so different from one another.

I'm kind of surprised - the act of rewriting the OT code for the new format (although onerous) is actually giving me an opportunity to rethink how the code works from a top-down perspective. I wanted to do that anyway at some point, although I was mired in details and I'm not convinced I would have gotten around to it. Clearing out the simple junk logic also makes it way easier to reason about the actual semantics of transform, which is important if I actually want to sneak setNull in there.

My diagramming is fun, although I'm tempted to delete the chart where I tick things off. It short circuits my motivation. Progress looks like working on the thing I know I need to work on next, not working on the easiest thing to tick off on a chart.

I deleted the chart.

August 13

I haven't touched this code in awhile - a couple weeks by the looks of things.

I can't quite remember what I was working on (damn!) so I'm going to read through the code and reload it all back into brain context.

I was working on the cursor code. I kept working on the readCursor code. I added some tests to clone ops using writeCursor, and its nearly working.

14 August

Today I want to start by getting the readCursor code correct then maybe merge in the writeCursor stuff so we only have one kind of cursor.

I'm noting a need for some random op code - I might write a simple generateRandomOp function soon.

Ok, the cursor code is all cleaned up and working. To warm up I might try rewriting apply to use the cursors. I think I'm basically at the point I was at before pre-funky op format now. I've replaced a moderately easy to consume cursor format with a hard to consume cursor format + the perfect API for consuming the ops. I guess thats better? Its very much a better-is-better solution.

17 August

I just got half the transform tests working using the new cursor code. Its actually surprisingly easy compared to the previous code - when the abstractions line up, the cursors are actually quite nice.

19 August

I said what about cursors? I have to keep adding more gunky methods to the things. Work through an abstraction and the abstraction becomes your king. That said, the new drop code in transform looks gorgeous. The previous version of the code dealt with 3 things:

transforming the list indexes correctly
the logic of stepping through op and otherOp together
the semantics of merging drops themselves

The new version has all three of those parts broken apart, and the code is immensely readable. (Albeit at a cost of length - the new transform function is short but the supporting code is much longer than it was before.)

Anyway, I'm going to try and get the last transform tests passing today.

22 August

Actually its gone midnight, so its kinda the 23rd now...

I rewrote the code a few more times. The previous time (yesterday, the 21st) I got most things working but landed in a game of whack-a-mole with seemingly random test cases breaking. I started playing guess and check a little, and I knew I was lost.

I spent last night and some of today writing out a big list of all the actual rules that transform should follow (below). Today I translated that into crappy pseudocode and then coded it up. Almost immediately when I got the code running most of the tests were passing. Writing the rules out also gave me some insights to other tests I should be writing, and a bunch more edge cases I simply hadn't considered.

Given how long its taking me to get this working I'm nervous about setNull and secondary storage - it might be easy, or it might be another demon infested hellhole. Either way I'm planning on waiting until this code is fuzzer correct before adding the setNull logic in. (Is that wise? I'm not sure - it might be worth at least going over my rules and seeing what needs to change).

I'm also still missing the list code - mostly because I was lazy and its late and I'm tired. I stubbed it out at the 11th hour to get the code running at all. I think I have all my ducks lined up to get it working; but I still need to actually do that if I want it to run.

I'm also quite curious about the fuzzer now - with the rules written out I'm feeling a lot more cocky. There were even a couple of tests which failed correctly because the rules-as-written said the output I'd written down for them was wrong. Its always satisfying when that happens.

Oh well, 1am. Time to craaaash.

transformPicks: ->

Essentially I need to do a 4x traversal over:

op1's pick tree
op2's pick tree
op2's drop tree
The write cursor (transformed to op2 drop / op1 pick space)

I need op1 to read from. I need op2's pick and drop trees to figure out how to transform list indices. And obviously I need the write cursor to write the damn thing out.

eachChildOf only really keeps track of two variables - and one of them has to be op1's pick tree. The question is whether the other one is op2's pick tree or op2's drop tree. I can still track the other - but we're going to skip things.

Implementation Option 1:

I traverse op1's pick tree and op2's pick tree. Whenever I get to things which were picked up in op2 I make a new write cursor and write out whatever we need to write out.

To make this work, I'll need to go hold all of op2's drop location read cursors so we can do the list transforms correctly.

In another pass through the tree, I traverse through op2's drop tree with the write cursor and drop in all the write bombs.

Implementation option 2:

Keep going with the current model. So, we descend over op1's pick tree and op2's drop tree, mapping op1 through op2 in the process. When we find a drop in op2, teleport the op1 cursor to the held read and descend like that.

We'll also need to traverse over op2's read tree for list mappings using the advancer

Implementation option 3:

I make a special kind of write cursor which does all the list transforms by keeping its own references to its location in op2's picks and drops. It deals with the list map and the teleporting cursors.

I just descend down in the tree.

Unfortunately, it might be a one trick pony. Doing the same thing for the drop step might be too crazy complicated (or too awesome - who can say?). It might also be possible to reuse / extend this teleporting cursor implementation to work for the drop step as well, and compose... But I also suspect it might get in the way of implementing some of the more fiddly logic.

Gonna go with option 1 for now. I think option 2 has less code, but option 1 is more straightforward, and I could use all the straightforward I can get right now.

rm

me or parent removed -> noop me or parent moved (or pick & drop) -> pick (rm) at dest

pick

me or parent removed -> noop parent moved -> pick at dest if I was moved -> LEFT: pick at dest -> RIGHT: noop

insert / drop

pick cancelled -> noop parent removed -> rm at pick parent moved -> i/d at dest op2 d/i here -> cancelled op2 drop: (assert r:true set already), write -> LEFT: r:true and write -> RIGHT: rm at pick

Edit:

me or parent removed -> noop me or parent moved -> edit at dest edit -> transform edits

op2

op1 remove subtree containing op2 pick -> remove at op2 drop site. -> set op2 move to cancelled. Cancelled drops never win ties.

This works regardless of list transform nonsense - it goes through it
For purposes of rules above, consider the op2 movement cancelled. Eg,

op1 removes A, insert B op2 moves A.x -> B

expect remove A, remove B, insert B

6 September

Last weekend I hooked up the randomiser instead of continuing to work on compose. I was worried that the longer I waited between working on transform and finding all the bugs the worse. I've spent days struggling on a tiny set of bugs - fixing maybe about 10 bugs so far.

The list code is now a giant mess - I had to rip it apart. The way I set up the advancer code was super ugly, and I'm not entirely sure what the right way to structure that code is. Also to calculate outPickOff I'm sort of repeating the logic of the pick phase code. It probably makes make more sense to iterate over what we wrote in the output instead of trying to recreate it in a few if statements. But I'm not sure what the descent looks like - from what perspective do we look at that output?

Debugging with a fuzzer like this has a tendancy to make me weirdly myopic about the code. When I'm fixing an issue I'm motivated to see how much further the fuzzer gets after crashing so I just want to get it working. And when its working, I'm afraid to change anything incase the (increasing) pile of logic collapses in some strange way. And the longer between refactoring the worse that gets - as evidenced by the current list code.

But thats all solvable. Until now, when I have the most awful of unspeakable edge cases: a->b.x vs b->a.x. So, two children move into each other.

I've been thinking about it and I've managed

17 September

I've been stuck on x->y.x, y->x.y for the last few weeks. Its just too hard. Coming back to it now - I want to start by cleaning up / refactoring some of the code.

27 September

Still haven't refactored the code. The changes I want to make are hard and consciousness-requiring. I've been stalling because I'm not entirely sure that I'm doing the right thing. My plan has been to replace x->y.x vs y->x.y by making one replacement win and the other lose. I think thats the wrong approach - I think actually the behaviour should be (at least optionally) application-dependant.

To make that happen I'm going to have to add conflict markers and an optional automatic conflict resolution step.

So, the plan would be for the server to be able to reject client operations due to conflict, and punt the conflict resolution back to the client to resolve.

Given a document at version 10 and concurrent operations A and B. A hits the server first and the doc goes to version 11. op B conflicts with op A, and op B can be rejected back to op B's client. Op B's client will get the rejection after getting operation A, so the client's UI can do whatever it wants to resolve the situation. The nice thing about this is that there's no diamond property requirement - op A was accepted and op B' will be based after op A anyway. Hm, we might have to have a transform function which transforms B to B' with the conflicting parts removed.

For each conflict we can supply an automatic resolution, although the automatic resolutions probably won't be very good. They'll need to be known by users anyway; they're kind of generally obnoxious issues depending on how you make your data structure. But leaving them up to the client means the client can resolve them differently if they want.

Conflicts:

Overlapping inserts (bad autoresolve deletes op2's insert) Related: all other cases of drop vs drop
Move the same item to the same place (easy autoresolve -> noop)
Move the same item to different places (autoresolve picks a winner)
Insert into adjacent / ambiguous list positions. This will need some finessing - if the insert positions are unambiguously ordered we shouldn't conflict. I'm imagining supporting manually maintaining a sorted list.
Remove vs remove (easy autoresolve -> noop)
Pick vs remove parent

Basically what I need to do is split out all the transform behaviour into two parts:

Things which transform nicely
Things which transform ambiguously

I make a transform-with-conflicts function which outputs the two parts. I have a traditional transform function which returns the automerged version, although I'm not sure if anyone should use that.

Hm.

Option 1:

We expose a traditional transform function and have an alternate transform-with-conflicts API for fancier stuff. Transform just calls transform-with-conflicts then automerges the conflicts. We'll have to make sure the automerge results obey the diamond property, although thats sort of possible. The rule

Option 2:

Transform throws if there are any conflicts

Option 3:

The conflicts form part of the document or something, though this is super gross

They are all kind of awful. :/

Maybe I just go with option 1 and abandon the classic

May 8 2017

I picked this up again after a few people pinged me on github about it. I think its been long enough.

Basically, the aim at the moment is to do a broad covering of the problems listed above by covering all the use cases. So I want to solve the problems by:

Adding a lost and found path

I'm not sure if this should be specified in the operation or as an option passed in to transform, but the idea is that if operations happen to blackhole some objects, instead of removing them entirely I'll send them to the lost and found bin.

Eg:

[['a', {p:0}], ['b', 'x', {d:0}]]
XF
['b', {r:true}]

should result in:

[['a', {p:0}], ['lostandfound', 0, {d:0}]
and
[['b', {r:true}, 'x', {p:0}], ['lostandfound', 0, {d:0}]

The fuzzer should also always pass when both operations are performed with the same lnf bin set. (So it should be symmetric.)

This will also happen for:

Mutually blackholed moves
Move-into-deleted item (like above)
Drop into the same location

I want to let the client specify a transform function for the inserted item, so they can add metadata. That way instead of just ending up with {x:5} in the lnf bin you can end up with {from:"fred", date:12345, item:{x:5}} or whatever.

So, I want to pass that transform function through using an extra options argument to transform. Its a bit icky, but whatever. Its useful. The question is, do I want to think of the lnf path as part of the operation or another option:

Most apps will just set it once and use the same path for every document. Putting it in every operation will add a whole bunch of junk to every single operation that comes through the system.
As an argument I'm not sure if there's a correct way to move the lnf bin itself, because what value do you pass in while its in flux? I mean, I could let you specify 2 bins (one for op1 content and 1 for op2 content), but thats super complicated.

Probably passing it as an argument is easier. I think I'll just say that I don't really support moving the bin itself (well, no concurrent editing while the bin is being moved.)

Also:

Add a test-for-conflicts function

Add a function that does a pretend transform, but instead of returning the transform results it returns a list of conflicts. the list of conflicts would include everything that would result in a lost-and-found move, as well as more stuff. It should support everything above:

Mutually blackholed moves (a->b.x vs b->a.x)
Move-into-deleted item (a->b.x vs delete b)
Drop into the same location (a->x vs b->x)

And some more potential semantic conflicts:

Moving the same item to different places (a->x vs a->y)
A drop inside a destination that has moved (a->b.x vs b->c)
Attempt to move an item that was deleted by op2 (a->x vs delete a)

... And maybe others?

To implement that I'll probably just make an internal, parameterised transform function that describes and returns any potential conflicts instead of generating data itself.

May 10

I spent most of yesterday tracking down more offset woes. I keep wondering if the way I'm writing this code is wrong - like, I'm keeping track of a lot of state around what we do with removes and moves between the pick traversal and drop traversal. I wonder if it would be better to just read our writes back out of that traversal itself.

Although, in some ways that would be harder - because we need to interpret some edits differently based on semantics.

Anyway, fuzzer is up to 1045

May 11

Spent all day trying to fix the code for op2 drops having the wrong list indicies. The solution I spent all day on didn't work. I think I'll need to recreate all the index tracking code from op1 drop, which is a huge pain.

May 13

Yesterday I pulled out some lego and figured out that the logic I thought I needed to write for op2 drops was also wrong. It turns out the problem is much simpler. You should think about the operations as:

op1 pick -> op1 drop op2 pick -> op2 drop / op1' pick -> op1' drop

So, op1 pick and op2 pick have the same context (the original document). op2 drop and the resulting op1 pick have the same context (the document as it is after op2 has been applied). There's three operational contexts we need to traverse through:

Op1
Op2
The resulting document
op1 picks and op2 picks need to be transformed through op2.
op1 drops need to be reverse transformed through op1, then transformed via op2 and the resulting operation
op2 picks need to be transformed via op2 - which is to say, we throw out anything underneath an op2 remove and anything under an op2 pick gets moved to the op2 drop location.
op2 drops need to be transformed via the resulting operation.

I still think there's a nice, small, clean implementation of this code somewhere out there in the solution space, but I'm not clever enough to find it.

Today's bug is much more understandable, which is the question of how we handle nulls and undefined at the root of the document. I talked to Jeremy about it and I've decided to make the OT system allow the root document to be undefined (as distinct from null). This means you can use JSON OT to create and destroy documents in a larger OT system - that is, the whole thing can be used recursively. But this spec change means I need a bunch of changes in the randomizer so it still generates valid operations.

May 14

The last couple days I've been burning through a bunch of bugs, and I've gotten the fuzzer up to iteration 13152 before it crashes. Bugs have hit at ~8000, then ~10250 then 13152. I think I've got a much better conceptual framework for understanding some of the bugs now - I'm finally understanding how transform json2 works, although I'm totally shattered by the experience. I might need to take most of this coming week off (or on the beach).

I know I was trying to avoid it for ages but I finally wrote code that needed a deep clone of the resulting (output) operation as an output stage. I'm not sure if its actually correct though.

And the fuzzer still hasn't mutually blackholed some objects. So there's at least 1 more bug that I know about, which means there's probably plenty more that the fuzzer just needs more time to find. (Wait no - the iteration 13152 bug is a mutual blackhole operation. So... maybe I'm near the middle of the bug pool or something. Who can say?)

Also I remembered that the operation generator is currently only generating very simple ops, because writing complex operations using the generator is hard. That means I'm generating relatively nice, simple test cases at the moment but I think my fuzzer pass count is going to take a plunge after I write compose.

I'm also starting to seriously run out stamina for this. I need a break.

Ok, did a sort of janky implementation of blackholing before bed. Up to 34329 now.

Oh, that was just a misplaced assert(). I think the fuzzer can find more bugs if I leave it long enough, but its up to 1 000 000 so far without finding anything.

Ok, it finally crashed on iteration 3 361 496. I might add the compose function before hunting that down - its probably just something involving multiple operation components that is really hard to generate.

16 May

I took yesterday off completely after my crazy Sunday, although I did do a few tiny cleanups (like brushing up the spec).

Today I just wrote the compose function, start to finish in almost one sitting and I added a bunch of unit tests for it. Test count is up around 160 now, which is respectable. The whole thing was honestly pretty easy after writing transform.

So much for iteration 3.3M. The fuzzer now crashes on iteration ... 8. Spectacular.

Oct 5 12017

I'm playing around with making a json editor. I added a normalize function (so I could merge [[a,b,pick], [a,b,drop]] without worrying). Almost immediately I ran into a bug in compose, and its sort of annoying because I'd apparently previously figured out that the behaviour was probably wrong, then written a fix and left the fix commented out. It took me ages to find.

I think I might actually need to finish this damned thing eventually.

Oct 7

Oh god, I'm getting pulled into the black hole of this project again.

It occurs to me that there's another way to structure this code. Instead of thinking of it as one big transform function, I could write a context transformer object which takes in a single operation, scans it and produces something you can walk from the perspective of its input that in turn tells you where you are from the perspective of its output.

Oct 15

I isolated and ran the compose tests the other day out of curiosity (it'd be nice if it was done and working). The tests spat out a bug from the setnull behaviour:

What should this do?

compose(
	[{i:[]}, [0, i:'a'], [1, i:'b']],
	[[0, p:0], [1, d:0]]
)

The first op is using the setnull behaviour to initalize a list, then inserting into it. This way if a concurrent operation tries to create + insert into the same list we'll keep all items.

We want to preserve that through the compose, but doing so means selectively modifying the operation. I think the code should sort of tag any parents of inserts as setnulls and freeze them, while allowing children to be modified?

I mean, in this case it should swap the inserts:

[{i:[]}, [0, i:'b'], [1, i:'a']]

But there's also much more complicated variants - like, if the second op picks up data inserted by the first op and moves it in / out of the setnull region.

In short, its an awful edge case. I might add some tests for these cases so when I get back to compose I have something to work on