Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source format "diff-ability" should be an explicit goal #114

Open
alerque opened this issue Jul 6, 2020 · 33 comments
Open

Source format "diff-ability" should be an explicit goal #114

alerque opened this issue Jul 6, 2020 · 33 comments
Labels
considering Specification change under consideration. infrastructure Administrative and website issues. proposal Proposed specification change.

Comments

@alerque
Copy link

alerque commented Jul 6, 2020

One of the best parts about UFO as a format is it it much closer than any other format at facilitating lossless collaborate on a font design. That being said it's not quite there. Being able to collaborate means being able to use different tools to actually edit the font and VCS to track the differences. Frankly the state of VCS usage across the font industry right now is pretty much a disaster, with the vast majority of projects just using it as to store & publish snapshots—something that could just as easily be done with Dropbox or BTRFS. Most of the VCS tooling out there is somewhere between frustrating and useless to use and as a result designers are getting little millage out of it. Realistically while some nerdy programmer types can figure it out most designers can't keep design ideas in branches branches, they can't merge contributions from other people, and so forth. Almost inevitably –and due just to the nature of the data formats– trying to use VCS tooling to merge, rebase, branch, cherry-pick, etc. leads to merge conflicts that have to be dealt with by hand even when the nature of the changes doesn't conflict.

In the case of UFO the nature of those conflicts is much less egregious than in other leading formats I'm familiar with (Glyphs, SFD). It's close to being usable but there are still pain points.

The current 3 major points of the design philosophy indirectly imply most of what's needed.

  • The data must be human readable and human editable.

    Human readable implies plain text, not some binary format. In this case all the competing formats are text based anyway which is a good starting point for VCS, but it isn't enough. Postscript is a text based format too, have you ever tried to merge two post script data sources by hand?

    Human readable/editable is helps a lot, but have you ever tried to merge changes from two different editors to the same Markdown file? I do this at a book publishing company regularly and the tooling just doesn't support it well. It is 1000× easier if you mandate additional restrictions above and beyond "valid Markdown". For example one line per sentence plays much nicer with version control during the editing process later. Forcing a fixed width wrap during the initial writing process and then making edits never change the line wrapping is another way to skin the cat. My point is Markdown is great at being human readable and human editable, but it is terrible at being diff-able. Most Git tooling can't handle it. Those that can do word or character level diffs are a lot nicer to work with, but almost zero tooling is available for merging at that level.

  • The data should be application independent.

    This helps too, but still leaves room for obnoxious things like datestamps (this is meta data that the file system and/or VCS can better provide or "generated by" values that add negative value in the context of VCS. If I share a font project with Bob and Alice (did I just give away my background in network security?) and ask Bob to cleanup the vowels and Nancy to cleanup the consonants and they both do exactly as told and don't touch any other letters, there shouldn't be any reason I can't merge their work in asynchronously. If both file versions coming back have differing date stamp and generator fields suddenly this becomes a manual merge operation, not an automatic one.

    Another example would be key/value order. Right now there are 2 dedicated UFO normalizers plus other ways to normalize (e.g. round-tripping through Python w/ defcon or a UFO editor). Using one of these methods consistently helps, but there is still room in the spec for odds and ends than can and do get implemented differently and thus cause merge errors.

  • Data duplication should be avoided unless absolutely necessary.

    This helps, but going further and making a place for unnecessary data that doesn't cause essential operations to choke up the machinery would be better

This issue isn't to hash out all the specifics of what should happen, only to suggest that the spec project add an explicit design goal of making collaboration easier using VCS tooling (whether that's Git or DARCS or Mercurial or Bazaar or whatever, the issues generalize to "how well does thing A difff with thing B).

@benkiel
Copy link
Contributor

benkiel commented Jul 7, 2020

@alerque thank you for bringing this up: do you have any concrete improvements that could be discussed?

@alerque
Copy link
Author

alerque commented Jul 7, 2020

@benkiel I remember a couple months ago I was looking for moldy vegetables to throw at alien ships after concluding the spec just wasn't designed with any thought to diff-ability, but but I don't remember the specifics. I'll dig out that bit of project and try to make a list.

The original point of this issue though it less that X, Y, or Z is a show-stopper — this is more an instance of death by a thousand paper cuts: no one issue is a make-or-break problem that needs to be fixed, but because this isn't an explicit goal and people have not been weighing other spec decisions with their effect on diff-ability in mind and a lot of small factors have compounded. Making it an explicit design goal would both keep it from getting worse as new features are added and provide a reason to rehash problematic bits that might otherwise be too small to merit changing.

@benkiel
Copy link
Contributor

benkiel commented Jul 7, 2020

@alerque Apologies, I didn't phrase the above very well; I agree with the idea, but there are a bunch of things above to unpack. Some the UFO spec doesn't have control over (tooling for version control, workflow setups) and some that it does. A couple of concrete examples of UFO specific sticking points would help all understand what the impact to the specfication might be.

One that comes to mind for me is the contents.plist: it's a pain point for merge conflicts, and could possibly be made optional. Having a recommendation for normalization would be another —enforcing a normalization in the specification would be a big issue for tool makers, as they may be using libraries that don't allow for certain formatting, but having a standard for normalization would be possible. I'm sure there are others.

@belluzj
Copy link

belluzj commented Jul 16, 2020

It's a problem with the UFO format that we know very well at Dalton Maag, as internally all of our font work is stored on Git as designspace + UFOs, and designers collaborate through the Git process of branches and merge requests to work in parallel on the same project, for example one person refining the numerals and another adding accented glyphs to the same UFO.

I would love to see a 4th point being added to UFO philosophy that prescribes that the format should be VCS-friendly.

Here are our most often encountered problems:

Whitespace diffs

It's not a big issue from a functionality point of view, but it contributes to the "thousand paper cuts". Our current solution is to run ufonormalizer everytime we write UFO files, or to make our tools that write UFO files conform the output of ufonormalizer.

Ideally it would be great if all the tools that work with UFOs could output the same whitespace. I wouldn't mind the spec being prescriptive on this point, in the same way that for example the Python language has a style guide that recommends 4 spaces.

Fontinfo fields that can either be not specified or specified as empty

In the same vein of normalization issues, some fields in fontinfo.plist can either be not specified in the file, or can be specified as empty arrays or other default values; both inputs having the same semantic meaning and leading to the same compiled fonts. That ambiguity leads various tools to either remove the fields from the file when they're specified as empty, and then other tools to put them back in as empty when they're not in the file.

This one is not a big deal either but it generates meaningless diffs when switching between tools with different conventions. We don't really have a workaround. The most zealous designers with take care to not commit those diffs to Git when they're indeed meaningless, but that's wasting their time on annoying details. Maybe the spec could enforce that fields all need to be specified all the time? I don't know if it's worth it of if other people have opinions about this.

contents.plist creating merge conflicts

If two designers work in parallel on the same file and each add new glyphs, when merging their two branches, Git doesn't "know" the semantics of the contents.plist file and records a conflict, as you can see in the screenshot below (I created a branch where I added a capital B and another where I added a capital C). Instead, if Git knew about contents.plist, it would understand that both additions needs to be here as the result of the merge, in alphabetical order ideally...

image

Our current workaround is to tell designers to always add new empty glyphs before branching out to work separately, or to assist them with the conflict resolutions after the fact.

contents.plist corruption while using Git to commit changes selectively

On top the problem above, designers don't always understand how the UFO format works, and while using Git it happens that they commit only certain .glif files or remove some .glif files without updating the matching contents.plist.

This problem happens because one of the nice features of Git is that while working, you can make wide-ranging changes to your source files (for example run some corner rounding script on all glyphs) and then commit to Git only some of the changes (for example you only wanted to keep the rounded numerals). Designers have gotten used to doing that, and when the changes involve adding or removing glyphs, they don't always remember to commit the changes to contents.plist along with their selection of .glif files.

Our current attempt to catch these problems is to run ufolint automatically every time designers push new things to Git. A better solution (for our use-cases at least) would be to get rid of the contents.plist file altogether, and always consider that whatever glyphs are in the folder are part of the layer.

Note: that would go against another proposal to remove the name from .glif files, if I understand correctly.

Anchors and guidelines creating merge conflicts

Anchors and guidelines are both dumped at the bottom of the .glif file at the same level, meaning that Git gets confused when one branch adds an anchor and the other adds guidelines. If Git knew about fonts, it would pick both as a solution to the merge. It would also order the merged lines according to the ufonormalizer standard for example.

image

We don't really have a workaround for that, when it happens people are expected to fix the conflict by hand, by selecting "Accept Both Changes". What could be done to prevent this is: wrap guidelines in a <guidelines> element, and anchors in <anchors> and require those two wrappers to be always present. That would probably give Git enough clues that the two changes are additive and not in conflict.

Mark colors creating merge conflicts

This problem is more a problem of designer behaviour than of Git not understanding fonts. What often happens is that while working, designers will clear all mark colors from the font, and assign their own set of favourite colors to help them with their current task. If two designers working on the same font are not careful, they will end up assigning each their own mark color to the same glyph. When merging their work later, Git will report a conflict, and rightly so.

As a general workaround against conflicts, we tell designers to have a clear distribution of work in terms of glyphs between them before they go off to work each in their branch. That approach works very well for all aspects of the design process (because no-one want to design more outlines or position more anchors than what they're supposed to work on), except for mark colors, because it's always far too easy and convenient to select all and reset colors.

Conflicts on mark colors are particularly hard to deal with because they span lots of different .glif files, and particularly annoying because they feel like such a useless tiny detail to spend time on fixing. What has been suggested so far is to allow mark colors to be stored in a separate file, instead of being spread in each .glif. That would even allow each designer who works on the project to have their own "color file" that they can mix in the font before working, to remind them of their own progress through the work.

[Not a UFO problem, just saying] Glyphs.app timestamps, ufonormalizer lastModTimes

Some tools want to write "last modification times" into their save files, like Glyphs.app (through glyphsLib which by default translates those timestamps into the UFO) and ufonormalizer that does it as part of a performance improvement. These features are counter-productive in a Git workflow, and internally we instruct glyphsLib to drop all the timestamps and we always run ufonormalizer without modification times.

@typesupply
Copy link
Contributor

@belluzj This is fantastically helpful. Thank you so much for all the detail and examples!

There are some great points raised in all of these comments. With regards to whitespace, attribute ordering, etc: The XML formatting has been very hard to specify because the various XML writers I've used and studied offer inconsistent or no controls over the formatting of their output. For example, Python's ElementTree didn't even do new lines or indentation and ordered attributes alphabetically. Because of this, I didn't feel comfortable specifying "\t for indentation, attributes are stored in the order they are specified, maximum line character count count 123 unless..." so I wrote ufoNormalizer. (I had to write my own friggin XML writer. 🤬)

I like the idea of having a specification level recommendation about formatting. That makes compliance optional and it becomes a conversation for a tool developer and their customers instead of a hard to implement requirement. We could do this by documenting what ufoNormalizer does (which, even though I tried to be logical, I'll admit that I made up as I went along) and use that as a starting point.

That said… Does anyone know of a XML based specification that has specific formatting rules? I'd be curious to see how they phrase it.

A better solution (for our use-cases at least) would be to get rid of the contents.plist file altogether, and always consider that whatever glyphs are in the folder are part of the layer.

I got about halfway through your first example and started wondering about this, too. contents.plist was/is an optimization. The reason for its existence is that it's very expensive to load <name> from every GLIF when all you need is a list of available glyph names. There's a proposal (#77) that I think would generate the same problems as your B + C example. We're stuck between performance and diff-ability on this. One one hand, I'd like to ditch these kinds of files, but on the other I don't want to impact performance.

Anchors and guidelines creating merge conflicts

I don't know what we can do about this. (see idea below)

Mark colors creating merge conflicts
What has been suggested so far is to allow mark colors to be stored in a separate file, instead of being spread in each .glif.

I have an idea on this:

  1. Each designer gets their own unique identifier.
  2. Before save, the mark colors are stored in glyph.lib["com.daltonmaag.markColor.<unique identifier>" the glyph.markColor is cleared, the save happens, the glyph.markColor is restored.
  3. When a UFO is opened, the glyph.markColor is set from glyph.lib["com.daltonmaag.markColor.<unique identifier>".

This would make the mark colors only visible to the designer with <unique identifier>.

[Not a UFO problem, just saying] Glyphs.app timestamps, ufonormalizer lastModTimes

Yeah. That's why it's optional in ufoNormalizer. 😄


Since so much of this is out of our hands due to XML writer limitations, Git not knowing about fonts, etc. I wonder if a big part of the solution to this would be to build a UFO specific front end for or a prep-step for Git. I have my own problems with using Git with UFOs and have been working on a solution for myself. As part of that, I did a lot of thinking about diffing UFOs and built a tool to do it with knowledge of what the contents of each file mean rather than just being raw text. A comprehensive merging, resolution, etc. tool would be a big project but if there is enough perhaps resources could be pooled to build it.

@benkiel
Copy link
Contributor

benkiel commented Jul 16, 2020

One thought would be to make contents.plist optional, not required.

Agree about making a recommendation for formatting, but not an enforced standard: the enforced standard is a burden to developers who might not be able to control formatting. At least a recommendation would give normalizers a standard to follow (the ufoNormalizer does some things different form the SIL normalizer, this would solve that)

@belluzj
Copy link

belluzj commented Jul 16, 2020

About Git not knowing about fonts, yes that's really the crux of the problem. I see basically 3 axes to look at this:

  • bend the UFO format to be Git-friendly
    • pros: being Git-friendly probably also means Mercurial-friendly etc. because all VCS have the same notions about text diffing I believe
    • cons: puts the burden of being diffable on UFO, which is in the end "shouldn't really be its concern" and might force UFO into compromises against other concerns
  • teach Git about UFOs so it can behave smartly
    • pros: UFO free to serve other concerns than diffability
    • cons: only solves the problem for Git users, and on the command line. User interfaces like Fork and online tools like GitHub and GitLab will still behave badly on UFOs, unless they too implement special user interfaces for looking at diffs between font files, and actually run our custom code for merging UFOs instead of the standard Git merge strategies (which sounds like it would probably never happen? the merge feature in these tools is a very "core" feature and I don't really see them accepting that each repository can override it with their own code (but I never looked into that properly so just guessing now))
  • write our own version control tools for fonts:
    • pros: total control
    • cons: massive effort before reaping the same level of benefits: Git and other such tools are feats of engineering and all the ecosystems around them add to their value immensely (at least for use-cases that we have at Dalton Maag)

@justvanrossum
Copy link
Contributor

Another thought (re contents.plist) would be to come up with a glyph name to file name scheme that is reversible: that you can get the glyph name from the file name, at least in most common cases. When that's truly not possible, eg. for very long glyph names, parsing the glif xml may still be necessary.

@justvanrossum
Copy link
Contributor

Regarding mark colors: its really the only tool users have to group glyphs. The real solution will be to devise a better and higher level way to do that.

(I cringe every time I see a proud screenshot with happy crayon colored glyph boxes... We should do better.)

@typesupply
Copy link
Contributor

Another thought (re contents.plist) would be to come up with a glyph name to file name scheme that is reversible

It's potentially more than contents.plist though. Any file that stores data about the glyphs outside of GLIF would suffer the same problem, including the cmap file that we want to add. I wonder if Ben's idea of making these files optional could work. If the file is there, use it. If not, quickly skim the XML. We'd have the data duplication problems we're frustrated with now with <name> and <unicode> but the spec could say that those values are important and they must be in sync with the font level files if they are present.

@belluzj
Copy link

belluzj commented Jul 16, 2020

@justvanrossum I agree very much with the idea of a reversible file name mapping! It would also solve the problem that file names for .glifs can clash and that generating file names must know the list of already generated file names to make new ones etc. Given that there cannot be two glyphs with the name in a font, and a bijective mapping to file names, the process of matching glyphs to their files becomes straightforward. That would also solve the other problems that @typesupply mentions about storing data outside GLIF files. Any code that needs to find the filename for a glyph would run that mapping function and would get back the one and only possible file name.

If this is not an agenda item already, it should become one I think.

@typesupply
Copy link
Contributor

That would also solve the other problems that @typesupply mentions about storing data outside GLIF files.

How would it solve the cmap issue?

@belluzj
Copy link

belluzj commented Jul 16, 2020

If I understand correctly the CMAP issue, it's a file outside of GLIF that has a list of mappings "unicode -> glyph name" and the problem is: how do you know which GLIF file corresponds to each unicode? If I understood correctly, you would call that reversible mapping function on the glyph name, and you would get the matching GLIF file name. Is that right?

@justvanrossum
Copy link
Contributor

justvanrossum commented Jul 16, 2020

I think the problem Tal means is that: if conflicts in contents.plist are frequent, then similar conflicts will happen in the separate cmap file as well. Which is a good point to consider, so we don't fix a problem in one place and create one in another.

@belluzj
Copy link

belluzj commented Jul 16, 2020

Oh ok, yes indeed. Thanks for explaining! cmap proposal here

@justvanrossum
Copy link
Contributor

Is the cmap file a performance-motivated proposal?

Yes and no, respectively:

  • I don't want to have to parse 65000 glif files before I can set a bit of text
  • It would be good if the format didn't allow multiple glyphs to have the same unicode

@typesupply
Copy link
Contributor

typesupply commented Jul 16, 2020

Plus:

• It's weird that a glyph named A on layer "foo" can have a different unicode than a glyph with the same name on layer "bar".

@justvanrossum
Copy link
Contributor

Note that my first argument is a bit of a slippery slope: to set text properly, you need features; to build features, you need anchor data. FontGoggles has some fast-path glif parsing code that only extracts unicode and anchor data for this purpose.

@typesupply
Copy link
Contributor

Note that my first argument is a bit of a slippery slope

Yeah. So, doing my spec-writer-devil's-advocate job, I'll ask: Are these optimizations being added to solve tool limitations (albeit limitations that many tools face)? We try to avoid that.

In this case we're presented with two conflicting tool needs:

  1. Tools rendering text from UFOs need to be able to quickly get glyph name -> file name mapping, unicode -> glyph name mapping.
  2. VCS need to have data decentralized to make diffing and merging more accurate.

This, to me, is the crux of the problem. Both are valid. I don't want a script editing a UFO to have to skim 65000 glyphs. I also feel the pain in the stories about contents.plist getting smashed by Git. So, I don't know what to do other than making these files optional. Hm.

@alerque
Copy link
Author

alerque commented Jul 16, 2020

UFO is a source format, not an optimized output format. I think the needs of easily working with the source should be prioritized. Of course some tooling that works with the source will also want fast optimized access to things, but those mappings (such as contents.plist that in many cases duplicates data otherwise derivable from the source files) can and should be either not ever a part of or optional to the source format. Tooling that wants to use them should know how to derive them. If they are commonly used optimizations I think it's okay to make them optional parts of the spec to ease interactions between programs that might be touching the same file sets.

In the specific case of interacting with VCS, files such as lookup maps that are derivable and only needed for output optimizations should not even be tracked in VCS. Since there could be a split between essential source data that must be tracked and shared and "intermediate" data formats that ease access for tooling, the latter could be regenerated when needed without making life harder on the source control end of things.

The opposite should be true for generated binary output formats. These should go out of their way to be consumed by shapers in the most efficient way possible and have most optimizations up-front as it were. These generated output formats of course should not be tracked in VCS.

@benkiel
Copy link
Contributor

benkiel commented Jul 16, 2020

Making contents.plist optional isn't a bad idea, but for the CMAP proposal (which also allows for storing UVS), this is going to be a sticking point. Not sure what to do there; it's not like contents.plist where that data can be regenerated. If we choose not to do a CMAP like thing, then figuring out where to store UVS is important.

@alerque
Copy link
Author

alerque commented Jul 16, 2020

I'm not up on the CMAP proposal so can't speak to it specifically, but specific to this proposal: if making diff-ability was an explicit priority, one design decision that would dictate is a separation between true original source material and derived data structures. Any content that can be regenerated on demand (even if it's inefficient to do so) should be stored in different files than data that can regenerated (even if it should stay in sync to be useful). It doesn't mean you can't have data that can be regenerated kicking around on disk, but you need to be able to cleanly separate it from other data.

@madig
Copy link
Contributor

madig commented Jul 17, 2020

In the same vein of normalization issues, some fields in fontinfo.plist can either be not specified in the file, or can be specified as empty arrays or other default values; both inputs having the same semantic meaning and leading to the same compiled fonts. That ambiguity leads various tools to either remove the fields from the file when they're specified as empty, and then other tools to put them back in as empty when they're not in the file.

I'd like to add two more points about this:

  1. This fontinfo value quantum state has led to bug discoveries like MathInfo: subtracting list and None attributes gives invalid results robotools/fontMath#175 -- to mitigate surprises, we internally run a check on Designspaces to check that all participating UFOs have the same interpolating fontinfo fields explicitly defined.

  2. While implementing https://github.com/linebender/norad/ (UFO reader, writer and "object hierarchy" written in Rust for brutal performance. Think ufoLib2 but faster), the optionality of a lot of data across a UFO leads to the question when a None and an empty value should be considered semantically equal (Glyph: Treatment of empty values vs. None linebender/norad#27). Consider the glif file

    <?xml version='1.0' encoding='UTF-8'?>
    <glyph name="be-cy" format="2">
    <advance width="557"/>
    <unicode hex="0431"/>
    <outline>
    </outline>
    </glyph>

    The current Norad code deserializes this to Glyph.outline = Some(empty outline struct). Ufonormalizer will remove empty outline elements, which Norad would then deserialize to Glyph.outline = None. Are these to be considered equal? This is important for making Norad output the same files as ufonormalizer, so that one does not need to run ufonormalizer (which would destroy the performance gain of Rust again).

    Ufonormalizer thus implicitly does more than formatting, so maybe its semantic changes should be part of ufoLib instead.

that would go against another proposal to remove the name from .glif files, if I understand correctly.

Could this be avoided by making the current file name generation algorithm responsible for mapping back and forth?

One one hand, I'd like to ditch these kinds of files, but on the other I don't want to impact performance.

I suppose we can run a small performance test of reading the contents of a directory of 10000 glyphs vs. loading the contents.plist file. The list can then be cached in UFOReader.

I don't want a script editing a UFO to have to skim 65000 glyphs.

I'd like to point out Norad again. The original author tried loading a 60k+ glyph CJK UFO on his relatively recent MacBook after he implemented parallel loading and said loading that takes ~2-3.5 seconds (linebender/norad#50). This is also why Norad currently does not implement any lazy loading, it doesn't pay off. I want to look into making a ufoLib-compatible frontend for Norad as part of work, after I'm done with other more urgent matters.

@typesupply
Copy link
Contributor

FWIW, I strongly disagree with the idea of making all fontinfo.plist fields mandatory. It would break many, many things. For example, if I'm making a font that will have CFF outlines, there is no need for me to set the TTF specific values. That's unnecessary overhead. Another example, in many cases the naming data and other identifying fields can be automatically built from the familyName and styleName values. The other naming fields are there for the cases when automation is not possible. In this case, the data would not only be unnecessary. It would also be a potential vector for errors since not having the value often mens "let the compiler handle this" and what the compilers do is not always documented.

The current Norad code deserializes this to Glyph.outline = Some(empty outline struct). Ufonormalizer will remove empty outline elements, which Norad would then deserialize to Glyph.outline = None. Are these to be considered equal?

Yes. No <outline> is the same thing as an empty <outline>. <outline> is just a container.

Ufonormalizer thus implicitly does more than formatting, so maybe its semantic changes should be part of ufoLib instead.

I agree that what ufoNormalizer does needs to be documented. I think it's fine for the behavior to be defined there instead of ufoLib since there are more UFO writers than ufoLib and the point of the normalizer is to abstract away the differences between writers.

@madig
Copy link
Contributor

madig commented Jul 17, 2020

FWIW, I strongly disagree with the idea of making all fontinfo.plist fields mandatory.

(Just in case this is directed at "participating UFOs have the same interpolating fontinfo fields explicitly defined": to be clear, the check will only complain if e.g. one UFO has italicAngle set while another doesn't. Not all interpolating fields must be present, but if they are present in one, they must be present in all participating UFOs to avoid undefined behavior.)

@justvanrossum
Copy link
Contributor

Possibly naive question: does "diff-ability" mean something different from "patch-ability"?

For example, I feel that the diff-ability of contents.plist is fine, but since it's prone to merge conflicts, its patch-ability is rather poor. Or is the term diff-ability meant to cover that, too?

@alerque
Copy link
Author

alerque commented Jul 21, 2020

@justvanrossum In the sense I intended to use the broad sense covering all the related issues of diffing, patching, merging, etc. Perhaps this could be re-worded "play nice with‌ VCS". An actual literal diff is probably the easy part, even binary formats can be diffed. Patching isn't really any different, if you can generate a diff that is a patch. The hard part is patch conflict resolution. For that to work effectively you need to keep you diffs as noise-free as possible. Hence it all starts with changes being represented in consistent, clear, and minimal diffs.

@justvanrossum
Copy link
Contributor

Got it, thanks for the elaboration. I'm happy to use "diff-ability" with your intended meaning.

@moyogo
Copy link
Collaborator

moyogo commented Jul 21, 2020

Besides making contents.plist optional, sorting the key-value pairs in contents.plist, or other glyph-keyed lists, by the public.glyphOrder instead of the alphabetical order would be another improvement.

The lib public.glyphOrder tends to have less conflicts when several designers are working at the same time than alphabetical glyph-keyed list like contents.plist, public.postscriptName, etc. Of course that is only true if the glyphOrder defined and not organized alphabetically.

@justvanrossum
Copy link
Contributor

The glyph order is such a trivial detail of the final font (that is often used as a bit of UI state, too) that I would be quite uncomfortable to use it in places like that at all.

It also implies that changes in glyph order will be reflected in more than one place.

@alerque
Copy link
Author

alerque commented Jul 21, 2020

It also implies that changes in glyph order will be reflected in more than one place.

Having data structures that change in multiple places for a singe discreet change is exactly the kind of thing we need to avoid in order to play nice with‌ VCS. A source format with good diff-ability will have zero such cases. You can certainly make use of them in apps in the form of cached values, but there should be a clear dividing line between canonical source data (stored in VCS) and derived data structures (whether in memory or on disk) that store the same data in a different form.

Obviously there are limits to this: if you have data that references other objects you have to have some kind of key obviously. If whatever that key is changes, then both the source and references will need to be updated to match across the whole source. But (for example) a key could be a value in a file or it could be in the filename, but it should never be both at once. If a file has key data in the file name, that same data shouldn't appear in the file. If the data is in the file, the file name should be generic and not need to change when the data does.

That's an abstract example, but the principle seems to be eschewed in several places in UFO right now, contents.plist being the most obvious.

@moyogo
Copy link
Collaborator

moyogo commented Jul 21, 2020

@justvanrossum
Good point. The issue with alphabetical order is that is is prone to VCS conflicts. I guess that’s one the things that can’t be avoided without adapting ones workflow.

@typesupply
Copy link
Contributor

To keep things on track towards making this discussion actionable, is there consensus on these?

  1. Define the formatting and changes that ufoNormalizer applies and make then a "recommendation" in the spec.
  2. Make contents.plist optional. Define behavior for when it isn't present and requirements for when it is.
  3. Add a note to the spec that any new top level files should be optional like contents.plist when possible.

Item 2 is one that we would need to carefully deploy since it would be a breaking change. I took a quick look at ufoLib.glifLib as a test and it operates under the principle that contents.plist exists. So, any tool using any version of that before it is updated to handle missing contents.plist will have unexpected behavior.

@typesupply typesupply added considering Specification change under consideration. infrastructure Administrative and website issues. proposal Proposed specification change. labels Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
considering Specification change under consideration. infrastructure Administrative and website issues. proposal Proposed specification change.
Projects
None yet
Development

No branches or pull requests

7 participants