-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of static probability tables format #8
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# About this document | ||
|
||
This is a Work in Progress. | ||
|
||
This document describes a format for storing Static Prediction Tables. | ||
|
||
It is possible that some of all of this format will be usable as part of | ||
individual compressed source files, to be determined. | ||
|
||
# About Static Prediction Tables (SPT) | ||
|
||
A Static Prediction Table is a form of external dictionary, shipped either | ||
with the JavaScript VM, or separately, and which may be referenced by any | ||
number of compressed Binary AST Source Files. | ||
|
||
Shipping Static Prediction Tables makes it possible to considerably reduce | ||
the size of individual compressed files. | ||
|
||
This document does not attempt to document how and when a SPT is loaded, | ||
or how and when an individual compressed file references a SPT. | ||
|
||
# Design guidelines | ||
|
||
- A SPT must be usable by many compressed source files. | ||
- A VM must be able to manage several SPTs simultaneously. | ||
- As JavaScript is a changing language, a SPT that is complete at a given point in time may not be expected to remain | ||
complete forever. | ||
- As Binary AST never reuses the same interface name for distinct purposes, a Path in the AST that is valid at a point | ||
in time will remain valid forever. | ||
- For upgrade purposes, a SPT may be defined as an amendment to another SPT. | ||
- A SPT may define additional strings of various natures. | ||
|
||
# Format | ||
|
||
## Header | ||
|
||
The header: | ||
- specifies the kind of file; | ||
- references the grammar version; | ||
- optionally, references a SPT file it amends. | ||
|
||
TBD | ||
|
||
## Tables of Strings | ||
|
||
These tables add new strings that may be referenced both in the tables of probabilities | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So far, the probability tables for strings just predict indexes into a move-to-front cache. We do not actually need to assign general probabilities to the string table itself - they will be predicted well after they are first referenced (and encoded using some varuint-encoding), and subsequently added to the MoveToFront String cache. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's true for string literals, identifier names and property keys. On the other hand, it's not true for interface names and string enums. I'll amend the text to clarify. |
||
and in the compressed files. | ||
|
||
### Table of property keys | ||
TBD | ||
|
||
### Table of string enum constants | ||
|
||
This table adds string enum constants. It is used when updating the JavaScript grammar. | ||
|
||
TBD | ||
|
||
### Table of interface names | ||
|
||
This table adds interface names. It is used when updating the JavaScript grammar. | ||
|
||
TBD | ||
|
||
### Table of literal strings | ||
TBD | ||
|
||
### Table of identifier names | ||
TBD | ||
|
||
|
||
## Tables of Probabilities | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can simplify the specification of probability tables by specifying that independently. We know that each probability table will specify the probabilities for a finite and relatively "small" set of symbols. For context-prediction of tree types, it's the set of schema-bounded types at that location. For string predictions, its the set Each table can be encoded simply as a series of 32-bit integers, where the sum of all entries are guaranteed to be less than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't get where anything is simplified. |
||
|
||
These tables increase or reset to 0 the number of instances of (value) at (path). | ||
|
||
Entries with depth N look like: | ||
- list of | ||
- the Path itself, as a list of exactly N entries of | ||
- InterfaceName (as a number, exact format to be determined) | ||
- Field index | ||
- the distribution at this Path, as a list of | ||
- Field index | ||
- Value (format to be determined) | ||
- Number of instances, where | ||
- 0 means that we remove this (Path, Field, Value) from the probability table | ||
- otherwise, if (Path, Field, Value) was in the probability table, we increase its previous number of instances | ||
- otherwise, we add (Path, Field, Value) to the probability table with `Number of instances` instances. | ||
|
||
The table of probabilities contains | ||
- One entry of depth 0, with a single field: the root. | ||
- Entries of depth 1, for possible children of the root. | ||
- Entries of depth 2, for possible grandchildren of the root. | ||
- ... | ||
- Entries of depth D, for all other nodes. | ||
|
||
|
||
TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this early stage, "delta" SPTs do not seem valuable to spend effort in speccing. They are off the fastpath anyway, and I can see them adding a lot of complexity to the spec. Let's leave the deltas until we actually feel we need them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the highest priority, but let's keep an eye on the road :)