-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: add a formal definition of the source language #100
Conversation
Summary ======= Replace the informal definition of the source language's syntax and semantics with a formal one. This provides the basis for deriving proofs of correctness in the future, as well as for proving that the language is *type safe* (among other properties). Details ======= The formal definition attempts to capture the full current semantics of the source language. To make the static and dynamic semantics simpler to express, a *language core* is defined, which the source language *desugars* to. Same as before, the operational semantics of floating-point arithmetic is not defined at the moment. In future work, it should be defined according to the IEEE 754.2008 standard. Where the formal definition (deliberately) differs from both the informal one and currently implemented behaviour is integer overflow: the new definition explicitly states that a panic is raised (i.e., the program terminates), whereas the previous definition ignored the possibility. The old specification is kept around for now, as not all ideas in it are formalized yet (because no implementation thereof exists at the moment).
Here are some of the resources I used - while not necessarily having read them to completion -, which I think might also be helpful to others: This article provides a good intro to the notations and terminology used for type deduction rules. This lecture provides a good and approachable intro of some type relationship concepts and their formal notation, most prominently subtyping. This lecture gives a good, short, and approachable intro to reduction semantics. This paper presents a language core of JavaScript and the semantics thereof. It's an excellent resource on how to apply reduction semantics to a real-world language with mutable aggregates and locals and complex control-flow similar to that of NimSkull. The Wasm specification is a good resource when wanting to look at an exhaustive application of both typing judgments and reduction semantics to a full (albeit low-level) language. The formal definition of StandardML is another real-world application of static and dynamic semantics, but it's - in my opinion - not as approachable as the Wasm specification. Its type inference and module semantics could be of interest to Phy in the future. The book "Practical Foundations for Programing Languages" (an abbreviated version is available online) provides an in-depth explanation on type systems, typing judgements, and semantics in general, as well as practical application thereof. This recent paper presents what it calls logical type soundness, which - according to the authors - is able to explain when a term is safe to execute at a given type, even when the term uses unsafe language features, as opposed to only describing the syntactic structure of well-formed terms (syntactic type soundness). I haven't read it to the end yet, but I think this will become of relevance to Phy in the future. It also describes how to use reduction semantics in the context of multi-threaded programs. Future WorkThe next step is using a NimSkull macro DSL for providing the formal definition (refer to #69), which will make it easier to read, modify, verify, and process the definition. Processing includes things like rendering it into a properly-typeset text representation and mechanizing the typing judgements and reductions. In the long term, it would be good if the macro produces NimSkull code implementing the typing judgments, steps, and reductions, but in the short term, it would also suffice if the macro produces PLT Redex code, which can then be used to make sure that |
* the `Frame` expression's deduction rule was ill-formed (the conclusion's relation was missing) * some conclusions were ill-formed (`|-` instead of `:`) * the empty module rule had no name
With the previous typing rules, type expressions were allowed where values are expected. Consider: ``` (TupleCons (TupleTy (IntTy))) ``` This was successfully judged to be of type `(TupleTy (type (TupleTy int)))`. Proof: ``` ------------- # S-int-type (IntTy) : int ---------------------------------------- # S-tuple-type (TupleTy (IntTy)) : (type (TupleTy int)) -------------------------------------------------------------- # S-tuple (TupleCons (TupleTy (IntTy))) : (TupleTy (type (TupleTy int))) ``` Type expression now use a separate set of rules, making them work like before.
A regression introduce while translating the informal to the formal definition.
Mutable tuples and field assignments didn't work as they did previously (too complicated to explain how and why). The changes *should* restore the previous behaviour.
The operational semantics didn't cover `Return`-without-operand.
There were a few issues, especially around tuple semantics, which should all be fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review
(nothing major thus far)
----------------------------------------- # S-type-ident | ||
C |-_t x : typ | ||
|
||
C |-_t e : typ ... typ != void ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the trailing ...
in these premises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within deduction rules, I used them to mean "repeat judgment, side-condition, or whatever else they're trailing for every item the input pattern matched".
Consider the following made-up rule:
C |- e : int ...
---------------------
C |- (Node e+) : node
e
refers to one or more expressions here (because of the e+
in the conclusion's pattern), and the ellipsis following the judgment in the premise is meant to highlight that the judgment is repeated (if e
matched more than one element, that is). In a way, this makes the rule somewhat of a rule template.
Ultimately, using a macro DSL is going to resolve these problems with ad-hoc, but if you have a suggestion for how to better express the above in the meantime, I'd be happy to hear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect I just have to get used to reading it more, I didn't quite connect the e+
in the conclusion informing the e
in the premise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through it, I need more practice with reading judgements/conditions/etc but that's a me thing, this is a big step forward and the macro lang will be that much nicer, because we can probably rig some nice type setting if going for macro lang -> document
nice work!
----------------------------------------- # S-type-ident | ||
C |-_t x : typ | ||
|
||
C |-_t e : typ ... typ != void ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect I just have to get used to reading it more, I didn't quite connect the e+
in the conclusion informing the e
in the premise
Co-authored-by: Saem Ghani <[email protected]>
Thank you for the review, @saem, it's much appreciated. |
Summary
Replace the informal definition of the source language's syntax and
semantics with a formal one. This provides the basis for deriving
proofs of correctness in the future, as well as for proving that the
language is type safe (among other properties).
Details
The formal definition attempts to capture the full current semantics of
the source language. To make the static and dynamic semantics simpler
to express, a language core is defined, which the source language
desugars to.
Same as before, the operational semantics of floating-point arithmetic
is not defined at the moment. In future work, it should be defined
according to the IEEE 754.2008 standard.
Where the formal definition (deliberately) differs from both the
informal one and currently implemented behaviour is integer overflow:
the new definition explicitly states that a panic is raised (i.e., the
program terminates), whereas the previous definition ignored the
possibility.
The old specification is kept around for now, as not all ideas in it
are formalized yet (because no implementation thereof exists at the
moment).
Notes For Reviewers
In addition to the benefits stated above, a formal definition also allows for a far more structured approach to language design and development. A formal definition also makes complexity of a language feature somewhat measurable (i.e., through the number of typing judgment and reduction rules involving it) and easier to spot.