spec: add a formal definition of the source language #100

zerbina · 2024-12-27T23:08:58Z

Summary

Replace the informal definition of the source language's syntax and
semantics with a formal one. This provides the basis for deriving
proofs of correctness in the future, as well as for proving that the
language is type safe (among other properties).

Details

The formal definition attempts to capture the full current semantics of
the source language. To make the static and dynamic semantics simpler
to express, a language core is defined, which the source language
desugars to.

Same as before, the operational semantics of floating-point arithmetic
is not defined at the moment. In future work, it should be defined
according to the IEEE 754.2008 standard.

Where the formal definition (deliberately) differs from both the
informal one and currently implemented behaviour is integer overflow:
the new definition explicitly states that a panic is raised (i.e., the
program terminates), whereas the previous definition ignored the
possibility.

The old specification is kept around for now, as not all ideas in it
are formalized yet (because no implementation thereof exists at the
moment).

Notes For Reviewers

a partial reboot of add a meta-language for formal definitions #69, leaving out the macro DSL

In addition to the benefits stated above, a formal definition also allows for a far more structured approach to language design and development. A formal definition also makes complexity of a language feature somewhat measurable (i.e., through the number of typing judgment and reduction rules involving it) and easier to spot.

Summary ======= Replace the informal definition of the source language's syntax and semantics with a formal one. This provides the basis for deriving proofs of correctness in the future, as well as for proving that the language is *type safe* (among other properties). Details ======= The formal definition attempts to capture the full current semantics of the source language. To make the static and dynamic semantics simpler to express, a *language core* is defined, which the source language *desugars* to. Same as before, the operational semantics of floating-point arithmetic is not defined at the moment. In future work, it should be defined according to the IEEE 754.2008 standard. Where the formal definition (deliberately) differs from both the informal one and currently implemented behaviour is integer overflow: the new definition explicitly states that a panic is raised (i.e., the program terminates), whereas the previous definition ignored the possibility. The old specification is kept around for now, as not all ideas in it are formalized yet (because no implementation thereof exists at the moment).

zerbina · 2024-12-27T23:09:16Z

Here are some of the resources I used - while not necessarily having read them to completion -, which I think might also be helpful to others:

This article provides a good intro to the notations and terminology used for type deduction rules.

This lecture provides a good and approachable intro of some type relationship concepts and their formal notation, most prominently subtyping.

This lecture gives a good, short, and approachable intro to reduction semantics.

This paper presents a language core of JavaScript and the semantics thereof. It's an excellent resource on how to apply reduction semantics to a real-world language with mutable aggregates and locals and complex control-flow similar to that of NimSkull.

The Wasm specification is a good resource when wanting to look at an exhaustive application of both typing judgments and reduction semantics to a full (albeit low-level) language.

The formal definition of StandardML is another real-world application of static and dynamic semantics, but it's - in my opinion - not as approachable as the Wasm specification. Its type inference and module semantics could be of interest to Phy in the future.

The book "Practical Foundations for Programing Languages" (an abbreviated version is available online) provides an in-depth explanation on type systems, typing judgements, and semantics in general, as well as practical application thereof.

This recent paper presents what it calls logical type soundness, which - according to the authors - is able to explain when a term is safe to execute at a given type, even when the term uses unsafe language features, as opposed to only describing the syntactic structure of well-formed terms (syntactic type soundness). I haven't read it to the end yet, but I think this will become of relevance to Phy in the future. It also describes how to use reduction semantics in the context of multi-threaded programs.

Future Work

The next step is using a NimSkull macro DSL for providing the formal definition (refer to #69), which will make it easier to read, modify, verify, and process the definition. Processing includes things like rendering it into a properly-typeset text representation and mechanizing the typing judgements and reductions.

In the long term, it would be good if the macro produces NimSkull code implementing the typing judgments, steps, and reductions, but in the short term, it would also suffice if the macro produces PLT Redex code, which can then be used to make sure that phy and the specification agree on the types and values of the programs in the test suite.

* the `Frame` expression's deduction rule was ill-formed (the conclusion's relation was missing) * some conclusions were ill-formed (`|-` instead of `:`) * the empty module rule had no name

With the previous typing rules, type expressions were allowed where values are expected. Consider: ``` (TupleCons (TupleTy (IntTy))) ``` This was successfully judged to be of type `(TupleTy (type (TupleTy int)))`. Proof: ``` ------------- # S-int-type (IntTy) : int ---------------------------------------- # S-tuple-type (TupleTy (IntTy)) : (type (TupleTy int)) -------------------------------------------------------------- # S-tuple (TupleCons (TupleTy (IntTy))) : (TupleTy (type (TupleTy int))) ``` Type expression now use a separate set of rules, making them work like before.

A regression introduce while translating the informal to the formal definition.

Mutable tuples and field assignments didn't work as they did previously (too complicated to explain how and why). The changes *should* restore the previous behaviour.

The operational semantics didn't cover `Return`-without-operand.

zerbina · 2024-12-28T20:49:55Z

There were a few issues, especially around tuple semantics, which should all be fixed now.

saem

partial review

(nothing major thus far)

.github/workflows/build_and_test.yml

spec/specification.md

saem · 2024-12-28T23:12:21Z

spec/specification.md

+----------------------------------------- # S-type-ident
+C |-_t x : typ
+
+C |-_t e : typ ...  typ != void ...


I don't understand the trailing ... in these premises

Within deduction rules, I used them to mean "repeat judgment, side-condition, or whatever else they're trailing for every item the input pattern matched".

Consider the following made-up rule:

C |- e : int ... --------------------- C |- (Node e+) : node

e refers to one or more expressions here (because of the e+ in the conclusion's pattern), and the ellipsis following the judgment in the premise is meant to highlight that the judgment is repeated (if e matched more than one element, that is). In a way, this makes the rule somewhat of a rule template.

Ultimately, using a macro DSL is going to resolve these problems with ad-hoc, but if you have a suggestion for how to better express the above in the meantime, I'd be happy to hear.

I suspect I just have to get used to reading it more, I didn't quite connect the e+ in the conclusion informing the e in the premise

spec/specification.md

saem

I've read through it, I need more practice with reading judgements/conditions/etc but that's a me thing, this is a big step forward and the macro lang will be that much nicer, because we can probably rig some nice type setting if going for macro lang -> document

nice work!

saem · 2024-12-29T21:38:02Z

spec/specification.md

+----------------------------------------- # S-type-ident
+C |-_t x : typ
+
+C |-_t e : typ ...  typ != void ...


I suspect I just have to get used to reading it more, I didn't quite connect the e+ in the conclusion informing the e in the premise

spec/specification.md

spec/spec_old.md

Co-authored-by: Saem Ghani <[email protected]>

zerbina · 2024-12-30T17:39:03Z

Thank you for the review, @saem, it's much appreciated.

zerbina added 3 commits December 27, 2024 22:26

spec: minor rewording

38a06e8

koch: update the specification's file name

a490313

zerbina added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 27, 2024

zerbina requested a review from saem December 27, 2024 23:08

zerbina added 8 commits December 27, 2024 23:10

ci: update the reference to the source lang specification

9d99280

spec: fix the parameter declaration grammar

8ca29c0

spec: fix some minor issues

31f2fb7

* the `Frame` expression's deduction rule was ill-formed (the conclusion's relation was missing) * some conclusions were ill-formed (`|-` instead of `:`) * the empty module rule had no name

spec: minor formatting fixes

36e6116

spec: don't allow duplicate names for parameters

daf9bb7

A regression introduce while translating the informal to the formal definition.

spec: restore previous tuple behaviour

1c9ed28

Mutable tuples and field assignments didn't work as they did previously (too complicated to explain how and why). The changes *should* restore the previous behaviour.

spec: add missing Return step

d1bbcd9

The operational semantics didn't cover `Return`-without-operand.

saem reviewed Dec 29, 2024

View reviewed changes

saem approved these changes Dec 30, 2024

View reviewed changes

zerbina commented Dec 30, 2024

View reviewed changes

spec/spec_old.md Outdated Show resolved Hide resolved

fix a few typos

df09bb0

Co-authored-by: Saem Ghani <[email protected]>

zerbina merged commit d551fe5 into nim-works:main Dec 30, 2024
5 checks passed

zerbina deleted the textual-formal-definition branch December 30, 2024 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: add a formal definition of the source language #100

spec: add a formal definition of the source language #100

zerbina commented Dec 27, 2024

zerbina commented Dec 27, 2024

zerbina commented Dec 28, 2024

saem left a comment

saem Dec 28, 2024

zerbina Dec 29, 2024

saem Dec 29, 2024

saem left a comment

saem Dec 29, 2024

zerbina commented Dec 30, 2024

spec: add a formal definition of the source language #100

spec: add a formal definition of the source language #100

Conversation

zerbina commented Dec 27, 2024

Summary

Details

Notes For Reviewers

zerbina commented Dec 27, 2024

Future Work

zerbina commented Dec 28, 2024

saem left a comment

Choose a reason for hiding this comment

saem Dec 28, 2024

Choose a reason for hiding this comment

zerbina Dec 29, 2024

Choose a reason for hiding this comment

saem Dec 29, 2024

Choose a reason for hiding this comment

saem left a comment

Choose a reason for hiding this comment

saem Dec 29, 2024

Choose a reason for hiding this comment

zerbina commented Dec 30, 2024