Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: add a formal definition of the source language #100

Merged
merged 12 commits into from
Dec 30, 2024

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Dec 27, 2024

Summary

Replace the informal definition of the source language's syntax and
semantics with a formal one. This provides the basis for deriving
proofs of correctness in the future, as well as for proving that the
language is type safe (among other properties).

Details

The formal definition attempts to capture the full current semantics of
the source language. To make the static and dynamic semantics simpler
to express, a language core is defined, which the source language
desugars to.

Same as before, the operational semantics of floating-point arithmetic
is not defined at the moment. In future work, it should be defined
according to the IEEE 754.2008 standard.

Where the formal definition (deliberately) differs from both the
informal one and currently implemented behaviour is integer overflow:
the new definition explicitly states that a panic is raised (i.e., the
program terminates), whereas the previous definition ignored the
possibility.

The old specification is kept around for now, as not all ideas in it
are formalized yet (because no implementation thereof exists at the
moment).


Notes For Reviewers

In addition to the benefits stated above, a formal definition also allows for a far more structured approach to language design and development. A formal definition also makes complexity of a language feature somewhat measurable (i.e., through the number of typing judgment and reduction rules involving it) and easier to spot.

Summary
=======

Replace the informal definition of the source language's syntax and
semantics with a formal one. This provides the basis for deriving
proofs of correctness in the future, as well as for proving that the
language is *type safe* (among other properties).

Details
=======

The formal definition attempts to capture the full current semantics of
the source language. To make the static and dynamic semantics simpler
to express, a *language core* is defined, which the source language
*desugars* to.

Same as before, the operational semantics of floating-point arithmetic
is not defined at the moment. In future work, it should be defined
according to the IEEE 754.2008 standard.

Where the formal definition (deliberately) differs from both the
informal one and currently implemented behaviour is integer overflow:
the new definition explicitly states that a panic is raised (i.e., the
program terminates), whereas the previous definition ignored the
possibility.

The old specification is kept around for now, as not all ideas in it
are formalized yet (because no implementation thereof exists at the
moment).
@zerbina zerbina added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 27, 2024
@zerbina zerbina requested a review from saem December 27, 2024 23:08
@zerbina
Copy link
Collaborator Author

zerbina commented Dec 27, 2024

Here are some of the resources I used - while not necessarily having read them to completion -, which I think might also be helpful to others:

This article provides a good intro to the notations and terminology used for type deduction rules.

This lecture provides a good and approachable intro of some type relationship concepts and their formal notation, most prominently subtyping.

This lecture gives a good, short, and approachable intro to reduction semantics.

This paper presents a language core of JavaScript and the semantics thereof. It's an excellent resource on how to apply reduction semantics to a real-world language with mutable aggregates and locals and complex control-flow similar to that of NimSkull.

The Wasm specification is a good resource when wanting to look at an exhaustive application of both typing judgments and reduction semantics to a full (albeit low-level) language.

The formal definition of StandardML is another real-world application of static and dynamic semantics, but it's - in my opinion - not as approachable as the Wasm specification. Its type inference and module semantics could be of interest to Phy in the future.

The book "Practical Foundations for Programing Languages" (an abbreviated version is available online) provides an in-depth explanation on type systems, typing judgements, and semantics in general, as well as practical application thereof.

This recent paper presents what it calls logical type soundness, which - according to the authors - is able to explain when a term is safe to execute at a given type, even when the term uses unsafe language features, as opposed to only describing the syntactic structure of well-formed terms (syntactic type soundness). I haven't read it to the end yet, but I think this will become of relevance to Phy in the future. It also describes how to use reduction semantics in the context of multi-threaded programs.

Future Work

The next step is using a NimSkull macro DSL for providing the formal definition (refer to #69), which will make it easier to read, modify, verify, and process the definition. Processing includes things like rendering it into a properly-typeset text representation and mechanizing the typing judgements and reductions.

In the long term, it would be good if the macro produces NimSkull code implementing the typing judgments, steps, and reductions, but in the short term, it would also suffice if the macro produces PLT Redex code, which can then be used to make sure that phy and the specification agree on the types and values of the programs in the test suite.

* the `Frame` expression's deduction rule was ill-formed (the
  conclusion's relation was missing)
* some conclusions were ill-formed (`|-` instead of `:`)
* the empty module rule had no name
With the previous typing rules, type expressions were allowed where
values are expected. Consider:

```
(TupleCons (TupleTy (IntTy)))
```

This was successfully judged to be of type
`(TupleTy (type (TupleTy int)))`. Proof:
```
                    ------------- # S-int-type
                    (IntTy) : int
           ---------------------------------------- # S-tuple-type
           (TupleTy (IntTy)) : (type (TupleTy int))
-------------------------------------------------------------- # S-tuple
(TupleCons (TupleTy (IntTy))) : (TupleTy (type (TupleTy int)))
```

Type expression now use a separate set of rules, making them work like
before.
A regression introduce while translating the informal to the formal
definition.
Mutable tuples and field assignments didn't work as they did previously
(too complicated to explain how and why). The changes *should* restore
the previous behaviour.
The operational semantics didn't cover `Return`-without-operand.
@zerbina
Copy link
Collaborator Author

zerbina commented Dec 28, 2024

There were a few issues, especially around tuple semantics, which should all be fixed now.

Copy link
Contributor

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review

(nothing major thus far)

----------------------------------------- # S-type-ident
C |-_t x : typ

C |-_t e : typ ... typ != void ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the trailing ... in these premises

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within deduction rules, I used them to mean "repeat judgment, side-condition, or whatever else they're trailing for every item the input pattern matched".

Consider the following made-up rule:

C |- e : int ...
---------------------
C |- (Node e+) : node

e refers to one or more expressions here (because of the e+ in the conclusion's pattern), and the ellipsis following the judgment in the premise is meant to highlight that the judgment is repeated (if e matched more than one element, that is). In a way, this makes the rule somewhat of a rule template.

Ultimately, using a macro DSL is going to resolve these problems with ad-hoc, but if you have a suggestion for how to better express the above in the meantime, I'd be happy to hear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect I just have to get used to reading it more, I didn't quite connect the e+ in the conclusion informing the e in the premise

Copy link
Contributor

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through it, I need more practice with reading judgements/conditions/etc but that's a me thing, this is a big step forward and the macro lang will be that much nicer, because we can probably rig some nice type setting if going for macro lang -> document

nice work!

----------------------------------------- # S-type-ident
C |-_t x : typ

C |-_t e : typ ... typ != void ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect I just have to get used to reading it more, I didn't quite connect the e+ in the conclusion informing the e in the premise

Co-authored-by: Saem Ghani <[email protected]>
@zerbina
Copy link
Collaborator Author

zerbina commented Dec 30, 2024

Thank you for the review, @saem, it's much appreciated.

@zerbina zerbina merged commit d551fe5 into nim-works:main Dec 30, 2024
5 checks passed
@zerbina zerbina deleted the textual-formal-definition branch December 30, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants