Skip to content

Initial draft defining syntax, semantics of controlling expressions #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
125 changes: 120 additions & 5 deletions drafts/25-xxx-specifications.txt
Original file line number Diff line number Diff line change
Expand Up @@ -569,11 +569,126 @@ nd20. The result of evaluating a processor-dependent directive is processor-depe
5 Expressions allowed in #if and #elif directives
=================================================

5.1 The 'defined' operator
--------------------------

6 Expression evaluation in #if and #elif directives
===================================================
ex05. Controlling expressions are made up of "primary expressions" and
operators applied to subexpressions.

ex10. When evaluating a #if or #elif directive, preprocessing
evaluates and expands all object-like macros and function-like
macro invocations to create a token-list of the expression to be
evaluated.

ex15. The resulting list of tokens shall be a valid expression
comprised of primary expressions and operators as described
below.

ex17. Preprocessing computes the integer value of conditional expressions
using the greatest integer range available to the processor to
determine the truth or falsity of the controlling expression.

ex20. The processor shall reject a program if evaluation of
the expression generates a computational error (such
as divide by zero).

ex25. When the expression evaluates to zero, the controlling expression
will be considered "false". If the expression evaluates to
any non-zero value, the controlling expression will be considered
"true".


5.1 Primary expressions
-----------------------

Preprocessing recognizes the following primary expressions in
controlling expressions.

Since expression evaluation occurs *after* macro expansion, there will
be no object-like macro or function-like macro invocations left to expand. All
instances of ID or ID (args) will all have been replaced with their
expansions.
Comment on lines +606 to +607
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last sentence falsely implies there will be no instances of ID after expansion. This is misleading and actually quite common, with code like:

#if ___GNUC___

which is shorthand to test whether ___GNUC___ is defined to a non-zero value.

This works because of 6.10.2-13 (emphasis added):

Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. If the token defined is generated as a result of this replacement process or use of the defined unary operator does not match one of the two specified forms prior to macro replacement, the behavior is undefined. After all replacements due to macro expansion and evaluations of defined macro expressions, has_include expressions, has_embed expressions, and has_c_attribute expressions have been performed, all remaining identifiers other than true (including those lexically identical to keywords such as false) are replaced with the pp-number 0, true is replaced with pp-number 1, and then each preprocessing token is converted into a token.

We'll need similar rules (ignoring the C23 features we are not keeping) to explain the replacement of any ID with 0 after expansion.


We list them here for completeness in describing the illustrative
syntax of controlling expressions.

| Primary | Evaluation semantics |
|--------------+---------------------------------------------|
| ID | The expansion of the object-like macro ID |
| ID (args) | The expansion of the function-like macro ID |
Comment on lines +614 to +615
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why are ID and ID(args) listed as primary expressions here. The paragraph immediately above has just explained that macros have already been expanded away during evaluation of conditional expressions, so macro invocations are NOT Primary expressions in the post-expansion expression grammar.

Listing them here "for completeness" is not helpful, it's just plain wrong. No conditional expression evaluation whatsoever is performed until after macros are completely expanded, and the pre-expansion text may look nothing like a valid conditional expression.

Here is a valid input example demonstrating what I mean:

#define LPAREN (
#define RPAREN )
#define ONE_PLUS 1 +

#if ONE_PLUS ZERO * LPAREN ONE_PLUS 4 RPAREN
integer :: tada
#endif

Pre-expansion, the list of tokens in the expression above looks like:

#if ID ID * ID ID WHOLE_NUMBER ID

post-expansion it looks like this:

#if WHOLE_NUMBER + WHOLE_NUMBER * ( WHOLE_NUMBER + WHOLE_NUMBER )

So wildly different that it's not useful to talk about grammar of the conditional expressions prior to expansion (aside from the bare minimum required to delineate arguments in FLM invocations).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's only describe the post-expansion grammar, and not the pre-expansion grammar. There is no such pre-expansion grammar. Dan points out that undefined ID replacement has to be done after processing of ## tokens.

| WHOLE_NUMBER | Decimal value of WHOLE_NUMBER |
| ( expr ) | Parenthesized expressions |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesized expressions are listed in the operator table below, so listing them here is redundant.

After macro expansion and ID-replacement, I believe the only "primaries" left in valid conditional expressions should be WHOLE_NUMBER, and the operators in the table below combining them (which includes defined as an operator).

In short, I suggest we delete this "primary table" entirely and replace it with a statement to that effect.

|--------------+---------------------------------------------|



5.1 Operators allowed in controlling expressions
------------------------------------------------

To maintain compatibility with the use of C preprocessing directives
in many existing Fortran programs, the operators allowed in
controlling expressions in #if and #elif expressions are a subset of
those defined in the C 2023 standard §6.5 "Expressions" and §6.6
"Constant expressions".

A "precedence level" is assigned to each operator that determines
how the operators combine with sub-expressions containing other operators
at different precedence levels.

An "associativity" is assigned to each operator that determines how
operators at the same precedence level are combined.
- "left" means the operator binds to the left,
- "right" means the operator binds to the right.
- "nonassoc" means that the operator is not associative.

The following table describes the semantics of the allowed operators
in conditional expressions. The table is grouped by precedence level,
from lowest precedence to highest.

We label subexpressions "e1", "e2", and "e3" to aid in describing the
evaluation semantics. Unless otherwise specified, all operators
evaluate with the same semantics as their Fortran counterparts.


| Prec | Op | Syntax | Assoc'y | Evaluation Semantics |
|------+---------+--------------+----------+----------------------------|
| low | ? : | e1 ? e2 : e3 | right | conditional-expr |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note in passing that CPP also allows the comma operator in conditional expressions (at lower priority than conditional-expression), although it's pretty pointless in preprocessor expressions and I'm not aware of any compelling use cases.

For this reason we (implicitly) omitted it from the requirements doc in 25-114r2. I'm only raising it now in case someone has a compelling argument to include it (something other than strict compatibility with CPP), otherwise I'm fine dropping it.

Copy link
Member Author

@gklimowicz gklimowicz May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left them out on purpose, but didn't have a strong reason to do so.

I may be missing a subtlety here.

In general, conditional expression evaluation is side-effect free. So, elaborating

#if (my_complicated_expression, my_other_expression)

results in only the value of my_other_expression affecting the #if. my_complicated_expression may be evaluated, but its result is thrown away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the slippery slope in the C grammar, where the CPP conditional expressions start at conditional-expression, which I think can unfold down to primary-expression, which then includes the ( expression ) which brings in the whole barnyard of comma-expressions and assignment-expression.

So I just chopped those rules out of the grammar.

|------+---------+--------------+----------+----------------------------|
| | ¦¦ | e1 || e2 | left | Fortran .OR. |
|------+---------+--------------+----------+----------------------------|
| | && | e1 && e2 | left | Fortran .AND. |
Comment on lines +654 to +656
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 5/12 call we resolved these should be short-circuit evaluation as in CPP, to allow things like:

#if  x && 1/x
#endif

which means they are NOT simply Fortran .OR. / .AND.

|------+---------+--------------+----------+----------------------------|
| | ¦ | e1 | e2 | left | Fortran IOR(e1, e2) |
|------+---------+--------------+----------+----------------------------|
| | ^ | e1 ^ e2 | left | Fortran IAND(e1, e2) |
|------+---------+--------------+----------+----------------------------|
| | & | e1 & e2 | left | Fortran IEOR(e1, e2) |
|------+---------+--------------+----------+----------------------------|
| | == | e1 == e2 | left | 1 if e1 == e2, 0 otherwise |
| | != | e1 != e2 | left | 1 if e1 /= e2, 0 otherwise |
|------+---------+--------------+----------+----------------------------|
| | > | e1 > e2 | left | 1 if e1 > e2, 0 otherwise |
| | >= | e1 >= e2 | left | 1 if e1 >= e2, 0 otherwise |
| | < | e1 < e2 | left | 1 if e1 < e2, 0 otherwise |
| | <= | e1 <= e2 | left | 1 if e1 <= e2, 0 otherwise |
|------+---------+--------------+----------+----------------------------|
| | << | e1 << e2 | left | Fortran ISHFT(e1, e2) |
| | >> | e1 >> e2 | left | Fortran ISHFT(e1, -e2) |
|------+---------+--------------+----------+----------------------------|
| | + | e1 + e2 | left | + |
| | - | e1 - e2 | left | - |
|------+---------+--------------+----------+----------------------------|
| | * | e1 * e2 | left | * |
| | / | e1 / e2 | left | / |
| | % | e1 % e2 | left | Fortran MOD(e1, e2) |
|------+---------+--------------+----------+----------------------------|
| | unary + | + e1 | right | unary + |
| | unary - | - e1 | right | unary - |
| | unary ~ | ~ e1 | right | Fortran NOT(e1) |
| | unary ! | ! e1 | right | 1 if e1 == 0, 0 otherwise |
| | defined | defined ID | nonassoc | 1 if the identifier |
| | | | | has a #defined value, |
| | | | | 0 otherwise |
Comment on lines +686 to +688
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, this is just plain wrong.
defined cannot appear in this table, because it needs to be applied AFTER macro expansion and BEFORE ID replacement with zero. Hence the defined operator (as in CPP) must be resolved and replaced before this post-expansion grammar is applied.

|------+---------+--------------+----------+----------------------------|
| high | ( e1 ) | | N/A | e1 |
|------+---------+--------------+----------+----------------------------|


7 Predefined macros
Expand Down