-
Notifications
You must be signed in to change notification settings - Fork 4
Initial draft defining syntax, semantics of controlling expressions #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Initial draft defining syntax, semantics of controlling expressions #65
Conversation
We describe a subset of the C constant-expression syntax for use in controlling expressions. Expression evaluation itself follows Fortran arithmetic expression semantics. Note that the tables are a bit terse as we try to keep the line length less than the 75-character limit for J3 papers.
This is pretty rough, but I wanted to produce something earlier rather than later. I have to go off for a day or so and work on other assignments for classes. |
Co-authored-by: Patrick Fasano <[email protected]>
Co-authored-by: Patrick Fasano <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gklimowicz for making a start on this tricky area!
Initial set of feedback:
Since expression evaluation occurs *after* token expansion, there will | ||
be no object-like macros or function-like macros left to evaluate. All |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"token expansion" is not a defined concept.
In CPP the correct term is "macro expansion".
Also CPP macros are never "evaluated": they are either "expanded" or "replaced"
Since expression evaluation occurs *after* token expansion, there will | |
be no object-like macros or function-like macros left to evaluate. All | |
Since expression evaluation occurs *after* macro expansion, there will | |
be no object-like macro or function-like macro invocations left to expand. All |
instances of ID or ID (args) will all have been replaced with their | ||
expansions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last sentence falsely implies there will be no instances of ID after expansion. This is misleading and actually quite common, with code like:
#if ___GNUC___
which is shorthand to test whether ___GNUC___
is defined to a non-zero value.
This works because of 6.10.2-13 (emphasis added):
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. If the token defined is generated as a result of this replacement process or use of the defined unary operator does not match one of the two specified forms prior to macro replacement, the behavior is undefined. After all replacements due to macro expansion and evaluations of defined macro expressions, has_include expressions, has_embed expressions, and has_c_attribute expressions have been performed, all remaining identifiers other than true (including those lexically identical to keywords such as false) are replaced with the pp-number 0, true is replaced with pp-number 1, and then each preprocessing token is converted into a token.
We'll need similar rules (ignoring the C23 features we are not keeping) to explain the replacement of any ID with 0 after expansion.
| ID | The expansion of the object-like macro ID | | ||
| ID (args) | The expansion of the function-like macro ID | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why are ID and ID(args) listed as primary expressions here. The paragraph immediately above has just explained that macros have already been expanded away during evaluation of conditional expressions, so macro invocations are NOT Primary expressions in the post-expansion expression grammar.
Listing them here "for completeness" is not helpful, it's just plain wrong. No conditional expression evaluation whatsoever is performed until after macros are completely expanded, and the pre-expansion text may look nothing like a valid conditional expression.
Here is a valid input example demonstrating what I mean:
#define LPAREN (
#define RPAREN )
#define ONE_PLUS 1 +
#if ONE_PLUS ZERO * LPAREN ONE_PLUS 4 RPAREN
integer :: tada
#endif
Pre-expansion, the list of tokens in the expression above looks like:
#if ID ID * ID ID WHOLE_NUMBER ID
post-expansion it looks like this:
#if WHOLE_NUMBER + WHOLE_NUMBER * ( WHOLE_NUMBER + WHOLE_NUMBER )
So wildly different that it's not useful to talk about grammar of the conditional expressions prior to expansion (aside from the bare minimum required to delineate arguments in FLM invocations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's only describe the post-expansion grammar, and not the pre-expansion grammar. There is no such pre-expansion grammar. Dan points out that undefined ID replacement has to be done after processing of ##
tokens.
| | defined | defined ID | nonassoc | 1 if the identifier | | ||
| | | | | has a #defined value, | | ||
| | | | | 0 otherwise | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought, this is just plain wrong.
defined
cannot appear in this table, because it needs to be applied AFTER macro expansion and BEFORE ID replacement with zero. Hence the defined
operator (as in CPP) must be resolved and replaced before this post-expansion grammar is applied.
| ID | The expansion of the object-like macro ID | | ||
| ID (args) | The expansion of the function-like macro ID | | ||
| WHOLE_NUMBER | Decimal value of WHOLE_NUMBER | | ||
| ( expr ) | Parenthesized expressions | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parenthesized expressions are listed in the operator table below, so listing them here is redundant.
After macro expansion and ID-replacement, I believe the only "primaries" left in valid conditional expressions should be WHOLE_NUMBER, and the operators in the table below combining them (which includes defined
as an operator).
In short, I suggest we delete this "primary table" entirely and replace it with a statement to that effect.
| ID (args) | The expansion of the function-like macro ID | | ||
| WHOLE_NUMBER | Decimal value of WHOLE_NUMBER | | ||
| ( expr ) | Parenthesized expressions | | ||
|--------------+---------------------------------------------| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor aside: C23 6.10.2-13 perversely also allows single-character character constants as "primaries" in conditional expressions, example from C23:
#if ’z’ - ’a’ == 25
however their exact CPP evaluation semantics are implementation-defined, which means their use is not guaranteed to be portable. I don't believe I've ever seen this bizarre "feature" used in practice.
I suspect this is some weird legacy holdover in CPP and unless someone provides a strong rationale for their inclusion I think FPP should prohibit character constants in conditional expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, character constants in conditional expressions should be a (pathological) processor-dependent extension.
| | > | e1 > e2 | nonassoc | 1 if e1 > e2, 0 otherwise | | ||
| | >= | e1 >= e2 | nonassoc | 1 if e1 >= e2, 0 otherwise | | ||
| | < | e1 < e2 | nonassoc | 1 if e1 < e2, 0 otherwise | | ||
| | <= | e1 <= e2 | nonassoc | 1 if e1 <= e2, 0 otherwise | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C (and hence CPP) relational-expression operators are left associative.
Example:
#if 0 > -4 > 0
left-assoc
#else
right-assoc
#endif
Expands to "left-assoc" in CPP.
However it appears some FPPs may (unintentionally?) diverge on this detail. Another great example why we need standardization!
| | > | e1 > e2 | nonassoc | 1 if e1 > e2, 0 otherwise | | |
| | >= | e1 >= e2 | nonassoc | 1 if e1 >= e2, 0 otherwise | | |
| | < | e1 < e2 | nonassoc | 1 if e1 < e2, 0 otherwise | | |
| | <= | e1 <= e2 | nonassoc | 1 if e1 <= e2, 0 otherwise | | |
| | > | e1 > e2 | left | 1 if e1 > e2, 0 otherwise | | |
| | >= | e1 >= e2 | left | 1 if e1 >= e2, 0 otherwise | | |
| | < | e1 < e2 | left | 1 if e1 < e2, 0 otherwise | | |
| | <= | e1 <= e2 | left | 1 if e1 <= e2, 0 otherwise | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put them as non-associative in the Fortran preprocessor, for potential future compatibility if we ever add Fortran operators. The Fortran operators are non-associative, so we might want to disallow expressions like 0 > -4 > 0
now. (And I think chaining relational operators is terrible, but that's just my opinion. If I saw that in C code myself, I would probably replace it with something people could immediately understand.)
|
||
| Prec | Op | Syntax | Assoc'y | Evaluation Semantics | | ||
|------+---------+--------------+----------+----------------------------| | ||
| low | ? : | e1 ? e2 : e3 | right | conditional-expr | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll note in passing that CPP also allows the comma operator in conditional expressions (at lower priority than conditional-expression), although it's pretty pointless in preprocessor expressions and I'm not aware of any compelling use cases.
For this reason we (implicitly) omitted it from the requirements doc in 25-114r2. I'm only raising it now in case someone has a compelling argument to include it (something other than strict compatibility with CPP), otherwise I'm fine dropping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left them out on purpose, but didn't have a strong reason to do so.
I may be missing a subtlety here.
In general, conditional expression evaluation is side-effect free. So, elaborating
#if (my_complicated_expression, my_other_expression)
results in only the value of my_other_expression
affecting the #if
. my_complicated_expression
may be evaluated, but its result is throw away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the slippery slope in the C grammar, where the CPP conditional expressions start at conditional-expression, which I think can unfold down to primary-expression, which then includes the (
expression )
which brings in the whole barnyard of comma-expressions and assignment-expression.
So I just chopped those rules out of the grammar.
I'd like to take credit but you pinged the wrong person :) |
Co-authored-by: Dan Bonachea <[email protected]>
Co-authored-by: Dan Bonachea <[email protected]>
Co-authored-by: Patrick Fasano <[email protected]>
Co-authored-by: Dan Bonachea <[email protected]>
| | ¦¦ | e1 || e2 | left | Fortran .OR. | | ||
|------+---------+--------------+----------+----------------------------| | ||
| | && | e1 && e2 | left | Fortran .AND. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the 5/12 call we resolved these should be short-circuit evaluation as in CPP, to allow things like:
#if x && 1/x
#endif
which means they are NOT simply Fortran .OR. / .AND.
Co-authored-by: Dan Bonachea <[email protected]>
Co-authored-by: Dan Bonachea <[email protected]>
| | defined | defined ID | nonassoc | 1 if the identifier | | ||
| | | | | has a #defined value, | | ||
| | | | | 0 otherwise | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought, this is just plain wrong.
defined
cannot appear in this table, because it needs to be applied AFTER macro expansion and BEFORE ID replacement with zero. Hence the defined
operator (as in CPP) must be resolved and replaced before this post-expansion grammar is applied.
We describe a subset of the C constant-expression syntax for use in controlling expressions. Expression evaluation itself follows Fortran arithmetic expression semantics.
Note that the tables are a bit terse as we try to keep the line length less than the 75-character limit for J3 papers.