Skip to content

Initial draft defining syntax, semantics of controlling expressions #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

gklimowicz
Copy link
Member

We describe a subset of the C constant-expression syntax for use in controlling expressions. Expression evaluation itself follows Fortran arithmetic expression semantics.

Note that the tables are a bit terse as we try to keep the line length less than the 75-character limit for J3 papers.

We describe a subset of the C constant-expression syntax for
use in controlling expressions. Expression evaluation itself
follows Fortran arithmetic expression semantics.

Note that the tables are a bit terse as we try to keep the
line length less than the 75-character limit for J3 papers.
@gklimowicz gklimowicz requested review from bonachea, kc9jud and aury6623 May 8, 2025 03:28
@gklimowicz
Copy link
Member Author

This is pretty rough, but I wanted to produce something earlier rather than later. I have to go off for a day or so and work on other assignments for classes.

gklimowicz and others added 2 commits May 8, 2025 05:56
Co-authored-by: Patrick Fasano <[email protected]>
Copy link
Collaborator

@bonachea bonachea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gklimowicz for making a start on this tricky area!

Initial set of feedback:

Comment on lines +601 to +602
Since expression evaluation occurs *after* token expansion, there will
be no object-like macros or function-like macros left to evaluate. All
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"token expansion" is not a defined concept.

In CPP the correct term is "macro expansion".

Also CPP macros are never "evaluated": they are either "expanded" or "replaced"

Suggested change
Since expression evaluation occurs *after* token expansion, there will
be no object-like macros or function-like macros left to evaluate. All
Since expression evaluation occurs *after* macro expansion, there will
be no object-like macro or function-like macro invocations left to expand. All

Comment on lines +603 to +604
instances of ID or ID (args) will all have been replaced with their
expansions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last sentence falsely implies there will be no instances of ID after expansion. This is misleading and actually quite common, with code like:

#if ___GNUC___

which is shorthand to test whether ___GNUC___ is defined to a non-zero value.

This works because of 6.10.2-13 (emphasis added):

Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. If the token defined is generated as a result of this replacement process or use of the defined unary operator does not match one of the two specified forms prior to macro replacement, the behavior is undefined. After all replacements due to macro expansion and evaluations of defined macro expressions, has_include expressions, has_embed expressions, and has_c_attribute expressions have been performed, all remaining identifiers other than true (including those lexically identical to keywords such as false) are replaced with the pp-number 0, true is replaced with pp-number 1, and then each preprocessing token is converted into a token.

We'll need similar rules (ignoring the C23 features we are not keeping) to explain the replacement of any ID with 0 after expansion.

Comment on lines +611 to +612
| ID | The expansion of the object-like macro ID |
| ID (args) | The expansion of the function-like macro ID |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why are ID and ID(args) listed as primary expressions here. The paragraph immediately above has just explained that macros have already been expanded away during evaluation of conditional expressions, so macro invocations are NOT Primary expressions in the post-expansion expression grammar.

Listing them here "for completeness" is not helpful, it's just plain wrong. No conditional expression evaluation whatsoever is performed until after macros are completely expanded, and the pre-expansion text may look nothing like a valid conditional expression.

Here is a valid input example demonstrating what I mean:

#define LPAREN (
#define RPAREN )
#define ONE_PLUS 1 +

#if ONE_PLUS ZERO * LPAREN ONE_PLUS 4 RPAREN
integer :: tada
#endif

Pre-expansion, the list of tokens in the expression above looks like:

#if ID ID * ID ID WHOLE_NUMBER ID

post-expansion it looks like this:

#if WHOLE_NUMBER + WHOLE_NUMBER * ( WHOLE_NUMBER + WHOLE_NUMBER )

So wildly different that it's not useful to talk about grammar of the conditional expressions prior to expansion (aside from the bare minimum required to delineate arguments in FLM invocations).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's only describe the post-expansion grammar, and not the pre-expansion grammar. There is no such pre-expansion grammar. Dan points out that undefined ID replacement has to be done after processing of ## tokens.

Comment on lines +679 to +681
| | defined | defined ID | nonassoc | 1 if the identifier |
| | | | | has a #defined value, |
| | | | | 0 otherwise |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, this is just plain wrong.
defined cannot appear in this table, because it needs to be applied AFTER macro expansion and BEFORE ID replacement with zero. Hence the defined operator (as in CPP) must be resolved and replaced before this post-expansion grammar is applied.

| ID | The expansion of the object-like macro ID |
| ID (args) | The expansion of the function-like macro ID |
| WHOLE_NUMBER | Decimal value of WHOLE_NUMBER |
| ( expr ) | Parenthesized expressions |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesized expressions are listed in the operator table below, so listing them here is redundant.

After macro expansion and ID-replacement, I believe the only "primaries" left in valid conditional expressions should be WHOLE_NUMBER, and the operators in the table below combining them (which includes defined as an operator).

In short, I suggest we delete this "primary table" entirely and replace it with a statement to that effect.

| ID (args) | The expansion of the function-like macro ID |
| WHOLE_NUMBER | Decimal value of WHOLE_NUMBER |
| ( expr ) | Parenthesized expressions |
|--------------+---------------------------------------------|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor aside: C23 6.10.2-13 perversely also allows single-character character constants as "primaries" in conditional expressions, example from C23:

#if ’z’ - ’a’ == 25

however their exact CPP evaluation semantics are implementation-defined, which means their use is not guaranteed to be portable. I don't believe I've ever seen this bizarre "feature" used in practice.

I suspect this is some weird legacy holdover in CPP and unless someone provides a strong rationale for their inclusion I think FPP should prohibit character constants in conditional expressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, character constants in conditional expressions should be a (pathological) processor-dependent extension.

Comment on lines +662 to +665
| | > | e1 > e2 | nonassoc | 1 if e1 > e2, 0 otherwise |
| | >= | e1 >= e2 | nonassoc | 1 if e1 >= e2, 0 otherwise |
| | < | e1 < e2 | nonassoc | 1 if e1 < e2, 0 otherwise |
| | <= | e1 <= e2 | nonassoc | 1 if e1 <= e2, 0 otherwise |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C (and hence CPP) relational-expression operators are left associative.

Example:

#if 0 > -4 > 0
left-assoc
#else
right-assoc
#endif

Expands to "left-assoc" in CPP.

However it appears some FPPs may (unintentionally?) diverge on this detail. Another great example why we need standardization!

Suggested change
| | > | e1 > e2 | nonassoc | 1 if e1 > e2, 0 otherwise |
| | >= | e1 >= e2 | nonassoc | 1 if e1 >= e2, 0 otherwise |
| | < | e1 < e2 | nonassoc | 1 if e1 < e2, 0 otherwise |
| | <= | e1 <= e2 | nonassoc | 1 if e1 <= e2, 0 otherwise |
| | > | e1 > e2 | left | 1 if e1 > e2, 0 otherwise |
| | >= | e1 >= e2 | left | 1 if e1 >= e2, 0 otherwise |
| | < | e1 < e2 | left | 1 if e1 < e2, 0 otherwise |
| | <= | e1 <= e2 | left | 1 if e1 <= e2, 0 otherwise |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put them as non-associative in the Fortran preprocessor, for potential future compatibility if we ever add Fortran operators. The Fortran operators are non-associative, so we might want to disallow expressions like 0 > -4 > 0 now. (And I think chaining relational operators is terrible, but that's just my opinion. If I saw that in C code myself, I would probably replace it with something people could immediately understand.)


| Prec | Op | Syntax | Assoc'y | Evaluation Semantics |
|------+---------+--------------+----------+----------------------------|
| low | ? : | e1 ? e2 : e3 | right | conditional-expr |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note in passing that CPP also allows the comma operator in conditional expressions (at lower priority than conditional-expression), although it's pretty pointless in preprocessor expressions and I'm not aware of any compelling use cases.

For this reason we (implicitly) omitted it from the requirements doc in 25-114r2. I'm only raising it now in case someone has a compelling argument to include it (something other than strict compatibility with CPP), otherwise I'm fine dropping it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left them out on purpose, but didn't have a strong reason to do so.

I may be missing a subtlety here.

In general, conditional expression evaluation is side-effect free. So, elaborating

#if (my_complicated_expression, my_other_expression)

results in only the value of my_other_expression affecting the #if. my_complicated_expression may be evaluated, but its result is throw away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the slippery slope in the C grammar, where the CPP conditional expressions start at conditional-expression, which I think can unfold down to primary-expression, which then includes the ( expression ) which brings in the whole barnyard of comma-expressions and assignment-expression.

So I just chopped those rules out of the grammar.

@gak
Copy link

gak commented May 12, 2025

Thanks @gak for making a start on this tricky area!

I'd like to take credit but you pinged the wrong person :)

Comment on lines +652 to +654
| | ¦¦ | e1 || e2 | left | Fortran .OR. |
|------+---------+--------------+----------+----------------------------|
| | && | e1 && e2 | left | Fortran .AND. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 5/12 call we resolved these should be short-circuit evaluation as in CPP, to allow things like:

#if  x && 1/x
#endif

which means they are NOT simply Fortran .OR. / .AND.

Comment on lines +679 to +681
| | defined | defined ID | nonassoc | 1 if the identifier |
| | | | | has a #defined value, |
| | | | | 0 otherwise |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, this is just plain wrong.
defined cannot appear in this table, because it needs to be applied AFTER macro expansion and BEFORE ID replacement with zero. Hence the defined operator (as in CPP) must be resolved and replaced before this post-expansion grammar is applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants