Skip to content

Fix: correct ASI for arrow function followed by parenthesized expressions #354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jackschu
Copy link
Contributor

@jackschu jackschu commented May 27, 2025

Problem

Observed

The following parses incorrectly

()=>{}
(1)

Without this changeset, the grammar treats this as an arrow function that is being invoked.

(program [0, 0] - [1, 3]
  (expression_statement [0, 0] - [1, 3]
    (call_expression [0, 0] - [1, 3]
      function: (arrow_function [0, 0] - [0, 6]
        parameters: (formal_parameters [0, 0] - [0, 2])
        body: (statement_block [0, 4] - [0, 6]))
      arguments: (arguments [1, 0] - [1, 3]
        (number [1, 1] - [1, 2])))))

We can gut check that this is wrong by running the following in node

() => {console.log('this never logs because the function is never invoked')}
(console.log('this logs'))

Expected

Instead this should parse as an arrow function, an automatically injected semicolon, and a parenthesized number, eg

(program [0, 0] - [1, 3]
  (expression_statement [0, 0] - [0, 6]
    (arrow_function [0, 0] - [0, 6]
      parameters: (formal_parameters [0, 0] - [0, 2])
      body: (statement_block [0, 4] - [0, 6])))
  (expression_statement [1, 0] - [1, 3]
    (parenthesized_expression [1, 0] - [1, 3]
      (number [1, 1] - [1, 2]))))

Solution

Background

Ill be using PascalCase to indicate spec nouns and snake_case to indicate tree-sitter-javascript nouns.

This grammar considers two levels of expressions. expression and primary_expression. These both exist in the ecmascript spec, but are obviously not perfect representations of their highly-granular spec counterparts (this in intentional as the tree-sitter docs point out).

The spec considers a third kind of expression called MemberExpression. This is just one level above PrimaryExpression in the spec and includes things that our primary_expression already includes (property access, subscript expression, meta, super, etc.).

Change: Reorganize Expressions

It turns out that there are several issues in the way of handling this correctly, all of which I tackle in this PR. Our high level goal is to make ()=>{}(1) parse as an Error so ASI is forced to kick in if there were to be a newline in there.

  1. call_expressions currently accept expression as its body, the spec has CallExpression accept basically exclusively MemberExpression as its body. This diff moves call_expression to accept exclusively primary_expression. On its own this doesn't clearly correct any issue directly, but it moves us towards a consistent view a top the spec and it'll be important for use in later steps.
  2. Step 1. forces the issue that new_expression is considered to be considered a primary_expression, but the spec considers it to be a MemberExpression. So productions like new foo()() won't parse but should. So I move new_expression to be considered a primary_expression (after all it seems like in spirit our primary_expression is a MemberExpressoin).
    • At this point, we can tackle the niche precedence rule where new with no args is lower precedence than call expressions but new with args is higher precedence (src).
    • Interestingly this properly places template call expressions properly at around precedence level 16 (src) so we can drop template_call as a prec level (added in Fix: Operator precedence rules for tagged template functions #336 )
    • This also means that new_expression no longer needs to produce itself, since primary_expression is sufficient, so we can drop prec.right
  3. arrow functions are currently considered a primary_expression, they are not a PrimaryExpression in the spec, and are not a MemberExpression either, so I move them to expression.

Change: Make ()=>{}(1) illegal

At this point we've made ()=>{}(1) parse as an arrow function with an invoked object as its result. This is progress over it being a call expression with an arrow function as a body, because we're now in a scenario where this is a clearly illegal production. This is clear because the spec requires that ConciseBody (aka the shorthand return result format) for ArrowFunctions don't start with {.

Here I stand-up an external scanner for '=>' <not followed by {> which im calling $._shorthand_arrow,. And I use this in arrow_function to make ()=>{}(1) take the statement_body path rather than the expression shorthand path. I tried pretty hard to do this via precedences but couldn't get it to work (if someone could explain to me why that's pretty impossible or intuitively hard I'd appreciate that).

Change: Allow ASI before '(' sometimes

At this point ()=>{}(1) does error, but ASI still doesnt trigger for the newline version.
This is because we blanketly consider seeing ( as the next symbol to void ASI. This is useful for f\n(1) but seems misguided.

ASI basically asks "is the next token valid" and only kicks in if its not (src). So I expose ( to the external scanner and void ASI only if ( is a valid. I do the same thing to [ preemptively because I assume it has the same problem.

At this point it all parses correctly :)

@jackschu jackschu marked this pull request as ready for review May 27, 2025 01:55
@clason
Copy link

clason commented May 27, 2025

The test failures are due to the fact that you're generating the parser with ABI 15, which the bindings are not compatible with. Please use tree-sitter g --abi=14 for this PR.

Also, please update the reference queries to the grammar change, if necessary.

@clason
Copy link

clason commented May 27, 2025

Ugh, CI forces generation...

Can you add generate: false after this line?

(This should be a separate ci(test): don't validate generation commit before your -- squashed! -- other commit.)

@clason
Copy link

clason commented May 27, 2025

please use conventional commits, and squash and reorder your commits before this is ready to merge

@jackschu
Copy link
Contributor Author

Oh jinx
@clason it looks like it still fails, complaining that re generating creates a diff because tree-sitter/parser-test-action@v2 defaults to 15 and inputs.generate

It looks like despite what the release notes say, this can't be overridden via tree-sitter.json

        let abi_version =
            self.abi_version
                .as_ref()
                .map_or(DEFAULT_GENERATE_ABI_VERSION, |version| {
                    if version == "latest" {
                        tree_sitter::LANGUAGE_VERSION
                    } else {
                        version.parse().expect("invalid abi version flag")
                    }
                });

@clason
Copy link

clason commented May 27, 2025

Yes, hence my asking you to disable the test (which you did, which is why CI passes now).

But please check that the queries are still compatbile!

@jackschu jackschu force-pushed the fix-new-template-primary branch 2 times, most recently from fc20d18 to 361637e Compare May 28, 2025 03:24
@jackschu
Copy link
Contributor Author

Sure, I updated the commit format.

As for queries being compatible, I think we're good because grammar tests pass and my ~600 AST conversion tests from https://github.com/jackschu/ESTree-sitter tests also pass. But if there are specific queries you think might be negatively impacted let me know.

This does change the classification of arrow_function (was a primary_expression, is now an expression) and new_expression (was an expression, is now a primary_expression), so some queries will get different results, but I think this is correct.

@jackschu jackschu force-pushed the fix-new-template-primary branch from 361637e to e47b6e4 Compare May 28, 2025 03:36
@clason
Copy link

clason commented May 28, 2025

As long as the old queries are still valid, that's ok. We've had PRs that change the grammar in a way that they break without thinking to adapt the reference queries here (because they use their own).

@jackschu
Copy link
Contributor Author

Yeah I dont think this could make any queries invalid but for future PRs where are these reference queries?

@clason
Copy link

clason commented May 29, 2025

https://github.com/tree-sitter/tree-sitter-javascript/tree/master/queries

@jackschu jackschu force-pushed the fix-new-template-primary branch 2 times, most recently from e47d0fe to 454814e Compare June 8, 2025 03:31
…sions. arrow_functions are now expressions rather than primary_expressions and new_expressions are now primary_expressions.
@jackschu jackschu force-pushed the fix-new-template-primary branch from 454814e to 462d182 Compare June 8, 2025 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants