Remove the need of NodeJS for generating parsers #465

XVilka · 2019-10-18T16:39:05Z

There are no things that JavaScript can do that can't be done with Rust. It makes possible the next logical step - get rid of NodeJS for generating parsers from the description as well.

There is a number of JavaScript parsers in Rust:

maxbrunsfeld · 2019-10-18T17:11:31Z

It'd be nice to avoid depending on external programs, but in order to evaluate grammar.js files, we need a full JavaScript runtime, not just a parser. Grammars can also use require to load external files, so the runtime needs to be compatible with Node.js in that way.

We could embed a V8 or Spidermonkey interpreter inside of the Tree-sitter CLI, but it seems like overkill to me. Node.js is very easy to install and many users already have it. In fact, I think most people use node.js (and npm) to install tree-sitter.

thestr4ng3r · 2019-10-18T17:24:15Z

Duktape can potentially be used as a lightweight runtime.

turbo · 2019-10-18T17:30:18Z

Throwing this in here, too:

Duktape can potentially be used as a lightweight runtime.

Or bellard's QuickJS, which is almost twice as fast as Duktape even after a recent round of security fixes. Plus the API is nice (subjectively).

thestr4ng3r · 2019-10-18T17:35:03Z

I think the greatest problem with depending on nodejs is that any project that doesn't ship a pre-generated parser in their source, but generates it on the fly as part of their build system also has to depend on nodejs.
In our case, this is not acceptable. We would have to always keep an already generated parser in our source tree as a workaround, which is bad practice.

maxbrunsfeld · 2019-10-18T19:02:31Z

Thanks for the explanation. Personally, I think it's fine to check generated code into a source repository, as it is platform-agnostic, and it makes it easier to use the project. But if you're ok with running tree-sitter generate as part of your build, that seems like a reasonable approach.

Just so I understand your use case - why are you ok with your build depending on the tree-sitter binary but not the node binary? Would you prefer to swap out Node for some other executable JavaScript interpreter like QuickJS, DukTape, V7, Rhino, or SpiderMonkey?

Currently, Tree-sitter just looks for node on your PATH and passes JavaScript to it via stdin. We do rely on process.env and require currently, but we could probably remove the dependence on those particular APIs.

XVilka · 2019-10-19T03:33:39Z

Well, to my taste I don't understand why do you need the full power of JavaScript, why not just use a limited subset of it? So a parser in Rust can be written and no need for JavaScript interpreter whatsoever.

maxbrunsfeld · 2019-10-19T04:28:33Z

Grammars define helper functions, call methods like Array.map, and .filter, merge objects using the spread operator, etc.

JavaScript is the most widely used programming language in the world. There’s just no reason to create some custom subset with its own interpreter.

thestr4ng3r · 2019-10-19T09:54:21Z

Well popularity doesn't mean it's good 😜

Just so I understand your use case - why are you ok with your build depending on the tree-sitter binary but not the node binary? Would you prefer to swap out Node for some other executable JavaScript interpreter like QuickJS, DukTape, V7, Rhino, or SpiderMonkey?

The tree-sitter binary will be required in any case, but depending on nodejs just for reading in the grammar seems way too much to me. If you say you do need full javascript, I think such a small and embedded interpreter, which would then be part of the tree-sitter binary, could be a good compromise.

turbo · 2019-10-19T14:43:40Z

No need to devolve into JS bashing. FWIW I agree that in general, I like programs to be self-contained, but I can also understand that this is the lowest of low priorities.

I guess the better question would be: If someone were to do the work of integrating one of the smaller JS engines into tree-sitter, would that work be accepted?

Arcanemagus · 2019-10-21T17:02:19Z

I guess the better question would be: If someone were to do the work of integrating one of the smaller JS engines into tree-sitter, would that work be accepted?

The issue I see with this is later on when somebody not familiar with this discussion comes in and starts working on a Tree-sitter grammar, sees that they can use JavaScript files and builds it with that... only to use some JavaScript feature that the tiny custom parser doesn't understand. How are they supposed to know that it's a problem in the miniature parser and not with their code? How are they supposed to fix that?

I don't see the benefit of removing the Node.js dependency here, without entirely removing the ability to use JavaScript.

maxbrunsfeld · 2019-10-21T17:05:36Z

Yeah, I think Node.js needs to remain the default. I am open to generalizing the grammar-evaluating code path so that you can use an alternative JavaScript interpreter if you want. I don't think I'm interested in embedding the JavaScript interpreter into the tree-sitter binary. The vast majority of users will want to use Node.js, so it seems like unnecessary bloat.

maxbrunsfeld · 2019-11-05T17:46:39Z

Are folks interested in this option: allowing Tree-sitter to use different (and possibly lighter-weight) executables besides node as the JavaScript engine? If not, I think we should close this out.

turbo · 2019-11-05T18:02:10Z

use an alternative JavaScript interpreter if you want

Just the choice would be great. I for example use deno for JS scripting (https://github.com/denoland/deno) and not node.

thestr4ng3r · 2019-11-05T18:38:30Z

Yes, I think that could be a good option.

maxbrunsfeld · 2019-11-05T18:41:24Z

Ok, sounds good; I will leave this open, and if anyone wants to tweak the javascript evaluation logic to not depend on Node-specific APIs, that'd be great.

jurov · 2019-12-12T20:59:22Z

I don't understand, if you need to generate grammar at runtime, why not directly generate the JSON format instead?

mingodad · 2020-03-21T21:28:49Z

Just did a test with quickjs with small changes to dsl.js/grammar.js and joining then the end result is identical to the output of nodejs.

/* commenting this in dsl.js
global.alias = alias;
global.blank = blank;
global.choice = choice;
global.optional = optional;
global.prec = prec;
global.repeat = repeat;
global.repeat1 = repeat1;
global.seq = seq;
global.sym = sym;
global.token = token;
global.grammar = grammar;
global.field = field;

const result = require(process.env.TREE_SITTER_GRAMMAR_PATH);
console.log(JSON.stringify(result, null, 2));
*/

//module.exports = grammar({...}); //replacing this in grammar.js
console.log(JSON.stringify(grammar({...}));

Himujjal · 2021-01-31T07:15:55Z

What are the limitations of quickjs that prevents it to be default javascript engine for the purpose of Tree-sitter? AFAIK, everything is synchronous. File operations are already available in QuickJS or could be easily put to use using FFI. Performance isnt a concern either in this case.

So, would a QuickJS engine based pull request be accepted?

With QuickJS, tree-sitter binary could easily be published to OS package managers without the NodeJS dependency.

carueda · 2021-02-04T22:05:44Z

Great discussion here. I have found tree-sitter really amazing, however, the dependency on node (besides the tree-sitter binary itself) is practically a showstopper if I wanted to integrate the code generation as part of my build, which is for an embedded system, basically within the context of a typical gcc/makefile workflow where there's no javascript/node involved whatsoever. An alternative strategy -as mentioned above already- is to generate such code in a separate project whose artifacts would then be included basically as external library dependencies; yes, feasible, but very far from ideal imo.

maxbrunsfeld · 2021-02-04T22:14:00Z

I would like to keep Node.js as the default because for most users, it will be by far the most familiar and least confusing. Error messages and built-in APIs will likely be different from engine to engine.

But, expanding on what I said earlier, I'd be open to pull requests adding either of:

A compile time feature that lets you build in a different JavaScript engine
A runtime feature that lets you specify a different binary as the javascript interpreter.

The second option would probably be the most flexible. It may take a little design work to decide exactly how Tree-sitter should invoke the JavaScript interpreter and receive the JSON output.

razzeee · 2021-03-03T17:49:27Z

It's way easier to write these in javascript/typescript. It get's converted into a json right after that.
The json looks like this https://github.com/elm-tooling/tree-sitter-elm/blob/master/src/grammar.json
I guess you could skip the js part in theory if you wanted. Still I would be surprised, if you could even maintain a grammar like that.

gwerbin · 2021-11-26T01:15:55Z

Perhaps JavaScript is problematic because it's not easy to embed, and its runtimes are generally quite "heavy", even though grammars use only a small subset of their features.

Whereas something like Lua is designed to be embedded.

There is a system for embedding NodeJS in C++ applications, but I don't know if that can then be used from other languages like Rust, Go, etc. https://nodejs.org/api/embedding.html

If the grammar ultimately gets emitted as JSON, then maybe Tree Sitter should expose a "grammar builder" API, which could be used from any language with an FFI. Then you can generate grammars in Node, Python, Haskell, whatever you want.

kalkin · 2021-11-26T08:05:25Z

Lua would definitely be a way to go, but as long there is no code the whole discussion is useless

maxbrunsfeld · 2021-11-26T21:24:55Z

👋 Hi everyone, this thread is getting a lot of drive-by comments that aren't very relevant to the specific problem that we're discussing. I'd like to lock this issue in order to reduce unnecessary notifications for myself and other Tree-sitter maintainers.

I'll leave the discussion open for another day or two, as a courtesy to anyone who feels they have important feedback to add. And I'll reiterate few points of my own:

Today, Node.js is not required in any way if you are using a Tree-sitter parser in your application. It is only required if you are re-generating a parser from the grammar. Typically, this is something you do only during development of your grammar.
Even if you do need to regenerate a parser again later, you can still avoid needing Node.js. You can run tree-sitter generate on a JSON file instead of a JavaScript file. The JSON file is much smaller than the parser.c and can be generated from the JavaScript file during development.
If we want to remove the dependency on Node.js during grammar development, we have an easy pathway towards doing that, which won't require much change to the JavaScript-based grammar DSL. I've outlined that above.
There are many JavaScript engines to choose from. Most OS's ship one by default, as part of their web view system.
JavaScript is the world's most commonly-known programming language by far. And grammar development does not need to be embedded in Tree-sitter users' applications. So there's no reason for us to switch the grammar DSL to use a different language.

If (and only if) you have exact, concrete use-cases for the Tree-sitter CLI and Node.js is a problem for you, and you don't think the above proposal can solve your problem, please continue to comment. Thanks!

sogaiu · 2023-01-05T08:23:29Z

Specifically regarding point 3 (runtime configurability of alternate js runtime) in this comment, below is a draft that lead to success locally.

Is this close to what was referred to in point 3?

(Possibly as a side-benefit this kind of thing might address #1686 as well.)

Tried patching mod.rs like this:

diff --git a/cli/src/generate/mod.rs b/cli/src/generate/mod.rs
index 4838828b..053ffbb0 100644
--- a/cli/src/generate/mod.rs
+++ b/cli/src/generate/mod.rs
@@ -25,6 +25,7 @@ use std::fs;
 use std::io::Write;
 use std::path::{Path, PathBuf};
 use std::process::{Command, Stdio};
+use std::env;
 
 lazy_static! {
     static ref JSON_COMMENT_REGEX: Regex = RegexBuilder::new("^\\s*//.*")
@@ -168,12 +169,16 @@ pub fn load_grammar_file(grammar_path: &Path) -> Result<String> {
 
 fn load_js_grammar_file(grammar_path: &Path) -> Result<String> {
     let grammar_path = fs::canonicalize(grammar_path)?;
-    let mut node_process = Command::new("node")
+    let js_runtime_path = match env::var("TREE_SITTER_JS_RUNTIME_PATH") {
+        Ok(s) => s,
+        Err(_) => "node".to_string(),
+    };
+    let mut node_process = Command::new(js_runtime_path)
         .env("TREE_SITTER_GRAMMAR_PATH", grammar_path)
         .stdin(Stdio::piped())
         .stdout(Stdio::piped())
         .spawn()
-        .expect("Failed to run `node`");
+        .expect("Failed to run js runtime");  // XXX: yeah not right
 
     let mut node_stdin = node_process
         .stdin

To test on a *nix machine:

Ensure a local installation of quickjs (see here for availability)
Build patched tree-sitter cli via cargo build --release or other appropriate method (will use later so save path somewhere -- typically it's target/release/tree-sitter under the tree-sitter repository root)
Fetch some tree-sitter grammar locally (e.g. via git clone)
Place make-grammar-json.js and quickjs-wrapper.sh (make executable) in grammar repository root (see bottom of post for .zip files for these files)
Set TREE_SITTER_JS_RUNTIME_PATH environment variable to point at quickjs-wrapper.sh
Move src directory out of the way
Invoke tree-sitter generate (using custom tree-sitter cli binary built earlier)

Success should yield a src directory with appropriate content.

simple .js file - make-grammar-json.zip
wrapper invocation script - quickjs-wrapper.zip

resolves tree-sitter#465 resolves tree-sitter#1686

jakesarjeant · 2023-07-18T11:38:33Z

I've implemented the latter approach from point 3 in a more complete manner in this PR. I'd love to hear whether or not this satisfies the concerns discussed here and/or whether anyone thinks further changes are necessary!

tgross35 · 2024-01-09T08:46:19Z

Can this be closed since #2403 is merged, or are the maintainers interested in an embedded option still?

If anyone is testing around, the best option at this time is probably deno_core https://docs.rs/deno_core/0.244.0/deno_core/. Deno is a pretty well proven alternative to node at this point, Rust runtime around the v8 engine. Kind of perfect for this use case.

The downside is the binary would probably 4x in size, but that may be an acceptable tradeoff.

Thaodan · 2024-07-14T02:47:19Z

How does this goal interact with the memory usage required to generate parser?

I'm packaging the parsers tree-sittter of tree-sitter grammars to be used in text editors for openSUSE. To ensure that the parser is compatible and to not trust the generated I generate the parser on each built. The argument that you just need to generate parsers during development doesn't add up to me as the chain of dependencies on what treesitter was used to generate the parser shouldn't be with the upstream from my point of view.

issue-label-bot bot added the feature_request label Oct 18, 2019

razzeee mentioned this issue Feb 8, 2021

Remove dependency on Node mpizenberg/elm-test-rs#37

Closed

ahlinc mentioned this issue Feb 21, 2021

Tree-sitter 1.0 Checklist #930

Open

29 tasks

XVilka mentioned this issue Mar 3, 2021

Update Vim's Syntax highlinging rust-lang/rust.vim#326

Open

ahlinc added the parser-generation label Aug 22, 2021

sogaiu mentioned this issue Dec 29, 2022

Update tree-sitter-cli to 0.20.6 sogaiu/tree-sitter-clojure#26

Closed

jakesarjeant added a commit to jakesarjeant/tree-sitter that referenced this issue Jul 18, 2023

feat(cli): add option to select JS runtime other than node

b8cf030

resolves tree-sitter#465 resolves tree-sitter#1686

jakesarjeant mentioned this issue Jul 18, 2023

feat(cli): add option to select JS runtime other than node #2403

Merged

tree-sitter deleted a comment from issue-label-bot bot Feb 7, 2024

dundargoc added enhancement Feature request and removed feature_request labels Feb 8, 2024

ObserverOfTime added this to the 1.0 milestone May 7, 2024

amaanq linked a pull request Oct 13, 2024 that will close this issue

feat: remove the need of an external JS runtime for processing grammars #3780

Draft

7 tasks

ObserverOfTime modified the milestones: 1.0, 0.26 Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the need of NodeJS for generating parsers #465

Remove the need of NodeJS for generating parsers #465

XVilka commented Oct 18, 2019 •

edited

Loading

maxbrunsfeld commented Oct 18, 2019

thestr4ng3r commented Oct 18, 2019

turbo commented Oct 18, 2019

thestr4ng3r commented Oct 18, 2019

maxbrunsfeld commented Oct 18, 2019

XVilka commented Oct 19, 2019

maxbrunsfeld commented Oct 19, 2019

thestr4ng3r commented Oct 19, 2019

turbo commented Oct 19, 2019 •

edited

Loading

Arcanemagus commented Oct 21, 2019

maxbrunsfeld commented Oct 21, 2019

maxbrunsfeld commented Nov 5, 2019

turbo commented Nov 5, 2019

thestr4ng3r commented Nov 5, 2019

maxbrunsfeld commented Nov 5, 2019

jurov commented Dec 12, 2019

mingodad commented Mar 21, 2020

Himujjal commented Jan 31, 2021 •

edited

Loading

carueda commented Feb 4, 2021

maxbrunsfeld commented Feb 4, 2021

razzeee commented Mar 3, 2021

gwerbin commented Nov 26, 2021 •

edited

Loading

kalkin commented Nov 26, 2021

maxbrunsfeld commented Nov 26, 2021

sogaiu commented Jan 5, 2023 •

edited

Loading

jakesarjeant commented Jul 18, 2023

tgross35 commented Jan 9, 2024

Thaodan commented Jul 14, 2024

Remove the need of NodeJS for generating parsers #465

Remove the need of NodeJS for generating parsers #465

Comments

XVilka commented Oct 18, 2019 • edited Loading

maxbrunsfeld commented Oct 18, 2019

thestr4ng3r commented Oct 18, 2019

turbo commented Oct 18, 2019

thestr4ng3r commented Oct 18, 2019

maxbrunsfeld commented Oct 18, 2019

XVilka commented Oct 19, 2019

maxbrunsfeld commented Oct 19, 2019

thestr4ng3r commented Oct 19, 2019

turbo commented Oct 19, 2019 • edited Loading

Arcanemagus commented Oct 21, 2019

maxbrunsfeld commented Oct 21, 2019

maxbrunsfeld commented Nov 5, 2019

turbo commented Nov 5, 2019

thestr4ng3r commented Nov 5, 2019

maxbrunsfeld commented Nov 5, 2019

jurov commented Dec 12, 2019

mingodad commented Mar 21, 2020

Himujjal commented Jan 31, 2021 • edited Loading

carueda commented Feb 4, 2021

maxbrunsfeld commented Feb 4, 2021

razzeee commented Mar 3, 2021

gwerbin commented Nov 26, 2021 • edited Loading

kalkin commented Nov 26, 2021

maxbrunsfeld commented Nov 26, 2021

sogaiu commented Jan 5, 2023 • edited Loading

jakesarjeant commented Jul 18, 2023

tgross35 commented Jan 9, 2024

Thaodan commented Jul 14, 2024

XVilka commented Oct 18, 2019 •

edited

Loading

turbo commented Oct 19, 2019 •

edited

Loading

Himujjal commented Jan 31, 2021 •

edited

Loading

gwerbin commented Nov 26, 2021 •

edited

Loading

sogaiu commented Jan 5, 2023 •

edited

Loading