-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the need of NodeJS for generating parsers #465
Comments
It'd be nice to avoid depending on external programs, but in order to evaluate We could embed a V8 or Spidermonkey interpreter inside of the Tree-sitter CLI, but it seems like overkill to me. Node.js is very easy to install and many users already have it. In fact, I think most people use node.js (and |
Duktape can potentially be used as a lightweight runtime. |
Throwing this in here, too:
Or bellard's QuickJS, which is almost twice as fast as Duktape even after a recent round of security fixes. Plus the API is nice (subjectively). |
I think the greatest problem with depending on nodejs is that any project that doesn't ship a pre-generated parser in their source, but generates it on the fly as part of their build system also has to depend on nodejs. |
Thanks for the explanation. Personally, I think it's fine to check generated code into a source repository, as it is platform-agnostic, and it makes it easier to use the project. But if you're ok with running Just so I understand your use case - why are you ok with your build depending on the Currently, Tree-sitter just looks for |
Well, to my taste I don't understand why do you need the full power of JavaScript, why not just use a limited subset of it? So a parser in Rust can be written and no need for JavaScript interpreter whatsoever. |
Grammars define helper functions, call methods like Array.map, and .filter, merge objects using the spread operator, etc. JavaScript is the most widely used programming language in the world. There’s just no reason to create some custom subset with its own interpreter. |
Well popularity doesn't mean it's good 😜
The |
No need to devolve into JS bashing. FWIW I agree that in general, I like programs to be self-contained, but I can also understand that this is the lowest of low priorities. I guess the better question would be: If someone were to do the work of integrating one of the smaller JS engines into tree-sitter, would that work be accepted? |
The issue I see with this is later on when somebody not familiar with this discussion comes in and starts working on a Tree-sitter grammar, sees that they can use JavaScript files and builds it with that... only to use some JavaScript feature that the tiny custom parser doesn't understand. How are they supposed to know that it's a problem in the miniature parser and not with their code? How are they supposed to fix that? I don't see the benefit of removing the Node.js dependency here, without entirely removing the ability to use JavaScript. |
Yeah, I think Node.js needs to remain the default. I am open to generalizing the grammar-evaluating code path so that you can use an alternative JavaScript interpreter if you want. I don't think I'm interested in embedding the JavaScript interpreter into the |
Are folks interested in this option: allowing Tree-sitter to use different (and possibly lighter-weight) executables besides |
Just the choice would be great. I for example use deno for JS scripting (https://github.com/denoland/deno) and not node. |
Yes, I think that could be a good option. |
Ok, sounds good; I will leave this open, and if anyone wants to tweak the javascript evaluation logic to not depend on Node-specific APIs, that'd be great. |
I don't understand, if you need to generate grammar at runtime, why not directly generate the JSON format instead? |
Just did a test with quickjs with small changes to dsl.js/grammar.js and joining then the end result is identical to the output of nodejs.
|
What are the limitations of quickjs that prevents it to be default javascript engine for the purpose of Tree-sitter? AFAIK, everything is synchronous. File operations are already available in QuickJS or could be easily put to use using FFI. Performance isnt a concern either in this case. So, would a QuickJS engine based pull request be accepted? With QuickJS, tree-sitter binary could easily be published to OS package managers without the NodeJS dependency. |
Great discussion here. I have found tree-sitter really amazing, however, the dependency on node (besides the tree-sitter binary itself) is practically a showstopper if I wanted to integrate the code generation as part of my build, which is for an embedded system, basically within the context of a typical gcc/makefile workflow where there's no javascript/node involved whatsoever. An alternative strategy -as mentioned above already- is to generate such code in a separate project whose artifacts would then be included basically as external library dependencies; yes, feasible, but very far from ideal imo. |
I would like to keep Node.js as the default because for most users, it will be by far the most familiar and least confusing. Error messages and built-in APIs will likely be different from engine to engine. But, expanding on what I said earlier, I'd be open to pull requests adding either of:
The second option would probably be the most flexible. It may take a little design work to decide exactly how Tree-sitter should invoke the JavaScript interpreter and receive the JSON output. |
It's way easier to write these in javascript/typescript. It get's converted into a json right after that. |
Perhaps JavaScript is problematic because it's not easy to embed, and its runtimes are generally quite "heavy", even though grammars use only a small subset of their features. Whereas something like Lua is designed to be embedded. There is a system for embedding NodeJS in C++ applications, but I don't know if that can then be used from other languages like Rust, Go, etc. https://nodejs.org/api/embedding.html If the grammar ultimately gets emitted as JSON, then maybe Tree Sitter should expose a "grammar builder" API, which could be used from any language with an FFI. Then you can generate grammars in Node, Python, Haskell, whatever you want. |
Lua would definitely be a way to go, but as long there is no code the whole discussion is useless |
👋 Hi everyone, this thread is getting a lot of drive-by comments that aren't very relevant to the specific problem that we're discussing. I'd like to lock this issue in order to reduce unnecessary notifications for myself and other Tree-sitter maintainers. I'll leave the discussion open for another day or two, as a courtesy to anyone who feels they have important feedback to add. And I'll reiterate few points of my own:
If (and only if) you have exact, concrete use-cases for the Tree-sitter CLI and Node.js is a problem for you, and you don't think the above proposal can solve your problem, please continue to comment. Thanks! |
Specifically regarding point 3 (runtime configurability of alternate js runtime) in this comment, below is a draft that lead to success locally. Is this close to what was referred to in point 3? (Possibly as a side-benefit this kind of thing might address #1686 as well.) Tried patching diff --git a/cli/src/generate/mod.rs b/cli/src/generate/mod.rs
index 4838828b..053ffbb0 100644
--- a/cli/src/generate/mod.rs
+++ b/cli/src/generate/mod.rs
@@ -25,6 +25,7 @@ use std::fs;
use std::io::Write;
use std::path::{Path, PathBuf};
use std::process::{Command, Stdio};
+use std::env;
lazy_static! {
static ref JSON_COMMENT_REGEX: Regex = RegexBuilder::new("^\\s*//.*")
@@ -168,12 +169,16 @@ pub fn load_grammar_file(grammar_path: &Path) -> Result<String> {
fn load_js_grammar_file(grammar_path: &Path) -> Result<String> {
let grammar_path = fs::canonicalize(grammar_path)?;
- let mut node_process = Command::new("node")
+ let js_runtime_path = match env::var("TREE_SITTER_JS_RUNTIME_PATH") {
+ Ok(s) => s,
+ Err(_) => "node".to_string(),
+ };
+ let mut node_process = Command::new(js_runtime_path)
.env("TREE_SITTER_GRAMMAR_PATH", grammar_path)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
- .expect("Failed to run `node`");
+ .expect("Failed to run js runtime"); // XXX: yeah not right
let mut node_stdin = node_process
.stdin To test on a *nix machine:
Success should yield a simple |
I've implemented the latter approach from point 3 in a more complete manner in this PR. I'd love to hear whether or not this satisfies the concerns discussed here and/or whether anyone thinks further changes are necessary! |
Can this be closed since #2403 is merged, or are the maintainers interested in an embedded option still? If anyone is testing around, the best option at this time is probably The downside is the binary would probably 4x in size, but that may be an acceptable tradeoff. |
How does this goal interact with the memory usage required to generate parser? I'm packaging the parsers tree-sittter of tree-sitter grammars to be used in text editors for openSUSE. To ensure that the parser is compatible and to not trust the generated I generate the parser on each built. The argument that you just need to generate parsers during development doesn't add up to me as the chain of dependencies on what treesitter was used to generate the parser shouldn't be with the upstream from my point of view. |
There are no things that JavaScript can do that can't be done with Rust. It makes possible the next logical step - get rid of NodeJS for generating parsers from the description as well.
There is a number of JavaScript parsers in Rust:
The text was updated successfully, but these errors were encountered: