Skip to content

Conversation

wolfmanx
Copy link

@wolfmanx wolfmanx commented Aug 7, 2025

As reported in issue #4, the tokenizer does not follow the POSIX rules for the shell. Since a frequent use case for the tokenizer module is to reuse commands from shell scripts, it is unfortunate, if the tokenizer behaves differently.

I have added extensive comments for better comprehension, but they can of course be removed.

I have also modified the newline test case, so that npm run test does not fail.

There is a test case that constructs a string of all characters in the ASCII code range 1-127 escaped, which is then supplied as unquoted, double and single quoted argument to argsTokenizer. The results are checked against the actual output from /bin/sh.

That test fails against the unmodified version of args-tokenizer and passes with the proposed modifications.

@wolfmanx
Copy link
Author

wolfmanx commented Aug 9, 2025

I have constructed an extensive test with all characters in the range 0-255:

let argument = `\\\0x00\\\0x01 ... \\\0xFF`;

The output is tested against expected results constructed according to POSIX standard for

  • unescaping of unquoted argument tokenizeArgs(argument),
  • unescaping of double quoted argument tokenizeArgs('"' + argument + '"'),
  • unescaping of single quoted argument tokenizeArgs("'" + argument + "'").

The same test is performed with all characters in the range 1-127 against actual output from /bin/sh.

await result = x("/bin/sh", ["-c", "echo " + argument]);  // unquoted
await result = x("/bin/sh", ["-c", "echo " + '"' + argument + '"']); // double quoted
await result = x("/bin/sh", ["-c", "echo " + "'" + argument + "'"]); // single quoted

This should demonstrate the correctness of the modification convincingly enough.

@wolfmanx wolfmanx force-pushed the posix-compliant-escaping branch from d4e1458 to b86e4c4 Compare August 10, 2025 21:13
@wolfmanx
Copy link
Author

The result of the escape test against the main branch shows the problems with escaped characters in double and single quoted strings:

   ✓ all escaped characters outside quoting context (POSIX)
   × all escaped characters in double quoting context (POSIX)
   × all escaped characters in single quoting context (POSIX)
   ✓ all escaped characters outside quoting context (/bin/sh)
   × all escaped characters in double quoting context (/bin/sh)
   × all escaped characters in single quoting context (/bin/sh)

⎯⎯⎯⎯⎯⎯⎯ Failed Tests 4 ⎯⎯⎯⎯⎯⎯⎯

 FAIL  src/args-tokenizer.test.ts > all escaped characters in double quoting context (POSIX)
 FAIL  src/args-tokenizer.test.ts > all escaped characters in double quoting context (/bin/sh)

  --------------------------------------------------
- \^A 1
+ ^A 1
- \^B 2
+ ^B 2
- \^C 3
+ ^C 3
- \^D 4
+ ^D 4

[...]

- \{ 123
+ { 123
- \| 124
+ | 124
- \} 125
+ } 125
- \~ 126
+ ~ 126
- \\x7F 127
+ \x7F 127

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[1/4]⎯

 FAIL  src/args-tokenizer.test.ts > all escaped characters in single quoting context (POSIX)
 FAIL  src/args-tokenizer.test.ts > all escaped characters in single quoting context (/bin/sh)
Error: Unexpected end of string. Closing quote is missing.

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[2/4]⎯

 Test Files  1 failed (1)
      Tests  4 failed | 18 passed (22)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant