Skip to content

flowdev/semtool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

semtool

Tool for semantic actions on any language or file format driven by PEG grammars.

TODOs

  1. Implement a simple parser for a PEG-like grammar with comb.
  2. Get search file definition for SemGrep/OpenGrep.
  3. Define a set of keywords/rules.
  4. Implement parser for search files with comb.
  5. Refine own PEG grammar parser.
  6. Implement AST comparison.
  7. Implement repacements with Go text templates.

Own Syntax

  • = is used for rule assignment.
  • / is used for alternatives (FirstSuccessful).
  • Separating multiple rules with space is used for sequences (Sequence).
  • * and + are used for repetitions (Many0 and Many1).
  • ,* and ,+ are used for lists (Separated0 and Separated1) withOUT parsing a separator at the end.
  • ;* and ;+ are used for lists (Separated0 and Separated1) WITH optional parsing of a separator at the end.
  • ? is used for optional parsers (Optional).
  • -> is used for parsing until another parser matches (Until).
  • . is used for any character or byte in case of a binary parser (either AnyChar or AnyByte).
  • ' and " are used for string literals.
  • ( and ) are used for grouping.
  • ! is used for negative lookahead.
  • & is used for positive lookahead.
  • [ and ] are used for character classes.
  • In string literals ANSI escape sequences are supported and so are \377, \xabcdef and \u00abcdef for octal, hex and Unicode.
  • Comments start with # and continue to the end of the line.
  • Whitespace is ignored.

Predifined rules

  • EOF parses the end of the input.
  • EOL parses the end of a line ('\r, '\n' or '\r\n').
  • FLOAT parses a floating point number (without a sign).
  • INTEGER parses an integer number (without a sign).
  • SPACE parses any amount of Unicode whitespace (including none).
  • MUST_SPACE parses Unicode whitespace (at least one character).
  • NAME parses a name (a Unicode letter followed by zero or more Unicode letters, Unicode digits or underscores).

Predefined character classes

  • ALPHA parses a Unicode letter.
  • DIGIT parses a Unicode digit or number.
  • WORD is ALPHA and DIGIT combined into a single class.

Rules a user has to define

  • GRAMMAR is the root rule of a grammar.
  • VARIABLE parses a variable name (e.g. '$' NAME) in code snippets for searching or replacing. A variable can stand for any syntactically valid subtree of the current parse tree (AST).
  • PLACEHOLDER parses a placeholder (e.g. '_' or '$PLACEHOLDER') in code snippets for searching. Like a variable, a placeholder can stand for any syntactically valid subtree of the current parse tree (AST). But a placeholder can't be referenced later. So it can't be used in replacements.
  • BINARY is more a variable than a rule. It can only be set to true or false. false is the default value. So it can be omitted.

About

Tool for semantic actions on any language or file format driven by PEG grammars.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages