ObiectumTokenizer

Just a small library for tokenizing strings.

Features

Line and column numbers.
Custom special characters - they're always counted as separate tokens (if not inside a string).
Custom comments - both single-line and multi-line.
Strings - a string is always a monolithic token, wrapped in quotes.
UTF-8 is the only supported encoding.

Example

obtokenizer_tokenizer_t tokenizer;
if (obtokenizer_init(&tokenizer, "abc /* comment №1 */ def,123 // №2") ||
    obtokenizer_add_spec_char(&tokenizer, ',')                   || // Count commas as separate tokens.
    obscanner_add_comment_mark(&tokenizer.scanner, false, "//")  || // Enable C-style single-line comments.
    obscanner_add_comment_mark(&tokenizer.scanner, true,  "/*")  || // Enable C-style multi-
    obscanner_add_comment_mark(&tokenizer.scanner, true,  "*/")     // line comments.
    ) {
    // error
}

obtokenizer_token_t token;
while (!obtokenizer_get(&tokenizer, &token)) {
    if (token.str[0] == '\0') {
        // No more tokens.
        obtokenizer_free_token(&token);
        break;
    }

    printf("%d:%d: %s\n", token.pos.line, token.pos.col, token.str);

    obtokenizer_free_token(&token); // Must be called before each reuse of a token structure.
}

Output:

$ ./test
1:1: abc
1:22: def
1:25: ,
1:26: 123

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ObiectumTokenizer

Features

Example

Files

README.md

Latest commit

History

README.md

File metadata and controls

ObiectumTokenizer

Features

Example