`convert_error` assumes fixed-width characters #1201

SPuntte · 2020-10-05T16:56:28Z

Hi,

a nom and Rust newbie here. Thanks for developing such a great library 🙏

Recently, I have been tinkering with parsing a file format containing tab characters. It just so happens that tabs can make convert_error output really confusing.

A minimal example (using nom 6.0.0-alpha3),

use nom::{
    bytes::complete::tag,
    character::complete::space1,
    error::{convert_error, ParseError, VerboseError},
    sequence::delimited,
    Err, IResult,
};

fn foobar<'a, E: ParseError<&'a str>>(input: &'a str) -> IResult<&'a str, &'a str, E> {
    delimited(tag("foo"), space1, tag("bar"))(input)
}

fn main() {
    let input = "foo\tbaz";

    match foobar::<VerboseError<&str>>(input) {
        Err(Err::Error(e)) => {
            println!("{}", convert_error(input, e));
        }
        _ => {}
    }
}

produces (YMMV, depending on terminal configuration)

0: at line 1, in Tag:
foo     baz
    ^

That is, convert_error assumes every output character (including tabs) to be one column wide.

This particular example isn't too bad yet. Given a more complex sequence of tokens after a tab character, you can probably see how misleading it can get, though.

Sure, I'd like my use case to "just work". However, looking at how the problem is solved in, say rustc¹, it seems somewhat nontrivial. As a newcomer I really can't tell if a typical user needs nom to provide high-quality, human-readable error traces at all. Maybe you can tell me if it's something you want to see nom handling in the future?

IMHO, it would be nice to at least mention this assumption in the documentation and/or examples.

As said, I'm not too confident of a rustacean yet but I would like to get into contributing if this is something I can help with :)

^[1]
See NonNarrowChar, Loc, source_map::SourceMap::lookup_char_pos in rustc_span.
↩

The text was updated successfully, but these errors were encountered:

Geal · 2020-10-24T09:55:09Z

it would be good to have convert_error support this use case, but I'm not sure I have the bandwidth to work on this? Could you send a PR?

SPuntte · 2020-11-02T20:04:57Z

Hey, thanks for the reply. I will try hacking something together. However, being busy with other things™ too, it might take me a while.

Since last time, I did a little bit of research discovering codespan-reporting and annotate-snippets both of which have, unsurprisingly, encountered effectively the exact same issue already. They also seem to solve the problem of error reporting pretty well. Thus, I infer that it makes little sense for nom to delve too deep into that problem space, as someone using nom as a parser in a more complex application will likely set up user-facing error reporting using such library anyway, avoiding convert_error altogether.

For simpler use cases, such as mine, I still see the value of the convert_error trace. With my current knowledge, I would suggest nom to take the rustc route, replacing tabs in the input with a fixed amount of spaces. Alternatively, one could try something more sophisticated, e.g. via unicode-width, but that would add a dependency and yet it wouldn't be a 100% solution.

Any thoughts or comments are warmly welcome.

Geal added this to the 6.0 milestone Oct 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`convert_error` assumes fixed-width characters #1201

`convert_error` assumes fixed-width characters #1201

SPuntte commented Oct 5, 2020

Geal commented Oct 24, 2020

SPuntte commented Nov 2, 2020 •

edited

Loading

convert_error assumes fixed-width characters #1201

convert_error assumes fixed-width characters #1201

Comments

SPuntte commented Oct 5, 2020

Geal commented Oct 24, 2020

SPuntte commented Nov 2, 2020 • edited Loading

`convert_error` assumes fixed-width characters #1201

`convert_error` assumes fixed-width characters #1201

SPuntte commented Nov 2, 2020 •

edited

Loading