Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_error assumes fixed-width characters #1201

Open
SPuntte opened this issue Oct 5, 2020 · 2 comments
Open

convert_error assumes fixed-width characters #1201

SPuntte opened this issue Oct 5, 2020 · 2 comments
Milestone

Comments

@SPuntte
Copy link

SPuntte commented Oct 5, 2020

Hi,

a nom and Rust newbie here. Thanks for developing such a great library 🙏

Recently, I have been tinkering with parsing a file format containing tab characters. It just so happens that tabs can make convert_error output really confusing.

A minimal example (using nom 6.0.0-alpha3),

use nom::{
    bytes::complete::tag,
    character::complete::space1,
    error::{convert_error, ParseError, VerboseError},
    sequence::delimited,
    Err, IResult,
};

fn foobar<'a, E: ParseError<&'a str>>(input: &'a str) -> IResult<&'a str, &'a str, E> {
    delimited(tag("foo"), space1, tag("bar"))(input)
}

fn main() {
    let input = "foo\tbaz";

    match foobar::<VerboseError<&str>>(input) {
        Err(Err::Error(e)) => {
            println!("{}", convert_error(input, e));
        }
        _ => {}
    }
}

produces (YMMV, depending on terminal configuration)

0: at line 1, in Tag:
foo     baz
    ^

That is, convert_error assumes every output character (including tabs) to be one column wide.

This particular example isn't too bad yet. Given a more complex sequence of tokens after a tab character, you can probably see how misleading it can get, though.

Sure, I'd like my use case to "just work". However, looking at how the problem is solved in, say rustc1, it seems somewhat nontrivial. As a newcomer I really can't tell if a typical user needs nom to provide high-quality, human-readable error traces at all. Maybe you can tell me if it's something you want to see nom handling in the future?

IMHO, it would be nice to at least mention this assumption in the documentation and/or examples.

As said, I'm not too confident of a rustacean yet but I would like to get into contributing if this is something I can help with :)


[1]
See NonNarrowChar, Loc, source_map::SourceMap::lookup_char_pos in rustc_span.

@Geal
Copy link
Collaborator

Geal commented Oct 24, 2020

it would be good to have convert_error support this use case, but I'm not sure I have the bandwidth to work on this? Could you send a PR?

@Geal Geal added this to the 6.0 milestone Oct 24, 2020
@SPuntte
Copy link
Author

SPuntte commented Nov 2, 2020

Hey, thanks for the reply. I will try hacking something together. However, being busy with other things™ too, it might take me a while.

Since last time, I did a little bit of research discovering codespan-reporting and annotate-snippets both of which have, unsurprisingly, encountered effectively the exact same issue already. They also seem to solve the problem of error reporting pretty well. Thus, I infer that it makes little sense for nom to delve too deep into that problem space, as someone using nom as a parser in a more complex application will likely set up user-facing error reporting using such library anyway, avoiding convert_error altogether.

For simpler use cases, such as mine, I still see the value of the convert_error trace. With my current knowledge, I would suggest nom to take the rustc route, replacing tabs in the input with a fixed amount of spaces. Alternatively, one could try something more sophisticated, e.g. via unicode-width, but that would add a dependency and yet it wouldn't be a 100% solution.

Any thoughts or comments are warmly welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants