Skip to content

Improve speed of fmt::Debug for str and char #28662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 3, 2015
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions src/libcore/char.rs
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,7 @@ impl CharExt for char {
'\t' => EscapeDefaultState::Backslash('t'),
'\r' => EscapeDefaultState::Backslash('r'),
'\n' => EscapeDefaultState::Backslash('n'),
'\\' => EscapeDefaultState::Backslash('\\'),
'\'' => EscapeDefaultState::Backslash('\''),
'"' => EscapeDefaultState::Backslash('"'),
'\\' | '\'' | '"' => EscapeDefaultState::Backslash(self),
'\x20' ... '\x7e' => EscapeDefaultState::Char(self),
_ => EscapeDefaultState::Unicode(self.escape_unicode())
};
Expand Down Expand Up @@ -380,4 +378,13 @@ impl Iterator for EscapeDefault {
EscapeDefaultState::Unicode(ref mut iter) => iter.next()
}
}

fn size_hint(&self) -> (usize, Option<usize>) {
match self.state {
EscapeDefaultState::Char(_) => (1, Some(1)),
EscapeDefaultState::Backslash(_) => (2, Some(2)),
EscapeDefaultState::Unicode(_) => (0, Some(10)),
_ => (0, Some(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have EscapeDefaultState::Done here?
It would make it easier to understand why (0, Some(0)) is the correct return value and in the unlikely case that the enum is extended it would promptly point out the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks for pointing out.

}
}
}
23 changes: 16 additions & 7 deletions src/libcore/fmt/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1310,11 +1310,21 @@ impl Display for bool {
#[stable(feature = "rust1", since = "1.0.0")]
impl Debug for str {
fn fmt(&self, f: &mut Formatter) -> Result {
try!(write!(f, "\""));
for c in self.chars().flat_map(|c| c.escape_default()) {
try!(f.write_char(c))
try!(f.write_char('"'));
let mut from = 0;
for (i, c) in self.char_indices() {
let esc = c.escape_default();
// If char needs escaping, flush backlog so far and write, else skip
if esc.size_hint().0 != 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be best to check the upper bound here against Some(1) because that may be a better signal that only this one character will be emitted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy about this unofficial communication through the size hint. A method on the iterator itself would be much cleaner, only problem is that it needs to be some degree of public (but unstable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton Do you mean esc.size_hint().1 != Some(1) or more explicit esc.size_hint() != (1, Some(1))?

As @bluss I would prefer if we could add method to iterator (well, to EscapeDefault actually). That might communicate intention clearer and be simpler in implementation than size_hint.

Or maybe just revert to 025ca11 to avoid unnecessary construction of iterator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@semmaz ah right, indeed I do mean that! I also think that a check against (1, Some(1)) would make it more explicit here. @bluss how do you feel about checking the whole size hint return value? That seems relatively more palatable to me at least.

And yeah the point of leveraging size_hint would be to avoid expansion of the API surface area for a function that's unlikely to ever be stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way of checking the size_hint is as good as any other IMO. I just prefer an explicit method, maybe a static method on EscapeDefault to make the connection quite clear. I don't see it as impossible to be part of a public api down the line, it might quite useful as we can see here to know if a char is default-escaped or not. But it's not the goal here, it should be unstable, and it's pub just so that we can use it cross modules.

A static method is a good API level since it ties the behavior tightly to the EscapeDefault iterator, without having to create the iterator to ask the escape yes / no question, a question that doesn't make very much sense on the stateful iterator value itself (because the iterator "forgets" the input char, depending on its state).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton Updated.
I agree with @bluss. I'm leaning more towards using it more directly (CharExt), but can understand concern about unclear ties and API.

I'm fine if it merges as it is right now. Although I really think that having less taxing way to query if char needs escape would be useful. Question is, should those methods be in scope of this PR?

try!(f.write_str(&self[from..i]));
for c in esc {
try!(f.write_char(c));
}
from = i + c.len_utf8();
}
}
write!(f, "\"")
try!(f.write_str(&self[from..]));
f.write_char('"')
}
}

Expand All @@ -1328,12 +1338,11 @@ impl Display for str {
#[stable(feature = "rust1", since = "1.0.0")]
impl Debug for char {
fn fmt(&self, f: &mut Formatter) -> Result {
use char::CharExt;
try!(write!(f, "'"));
try!(f.write_char('\''));
for c in self.escape_default() {
try!(f.write_char(c))
}
write!(f, "'")
f.write_char('\'')
}
}

Expand Down
4 changes: 4 additions & 0 deletions src/test/run-pass/ifmt.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@ pub fn main() {
t!(format!("{:?}", 10_usize), "10");
t!(format!("{:?}", "true"), "\"true\"");
t!(format!("{:?}", "foo\nbar"), "\"foo\\nbar\"");
t!(format!("{:?}", "foo\n\"bar\"\r\n\'baz\'\t\\qux\\"),
r#""foo\n\"bar\"\r\n\'baz\'\t\\qux\\""#);
t!(format!("{:?}", "foo\0bar\x01baz\u{3b1}q\u{75}x"),
r#""foo\u{0}bar\u{1}baz\u{3b1}qux""#);
t!(format!("{:o}", 10_usize), "12");
t!(format!("{:x}", 10_usize), "a");
t!(format!("{:X}", 10_usize), "A");
Expand Down