Skip to content

Commit 0b9eeb1

Browse files
committed
Micro-optimize code size of /proc/maps parsing code
The current code size is really wastefully large. Originally, it was 1500 lines of assembly in Godbolt, now I reduced it to just under 800. The effect of `.text` size in hello world is from 297028 to 295453 which is small but not completely irrelevant. It's just a small fish in the bigger pond of DWARF parsing, but it's better than nothing. I extracted the parsing of each component into a separate function to allow for better sharing. I replaced the string methods with manual iteration since that results in simpler code because it has to handle fewer cases. I also had to use unsafe because the bounds checks were sadly not optimized out and were really large. I also made the parser less resilient against whitespace, now it no longer handles Unicode whitespace (an obvious simplification) but also no longer handles any whitespace except the normal SP. I think this is fine, it seems highly unlikely that a system would suddenly use another type of whitespace (but I guess not impossible?).
1 parent 311a1d0 commit 0b9eeb1

File tree

1 file changed

+48
-13
lines changed

1 file changed

+48
-13
lines changed

src/symbolize/gimli/parse_running_mmaps_unix.rs

Lines changed: 48 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,17 @@ pub(super) fn parse_maps() -> Result<Vec<MapsEntry>, &'static str> {
5252
.map_err(|_| "Couldn't read /proc/self/maps")?;
5353

5454
let mut v = Vec::new();
55-
for line in buf.lines() {
55+
let mut buf = buf.as_str();
56+
while let Some(match_idx) = buf.bytes().position(|b| b == b'\n') {
57+
// Unsafe is unfortunately necessary to get the bounds check removed (for code size).
58+
59+
// SAFETY: match_idx is the position of the newline, so it must be valid.
60+
let line = unsafe { buf.get_unchecked(..match_idx) };
61+
5662
v.push(line.parse()?);
63+
64+
// SAFETY: match_idx is the position of the newline, so the byte after it must be valid.
65+
buf = unsafe { buf.get_unchecked((match_idx + 1)..) };
5766
}
5867

5968
Ok(v)
@@ -81,46 +90,73 @@ impl FromStr for MapsEntry {
8190
// e.g.: "ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]"
8291
// e.g.: "7f5985f46000-7f5985f48000 rw-p 00039000 103:06 76021795 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2"
8392
// e.g.: "35b1a21000-35b1a22000 rw-p 00000000 00:00 0"
84-
//
85-
// Note that paths may contain spaces, so we can't use `str::split` for parsing (until
86-
// Split::remainder is stabilized #77998).
8793
fn from_str(s: &str) -> Result<Self, Self::Err> {
88-
fn error(msg: &str) -> &str {
94+
// While there are nicer standard library APIs available for this, we aim for minimal code size.
95+
96+
let mut state = s;
97+
98+
fn parse_start<'a>(state: &mut &'a str) -> &'a str {
99+
// Unsafe is unfortunately necessary to get the bounds check removed (for code size).
100+
101+
let start_idx = state.bytes().position(|b| b != b' ');
102+
if let Some(start_idx) = start_idx {
103+
// SAFETY: It comes from position, so it's in bounds.
104+
// It must be on a UTF-8 boundary as it's the first byte that isn't ' '.
105+
*state = unsafe { state.get_unchecked(start_idx..) };
106+
}
107+
let match_idx = state.bytes().position(|b| b == b' ');
108+
match match_idx {
109+
None => {
110+
let result = *state;
111+
*state = "";
112+
result
113+
}
114+
Some(match_idx) => {
115+
// SAFETY: match_index comes from .bytes().position() of an ASCII character,
116+
// so it's both in bounds and a UTF-8 boundary
117+
let result = unsafe { state.get_unchecked(..match_idx) };
118+
// SAFETY: Since match_idx is the ' ', there must be at least the end after it.
119+
*state = unsafe { state.get_unchecked((match_idx + 1)..) };
120+
result
121+
}
122+
}
123+
}
124+
125+
fn error(msg: &str) -> &str {
89126
if cfg!(debug_assertions) {
90127
msg
91128
} else {
92129
"invalid map entry"
93130
}
94131
}
95132

96-
97-
let (range_str, s) = s.trim_start().split_once(' ').unwrap_or((s, ""));
133+
let range_str = parse_start(&mut state);
98134
if range_str.is_empty() {
99135
return Err(error("Couldn't find address"));
100136
}
101137

102-
let (perms_str, s) = s.trim_start().split_once(' ').unwrap_or((s, ""));
138+
let perms_str = parse_start(&mut state);
103139
if perms_str.is_empty() {
104140
return Err(error("Couldn't find permissions"));
105141
}
106142

107-
let (offset_str, s) = s.trim_start().split_once(' ').unwrap_or((s, ""));
143+
let offset_str = parse_start(&mut state);
108144
if offset_str.is_empty() {
109145
return Err(error("Couldn't find offset"));
110146
}
111147

112-
let (dev_str, s) = s.trim_start().split_once(' ').unwrap_or((s, ""));
148+
let dev_str = parse_start(&mut state);
113149
if dev_str.is_empty() {
114150
return Err(error("Couldn't find dev"));
115151
}
116152

117-
let (inode_str, s) = s.trim_start().split_once(' ').unwrap_or((s, ""));
153+
let inode_str = parse_start(&mut state);
118154
if inode_str.is_empty() {
119155
return Err(error("Couldn't find inode"));
120156
}
121157

122158
// Pathname may be omitted in which case it will be empty
123-
let pathname_str = s.trim_start();
159+
let pathname_str = state.trim_ascii_start();
124160

125161
let hex = |s| usize::from_str_radix(s, 16).map_err(|_| error("Couldn't parse hex number"));
126162
let hex64 = |s| u64::from_str_radix(s, 16).map_err(|_| error("Couldn't parse hex number"));
@@ -130,7 +166,6 @@ impl FromStr for MapsEntry {
130166
} else {
131167
return Err(error("Couldn't parse address range"));
132168
};
133-
134169
let offset = hex64(offset_str)?;
135170
let pathname = pathname_str.into();
136171

0 commit comments

Comments
 (0)