Skip to content

Commit ca96865

Browse files
author
bors-servo
committed
Auto merge of #68 - antrik:optimise-buffers, r=pcwalton
Optimisations (and cleanups) of Linux backend This is a bunch of optimisations to the Linux platform code, along with various cleanups of the related code which the optimisation patches build upon. Most of the cleanups are on the `send()` side, as the `recv()` side is less affected by the optimisation changes, and thus there has been less reason to refactor the code -- some extra cleanup work would probably be in order here. The optimisations are mostly about avoiding unnecessary copies by using scatter-gather buffers for send and receive; as well as avoiding unnecessary initialisation of receive buffers. The results are impressive: with gains of at least 5x for large transfers of several MiB (a bit more on a modern system); >5x (on an old system) up to >10x (on a modern one) for small transfers of up to a few KiB; and more than 10x for most of the range in between -- peaking at about 12x - 13x on the old system and 20x - 21x on the modern system for medium-sized transfers of about 64 KiB up to a few hundred KiB. For another interesting data point, the CPU usage during benchmark runs (with many iterations, to amortise the setup time) was dominated by user time (more than two thirds of total time) with the original variant; whereas the optimised variant not only further reduces system time to less then half the original value (presumably because of fewer allocations?), but also almost entirely eliminates the user time, making it pretty insignificant in the total picture now -- as it should be. On a less scientific note, Servo built with the optimised ipc-channel doesn't seem to show undue delays any more while rendering a language selection menu. (Which requires lots of fonts to be loaded, and thus triggers heavy ipc-channel activity.)
2 parents 10bed82 + 7c2466e commit ca96865

File tree

7 files changed

+545
-251
lines changed

7 files changed

+545
-251
lines changed

Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,12 @@ path = "lib.rs"
88

99
[dependencies]
1010
bincode = ">=0.4.1, <0.6"
11-
byteorder = "0.5"
1211
lazy_static = "0.2"
1312
libc = "0.2"
1413
rand = "0.3"
1514
serde = ">=0.6, <0.8"
1615
serde_macros = ">=0.6, <0.8"
1716
uuid = { version = "0.2", features = ["v4"] }
17+
18+
[dev-dependencies]
19+
crossbeam = "0.2"

benches/bench.rs

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
#![feature(test)]
2+
3+
extern crate crossbeam;
4+
extern crate ipc_channel;
5+
extern crate test;
6+
7+
use ipc_channel::platform;
8+
9+
use std::sync::{mpsc, Mutex};
10+
11+
/// Allows doing multiple inner iterations per bench.iter() run.
12+
///
13+
/// This is mostly to amortise the overhead of spawning a thread in the benchmark
14+
/// when sending larger messages (that might be fragmented).
15+
///
16+
/// Note that you need to compensate the displayed results
17+
/// for the proportionally longer runs yourself,
18+
/// as the benchmark framework doesn't know about the inner iterations...
19+
const ITERATIONS: usize = 1;
20+
21+
fn bench_size(b: &mut test::Bencher, size: usize) {
22+
let data: Vec<u8> = (0..size).map(|i| (i % 251) as u8).collect();
23+
let (tx, rx) = platform::channel().unwrap();
24+
25+
let (wait_tx, wait_rx) = mpsc::channel();
26+
let wait_rx = Mutex::new(wait_rx);
27+
28+
if size > platform::OsIpcSender::get_max_fragment_size() {
29+
b.iter(|| {
30+
crossbeam::scope(|scope| {
31+
scope.spawn(|| {
32+
let wait_rx = wait_rx.lock().unwrap();
33+
for _ in 0..ITERATIONS {
34+
tx.send(&data, vec![], vec![]).unwrap();
35+
if ITERATIONS > 1 {
36+
// Prevent beginning of the next send
37+
// from overlapping with receive of last fragment,
38+
// as otherwise results of runs with a large tail fragment
39+
// are significantly skewed.
40+
wait_rx.recv().unwrap();
41+
}
42+
}
43+
});
44+
for _ in 0..ITERATIONS {
45+
rx.recv().unwrap();
46+
if ITERATIONS > 1 {
47+
wait_tx.send(()).unwrap();
48+
}
49+
}
50+
// For reasons mysterious to me,
51+
// not returning a value *from every branch*
52+
// adds some 100 ns or so of overhead to all results --
53+
// which is quite significant for very short tests...
54+
0
55+
})
56+
});
57+
} else {
58+
b.iter(|| {
59+
for _ in 0..ITERATIONS {
60+
tx.send(&data, vec![], vec![]).unwrap();
61+
rx.recv().unwrap();
62+
}
63+
0
64+
});
65+
}
66+
}
67+
68+
#[bench]
69+
fn size_00_1(b: &mut test::Bencher) {
70+
bench_size(b, 1);
71+
}
72+
#[bench]
73+
fn size_01_2(b: &mut test::Bencher) {
74+
bench_size(b, 2);
75+
}
76+
#[bench]
77+
fn size_02_4(b: &mut test::Bencher) {
78+
bench_size(b, 4);
79+
}
80+
#[bench]
81+
fn size_03_8(b: &mut test::Bencher) {
82+
bench_size(b, 8);
83+
}
84+
#[bench]
85+
fn size_04_16(b: &mut test::Bencher) {
86+
bench_size(b, 16);
87+
}
88+
#[bench]
89+
fn size_05_32(b: &mut test::Bencher) {
90+
bench_size(b, 32);
91+
}
92+
#[bench]
93+
fn size_06_64(b: &mut test::Bencher) {
94+
bench_size(b, 64);
95+
}
96+
#[bench]
97+
fn size_07_128(b: &mut test::Bencher) {
98+
bench_size(b, 128);
99+
}
100+
#[bench]
101+
fn size_08_256(b: &mut test::Bencher) {
102+
bench_size(b, 256);
103+
}
104+
#[bench]
105+
fn size_09_512(b: &mut test::Bencher) {
106+
bench_size(b, 512);
107+
}
108+
#[bench]
109+
fn size_10_1k(b: &mut test::Bencher) {
110+
bench_size(b, 1 * 1024);
111+
}
112+
#[bench]
113+
fn size_11_2k(b: &mut test::Bencher) {
114+
bench_size(b, 2 * 1024);
115+
}
116+
#[bench]
117+
fn size_12_4k(b: &mut test::Bencher) {
118+
bench_size(b, 4 * 1024);
119+
}
120+
#[bench]
121+
fn size_13_8k(b: &mut test::Bencher) {
122+
bench_size(b, 8 * 1024);
123+
}
124+
#[bench]
125+
fn size_14_16k(b: &mut test::Bencher) {
126+
bench_size(b, 16 * 1024);
127+
}
128+
#[bench]
129+
fn size_15_32k(b: &mut test::Bencher) {
130+
bench_size(b, 32 * 1024);
131+
}
132+
#[bench]
133+
fn size_16_64k(b: &mut test::Bencher) {
134+
bench_size(b, 64 * 1024);
135+
}
136+
#[bench]
137+
fn size_17_128k(b: &mut test::Bencher) {
138+
bench_size(b, 128 * 1024);
139+
}
140+
#[bench]
141+
fn size_18_256k(b: &mut test::Bencher) {
142+
bench_size(b, 256 * 1024);
143+
}
144+
#[bench]
145+
fn size_19_512k(b: &mut test::Bencher) {
146+
bench_size(b, 512 * 1024);
147+
}
148+
#[bench]
149+
fn size_20_1m(b: &mut test::Bencher) {
150+
bench_size(b, 1 * 1024 * 1024);
151+
}
152+
#[bench]
153+
fn size_21_2m(b: &mut test::Bencher) {
154+
bench_size(b, 2 * 1024 * 1024);
155+
}
156+
#[bench]
157+
fn size_22_4m(b: &mut test::Bencher) {
158+
bench_size(b, 4 * 1024 * 1024);
159+
}
160+
#[bench]
161+
fn size_23_8m(b: &mut test::Bencher) {
162+
bench_size(b, 8 * 1024 * 1024);
163+
}

lib.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
extern crate lazy_static;
1717

1818
extern crate bincode;
19-
extern crate byteorder;
2019
extern crate libc;
2120
extern crate rand;
2221
extern crate serde;

platform/inprocess/mod.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ use std::fmt::{self, Debug, Formatter};
1818
use std::cmp::{PartialEq};
1919
use std::ops::Deref;
2020
use std::mem;
21+
use std::usize;
2122

2223
use uuid::Uuid;
2324

@@ -148,6 +149,10 @@ impl MpscSender {
148149
Ok(record.sender)
149150
}
150151

152+
pub fn get_max_fragment_size() -> usize {
153+
usize::MAX
154+
}
155+
151156
pub fn send(&self,
152157
data: &[u8],
153158
ports: Vec<MpscChannel>,

0 commit comments

Comments
 (0)