Skip to content

Commit 2ac9808

Browse files
committed
perf(jetsocat): increase the maximum JMUX message size
This has almost no effect on the throughput when there is a significant delay, but the throughput is improved when the delay is very small to non-inexistant. The main benefit is a reduced CPU-usage. 1. Benchmark results before this patch a. With 50ms delay on loopback 1 connection: [ 1] 0.0000-600.4197 sec 16.1 GBytes 230 Mbits/sec 2 connections: [ 1] 0.0000-605.0387 sec 8.19 GBytes 116 Mbits/sec [ 2] 0.0000-605.1395 sec 8.19 GBytes 116 Mbits/sec [SUM] 0.0000-605.1395 sec 16.4 GBytes 233 Mbits/sec 10 connections: [ 3] 0.0000-625.7966 sec 1.69 GBytes 23.2 Mbits/sec [ 8] 0.0000-625.9956 sec 1.69 GBytes 23.2 Mbits/sec [ 1] 0.0000-626.0966 sec 1.69 GBytes 23.2 Mbits/sec [ 5] 0.0000-626.0964 sec 1.69 GBytes 23.2 Mbits/sec [ 2] 0.0000-626.1983 sec 1.69 GBytes 23.2 Mbits/sec [ 7] 0.0000-626.1964 sec 1.69 GBytes 23.2 Mbits/sec [ 6] 0.0000-626.1964 sec 1.69 GBytes 23.2 Mbits/sec [ 9] 0.0000-626.1981 sec 1.69 GBytes 23.2 Mbits/sec [ 10] 0.0000-626.2973 sec 1.69 GBytes 23.2 Mbits/sec [ 4] 0.0000-626.3984 sec 1.69 GBytes 23.2 Mbits/sec [SUM] 0.0000-626.3986 sec 16.9 GBytes 232 Mbits/sec b. Without delay 1 connection: [ 1] 0.0000-600.0518 sec 1.33 TBytes 19.4 Gbits/sec 2 connections: [ 2] 0.0000-600.0706 sec 681 GBytes 9.75 Gbits/sec [ 1] 0.0000-600.0705 sec 681 GBytes 9.75 Gbits/sec [SUM] 0.0000-600.0705 sec 1.33 TBytes 19.5 Gbits/sec 10 connections: [ 3] 0.0000-600.3608 sec 112 GBytes 1.60 Gbits/sec [ 5] 0.0000-600.3606 sec 112 GBytes 1.60 Gbits/sec [ 6] 0.0000-600.3605 sec 112 GBytes 1.60 Gbits/sec [ 8] 0.0000-600.3598 sec 112 GBytes 1.60 Gbits/sec [ 7] 0.0000-600.3594 sec 112 GBytes 1.60 Gbits/sec [ 1] 0.0000-600.3606 sec 112 GBytes 1.60 Gbits/sec [ 9] 0.0000-600.3597 sec 112 GBytes 1.60 Gbits/sec [ 10] 0.0000-600.3606 sec 112 GBytes 1.60 Gbits/sec [ 2] 0.0000-600.3602 sec 112 GBytes 1.60 Gbits/sec [ 4] 0.0000-600.3719 sec 112 GBytes 1.60 Gbits/sec [SUM] 0.0000-600.3721 sec 1.09 TBytes 16.0 Gbits/sec 2. Benchmark results after this patch a. With 50ms delay on loopback 1 connection: [ 1] 0.0000-600.4552 sec 16.1 GBytes 231 Mbits/sec 2 connections: [ 1] 0.0000-605.1600 sec 8.16 GBytes 116 Mbits/sec [ 2] 0.0000-605.1599 sec 8.16 GBytes 116 Mbits/sec [SUM] 0.0000-605.1599 sec 16.3 GBytes 232 Mbits/sec 10 connections: [ 8] 0.0000-625.8346 sec 1.69 GBytes 23.2 Mbits/sec [ 9] 0.0000-626.1828 sec 1.69 GBytes 23.2 Mbits/sec [ 2] 0.0000-626.1820 sec 1.69 GBytes 23.2 Mbits/sec [ 5] 0.0000-626.1817 sec 1.69 GBytes 23.2 Mbits/sec [ 6] 0.0000-626.1815 sec 1.69 GBytes 23.2 Mbits/sec [ 4] 0.0000-626.1827 sec 1.69 GBytes 23.2 Mbits/sec [ 3] 0.0000-626.1814 sec 1.69 GBytes 23.2 Mbits/sec [ 7] 0.0000-626.1821 sec 1.69 GBytes 23.2 Mbits/sec [ 1] 0.0000-626.2831 sec 1.69 GBytes 23.1 Mbits/sec [ 10] 0.0000-626.2819 sec 1.69 GBytes 23.1 Mbits/sec [SUM] 0.0000-626.2832 sec 16.9 GBytes 232 Mbits/sec b. Without delay 1 connection: [ 1] 0.0000-600.0402 sec 1.68 TBytes 24.6 Gbits/sec 2 connections: [ 1] 0.0000-600.0628 sec 752 GBytes 10.8 Gbits/sec [ 2] 0.0000-601.0794 sec 751 GBytes 10.7 Gbits/sec [SUM] 0.0000-601.0794 sec 1.47 TBytes 21.5 Gbits/sec 10 connections: [ 6] 0.0000-600.3015 sec 127 GBytes 1.82 Gbits/sec [ 3] 0.0000-600.3014 sec 127 GBytes 1.82 Gbits/sec [ 7] 0.0000-600.3012 sec 127 GBytes 1.82 Gbits/sec [ 5] 0.0000-600.2992 sec 127 GBytes 1.82 Gbits/sec [ 9] 0.0000-600.3014 sec 127 GBytes 1.82 Gbits/sec [ 1] 0.0000-600.3006 sec 127 GBytes 1.82 Gbits/sec [ 2] 0.0000-600.3601 sec 127 GBytes 1.82 Gbits/sec [ 10] 0.0000-600.3592 sec 127 GBytes 1.82 Gbits/sec [ 8] 0.0000-600.3604 sec 127 GBytes 1.82 Gbits/sec [ 4] 0.0000-600.3586 sec 127 GBytes 1.82 Gbits/sec [SUM] 0.0000-600.3605 sec 1.24 TBytes 18.2 Gbits/sec
1 parent 73f1716 commit 2ac9808

File tree

2 files changed

+88
-7
lines changed

2 files changed

+88
-7
lines changed

crates/jmux-proxy/src/lib.rs

+7-2
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ use tokio::task::JoinHandle;
3030
use tokio_util::codec::FramedRead;
3131
use tracing::{Instrument as _, Span};
3232

33-
const MAXIMUM_PACKET_SIZE_IN_BYTES: u16 = 4 * 1024; // 4 kiB
33+
const DATA_PACKET_OVERHEAD: u16 = 8;
34+
const MAXIMUM_PACKET_SIZE_IN_BYTES: u16 = 8 * 1024 + DATA_PACKET_OVERHEAD; // 8 kiB + packet overhead
3435
const WINDOW_ADJUSTMENT_THRESHOLD: u32 = 4 * 1024; // 4 kiB
3536

3637
pub type ApiResponseSender = oneshot::Sender<JmuxApiResponse>;
@@ -777,7 +778,11 @@ impl DataReaderTask {
777778
} = self;
778779

779780
let codec = tokio_util::codec::BytesCodec::new();
780-
let mut bytes_stream = FramedRead::new(reader, codec);
781+
let mut bytes_stream = FramedRead::with_capacity(
782+
reader,
783+
codec,
784+
usize::from(MAXIMUM_PACKET_SIZE_IN_BYTES - DATA_PACKET_OVERHEAD),
785+
);
781786
let maximum_packet_size = usize::from(maximum_packet_size);
782787

783788
trace!("Started forwarding");

docs/JMUX-proxy-performance.md

+81-5
Original file line numberDiff line numberDiff line change
@@ -70,10 +70,9 @@ iperf -c "$ADDR" -p $PORT -P 10 -t 600
7070

7171
Let’s assume the script is in a file named `run_iperf.sh`.
7272

73-
Running `iperf` for long enough is important to ensure that the buffering happening at the socket level is not influencing the numbers too much.
74-
When running less a minute, we end up measuring the rate at which `iperf` enqueue bytes into the socket’s buffer.
75-
Filling the buffer can be done very quickly and can have a significant impact on the measured average speed.
76-
10 minutes is long enough to obtain convergent results.
73+
It's important to note that `iperf` should be run for an extended period to account for the initial filling of TCP socket buffers,
74+
which can artificially inflate the average throughput if tested for less than a minute.
75+
Running `iperf` for 10 minutes is enough to ensure the results accurately reflect the effective average throughput.
7776

7877
## Applied optimizations
7978

@@ -274,7 +273,7 @@ The flow control algorithm, particularly the window size, is a critical paramete
274273
Since such delays are common in almost all practical setups, it’s safe to say that this is the most important metric to optimize.
275274

276275
Other optimizations, while beneficial, primarily serve to reduce CPU usage and increase throughput on very high-speed networks.
277-
A speed of 30 Mbits/s is already considered high, but networks with throughput exceeding 1 Gbits/s also exist.
276+
A speed of 30 Mbits/s is already considered high, but networks with throughput exceeding 1 Gbits/s also exist (e.g.: ultra-high speed local area networks).
278277
Enhancing performance for these networks is valuable, particularly in reducing CPU usage as the volume of data processed increases.
279278

280279
Measurements indicate that our JMUX proxy should perform well, even on high-speed networks.
@@ -286,3 +285,80 @@ In real-world wide-area networks, packet loss will inevitably occur.
286285

287286
Nevertheless, these results provide valuable data, confirming that our optimizations are effective with a high degree of confidence.
288287
While further optimization could be pursued to address more specific scenarios, the current implementation is likely sufficient for most practical purposes.
288+
289+
## 2025.2.0 update
290+
291+
Related patches:
292+
293+
- <https://github.com/Devolutions/devolutions-gateway/pull/974>
294+
- <https://github.com/Devolutions/devolutions-gateway/pull/979>
295+
296+
### Results
297+
298+
```shell
299+
./run_iperf.sh 5000
300+
```
301+
302+
#### With 50ms delay on loopback
303+
304+
1 connection:
305+
306+
```
307+
[ 1] 0.0000-600.4552 sec 16.1 GBytes 231 Mbits/sec
308+
```
309+
310+
2 connections:
311+
312+
```
313+
[ 1] 0.0000-605.1600 sec 8.16 GBytes 116 Mbits/sec
314+
[ 2] 0.0000-605.1599 sec 8.16 GBytes 116 Mbits/sec
315+
[SUM] 0.0000-605.1599 sec 16.3 GBytes 232 Mbits/sec
316+
```
317+
318+
10 connections:
319+
320+
```
321+
[ 8] 0.0000-625.8346 sec 1.69 GBytes 23.2 Mbits/sec
322+
[ 9] 0.0000-626.1828 sec 1.69 GBytes 23.2 Mbits/sec
323+
[ 2] 0.0000-626.1820 sec 1.69 GBytes 23.2 Mbits/sec
324+
[ 5] 0.0000-626.1817 sec 1.69 GBytes 23.2 Mbits/sec
325+
[ 6] 0.0000-626.1815 sec 1.69 GBytes 23.2 Mbits/sec
326+
[ 4] 0.0000-626.1827 sec 1.69 GBytes 23.2 Mbits/sec
327+
[ 3] 0.0000-626.1814 sec 1.69 GBytes 23.2 Mbits/sec
328+
[ 7] 0.0000-626.1821 sec 1.69 GBytes 23.2 Mbits/sec
329+
[ 1] 0.0000-626.2831 sec 1.69 GBytes 23.1 Mbits/sec
330+
[ 10] 0.0000-626.2819 sec 1.69 GBytes 23.1 Mbits/sec
331+
[SUM] 0.0000-626.2832 sec 16.9 GBytes 232 Mbits/sec
332+
```
333+
334+
#### Without delay
335+
336+
1 connection:
337+
338+
```
339+
[ 1] 0.0000-600.0402 sec 1.68 TBytes 24.6 Gbits/sec
340+
```
341+
342+
2 connections:
343+
344+
```
345+
[ 1] 0.0000-600.0628 sec 752 GBytes 10.8 Gbits/sec
346+
[ 2] 0.0000-601.0794 sec 751 GBytes 10.7 Gbits/sec
347+
[SUM] 0.0000-601.0794 sec 1.47 TBytes 21.5 Gbits/sec
348+
```
349+
350+
10 connections:
351+
352+
```
353+
[ 6] 0.0000-600.3015 sec 127 GBytes 1.82 Gbits/sec
354+
[ 3] 0.0000-600.3014 sec 127 GBytes 1.82 Gbits/sec
355+
[ 7] 0.0000-600.3012 sec 127 GBytes 1.82 Gbits/sec
356+
[ 5] 0.0000-600.2992 sec 127 GBytes 1.82 Gbits/sec
357+
[ 9] 0.0000-600.3014 sec 127 GBytes 1.82 Gbits/sec
358+
[ 1] 0.0000-600.3006 sec 127 GBytes 1.82 Gbits/sec
359+
[ 2] 0.0000-600.3601 sec 127 GBytes 1.82 Gbits/sec
360+
[ 10] 0.0000-600.3592 sec 127 GBytes 1.82 Gbits/sec
361+
[ 8] 0.0000-600.3604 sec 127 GBytes 1.82 Gbits/sec
362+
[ 4] 0.0000-600.3586 sec 127 GBytes 1.82 Gbits/sec
363+
[SUM] 0.0000-600.3605 sec 1.24 TBytes 18.2 Gbits/sec
364+
```

0 commit comments

Comments
 (0)