Skip to content

Commit 621ac95

Browse files
authored
fix(core): timeout due to node inactivity instead of total load test time (#2530)
Changes: - Timeout is now smarter. Instead of waiting a fixed amount of time (e.g. 10 minutes) for the whole load test to happen, which is a bit unpredictable, the load test waits at most 1 minute (configurable) of no-updates from the node. This way it's less machine dependent and more based on responsiveness. - load-test-ci.json is fixed to be similar to perf-ci.json, but in prague and with the system smart contracts from l1-dev.json deployed. - logs are re-added. - Readme si fixed. - Re-add flamegraph reporter to CI so they are generated on every push. Closes #2522
1 parent ce76f69 commit 621ac95

File tree

5 files changed

+1163
-591
lines changed

5 files changed

+1163
-591
lines changed

.github/scripts/flamegraph_watcher.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ cargo build --release --manifest-path ./cmd/load_test/Cargo.toml
1616

1717
echo "Starting load test"
1818
start_time=$(date +%s)
19-
RUST_BACKTRACE=1 ./target/release/load_test -k ./test_data/private_keys.txt -t eth-transfers -N 1000 -n http://localhost:1729 -w 5 >/dev/null
19+
RUST_BACKTRACE=1 ./target/release/load_test -k ./test_data/private_keys.txt -t eth-transfers -N 1000 -n http://localhost:1729 -w 1
2020
end_time=$(date +%s)
2121

2222
elapsed=$((end_time - start_time))

.github/workflows/main_flamegraph_report.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ permissions:
66
id-token: write
77

88
on:
9-
# push:
10-
# branches: ["main"]
9+
push:
10+
branches: ["main"]
1111
workflow_dispatch:
1212

1313
env:
@@ -237,7 +237,7 @@ jobs:
237237
cargo build --release --bin ethrex --features dev
238238
CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph -c "record -o perf.data -F997 --call-graph dwarf,16384 -g" \
239239
--bin ethrex --release --features dev -- \
240-
--dev --network /home/runner/work/ethrex/ethrex/test_data/genesis-perf-ci.json --http.port 1729 >/dev/null &
240+
--dev --network /home/runner/work/ethrex/ethrex/test_data/genesis-load-test.json --http.port 1729 &
241241
sleep 10
242242
echo "Executing load test..."
243243
bash /home/runner/work/ethrex/ethrex/.github/scripts/flamegraph_watcher.sh &&
@@ -352,10 +352,10 @@ jobs:
352352
rm -rf target/profiling/reth
353353
cargo build --bin reth --profile profiling
354354
CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph -c "record -o perf.data -F997 --call-graph dwarf,16384 -g" --bin reth --profile profiling -- \
355-
node --chain /home/runner/work/ethrex/ethrex/test_data/genesis-perf-ci.json --dev \
355+
node --chain /home/runner/work/ethrex/ethrex/test_data/genesis-load-test.json --dev \
356356
--dev.block-time 1000ms --http.port 1729 --txpool.max-pending-txns 100000000 --txpool.max-new-txns 1000000000 \
357357
--txpool.pending-max-count 100000000 --txpool.pending-max-size 10000000000 --txpool.basefee-max-count 100000000000 \
358-
--txpool.basefee-max-size 1000000000000 --txpool.queued-max-count 1000000000 >/dev/null &
358+
--txpool.basefee-max-size 1000000000000 --txpool.queued-max-count 1000000000 &
359359
sleep 30
360360
echo "Executing load test..."
361361
(cd /home/runner/work/ethrex/ethrex; ./.github/scripts/flamegraph_watcher.sh)

cmd/load_test/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ On some machines, this fixes the `ERROR axum::serve::listener: accept error: Too
3131
To run a load test, first run the node using a command like the following in the root folder:
3232

3333
```bash
34-
cargo run --bin ethrex --release --features dev -- --network test_data/genesis-perf-ci.json --dev
34+
cargo run --bin ethrex --release --features dev -- --network test_data/genesis-load-test.json --dev
3535
```
3636

3737
Genesis-l2-ci has many rich accounts and does not include the prague fork, which is important for dev mode until it's fixed.
@@ -73,39 +73,39 @@ Load tests are usually used to get performance metrics. We usually want to gener
7373
To produce a flamegraph, run the node in the following way.
7474

7575
```bash
76-
cargo flamegraph --root --bin ethrex --release --features dev -- --network test_data/genesis-perf-ci.json --dev
76+
cargo flamegraph --root --bin ethrex --release --features dev -- --network test_data/genesis-load-test.json --dev
7777
```
7878

7979
The "root" command is only needed for mac. It can be removed if running on linux.
8080

8181
For a samply report, run the following:
8282

8383
```bash
84-
samply record cargo run --bin ethrex --release --features dev -- --network test_data/genesis-perf-ci.json --dev
84+
samply record cargo run --bin ethrex --release --features dev -- --network test_data/genesis-load-test.json --dev
8585
```
8686

8787
## Interacting with reth
8888

8989
The same load test can be run, the only difference is how you run the node:
9090

9191
```bash
92-
cargo run --release -- node --chain <path_to_ethrex>/test_data/genesis-perf-ci.json --dev --dev.block-time 5000ms --http.port 8545 --txpool.max-pending-txns 100000000 --txpool.max-new-txns 1000000000 --txpool.pending-max-count 100000000 --txpool.pending-max-size 10000000000 --txpool.basefee-max-count 100000000000 --txpool.basefee-max-size 1000000000000 --txpool.queued-max-count 1000000000
92+
cargo run --release -- node --chain <path_to_ethrex>/test_data/genesis-load-test.json --dev --dev.block-time 5000ms --http.port 8545 --txpool.max-pending-txns 100000000 --txpool.max-new-txns 1000000000 --txpool.pending-max-count 100000000 --txpool.pending-max-size 10000000000 --txpool.basefee-max-count 100000000000 --txpool.basefee-max-size 1000000000000 --txpool.queued-max-count 1000000000
9393
```
9494

9595
All of the txpool parameters are to make sure that it doesn't discard transactions sent by the load test. Trhoughput measurements in the logs are typically near 1Gigagas/second. To remove the database before getting measurements again:
9696

9797
```bash
98-
cargo run --release -- db --chain <path_to_ethrex>/test_data/genesis-perf-ci.json drop -f
98+
cargo run --release -- db --chain <path_to_ethrex>/test_data/genesis-load-test.json drop -f
9999
```
100100

101101
To get a flamegraph of its execution, run with the same parameters, just replace `cargo run --release` with `cargo flamegraph --bin reth --profiling`:
102102

103103
```bash
104-
cargo flamegraph --bin reth --root --profiling -- node --chain ~/workspace/ethrex/test_data/genesis-perf-ci.json --dev --dev.block-time 5000ms --http.port 8545 --txpool.max-pending-txns 100000000 --txpool.max-new-txns 1000000000 --txpool.pending-max-count 100000000 --txpool.pending-max-size 10000000000 --txpool.basefee-max-count 100000000000 --txpool.basefee-max-size 1000000000000 --txpool.queued-max-count 1000000000
104+
cargo flamegraph --bin reth --root --profiling -- node --chain ~/workspace/ethrex/test_data/genesis-load-test.json --dev --dev.block-time 5000ms --http.port 8545 --txpool.max-pending-txns 100000000 --txpool.max-new-txns 1000000000 --txpool.pending-max-count 100000000 --txpool.pending-max-size 10000000000 --txpool.basefee-max-count 100000000000 --txpool.basefee-max-size 1000000000000 --txpool.queued-max-count 1000000000
105105
```
106106

107107
For samply we want to directly execute the binary, so that it records the binary and not cargo itself:
108108

109109
```bash
110-
samply record ./target/profiling/reth node --chain ~/workspace/ethrex/test_data/genesis-perf-ci.json --dev --dev.block-time 5000ms --http.port 8545 --txpool.max-pending-txns 100000000 --txpool.max-new-txns 1000000000 --txpool.pending-max-count 100000000 --txpool.pending-max-size 10000000000 --txpool.basefee-max-count 100000000000 --txpool.basefee-max-size 1000000000000 --txpool.queued-max-count 1000000000
110+
samply record ./target/profiling/reth node --chain ~/workspace/ethrex/test_data/genesis-load-test.json --dev --dev.block-time 5000ms --http.port 8545 --txpool.max-pending-txns 100000000 --txpool.max-new-txns 1000000000 --txpool.pending-max-count 100000000 --txpool.pending-max-size 10000000000 --txpool.basefee-max-count 100000000000 --txpool.basefee-max-size 1000000000000 --txpool.queued-max-count 1000000000
111111
```

cmd/load_test/src/main.rs

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ struct Cli {
6262
long,
6363
short = 'w',
6464
default_value_t = 0,
65-
help = "Timeout to wait for all transactions to be included. If 0 is specified, wait indefinitely."
65+
help = "Timeout in minutes. If the node doesn't provide updates in this time, it's considered stuck and the load test fails. If 0 is specified, the load test will wait indefinitely."
6666
)]
6767
wait: u64,
6868
}
@@ -248,13 +248,11 @@ async fn load_test(
248248
let client = client.clone();
249249
sleep(Duration::from_micros(800)).await;
250250
let _sent = client.send_eip1559_transaction(&tx, &sk).await?;
251-
println!(
252-
"Tx number {} sent! From: {}. To: {}",
253-
nonce + i + 1,
254-
encoded_src,
255-
dst.encode_hex::<String>()
256-
);
257251
}
252+
println!(
253+
"{} transactions have been sent for {}",
254+
tx_amount, encoded_src
255+
);
258256
Ok::<(), EthClientError>(())
259257
});
260258
}
@@ -268,18 +266,18 @@ async fn load_test(
268266
// Waits until the nonce of each account has reached the tx_amount.
269267
async fn wait_until_all_included(
270268
client: EthClient,
271-
wait: Option<Duration>,
269+
timeout: Option<Duration>,
272270
accounts: &[Account],
273271
tx_amount: u64,
274272
) -> Result<(), String> {
275-
let start_time = tokio::time::Instant::now();
276-
277273
for (_, sk) in accounts {
278274
let client = client.clone();
279275
let src = get_address_from_secret_key(sk).expect("Failed to get address from secret key");
280276
let encoded_src: String = src.encode_hex();
277+
let mut last_updated = tokio::time::Instant::now();
278+
let mut last_nonce = 0;
279+
281280
loop {
282-
let elapsed = start_time.elapsed();
283281
let nonce = client.get_nonce(src, BlockByNumber::Latest).await.unwrap();
284282
if nonce >= tx_amount {
285283
println!(
@@ -289,14 +287,23 @@ async fn wait_until_all_included(
289287
break;
290288
} else {
291289
println!(
292-
"Waiting for transactions to be included from {}. Nonce: {}. Needs: {}. Percentage: {:2}%. Elapsed time: {}s.",
293-
encoded_src, nonce, tx_amount, (nonce as f64 / tx_amount as f64) * 100.0, elapsed.as_secs()
290+
"Waiting for transactions to be included from {}. Nonce: {}. Needs: {}. Percentage: {:2}%.",
291+
encoded_src, nonce, tx_amount, (nonce as f64 / tx_amount as f64) * 100.0
294292
);
295293
}
296294

297-
if let Some(wait) = wait {
298-
if elapsed > wait {
299-
return Err("Timeout reached for transactions to be included".to_string());
295+
if let Some(timeout) = timeout {
296+
if last_nonce == nonce {
297+
let inactivity_time = last_updated.elapsed();
298+
if inactivity_time > timeout {
299+
return Err(format!(
300+
"Node inactive for {} seconds. Timeout reached.",
301+
inactivity_time.as_secs()
302+
));
303+
}
304+
} else {
305+
last_nonce = nonce;
306+
last_updated = tokio::time::Instant::now();
300307
}
301308
}
302309

@@ -379,7 +386,7 @@ async fn main() {
379386
};
380387

381388
println!(
382-
"Starting load test with {} transactions per account",
389+
"Starting load test with {} transactions per account...",
383390
cli.tx_amount
384391
);
385392
let time_now = tokio::time::Instant::now();

0 commit comments

Comments
 (0)