Expand storage for Holesky fleet #218

jakubgs · 2025-01-03T13:54:41Z

We are currently low on storage for EL nodes on nimbus.holesky fleet. Storage usage on /docker volume varies from 72% up to 96% on some nodes.

We need to:

Request extension of existing storage from InnovaHosting.
- Preferably with the same kind or at least size of NVMe.
Backup existing node data either locally, or remotely, or re-sync from scratch.
- If re-syncing is picked BNs will need additional EL while the sync happens.
Re-create a RAID0 array using HP SmartArray CLI tool.
- We don't care about data security since this can all be re-synced.
Restore node data backups or re-sync.

You can see notes on previous task like this here:

Upgrade storage for mainnet fleet #184

The text was updated successfully, but these errors were encountered:

jakubgs · 2025-01-03T13:54:49Z

Current state:

| Hostname                            | Volume  | Size | Used | Avail | Use% |
|-------------------------------------|---------|------|------|-------|------|
| erigon-01.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 994G | 398G  |  72% |
| erigon-02.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 1.1T | 361G  |  75% |
| erigon-03.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 1.1T | 333G  |  77% |
| erigon-04.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 1.1T | 336G  |  76% |
| erigon-05.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 991G | 401G  |  72% |
| erigon-06.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 991G | 402G  |  72% |
| erigon-07.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 994G | 398G  |  72% |
| erigon-08.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 989G | 403G  |  72% |
| erigon-09.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 997G | 395G  |  72% |
| erigon-10.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 948G | 444G  |  69% |
| geth-01.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 266G  |  81% |
| geth-02.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 267G  |  81% |
| geth-03.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 267G  |  81% |
| geth-04.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 267G  |  81% |
| geth-05.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-06.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-07.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-08.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-09.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-10.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 267G  |  81% |
| neth-01.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  67G  |  96% |
| neth-02.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  63G  |  96% |
| neth-03.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  62G  |  96% |
| neth-04.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  65G  |  96% |
| neth-05.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  62G  |  96% |
| neth-06.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  63G  |  96% |
| neth-07.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  64G  |  96% |
| neth-08.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  64G  |  96% |
| neth-09.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  68G  |  96% |
| neth-10.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  70G  |  96% |

yakimant · 2025-01-14T15:17:01Z

Re-sync itself can free some space too.
Might not help in a long run, but still can be beneficial to do from time to time on non-prod nodes.

markoburcul · 2025-01-15T09:09:42Z

Re-sync itself can free some space too. Might not help in a long run, but still can be beneficial to do from time to time on non-prod nodes.

Is re-sync what @jakubgs did here #219 (comment)?

yakimant · 2025-01-15T14:25:06Z

Yes, I think.
Holesky nodes should sync faster, but maybe also not so much space will be recovered.

You can read on this in geth docs:

A snap-sync'd Geth node currently requires more than 650 GB of disk space to store the historic blockchain data. With default cache size the database grows by about 14 GB/week. This means that Geth users will rapidly run out of space on 1TB hard drives. To solve this problem without needing to purchase additional hardware, Geth can be pruned. Pruning is the process of erasing older data to save disk space. Since Geth v1.10, users have been able to trigger a snapshot offline prune to bring the total storage back down to the original ~650 GB. The pruning time depends on your hardware but it can take upwards of 12 hours. This has to be done periodically to keep the total disk storage within the bounds of the local hardware (e.g. every month or so for a 1TB disk).

https://geth.ethereum.org/docs/fundamentals/pruning

Although I've never tried prune, just remove and run the node to sync from scratch.

jakubgs · 2025-01-16T17:32:50Z

That is correct, I simply purged the data folder (excluding any key files) and allowed to sync from scratch.

And yes, Holesky is way smaller than mainnet.

markoburcul · 2025-01-17T15:52:57Z

For all of the Nethermind hosts I've purged the data and started sync from scratch for two of the four nodes per host. For all of them /docker is now on around 55-60%.

markoburcul · 2025-01-21T15:15:16Z

I've did the same for geth holesky hosts, 2 nodes per host have been resynced(I've purged the node/data/geth/chaindata dir and restarted the node). This should give us enough time until the SSD's arrive.

markoburcul · 2025-01-31T11:08:32Z

Done with all nethermind hosts. The layout looks like this:

[email protected]:~ % lsblk   
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda      8:0    0 372.5G  0 disk  
├─sda1   8:1    0     1G  0 part  /boot/efi
└─sda2   8:2    0 371.5G  0 part  /docker
                                  /
sdb      8:16   0   1.5T  0 disk  /data
                                  /mnt/sdb
sdc      8:32   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
sdd      8:48   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker

[email protected]:~ % df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           6.3G  2.5M  6.3G   1% /run
/dev/sda2       365G   41G  306G  12% /
tmpfs            32G  312K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb        1.5T  825G  567G  60% /data
/dev/sda1      1022M  6.1M 1016M   1% /boot/efi
tmpfs           6.3G     0  6.3G   0% /run/user/7010
/dev/md0        2.9T  7.5G  2.8T   1% /docker

Devices sdc and sdd are combined into raid0 logical volume and mounted at /docker.

markoburcul · 2025-01-31T11:57:31Z

On geth hosts I've noticed this in geth exporter logs:

2025/01/31 11:46:08 failed to get metrics: the method debug_metrics does not exist/is not available
2025/01/31 11:46:38 failed to get metrics: the method debug_metrics does not exist/is not available

which is weird considering that the metrics it exports all state metric exported from geth with debug.metrics:

# HELP geth_sync_txIndexFinishedBlocks_value metric exported from geth with debug.metrics

Another thing is this depends_on in its docker compose file:
https://github.com/status-im/infra-role-geth-exporter/blob/0859bd5b4a5010a7377c01ea3b4bb26195a594f4/templates/docker-compose.yml.j2#L16C1-L17C15

which doesn't make sense and results with an error if you try to stop container with compose down:

[email protected]:~ % docker compose -f /docker/geth-holesky-02/docker-compose.exporter.yml down
WARN[0000] /docker/geth-holesky-02/docker-compose.exporter.yml: `version` is obsolete 
service "metrics" depends on undefined service "geth": invalid compose project

markoburcul · 2025-01-31T13:37:59Z

Done with the geth hosts, the layout is the same as nethermind:

[email protected]:~ % lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda      8:0    0 372.5G  0 disk  
├─sda1   8:1    0     1G  0 part  /boot/efi
└─sda2   8:2    0 371.5G  0 part  /docker
                                  /
sdb      8:16   0   1.5T  0 disk  /data
                                  /mnt/sdb
sdc      8:32   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
sdd      8:48   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
[email protected]:~ % df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           6.3G  2.6M  6.3G   1% /run
/dev/sda2       365G   42G  305G  12% /
tmpfs            32G  332K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb        1.5T  825G  568G  60% /data
/dev/sda1      1022M  6.1M 1016M   1% /boot/efi
/dev/md0        2.9T  6.3G  2.8T   1% /docker
tmpfs           6.3G     0  6.3G   0% /run/user/7010

markoburcul · 2025-01-31T14:59:58Z

Done with the erigon hosts, the layout is the same as nethermind and geth:

[email protected]:~ % lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda      8:0    0 372.5G  0 disk  
├─sda1   8:1    0     1G  0 part  /boot/efi
└─sda2   8:2    0 371.5G  0 part  /docker
                                  /
sdb      8:16   0   1.5T  0 disk  /data
                                  /mnt/sdb
sdc      8:32   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
sdd      8:48   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
[email protected]:~ % df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           6.3G  2.2M  6.3G   1% /run
/dev/sda2       365G   44G  303G  13% /
tmpfs            32G  316K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb        1.5T  824G  568G  60% /data
/dev/sda1      1022M  6.1M 1016M   1% /boot/efi
/dev/md0        2.9T   34G  2.7T   2% /docker
tmpfs           6.3G     0  6.3G   0% /run/user/7010

I've resynced them all, they will be synced in few hours.

jakubgs assigned markoburcul Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand storage for Holesky fleet #218

Expand storage for Holesky fleet #218

jakubgs commented Jan 3, 2025 •

edited

Loading

jakubgs commented Jan 3, 2025

yakimant commented Jan 14, 2025

markoburcul commented Jan 15, 2025

yakimant commented Jan 15, 2025 •

edited

Loading

jakubgs commented Jan 16, 2025

markoburcul commented Jan 17, 2025

markoburcul commented Jan 21, 2025

markoburcul commented Jan 31, 2025

markoburcul commented Jan 31, 2025

markoburcul commented Jan 31, 2025

markoburcul commented Jan 31, 2025

Expand storage for Holesky fleet #218

Expand storage for Holesky fleet #218

Comments

jakubgs commented Jan 3, 2025 • edited Loading

jakubgs commented Jan 3, 2025

yakimant commented Jan 14, 2025

markoburcul commented Jan 15, 2025

yakimant commented Jan 15, 2025 • edited Loading

jakubgs commented Jan 16, 2025

markoburcul commented Jan 17, 2025

markoburcul commented Jan 21, 2025

markoburcul commented Jan 31, 2025

markoburcul commented Jan 31, 2025

markoburcul commented Jan 31, 2025

markoburcul commented Jan 31, 2025

jakubgs commented Jan 3, 2025 •

edited

Loading

yakimant commented Jan 15, 2025 •

edited

Loading