Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand storage for Holesky fleet #218

Open
jakubgs opened this issue Jan 3, 2025 · 11 comments
Open

Expand storage for Holesky fleet #218

jakubgs opened this issue Jan 3, 2025 · 11 comments
Assignees

Comments

@jakubgs
Copy link
Member

jakubgs commented Jan 3, 2025

We are currently low on storage for EL nodes on nimbus.holesky fleet. Storage usage on /docker volume varies from 72% up to 96% on some nodes.

We need to:

  1. Request extension of existing storage from InnovaHosting.
    • Preferably with the same kind or at least size of NVMe.
  2. Backup existing node data either locally, or remotely, or re-sync from scratch.
    • If re-syncing is picked BNs will need additional EL while the sync happens.
  3. Re-create a RAID0 array using HP SmartArray CLI tool.
    • We don't care about data security since this can all be re-synced.
  4. Restore node data backups or re-sync.

You can see notes on previous task like this here:

@jakubgs
Copy link
Member Author

jakubgs commented Jan 3, 2025

Current state:

| Hostname                            | Volume  | Size | Used | Avail | Use% |
|-------------------------------------|---------|------|------|-------|------|
| erigon-01.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 994G | 398G  |  72% |
| erigon-02.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 1.1T | 361G  |  75% |
| erigon-03.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 1.1T | 333G  |  77% |
| erigon-04.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 1.1T | 336G  |  76% |
| erigon-05.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 991G | 401G  |  72% |
| erigon-06.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 991G | 402G  |  72% |
| erigon-07.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 994G | 398G  |  72% |
| erigon-08.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 989G | 403G  |  72% |
| erigon-09.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 997G | 395G  |  72% |
| erigon-10.ih-eu-mda1.nimbus.holesky | /docker | 1.5T | 948G | 444G  |  69% |
| geth-01.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 266G  |  81% |
| geth-02.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 267G  |  81% |
| geth-03.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 267G  |  81% |
| geth-04.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 267G  |  81% |
| geth-05.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-06.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-07.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-08.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-09.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 268G  |  81% |
| geth-10.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.1T | 267G  |  81% |
| neth-01.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  67G  |  96% |
| neth-02.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  63G  |  96% |
| neth-03.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  62G  |  96% |
| neth-04.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  65G  |  96% |
| neth-05.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  62G  |  96% |
| neth-06.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  63G  |  96% |
| neth-07.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  64G  |  96% |
| neth-08.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  64G  |  96% |
| neth-09.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  68G  |  96% |
| neth-10.ih-eu-mda1.nimbus.holesky   | /docker | 1.5T | 1.3T |  70G  |  96% |

@yakimant
Copy link
Member

Re-sync itself can free some space too.
Might not help in a long run, but still can be beneficial to do from time to time on non-prod nodes.

@markoburcul
Copy link
Contributor

Re-sync itself can free some space too. Might not help in a long run, but still can be beneficial to do from time to time on non-prod nodes.

Is re-sync what @jakubgs did here #219 (comment)?

@yakimant
Copy link
Member

yakimant commented Jan 15, 2025

Yes, I think.
Holesky nodes should sync faster, but maybe also not so much space will be recovered.

You can read on this in geth docs:

A snap-sync'd Geth node currently requires more than 650 GB of disk space to store the historic blockchain data. With default cache size the database grows by about 14 GB/week. This means that Geth users will rapidly run out of space on 1TB hard drives. To solve this problem without needing to purchase additional hardware, Geth can be pruned. Pruning is the process of erasing older data to save disk space. Since Geth v1.10, users have been able to trigger a snapshot offline prune to bring the total storage back down to the original ~650 GB. The pruning time depends on your hardware but it can take upwards of 12 hours. This has to be done periodically to keep the total disk storage within the bounds of the local hardware (e.g. every month or so for a 1TB disk).

https://geth.ethereum.org/docs/fundamentals/pruning

Although I've never tried prune, just remove and run the node to sync from scratch.

@jakubgs
Copy link
Member Author

jakubgs commented Jan 16, 2025

That is correct, I simply purged the data folder (excluding any key files) and allowed to sync from scratch.

And yes, Holesky is way smaller than mainnet.

@markoburcul
Copy link
Contributor

For all of the Nethermind hosts I've purged the data and started sync from scratch for two of the four nodes per host. For all of them /docker is now on around 55-60%.

@markoburcul
Copy link
Contributor

I've did the same for geth holesky hosts, 2 nodes per host have been resynced(I've purged the node/data/geth/chaindata dir and restarted the node). This should give us enough time until the SSD's arrive.

@markoburcul
Copy link
Contributor

Done with all nethermind hosts. The layout looks like this:

[email protected]:~ % lsblk   
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda      8:0    0 372.5G  0 disk  
├─sda1   8:1    0     1G  0 part  /boot/efi
└─sda2   8:2    0 371.5G  0 part  /docker
                                  /
sdb      8:16   0   1.5T  0 disk  /data
                                  /mnt/sdb
sdc      8:32   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
sdd      8:48   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker

[email protected]:~ % df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           6.3G  2.5M  6.3G   1% /run
/dev/sda2       365G   41G  306G  12% /
tmpfs            32G  312K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb        1.5T  825G  567G  60% /data
/dev/sda1      1022M  6.1M 1016M   1% /boot/efi
tmpfs           6.3G     0  6.3G   0% /run/user/7010
/dev/md0        2.9T  7.5G  2.8T   1% /docker

Devices sdc and sdd are combined into raid0 logical volume and mounted at /docker.

@markoburcul
Copy link
Contributor

On geth hosts I've noticed this in geth exporter logs:

2025/01/31 11:46:08 failed to get metrics: the method debug_metrics does not exist/is not available
2025/01/31 11:46:38 failed to get metrics: the method debug_metrics does not exist/is not available

which is weird considering that the metrics it exports all state metric exported from geth with debug.metrics:

# HELP geth_sync_txIndexFinishedBlocks_value metric exported from geth with debug.metrics

Another thing is this depends_on in its docker compose file:
https://github.com/status-im/infra-role-geth-exporter/blob/0859bd5b4a5010a7377c01ea3b4bb26195a594f4/templates/docker-compose.yml.j2#L16C1-L17C15

which doesn't make sense and results with an error if you try to stop container with compose down:

[email protected]:~ % docker compose -f /docker/geth-holesky-02/docker-compose.exporter.yml down
WARN[0000] /docker/geth-holesky-02/docker-compose.exporter.yml: `version` is obsolete 
service "metrics" depends on undefined service "geth": invalid compose project

@markoburcul
Copy link
Contributor

Done with the geth hosts, the layout is the same as nethermind:

[email protected]:~ % lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda      8:0    0 372.5G  0 disk  
├─sda1   8:1    0     1G  0 part  /boot/efi
└─sda2   8:2    0 371.5G  0 part  /docker
                                  /
sdb      8:16   0   1.5T  0 disk  /data
                                  /mnt/sdb
sdc      8:32   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
sdd      8:48   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
[email protected]:~ % df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           6.3G  2.6M  6.3G   1% /run
/dev/sda2       365G   42G  305G  12% /
tmpfs            32G  332K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb        1.5T  825G  568G  60% /data
/dev/sda1      1022M  6.1M 1016M   1% /boot/efi
/dev/md0        2.9T  6.3G  2.8T   1% /docker
tmpfs           6.3G     0  6.3G   0% /run/user/7010

@markoburcul
Copy link
Contributor

Done with the erigon hosts, the layout is the same as nethermind and geth:

[email protected]:~ % lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda      8:0    0 372.5G  0 disk  
├─sda1   8:1    0     1G  0 part  /boot/efi
└─sda2   8:2    0 371.5G  0 part  /docker
                                  /
sdb      8:16   0   1.5T  0 disk  /data
                                  /mnt/sdb
sdc      8:32   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
sdd      8:48   0   1.5T  0 disk  
└─md0    9:0    0   2.9T  0 raid0 /mnt/sdc
                                  /docker
[email protected]:~ % df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           6.3G  2.2M  6.3G   1% /run
/dev/sda2       365G   44G  303G  13% /
tmpfs            32G  316K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb        1.5T  824G  568G  60% /data
/dev/sda1      1022M  6.1M 1016M   1% /boot/efi
/dev/md0        2.9T   34G  2.7T   2% /docker
tmpfs           6.3G     0  6.3G   0% /run/user/7010

I've resynced them all, they will be synced in few hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants