Skip to content

Conversation

@hossnys
Copy link
Collaborator

@hossnys hossnys commented Feb 19, 2025

@hossnys
Copy link
Collaborator Author

hossnys commented Feb 23, 2025

@PeterNashaat here is the new flist to test with it : https://hub.grid.tf/tf-official-apps/casperlabs-latest.flist

@PeterNashaat
Copy link
Member

  • Tested multiple ssh keys and worked fine
  • But there is issue in initializing casperlab process
root@caspertest:~# zinit
---
sshd: Running
casperlabs: "Error(Exited(Pid(1918), 1))"
sshkey: Success
root@caspertest:~#

which makes it stuck at this step
image

  • From logs i found out
[+] casperlabs: + CASPER_VERSION=1_0_0
[+] casperlabs: + CASPER_NETWORK=casper
[+] casperlabs: + crontab -
[+] casperlabs: + cat /opt/cronjobs
[+] casperlabs: + rm -f /var/www/html/index.html
[+] casperlabs: + mkdir /run/lock
[+] casperlabs: mkdir: cannot create directory '/run/lock': File exists
  • Solution for this is to use this if condition
if [ ! -d "/run/lock" ]; then
    mkdir /run/lock
fi

instead of this line here https://github.com/threefoldtech/tf-images/blob/development_fix_casperlaps/tfgrid3/casper/scripts/start_casper#L9

@hossnys
Copy link
Collaborator Author

hossnys commented Mar 6, 2025

Thanks for checking it @PeterNashaat , it's fixed here

@PeterNashaat
Copy link
Member

  • Thanks @hossnys for adding fix, but please check these errors here for not able to extract gzip format
    image
  • And i see from the zinit logs that casperlabs process is in a loop check if adding oneshot won't make issue in starting the service
casperlabs.yaml
exec: bash -c "/start_casper"
  • it's still not syncing
    image

@hossnys
Copy link
Collaborator Author

hossnys commented Mar 11, 2025

@PeterNashaat
Copy link
Member

zinit
---
ufw: Success
ufw_init: Success
sshd: Running
casperlabs: "Error(Exited(Pid(205), 1))"
sshkey: Success
image - Domain: https://cl604test2casper22.gent03.grid.tf/ image
  • Still same issue node not syncing

@hossnys
Copy link
Collaborator Author

hossnys commented Apr 6, 2025

  • fixed and verified :

Screenshot from 2025-04-06 18-30-35

Screenshot from 2025-04-06 18-30-11

@PeterNashaat
Copy link
Member

Flist deployment tested and zinit processes are running as expected.

However, the node is not syncing. Logs show repeated sync leap failures and connection errors:

"error":"TCP connection failed: Connection refused (os error 111)"
"error":"Broken pipe (os error 32)"
"error":"TimedOut { id: SyncLeapIdentifier { block_hash: BlockHash(8322e1ca...) }, peer: NodeId(...) }"
"error":"Absent { id: SyncLeapIdentifier { block_hash: BlockHash(8322e1ca...) }, peer: NodeId(...) }"
"message":"CatchUp: failed leap", "error":"unable to acquire data for: block hash 8322..e31d"
image

The node fails to fetch required blocks and global state from peers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix casperlaps adding ssh key to authotrized keys

4 participants