Is your feature request related to a problem? Please describe.
When the management server is down, running umount ${mnt} hang indefinitely, causing the systemctl restart beegfs-client incorrectly reported success.
Here is the process to reproduce it:
systemctl start beegfs-mgmtd start remote management server
systemctl start beegfs-client (which calls /opt/beegfs/sbin/beegfs-client start)
systemctl stop beegfs-mgmtd stop remote management server
systemctl restart beegfs-client (which calls /opt/beegfs/sbin/beegfs-client stop and /opt/beegfs/sbin/beegfs-client start)
/opt/beegfs/sbin/beegfs-client stop is killed due to timeout, getting stuck at this line:
Here is the systemd log when restarting the client:
Aug 21 10:30:11 pc systemd[1]: Stopping Start BeeGFS Client...
Aug 21 10:30:11 pc beegfs-client[58947]: Shutting down BeeGFS Client:
Aug 21 10:30:11 pc beegfs-client[58947]: - Unmounting directories from /etc/beegfs/beegfs-mounts.conf
Aug 21 10:31:41 pc systemd[1]: beegfs-client.service: Stopping timed out. Terminating.
Aug 21 10:31:41 pc systemd[1]: beegfs-client.service: Control process exited, code=killed, status=15/TERM
Aug 21 10:31:41 pc systemd[1]: beegfs-client.service: Failed with result 'timeout'.
Aug 21 10:31:41 pc systemd[1]: Stopped Start BeeGFS Client.
Aug 21 10:31:41 pc systemd[1]: Starting Start BeeGFS Client...
Aug 21 10:31:41 pc beegfs-client[58986]: Starting BeeGFS Client:
Aug 21 10:31:41 pc beegfs-client[58986]: - Loading BeeGFS modules
Aug 21 10:31:41 pc beegfs-client[58986]: - Mounting directories from /etc/beegfs/beegfs-mounts.conf
Aug 21 10:31:41 pc systemd[1]: Finished Start BeeGFS Client.
- After that,
systemctl status beegfs-client reports SUCCESS, because stop doesn't unmount successfully, and start skip the mount if it already existed
|
mount -t beegfs | grep "${mnt} " >/dev/null 2>&1 |
|
if [ $? -eq 0 ]; then |
|
# already mounted |
|
continue |
|
fi |
Describe the solution you'd like
Changing this line to use umount -l (lazy unmount) should work, but I'm not sure if it will break something
Describe alternatives you've considered
umount -f can also do force unmount, but it might crash other process.
- use longer
TimeoutStopSec to client service.
Is your feature request related to a problem? Please describe.
When the management server is down, running
umount ${mnt}hang indefinitely, causing thesystemctl restart beegfs-clientincorrectly reported success.Here is the process to reproduce it:
systemctl start beegfs-mgmtdstart remote management serversystemctl start beegfs-client(which calls/opt/beegfs/sbin/beegfs-client start)systemctl stop beegfs-mgmtdstop remote management serversystemctl restart beegfs-client(which calls/opt/beegfs/sbin/beegfs-client stopand/opt/beegfs/sbin/beegfs-client start)/opt/beegfs/sbin/beegfs-client stopis killed due to timeout, getting stuck at this line:beegfs/client_module/scripts/etc/beegfs/lib/init-multi-mode.beegfs-client
Line 260 in a0357ce
Here is the systemd log when restarting the client:
systemctl status beegfs-clientreports SUCCESS, becausestopdoesn't unmount successfully, andstartskip the mount if it already existedbeegfs/client_module/scripts/etc/beegfs/lib/init-multi-mode.beegfs-client
Lines 190 to 194 in a0357ce
Describe the solution you'd like
Changing this line to use
umount -l(lazy unmount) should work, but I'm not sure if it will break somethingbeegfs/client_module/scripts/etc/beegfs/lib/init-multi-mode.beegfs-client
Line 260 in a0357ce
Describe alternatives you've considered
umount -fcan also do force unmount, but it might crash other process.TimeoutStopSecto client service.