-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Description
ISSUE TYPE
Bug Report
COMPONENT NAME
Management, Storage, KVM
CLOUDSTACK VERSION
4.17.2.0
CONFIGURATION
1 X Management Server
2 X Primary Storage´s
1 X Secondary Storage
3 X KVM Server.
Advanced Network - Default Security Group.
OS / ENVIRONMENT
Both Management, KVM and Storage´s run Rocklinux 8.7
SUMMARY
Recurring volume snapshots fails and times out after 3600 seconds.
STEPS TO REPRODUCE
Scheduler DATA Volume Snapshot VM (1TB) while VM is online.
LOG HERE
2023-06-05 15:56:57,212 DEBUG [c.c.s.s.SnapshotSchedulerImpl] (SnapshotPollTask:ctx-f75b0142) (logid:1000c112) Got 0 snapshots to be executed at 2023-06-05 18:56:57 GMT
2023-06-05 15:56:59,143 DEBUG [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl] (Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1) (logid:8443133e) Failed to take snapshot: 375
at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.takeSnapshot(SnapshotServiceImpl.java:208)
at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.takeSnapshot(DefaultSnapshotStrategy.java:439)
at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1240)
2023-06-05 15:56:59,147 DEBUG [o.a.c.s.s.SnapshotServiceImpl] (Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1) (logid:8443133e) create snapshot FileServer_DATA-414_20230605175657 failed: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:15, com.cloud.exception.OperationTimedoutException: Commands 4851502698584866841 to Host 15 timed out after 3600
2023-06-05 15:56:59,173 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1) (logid:8443133e) Failed to take snapshot: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:15, com.cloud.exception.OperationTimedoutException: Commands 4851502698584866841 to Host 15 timed out after 3600
2023-06-05 15:56:59,188 DEBUG [c.c.s.s.SnapshotManagerImpl] (Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1) (logid:8443133e) Failed to create snapshotcom.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:15, com.cloud.exception.OperationTimedoutException: Commands 4851502698584866841 to Host 15 timed out after 3600
2023-06-05 15:56:59,200 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1) (logid:8443133e) Updating resource Type = snapshot count for Account = 219 Operation = decreasing Amount = 1
2023-06-05 15:56:59,232 ERROR [o.a.c.s.v.VolumeServiceImpl] (Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1) (logid:8443133e) Take snapshot: 1191 failed
at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.takeSnapshot(DefaultSnapshotStrategy.java:442)
at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1240)
at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.takeSnapshot(DefaultSnapshotStrategy.java:442)
at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1240)
2023-06-05 15:56:59,303 ERROR [o.a.c.a.c.u.s.CreateSnapshotCmd] (API-Job-Executor-1:ctx-225af267 job-15633 ctx-cb317a27) (logid:8443133e) Failed to create snapshot due to an internal error creating snapshot for volume d2f310f4-af8e-41fa-a94d-a1863d566503
at org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd.execute(CreateSnapshotCmd.java:219)
at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.takeSnapshot(DefaultSnapshotStrategy.java:442)
at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1240)
2023-06-05 15:56:59,322 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-225af267 job-15633) (logid:8443133e) Complete async job-15633, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to create snapshot due to an internal error creating snapshot for volume d2f310f4-af8e-41fa-a94d-a1863d566503"}
2023-06-05 15:56:59,328 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-225af267 job-15633) (logid:8443133e) Done executing org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd for job-15633
EXPECTED RESULTS
Volume snapshot is created and transported to secondary storage.
ACTUAL RESULTS
Volume DATA snapshot process can successfully create VM Snapshot.
Checking the logs, the operation is being timeout after 3600 seconds.
I noticed that even getting the error in the ACS log and in the UI.
The process is interrupted by the ACS in 3600 seconds and does not copy the data to the secondary storage.
Even receiving the error, the snapshot of the volume runs in the background, but is not transferred, giving the impression that the hypervisor continued to execute what was requested in that step, when it received the timeout it does not execute the next steps to complete the request .
SNAPSHOTS
virsh snapshot-list i-219-414-VM
Name Creation Time State
-----------------------------------------------------------------------------------
19290884-7bd7-4b29-863a-e2400f4db35f 2023-06-03 14:23:01 -0300 disk-snapshot
3de38bcb-8e2c-4616-9ade-66524eae41b7 2023-05-27 00:08:01 -0300 disk-snapshot
40a49d8b-294b-4139-97d1-3fb3d5318b5e 2023-05-20 00:08:01 -0300 disk-snapshot
4561cfde-8e21-495c-878a-e3511b3a1b0b 2023-05-27 00:03:01 -0300 disk-snapshot
60f9e8ea-a15c-4c2d-a5f9-320670c44c48 2023-06-05 14:56:59 -0300 disk-snapshot
6cb5b473-9b47-42bb-90ce-99871ed42517 2023-06-03 17:44:44 -0300 disk-snapshot
6d9f0c1a-15ff-4e9c-b739-95406ae86649 2023-05-15 08:15:40 -0300 disk-snapshot
6e4efb0b-f17e-4488-b1a4-3b2cc8053c44 2023-05-06 00:00:38 -0300 disk-snapshot
94d85369-9914-4ed2-9263-04823f992c38 2023-06-05 12:03:54 -0300 disk-snapshot
9e3466e2-5479-4abd-8003-f37336234799 2023-06-03 12:18:01 -0300 disk-snapshot
a722a007-0a8f-447c-96df-14e38360dfc5 2023-05-08 10:20:38 -0300 disk-snapshot
af3236fc-a865-4916-bfcf-3433de227a70 2023-06-03 00:03:01 -0300 disk-snapshot
b54669a5-d7bc-46c0-9fda-f7cf822ad512 2023-05-15 08:20:40 -0300 disk-snapshot
cb237c76-0adb-4288-8a0d-699ff4e63d92 2023-06-03 00:12:11 -0300 disk-snapshot
d61caeaa-5a0c-4b5e-852b-6069a5136bf3 2023-05-20 00:03:01 -0300 disk-snapshot
/mnt/a6c33213-6d0f-365a-845c-021a5b5c2aed/snapshots
drwxr-xr-x. 2 root root 4.0K Jun 5 14:56 .
drwxr-xr-x. 4 root root 4.0K Jun 5 14:56 ..
-rw-r--r--. 1 root root 217G Jun 3 16:26 19290884-7bd7-4b29-863a-e2400f4db35f
-rw-r--r--. 1 root root 243G Jun 5 17:08 60f9e8ea-a15c-4c2d-a5f9-320670c44c48
-rw-r--r--. 1 root root 217G Jun 3 19:46 6cb5b473-9b47-42bb-90ce-99871ed42517
-rw-r--r--. 1 root root 278G Jun 5 14:37 94d85369-9914-4ed2-9263-04823f992c38
-rw-r--r--. 1 root root 217G Jun 3 14:21 9e3466e2-5479-4abd-8003-f37336234799
-rw-r--r--. 1 root root 217G Jun 3 02:14 cb237c76-0adb-4288-8a0d-699ff4e63d92
GLOBAL SETTINGS
job.cancel.threshold.minutes Time (in minutes) for async-jobs to be forcely cancelled if it has been in process for long Advanced default(60) set(1440)
copy.volume.wait In second, timeout for copy volume command Storage default(10800) set(86400)
backup.snapshot.wait In second, timeout for BackupSnapshotCommand Storage default(21600) set(86400)