相关问题及报错:
回退快照后,spice终端发现虚拟机重新启动,没有恢复快照内存。
1、2日志中均出现如下报错:
2018-02-12 19:39:23,830+0800 ERROR (vm/d7be0fde) [virt.vm] (vmId=’d7be0fde-f9b9-4447-a250-2453482faef9’) The vm start process failed (vm:662)
Traceback (most recent call last):
File “/usr/share/vdsm/virt/vm.py”, line 607, in _startUnderlyingVm
self._completeIncomingMigration()
File “/usr/share/vdsm/virt/vm.py”, line 3268, in _completeIncomingMigration
self.cont()
File “/usr/share/vdsm/virt/vm.py”, line 1128, in cont
self._underlyingCont()
File “/usr/share/vdsm/virt/vm.py”, line 3368, in _underlyingCont
self._dom.resume()
File “/usr/lib/python2.7/dist-packages/vdsm/virt/virdomain.py”, line 69, in f
ret = attr(*args, **kwargs)
File “/usr/lib/python2.7/dist-packages/vdsm/libvirtconnection.py”, line 123, in wrapper
ret = f(*args, **kwargs)
File “/usr/lib/python2.7/dist-packages/vdsm/utils.py”, line 926, in wrapper
return func(inst, *args, **kwargs)
File “/usr/lib/python2.7/dist-packages/libvirt.py”, line 1905, in resume
if ret == -1: raise libvirtError (‘virDomainResume() failed’, dom=self)
libvirtError: Requested operation is not valid: domain is already running
问题调查:
该问题出现在vdsm端,vdsm创建快照和虚拟机暂停两项功能在原理上是一致的,且启动流程相同,但虚拟机暂停后启动是正常,执行流程如下。
snapshot线:
hibernate线:
vm启动线:
API->prepare->_startUnderlyingVm->_completeIncomingMigration->self._dom_resume->…etc->return
创建快照、暂停虚拟机在保存内存和cpu状态的流程并不相同,暂停虚拟机在保存cpu状态前会调用self._dom.suspend暂停虚拟机,而创建快照只会冻结文件系统,创建快照不暂停虚拟机的原因见[1]中的vdsm注释,社区认为libvirt会pause vm,所以在vdsm并未进行pause操作,这会导致在vm启动线中,调用self._dom_resume时出现domain is already running的报错,这是因为libvirt中会首先保存cpu状态,再去检测cpu状态,如果running则pause,所以恢复快照后cpu状态仍然是保存下来running,这时候再调用resume会报错。
实际改动:
改动见[2],创建快照时只需在保存cpu状态前pause vm,并在剩余操作执行完毕后resume vm即可。
测试结果:
修改后重新创建多个快照,并进行5次回退操作,状态显示均正常,spice连接后页面正常、内存回退正常。