CloudStack由於解決主機宕機引起的SSVM與CPVM無法刪除的方法


問題背景:

運行SSVM跟CPVM的物理機發生宕機,查看SSVM跟CPVM狀態仍舊爲 Running, 所在主機仍舊顯示爲宕機物理機,於是將該物理機啓動成功,登錄物理機通過virsh list --all 命令查看SSVM跟 CPVM是否確實運行成功,發現並沒有,再查詢所有物理機,發現依舊沒有發現 SSVM跟 CPVM的虛機,然而CloudStack的UI界面顯示SSVM跟CPVM一直爲Running,也顯示運行在該主機上面,當然Ping不通其IP地址,於是想將SSVM 跟  CPVM 刪除,但是都不行,連停止操作都失敗,但是竟然可以順利創建實例,簡直就是一個BIG BUG!

日誌信息: /var/log/cloudstack/management/management-server.log
2013-12-17 21:33:26,525 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-130:job-130) Executing org.apache.cloudstack.api.command.admin.systemvm.DestroySystemVmCmd for job-130
2013-12-17 21:33:26,527 DEBUG [cloud.api.ApiServlet] (catalina-exec-9:null) ===END===  10.200.251.246 -- GET  command=destroySystemVm&id=94576696-a734-459b-b697-9ade8d616e68&response=json&sessionkey=yY8M0StWM6ohsnSO3nhPZGj7xKk%3D&_=1387333995495
2013-12-17 21:33:26,612 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-130:job-130) VM state transitted from :Running to Stopping with event: StopRequestedvm's original host id: 1 new host id: 1 host id before state transition: 1
2013-12-17 21:33:26,618 WARN  [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-130:job-130) Unable to stop vm, agent unavailable: com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Host with specified id is not in the right state: Disconnected
2013-12-17 21:33:26,618 WARN  [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-130:job-130) Unable to stop vm VM[SecondaryStorageVm|s-1-VM]
2013-12-17 21:33:26,628 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-130:job-130) VM state transitted from :Stopping to Running with event: OperationFailedvm's original host id: 1 new host id: 1 host id before state transition: 1
2013-12-17 21:33:26,628 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-130:job-130) Unable to stop the VM so we can't expunge it.
2013-12-17 21:33:26,628 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-130:job-130) Unable to destroy the vm because it is not in the correct state: VM[SecondaryStorageVm|s-1-VM]
2013-12-17 21:33:26,628 INFO  [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-130:job-130) Did not expunge VM[SecondaryStorageVm|s-1-VM]
2013-12-17 21:33:26,640 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-130:job-130) Complete async job-130, jobStatus: 2, resultCode: 530, result: Error Code: 530 Error text: Fail to destroy system vm
2013-12-17 21:33:26,728 DEBUG [agent.transport.Request] (StatsCollector-1:null) Seq 15-1464552034: Received:  { Ans: , MgmtId: 345051385634, via: 15, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
2013-12-17 21:33:27,100 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-13:null) Ping from 8
2013-12-17 21:33:27,235 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-9:null) Ping from 14
2013-12-17 21:33:27,454 DEBUG [agent.transport.Request] (AgentManager-Handler-8:null) Seq 8-1342917711: Processing:  { Ans: , MgmtId: 345051385634, via: 8, Ver: v1, Flags: 10, [{"Answer":{"result":false,"details":"timeout","wait":0}}] }
2013-12-17 21:33:27,455 DEBUG [agent.transport.Request] (AgentManager-Handler-12:null) Seq 8-1342917712: Processing:  { Ans: , MgmtId: 345051385634, via: 8, Ver: v1, Flags: 10, [{"Answer":{"result":false,"details":"timeout","wait":0}}] }
2013-12-17 21:33:27,455 DEBUG [agent.transport.Request] (AgentTaskPool-3:null) Seq 8-1342917711: Received:  { Ans: , MgmtId: 345051385634, via: 8, Ver: v1, Flags: 10, { Answer } }
2013-12-17 21:33:27,455 DEBUG [cloud.ha.AbstractInvestigatorImpl] (AgentTaskPool-3:null) host (10.196.53.73) cannot be pinged, returning null ('I don't know')
2013-12-17 21:33:27,455 DEBUG [cloud.ha.UserVmDomRInvestigator] (AgentTaskPool-3:null) sending ping from (9) to agent's host ip address (10.196.53.73)
2013-12-17 21:33:27,455 DEBUG [agent.transport.Request] (AgentTaskPool-16:null) Seq 8-1342917712: Received:  { Ans: , MgmtId: 345051385634, via: 8, Ver: v1, Flags: 10, { Answer } }
2013-12-17 21:33:27,455 DEBUG [cloud.ha.AbstractInvestigatorImpl] (AgentTaskPool-16:null) host (10.196.53.74) cannot be pinged, returning null ('I don't know')
2013-12-17 21:33:27,455 DEBUG [cloud.ha.UserVmDomRInvestigator] (AgentTaskPool-16:null) sending ping from (9) to agent's host ip address (10.196.53.74)
2013-12-17 21:33:27,460 DEBUG [agent.transport.Request] (AgentTaskPool-3:null) Seq 9-241192500: Sending  { Cmd , MgmtId: 345051385634, via: 9, Ver: v1, Flags: 100011, [{"PingTestCommand":{"_computingHostIp":"10.196.53.73","wait":20}}] }
2013-12-17 21:33:27,461 DEBUG [agent.transport.Request] (AgentTaskPool-16:null) Seq 9-241192501: Sending  { Cmd , MgmtId: 345051385634, via: 9, Ver: v1, Flags: 100011, [{"PingTestCommand":{"_computingHostIp":"10.196.53.74","wait":20}}] }
2013-12-17 21:33:27,585 DEBUG [agent.transport.Request] (StatsCollector-1:null) Seq 16-1532317381: Received:  { Ans: , MgmtId: 345051385634, via: 16, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
2013-12-17 21:33:27,890 DEBUG [agent.manager.AgentManagerImpl] (AgentManager-Handler-1:null) Ping from 11

關鍵信息:
Unable to destroy the vm because it is not in the correct state: VM[SecondaryStorageVm|s-1-VM]    
數據庫信息
mysql> SELECT * FROM host WHERE  name like '%s-1-VM%'\G  //主機信息中的系統虛機信息
*************************** 1. row ***************************
                  id: 21
                name: s-1-VM
                uuid: 986db967-13a9-48ca-815b-c41d6951a3f3
              status: Disconnected
                type: SecondaryStorageVM
  private_ip_address: 10.196.53.74
     private_netmask: 255.255.255.0
 private_mac_address: 06:51:e0:00:00:07
  storage_ip_address: 10.196.53.82
     storage_netmask: 255.255.255.0
 storage_mac_address: 06:51:e0:00:00:07
storage_ip_address_2: NULL
storage_mac_address_2: NULL
   storage_netmask_2: NULL
          cluster_id: NULL
   public_ip_address: 10.196.53.76
      public_netmask: 255.255.255.0
  public_mac_address: 06:e0:2c:00:00:0e
          proxy_port: NULL
      data_center_id: 1
              pod_id: 1
                cpus: NULL
               speed: NULL
                 url: NoIqn
             fs_type: NULL
     hypervisor_type: NULL
  hypervisor_version: NULL
                 ram: 0
            resource: NULL
             version: 4.1.1
              parent: NULL
          total_size: NULL
        capabilities: NULL
                guid: s-1-VM-NfsSecondaryStorageResource
           available: 1
               setup: 0
         dom0_memory: 0
           last_ping: 1354828061
      mgmt_server_id: 345051385634
        disconnected: NULL
             created: 2013-12-18 05:18:54
             removed: NULL
        update_count: 2
      resource_state: Enabled
               owner: NULL
         lastUpdated: NULL
        engine_state: Disabled
1 row in set (0.00 sec)
mysql> SELECT * FROM vm_instance WHERE  name like '%s-1-VM%'\G //虛擬機實例中的系統虛機信息,cloudstack界面上面的實例以及系統虛機狀態均從該表中的state字段讀取。
*************************** 1. row ***************************
                id: 22
              name: s-1-VM
              uuid: 8bd3ab0c-a431-4dd2-85a7-013921427f6a
     instance_name: s-1-VM
             state: Running
    vm_template_id: 3
       guest_os_id: 15
private_mac_address: 06:51:e0:00:00:07
private_ip_address: 10.196.53.74
            pod_id: 1
    data_center_id: 1
           host_id: 15
      last_host_id: 15
          proxy_id: 55
 proxy_assign_time: 2013-12-18 05:20:52
      vnc_password: VoRRPovUk7w7/+islEFf9Ai0tbTep0WOJJod0PLOJkU=
        ha_enabled: 0
     limit_cpu_use: 0
      update_count: 3
       update_time: 2013-12-18 05:18:59
           created: 2013-12-18 05:17:04
           removed: NULL
              type: SecondaryStorageVm
           vm_type: SecondaryStorageVm
        account_id: 1
         domain_id: 1
service_offering_id: 9
    reservation_id: a2a55809-abfa-4b6e-92f8-105cf8bef2a8
   hypervisor_type: KVM
  disk_offering_id: NULL
               cpu: NULL
               ram: NULL
             owner: NULL
             speed: NULL
         host_name: NULL
      display_name: NULL
     desired_state: NULL
1 row in set (0.01 sec)
問題的關鍵點
就是數據庫中兩個字段的紅色標註部分    ,一個表中顯示的是Disconnected ,一個表中顯示的是Running, CloudStack 的UI界面上面顯示兩個系統虛機也是Running。
問題解決:
瞭解這兩個虛擬機的朋友都知道,這是個很強大的虛擬機,刪除之後能夠重建,一般這兩個虛擬機出現了故障,也是通過刪除,重建解決的,既然UI界面上面無法刪除,那就在數據庫中修改相應字段,將其狀態置爲Destroyed即可。
UPDATE vm_instance SET state='Destroyed'  WHERE name='s-1-VM';
UPDATE vm_instance SET state='Destroyed'  WHERE name='v-2-VM';
然後回到CloudStack UI界面查看
spacer.gif175843246.png
系統檢測到原有的兩個系統虛機狀態都爲Destroyed,就開始重建新的SSVM跟CPVM,等待其狀態顯示爲Running,系統就恢復正常了。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章