OpenStack Juno實例遷移resize:nova ssh無密碼計算節點互連

Resize功能,除了會調用flavor之外,和遷移功能幾乎完全一致,會走schedule過程。下面只說遷移的實現,遷移實現後resize功能自然同步實現。

一、熱遷移需要shared storage,如NFS存儲後端

二、移植雲主機(Dashboard->migrate instance,# nova migrate instance_id)

==Dashboard的migrate instance(移植雲主機,冷遷移,實例需要關機)就是執行命令nova migrate,基於ssh,不能指定目標host(依據nova-scheduler)

==Dashboard的Live Migrate Instance(熱遷移主機,os-migrateLive)就是執行命令nova live-migration,基於qemu+tcp,能指定目標host,其中默認--block-migrate=false(第7步),熱遷移需要共享存儲,也可以不要共享存儲執行塊遷移,另外kilo之前的版本還需要更改nova.conf配置開啓Live Migration。

The migration types are:
  A.Non-live migration (sometimes referred to simply as ‘migration’). 
The instance is shut down for a period of time to be moved to another hypervisor. In this case, the instance recognizes that it was rebooted.
  B.Live migration (or ‘true live migration’). 
Almost no instance downtime. Useful when the instances must be kept running during the migration. The different types of live migration are:
    a.Shared storage-based live migration. 
Both hypervisors have access to shared storage.
    b.Block live migration. 
No shared storage is required. Incompatible with read-only devices such as CD-ROMs and Configuration Drive (config_drive).
    c.Volume-backed live migration. 
Instances are backed by volumes rather than ephemeral disk, no shared storage is required, and migration is supported 
(currently only available for libvirt-based hypervisors).
Enabling true live migration:
  Prior to the Kilo release, the Compute service did not use the libvirt live migration function by default. 
Because there is a risk that the migration process will never end if the guest operating system uses blocks on the disk faster than 
they can be migrated. To enable this function, add the following line to the [libvirt] section of the nova.conf file:
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED

1. 管理面板Dashboard直接報錯,即是不能ssh到另一臺計算節點上去建立用於存儲準備遷移的實例的文件夾:

Success: Scheduled migration (pending confirmation) of Instance: migration-test1
Error: Failed to launch instance "migration-test1": Please try again later [Error: Unexpected error while running command. 
Command: ssh 192.168.1.123 mkdir -p /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834 
Exit code: 255 Stdout: u'' Stderr: u'Host key verification failed.\r\n']. 

2. 出現這個錯誤的原因在於用以ssh的賬號nova默認是不登陸的nologin

# cat /etc/passwd | grep nova

nova:x:162:162:OpenStack Nova Daemons:/var/lib/nova:/sbin/nologin

3. 所有計算節點上分別設置nova用戶可登陸,生成密鑰對,並且拷貝公鑰到其他所有計算節點上

# usermod -s /bin/bash nova //更改nova用戶的shell
# passwd nova //設置nova的密碼
# su - nova //切換到nova
-bash-4.2$ ssh-keygen //生成密鑰對

Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/nova/.ssh/id_rsa): //直接回車,(必須默認)使用id_rsa作爲文件名
Created directory '/var/lib/nova/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /var/lib/nova/.ssh/id_rsa.
Your public key has been saved in /var/lib/nova/.ssh/id_rsa.pub.
The key fingerprint is:
53:ef:8d:99:15:4f:d9:fd:a3:17:c8:e4:f8:89:44:b4 nova@compute1
The key's randomart image is:
+--[ RSA 2048]----+
|           .     |
|          . .   +|
|          .E ...+|
|         ...= .+.|
|        S  o.+.oo|
|         ...o*o o|
|           .=+.. |
|              .  |
|                 |
+-----------------+
-bash-4.2$ ssh-copy-id -i /var/lib/nova/.ssh/id_rsa.pub compute2 //把公鑰寫入到目標計算節點的/var/lib/nova/.ssh/authorized_keys
nova@compute2's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'compute2'"
and check to make sure that only the key(s) you wanted were added.
另外,如果提示ssh-copy-id command not found的話,那就直接人工的scp過去:
-bash-4.2$ cat ~/.ssh/id_rsa.pub | ssh nova@compute2 'cat >> ~/.ssh/authorized_keys'
等於分別執行了下面兩條命令:
①在本地機器上執行:scp ~/.ssh/id_rsa.pub nova@compute2:/~
②到遠程機器上執行:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
同時記得.ssh文件夾和authorized_keys文件的權限:
-bash-4.2$ chmod 600 .ssh/authorized_keys
-bash-4.2$ chmod 700 -R .ssh




到此,Dashboard再次執行遷移,報錯:

Error: Failed to launch instance "migration-test1": Please try again later [Error: Unexpected error while running command. 
Command: ssh 192.168.1.123 mkdir -p /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834 
Exit code: 255 Stdout: u'' Stderr: u'Permission denied, please try again.\r\nPermission denied, please try again.\r\nR]. 
在當前計算節點去ssh compute2還是提示輸入密碼

4. 在計算節點把nova的私鑰存入到ssh-agent中

-bash-4.2$ eval `ssh-agent` //處理下ssh-agent的輸出先,不然ssh-add報Could not open a connection to your authentication agent

Agent pid 40697

eval [arg ...]
              The args are read and concatenated together into a single command.  This command is then read and executed  by  the  shell,  and  its  exit  status  is
              returned as the value of eval.  If there are no args, or only null arguments, eval returns 0.
-bash-4.2$ ssh-add .ssh/id_rsa //加入私鑰到ssh-agent

$ ssh-add -l //驗證查看已添加的密鑰

-bash-4.2$ ssh compute2 //測試計算節點2,成功無密碼登陸了

5. (可選)更改nova用戶的ssh不使用known_hosts來記錄覈查連接主機,這樣做的目的和好處在於避免以後用相同IP但不同系統/機器來連接時需要首先去known_hosts中刪除對應記錄

-bash-4.2$ vim .ssh/config

Host *
    StrictHostKeyChecking no
    UserKnownHostsFile=/dev/null
a. StrictHostKeyChecking no 表示連接主機密鑰發生變化也不拒絕連接,也就是不用強制先去known_hosts中刪除對應項

b. UserKnownHostsFile=/dev/null 給個空文件給known_hosts文件,直接不存known_hosts

6. 在Dashboard上再次執行遷移(migrate instance,# nova migrate instance_id)

之前的遷移報錯,可能會導致實例狀態變爲Error,所以在其所在計算節點上重置下狀態爲Active

# nova reset-state --active d8db2011-217b-433d-aa80-06230203a834 //控制節點重置實例狀態

# nova migrate --poll d8db2011-217b-433d-aa80-06230203a834 //命令行來執行,DAshboard上會同步顯示

Server migrating... 100% complete
Finished

# tail -f /var/log/nova/nova-compute.log //當前計算節點

2016-02-17 19:09:50.796 23958 INFO nova.compute.manager [req-e22d9e9a-96ce-4aa6-9027-cf29d0997fd4 None] 
  [instance: d8db2011-217b-433d-aa80-06230203a834] During sync_power_state the instance has a pending task (resize_migrating). Skip.
2016-02-17 19:09:51.030 23958 INFO nova.virt.libvirt.driver [req-019123e1-4f1d-4e3e-931a-ae528d193193 None] 
  [instance: d8db2011-217b-433d-aa80-06230203a834] Instance shutdown successfully after 3 seconds.

留意的是:遷移之後需要確認,確認之前實例就已經恢復正常運行;這個給個revert的選項在於還可以復原遷移。到此,遷移完成。

冷遷移的過程是首先把實例給關閉了,然後重命名實例文件夾爲instanceid_resize,ssh把實例文件給拷貝過去,數據庫更改,在新計算節點上啓動實例,在dashboard上顯示確認遷移,確認後刪除原有實例文件,刪除老節點qemu下的xml文件並在新節點新建(更加具體流程還需要看原碼)。


另外1. 如果前面配置前的遷移報錯造成一些數據不同步的情況:數據庫沒有了,但是計算節點檢測到還有。計算節點每10分鐘就會去synchronizing instance power states一次。老計算節點nova-compute.log顯示如下:

2016-02-18 10:26:39.111 89998 WARNING nova.compute.manager [-] While synchronizing instance power states, 
  found 2 instances in the database and 3 instances on the hypervisor.
發現老節點上實例文件夾_resize還在:
drwxr-xr-x 2 nova nova   69 2月  18 10:26 ca23a858-f0fc-406c-a14b-a25a47356361_resize
總用量 106G
-rw-rw---- 1 root root    0 1月  21 19:33 console.log
-rw-r--r-- 1 root root 106G 2月  18 09:49 disk
-rw-r--r-- 1 nova nova   79 11月 27 13:49 disk.info
-rw-r--r-- 1 nova nova 2.6K 1月  18 23:16 libvirt.xml
......
2016-02-18 16:35:34.071 89998 WARNING nova.compute.manager [-] While synchronizing instance power states, 
  found 0 instances in the database and 2 instances on the hypervisor.

# virsh list --all //列出hypervisor識別的所有的實例,包括關機的

 Id    名稱                         狀態
----------------------------------------------------
 -     instance-0000008a              關閉
 -     instance-0000008c              關閉

如果實例已經搬遷過去了,那老節點上直接給undefine刪除就好。

# virsh undefine instance-0000008a //刪除對該實例的定義,也就是刪除實例,之後便沒有再提示還有實例在hypervisor上。搞定。

另外2. 至於(新節點)hypervisor上實例的數量,在實例錯誤狀態,電源nostate時,實例在新節點上沒自動啓動,沒有libvirt.xml文件,且新節點nova-compute.log提示少了個實例:

2016-02-18 10:36:47.347 144832 WARNING nova.compute.manager [-] While synchronizing instance power states, 
  found 9 instances in the database and 8 instances on the hypervisor.

則硬啓動(hard  reboot(建立/etc/libvirt/qemu/下的xml文件))實例之後,建立了libvirt.xml文件,數據庫和hypervisor識別的實例數變得一致。

三、熱遷移主機(Dashboard->Live Migrate Instance,# nova live-migration --block-migrate instance_id host)

在上面冷遷移的開啓計算節點nova用戶可登陸、密鑰生成、拷貝公鑰,添加私鑰步驟基礎之上,進行下面的步驟。

# vim /etc/nova/nova.conf //計算節點取消註釋

live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED
# systemctl status openstack-nova-api
# systemctl restart openstack-nova-compute //重啓nova-api和nova-compute

# nova live-migration --block-migrate d8db2011-217b-433d-aa80-06230203a834 compute2 //控制節點執行遷移

# tail -f /var/log/nova/nova-compute.log //當前計算節點日誌

2016-02-17 11:31:15.179 11959 ERROR nova.virt.libvirt.driver [-] [instance: d8db2011-217b-433d-aa80-06230203a834] 
  Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute2/system: 
  unable to connect to server at 'compute2:16509': Connection refused
# tail -f /var/log/nova/nova-compute.log //目標計算節點日誌
2016-02-17 11:31:14.250 46026 WARNING nova.virt.disk.vfs.guestfs [req-fa8c2e70-9679-493d-b0ba-170a9a0343d5 None] 
  Failed to close augeas aug_close: do_aug_close: you must call 'aug-init' first to initialize Augeas
2016-02-17 11:31:15.442 46026 WARNING nova.virt.libvirt.driver [-] [instance: d8db2011-217b-433d-aa80-06230203a834] 
  During wait destroy, instance disappeared.
2016-02-17 11:31:16.056 46026 INFO nova.virt.libvirt.driver [req-fa8c2e70-9679-493d-b0ba-170a9a0343d5 None] 
  [instance: d8db2011-217b-433d-aa80-06230203a834] Deleting instance files /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834_del
2016-02-17 11:31:16.057 46026 INFO nova.virt.libvirt.driver [req-fa8c2e70-9679-493d-b0ba-170a9a0343d5 None] 
  [instance: d8db2011-217b-433d-aa80-06230203a834] Deletion of /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834_del complete

# netstat -an | grep 16509 //目標計算節點發現這個端口沒有開,該端口用於libvirtd的TCP連接監聽

# grep listen_ /etc/libvirt/libvirtd.conf  //目標計算節點上的libvirtd配置,#的爲末日,對應沒有#的爲手工配置

# This is enabled by default, uncomment this to disable it
#listen_tls = 0
listen_tls = 0
# This is disabled by default, uncomment this to enable it.
#listen_tcp = 1
listen_tcp = 1
#listen_addr = "192.168.0.1"
listen_addr = "0.0.0.0"
......
<pre name="code" class="html"># Override the port for accepting secure TLS connections
# This can be a port number, or service name
#
#tls_port = "16514"

# Override the port for accepting insecure TCP connections
# This can be a port number, or service name
#
#tcp_port = "16509"
......
#auth_tcp = "sasl"
auth_tcp = "none"
# grep LIBVIRTD_ARGS /etc/sysconfig/libvirtd //開啓TCP端口監聽,#的爲默認配置
# Listen for TCP/IP connections
# NB. must setup TLS/SSL keys prior to using this
#LIBVIRTD_ARGS="--listen"
LIBVIRTD_ARGS="--listen"

# systemctl restart libvirtd //重啓libvirtd

# netstat -an | grep 16509 //檢測端口開放

tcp        0      0 0.0.0.0:16509           0.0.0.0:*               LISTEN
再次命令行遷移
# tail -f /var/log/nova/nova-compute.log //當前計算節點日誌
2016-02-17 15:35:34.402 11959 ERROR nova.virt.libvirt.driver [-] [instance: d8db2011-217b-433d-aa80-06230203a834] 
Live Migration failure: internal error: unable to execute QEMU command 'migrate': this feature or command is not currently supported
# tail -f /var/log/nova/nova-compute.log //目標計算節點日誌
2016-02-17 15:35:32.898 46026 WARNING nova.virt.disk.vfs.guestfs [req-d810867d-9880-48cb-8b24-3a9ed357178d None] 
  Failed to close augeas aug_close: do_aug_close: you must call 'aug-init' first to initialize Augeas
2016-02-17 15:35:34.183 46026 INFO nova.compute.manager [-] [instance: d8db2011-217b-433d-aa80-06230203a834] 
  VM Started (Lifecycle Event)
2016-02-17 15:35:34.302 46026 INFO nova.compute.manager [-] [instance: d8db2011-217b-433d-aa80-06230203a834] 
  During the sync_power process the instance has moved from host compute5 to host compute4
2016-02-17 15:35:34.400 46026 INFO nova.compute.manager [-] [instance: d8db2011-217b-433d-aa80-06230203a834] 
  VM Stopped (Lifecycle Event)
2016-02-17 15:35:34.519 46026 INFO nova.compute.manager [-] [instance: d8db2011-217b-433d-aa80-06230203a834] 
  During the sync_power process the instance has moved from host compute5 to host compute4
2016-02-17 15:35:34.658 46026 WARNING nova.virt.libvirt.driver [-] [instance: d8db2011-217b-433d-aa80-06230203a834] 
  During wait destroy, instance disappeared.
2016-02-17 15:35:35.232 46026 INFO nova.virt.libvirt.driver [req-d810867d-9880-48cb-8b24-3a9ed357178d None] 
  [instance: d8db2011-217b-433d-aa80-06230203a834] Deleting instance files /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834_del
2016-02-17 15:35:35.233 46026 INFO nova.virt.libvirt.driver [req-d810867d-9880-48cb-8b24-3a9ed357178d None] 
  [instance: d8db2011-217b-433d-aa80-06230203a834] Deletion of /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834_del complete
G之後說可能CentOS7想用的是qemu-kvm-rhev而非qemu-kvm,在控制節點和計算節點上都配置repo、安裝這個包

# vim /etc/yum.repos.d/qemu-kvm-rhev.repo

[qemu-kvm-rhev]
name=oVirt rebuilds of qemu-kvm-rhev
baseurl=http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/
mirrorlist=http://resources.ovirt.org/pub/yum-repo/mirrorlist-ovirt-3.5-el7Server
enabled=1
skip_if_unavailable=1
gpgcheck=0
# yum -y install qemu-kvm-rhev
再次命令行遷移,同樣的錯誤 - 待續。。。。。。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章