用OpenStack savanna部署Hadoop集羣

1. 部署OpenStack環境, 安裝主要模塊(keystone/glance/nova/neutron/horizon)

使用RDO的packstack部署較快, 按照下面鏈接中的步驟,如果中間發生錯誤,可重複執行命令



packstack --allinone --os-neutron-install=y

2. 上面部署成功後, 安裝savanna, (我部署savanna的時候命令行還沒有完全支持,所以使用的dashboard)


To install with RDO(使用RDO安裝savanna)

  1. Start by following the Quickstart to install and setupOpenStack.
  2. Install the savanna-api service with,
$ yum install openstack-savanna
  1. Configure the savanna-api service to your liking. The configurationfile is located in/etc/savanna/savanna.conf.
  2. Start the savanna-api service with,
$ service openstack-savanna-api start

To install into a virtual environment(使用tarball安裝Savanna)

       1.   First you need to install python-setuptools,python-virtualenv and python headers using yourOS package manager. The python headers package name depends on OS. For Ubuntu it ispython-dev,for Red Hat - python-devel.

For Fedora:

$ sudo yum install gcc python-setuptools python-virtualenv python-devel
  1. Setup virtual environment for Savanna:
$ virtualenv savanna-venv
This will install python virtual environment into savanna-venv directoryin your current working directory. This command does not require superuser privileges and could be executed in any directory current user haswrite permission.
  1. You can install the latest Savanna release version from pypi:
$ savanna-venv/bin/pip install savanna
Or you can get Savanna archive from http://tarballs.openstack.org/savanna/ and install it using pip:
$ savanna-venv/bin/pip install 'http://tarballs.openstack.org/savanna/savanna-master.tar.gz'
Note that savanna-master.tar.gz contains the latest changes and might not be stable at the moment.We recommend browsinghttp://tarballs.openstack.org/savanna/ and selecting the latest stable release.
  1. After installation you should create configuration file. Sample config file locationdepends on your OS. For Ubuntu it is/usr/local/share/savanna/savanna.conf.sample,for Red Hat -/usr/share/savanna/savanna.conf.sample. Below is an example for Ubuntu:
$ mkdir savanna-venv/etc
$ cp savanna-venv/share/savanna/savanna.conf.sample savanna-venv/etc/savanna.conf
check each option in savanna-venv/etc/savanna.conf, and make necessary changes
  1. To start Savanna call:
$ savanna-venv/bin/python savanna-venv/bin/savanna-api --config-file savanna-venv/etc/savanna.conf

3. 配置Savanna

[root@xianghui workplace]# vi /etc/savanna/savanna.conf


# mkdir /var/log/savanna


# vi /var/log/savanna/savanna.log 
# chown savanna:savanna /var/log/savanna/savanna.log


4. 配置savanna UI

  1. Go to the machine where Dashboard resides and install Savanna UI:

    For RDO:

$ sudo yum install python-django-savanna
$ sudo pip install savanna-dashboard
This will install latest stable release of Savanna UI. If you want to install master branch of Savanna UI:
$ sudo pip install 'http://tarballs.openstack.org/savanna-dashboard/savanna-dashboard-master.tar.gz'
     2.  Configure OpenStack Dashboard. In settings.py add savanna to

    'dashboards': ('nova', 'syspanel', 'settings', ..., 'savanna'),
and also add savannadashboard to
Note: settings.py file is located in/usr/share/openstack-dashboard/openstack_dashboard/ by default.
  1. Also you have to specify SAVANNA_URL in local_settings.py. For example:
SAVANNA_URL = 'http://localhost:8386/v1.1'

          If you are using Neutron instead of Nova Network:

Note: For RDO, the local_settings.py file is located in/etc/openstack-dashboard/, otherwise it is in/usr/share/openstack-dashboard/openstack_dashboard/local/.

$ sudo service httpd reload
You can check that service has been started successfully. Go to Horizon URL and if installation is correct you’ll be able to see the Savanna tab.
5. 下載prebuild image Upload image to Glance

You can download pre-built images with vanilla Apache Hadoop or build this images yourself:

Download and install pre-built image with Fedora 19

$ wget http://savanna-files.mirantis.com/savanna-0.3-vanilla-1.2.1-fedora-19.qcow2
$ glance image-create --name=savanna-0.3-vanilla-1.2.1-fedora-19 \
  --disk-format=qcow2 --container-format=bare < ./savanna-0.3-vanilla-1.2.1-fedora-19.qcow2

登錄dashboard就會發現多了一頁page "savanna", plugins項顯示目前支持兩種類型plugin: vanilla/hdp

使用dashboard註冊prebuild image 到savanna後結果如下:

注意 tags:='["vanilla", "1.2.1", "fedora"]'

創建node group: data node/name node templates

創建cluster templates


6. 跑job

由於界面貌似只支持Swift, 但是我沒有安裝配置swift, 就打算簡單run HDFS.

下面的列表是cluster生成的三個虛擬機,在虛擬機啓動後,cloud-init腳本會自動執行vanilla plugin預先寫好的hadoop配置,這樣就不用每臺虛擬機單獨手動配置

[root@xianghui ~]# nova list
| ID                                   | Name                                  | Status | Task State | Power State | Networks                        |
| 0019f2e8-9450-45e7-9455-44f51d4029b8 | test-1-DataNodeGroup-001              | ACTIVE | None       | Running     | flat-80=                |
| 1cd12dc5-86f5-4e06-bf1f-fd7635dca032 | test-1-DataNodeGroup-002              | ACTIVE | None       | Running     | flat-80=                |
| e746f0ba-b297-4e07-b430-85840b71fa53 | test-1-NameNodeGroup-001              | ACTIVE | None       | Running     | flat-80=                |

因爲配置了ssh-key, 所以不需要密碼,直接登錄

[root@xianghui ~]# ssh [email protected]
Last login: Wed Oct 30 09:48:07 2013 from
[fedora@test-1-namenodegroup-001 ~]$ sudo -i
[root@test-1-namenodegroup-001 ~]#whereis hadoop
hadoop: /usr/bin/hadoop /etc/hadoop /usr/etc/hadoop /usr/include/hadoop /usr/share/hadoop


[root@test-1-namenodegroup-001 ~]# cat input

運行example裏面的wordcount job, 結果顯示結果爲3, 正確!

[root@test-1-namenodegroup-001 ~]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input test_output
13/11/14 06:15:34 INFO mapred.JobClient: Running job: job_201310210945_0015
13/11/14 06:15:35 INFO mapred.JobClient:  map 0% reduce 0%
13/11/14 06:15:50 INFO mapred.JobClient:  map 100% reduce 0%
13/11/14 06:16:01 INFO mapred.JobClient:  map 100% reduce 33%
13/11/14 06:16:03 INFO mapred.JobClient:  map 100% reduce 100%
13/11/14 06:16:05 INFO mapred.JobClient: Job complete: job_201310210945_0015
13/11/14 06:16:05 INFO mapred.JobClient: Counters: 29
13/11/14 06:16:05 INFO mapred.JobClient:   Job Counters
13/11/14 06:16:05 INFO mapred.JobClient:     Launched reduce tasks=1
13/11/14 06:16:05 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=15287
13/11/14 06:16:05 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/11/14 06:16:05 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/11/14 06:16:05 INFO mapred.JobClient:     Launched map tasks=1
13/11/14 06:16:05 INFO mapred.JobClient:     Data-local map tasks=1
13/11/14 06:16:05 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=12207
13/11/14 06:16:05 INFO mapred.JobClient:   File Output Format Counters
13/11/14 06:16:05 INFO mapred.JobClient:     Bytes Written=26
13/11/14 06:16:05 INFO mapred.JobClient:   FileSystemCounters
13/11/14 06:16:05 INFO mapred.JobClient:     FILE_BYTES_READ=44
13/11/14 06:16:05 INFO mapred.JobClient:     HDFS_BYTES_READ=138
13/11/14 06:16:05 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=110578
13/11/14 06:16:05 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=26
13/11/14 06:16:05 INFO mapred.JobClient:   File Input Format Counters
13/11/14 06:16:05 INFO mapred.JobClient:     Bytes Read=21
13/11/14 06:16:05 INFO mapred.JobClient:   Map-Reduce Framework
13/11/14 06:16:05 INFO mapred.JobClient:     Map output materialized bytes=44
13/11/14 06:16:05 INFO mapred.JobClient:     Map input records=4
13/11/14 06:16:05 INFO mapred.JobClient:     Reduce shuffle bytes=44
13/11/14 06:16:05 INFO mapred.JobClient:     Spilled Records=6
13/11/14 06:16:05 INFO mapred.JobClient:     Map output bytes=32
13/11/14 06:16:05 INFO mapred.JobClient:     Total committed heap usage (bytes)=163254272
13/11/14 06:16:05 INFO mapred.JobClient:     CPU time spent (ms)=2580
13/11/14 06:16:05 INFO mapred.JobClient:     Combine input records=3
13/11/14 06:16:05 INFO mapred.JobClient:     SPLIT_RAW_BYTES=117
13/11/14 06:16:05 INFO mapred.JobClient:     Reduce input records=3
13/11/14 06:16:05 INFO mapred.JobClient:     Reduce input groups=3
13/11/14 06:16:05 INFO mapred.JobClient:     Combine output records=3
13/11/14 06:16:05 INFO mapred.JobClient:     Physical memory (bytes) snapshot=238972928
13/11/14 06:16:05 INFO mapred.JobClient:     Reduce output records=3
13/11/14 06:16:05 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1652383744
13/11/14 06:16:05 INFO mapred.JobClient:     Map output records=3

