Centos7.5+CDH 6.2搭建大數據平臺
1.CDH介紹
目前Hadoop比較流行的主要有2個版本,Apache和Cloudera版本。
- Apache Hadoop:社區人員比較多,更新頻率比較快,但是穩定性比較差,安裝配置繁瑣,實際使用者少。
- Cloudera Hadoop(CDH):Cloudera公司的發行版本,基於Apache Hadoop的二次開發,優化了組件兼容和交互接口、簡化安裝配置、提供界面統一管理程序。
2.Cloudera Manager 介紹
Cloudera Manager 是用於管理cdh集羣的端到端應用程序,統一管理和安裝。CDH除了可以通過cm安裝也可以通過yum,tar,rpm安裝。主要由如下幾部分組成:
-
服務端/Server:
Cloudera Manager 的核心。主要用於管理 web server 和應用邏輯。它用於安裝軟件,配置,開始和停止服務,以及管理服務運行的集羣。 -
代理/agent:
安裝在每臺主機上。它負責啓動和停止進程,部署配置,觸發安裝和監控主機。 -
數據庫/Database:
存儲配置和監控信息。通常可以在一個或多個數據庫服務器上運行的多個邏輯數據庫。例如,所述的 Cloudera 管理器服務和監視,後臺程序使用不同的邏輯數據庫。
Cloudera Repository:由cloudera manager 提供的軟件分發庫。 -
客戶端/Clients:
提供了一個與 Server 交互的接口。
3.環境準備
3.1.節點準備(四個節點)
主機名 | IP | CM管理軟件 |
---|---|---|
nn01 | 192.168.18.110 | Cloudera Manager Server&Agent ,MariaDB |
dn01 | 192.168.18.111 | Cloudera Manager Agent |
dn02 | 192.168.18.112 | Cloudera Manager Agent |
dn03 | 192.168.18.113 | Cloudera Manager Agent |
3.2.配置主機名和hosts解析(所有節點)
編輯/etc/hostname,修改主機名,並使用命令hostname使其立刻生效。編輯文件/etc/hosts,增加如下內容。
192.168.18.110 nn01.yunlu.cn nn01
192.168.18.111 dn01.yunlu.cn dn01
192.168.18.112 dn02.yunlu.cn dn02
192.168.18.113 dn03.yunlu.cn dn03
3.3.關閉防火牆
# systemctl stop firewalld.service && systemctl disable firewalld.service
3.4.關閉SELinux
# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# setenforce 0
3.5.配置時間同步
chrony既可作時間服務器服務端,也可作客戶端。chrony性能比ntp要好很多,且chrony配置簡單、管理方便。 但是此次我們採用定時任務同步網絡時間的方法。
- 添加定時任務
# echo "$((RANDOM%60)) $((RANDOM%24)) * * * /usr/sbin/ntpdate time1.aliyun.com" >> /var/spool/cron/root
3.6.禁用透明大頁面壓縮,CDH配置需要
# echo never > /sys/kernel/mm/transparent_hugepage/defrag
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
- 並將上面的兩條命令寫入開機自啓動/etc/rc.local
3.7.優化交換分區
# echo "vm.swappiness = 10" >> /etc/sysctl.conf
# sysctl -p
3.8.配置SSH免密登錄
- 所有節點執行如下命令(四次回車):
# ssh-keygen -t rsa
- 用拷貝的方法分發祕鑰,所有節點執行如下命令:
# ssh-copy-id [nn01,dn01-dn03]
總共四次拷貝,每次拷貝按提示輸入
yes
和相應節點的密碼。
4.安裝 CM 和 CDH
4.1.配置 Cloudera Manager 倉庫(所有節點)
# wget https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/cloudera-manager.repo -P /etc/yum.repos.d/
# rpm --import https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/RPM-GPG-KEY-cloudera
使用在線安裝會比較慢,建議先把需要的rpm下載下來,進行離線安裝或者建私有倉庫,涉及下面三個軟件包:
cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm
cloudera-manager-server-6.2.0-968826.el7.x86_64.rpm
cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm
4.2.配置 JDK (所有節點)
# rpm -ivh jdk-8u202-linux-x64.rpm
4.3.安裝 CM Server 和 Agent
建議離線安裝,把rpm包下載到服務器上面,傳到其他節點一份,再本地安裝,速度會快很多。
- nn01:
# yum localinstall cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-server-6.2.0-968826.el7.x86_64.rpm -y
- dn[01-03]:
# yum localinstall cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm -y
4.4.安裝MySQL數據庫(在nn01節點)
此次安裝 mysql 是按照官網教程安裝的,鏈接地址:
https://www.cloudera.com/documentation/enterprise/6/6.0/topics/cm_ig_mysql.html#cmig_topic_5_5
# wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
# rpm -ivh mysql-community-release-el7-5.noarch.rpm
# yum install mysql-server -y
# systemctl start mysqld
- 查看狀態
- 可選步驟。根據官方推薦的配置,編輯文件
/etc/my.cnf
,修改成如下內容:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
symbolic-links = 0
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M
#log_bin should be on a disk with enough free space.
#Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your
#system and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log
#In later versions of MySQL, if you enable the binary log and do not set
#a server_id, MySQL will not start. The server_id must be unique within
#the replicating group.
server_id=1
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
sql_mode=STRICT_ALL_TABLES
以上配置的含義,請參考本節開頭的文檔。
- 設置MySQL的登錄密碼,按照相關提示操作即可
# /usr/bin/mysql_secure_installation
- 將mysql 加到 開機啓動中
# systemctl enable mysqld
4.5.安裝 MySQL JDBC 驅動(所有節點)
用於各節點連接數據庫。
# wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
# tar xf mysql-connector-java-5.1.46.tar.gz
# mkdir -p /usr/share/java/
# cd mysql-connector-java-5.1.46
# cp mysql-connector-java-5.1.46-bin.jar /usr/share/java/mysql-connector-java.jar
4.6.爲 Cloudera 各軟件創建數據庫(在nn01節點)
Service | Database | User |
---|---|---|
Cloudera Manager Server | scm | scm |
Activity Monitor | amon | amon |
Reports Manager | rman | rman |
Sentry Server | sentry | sentry |
Cloudera Navigator Audit Server | nav | nav |
Cloudera Navigator Metadata Server | navms | navms |
Hive Metastore Server | hive | hive |
Hue | hue | hue |
Oozie | oozie | oozie |
- 將如下內容,寫入到
cdh.sql
文件中
CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY 'scm';
CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY 'amon';
CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY 'rman';
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY 'hue';
CREATE DATABASE hive DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';
CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY 'sentry';
CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY 'nav';
CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY 'navms';
CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY 'oozie';
- 執行sql文件
# mysql -uroot -p<ROOT_PASSWORD> < ./cdh.sql
4.7.設置 Cloudera Manager 數據庫
# /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm
接着,輸入scm數據庫密碼
4.8.安裝 CDH(在nn01節點)
CM安裝成功之後,接下來我們就可以通過CM安裝CDH的方式構建企業大數據平臺。所以首先需要把CDH的parcels包下載到CM主服務器上。同樣的,我們爲了加速我們的安裝,我們可以把需要下載的軟件包提前下載下來,也可以創建CDH私有倉庫。
- 下載CDH的軟件包 parcels
# cd /opt/cloudera/parcel-repo
# wget https://archive.cloudera.com/cdh6/6.2.0/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel
# wget https://archive.cloudera.com/cdh6/6.2.0/parcels/manifest.json
- 生成sha文件
# sha1sum CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel | awk '{ print $1 }' > CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel.sha
- 修改屬主屬組
# chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/*
4.9.啓動 Cloudera Manager Server(在nn01節點)
# systemctl start cloudera-scm-server
如果啓動中有什麼問題,可以查看日誌。
# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
在最後顯示的日誌中,有顯示啓動監聽的端口。
Started ServerConnector@da518cb{SSL,[ssl, http/1.1]}{0.0.0.0:7183}
Started ServerConnector@a77165b{HTTP/1.1,[http/1.1]}{0.0.0.0:7180}
5.初始化 Cloudera Manager
稍等下,瀏覽器打開http://nn01:7180,用戶名和密碼默認都是admin。
- 然後按需,繼續下一步操作即可。
5.1.CDH集羣安裝
- 按照提示操作即可,一般選默認就行。
5.2.集羣設置
- 數據庫設置
- 其它按照提示操作即可,一般選默認就行。