大數據集羣-這是一篇longlong的博客

[TOP]

ip設置:

服務器中共虛擬了6臺虛擬機:

hadoop1 :內存8G,硬盤2T
hadoop2 :內存8G,硬盤2T
hadoop3 :內存8G,硬盤2T
zookeeper :內存8G,硬盤2T
redis :內存8G,硬盤2T
ethings :內存8G,硬盤2T

192.168.0.10 hadoop1 // hadoop2.7.4 + zookeeper3.4.10 + hbase1.2.6 + hive2.1.1 + mariadb5.5
192.168.0.11 hadoop2 // hadoop2.7.4 + zookeeper3.4.10 + hbase1.2.6
192.168.0.12 hadoop3 // hadoop2.7.4 + zookeeper3.4.10 + hbase1.2.6
192.168.0.13 zookeeper // (備用)
192.168.0.14 redis // redis4.0.1 + mysql(mariadb5.5)
192.168.0.15 ethings // 應用平臺
192.168.0.16 hadoop3 // hadoop2.7.4 + zookeeper3.4.10 + hbase1.2.6
192.168.0.17 hadoop3 // hadoop2.7.4 + zookeeper3.4.10 + hbase1.2.6

這是整個在給的服務器上搭建的環境,
目前數據存儲在 hadoop2 和 hadoop3 上。


存儲平臺開關

	啓動:
		
		1. 先啓動zookeeper:
		hadoop1:zkServer.sh start
		hadoop2:zkServer.sh start
		hadoop3:zkServer.sh start
		2. 啓動hbase
		hadoop1:start-hbase.sh 
		3. 啓動hadoop集羣
		hadoop1:start-all.sh
		
	關閉:

		1. 先關閉zookeeper:
		hadoop1:zkServer.sh stop
		hadoop2:zkServer.sh stop
		hadoop3:zkServer.sh stop
		2. 關閉hbase
		hadoop1:stop-hbase.sh 
		3. 關閉hadoop集羣
		hadoop1:stop-all.sh

hadoop

core-site.xml

<configuration>
    <!-- 指定HDFS老大(namenode)的通信地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop1:9000</value>
    </property>
    <!-- 指定hadoop運行時產生文件的存儲路徑 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/tmp</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop1:9001</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop/hdfs/data</value>
    </property>
    <!-- 設置hdfs副本數量 -->
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
	<property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
</configuration>

客戶端,添加環境變量:HADOOP_USER_NAME=hadoop (如果在客戶端的IDE中調試需要設置這個環境變量,如eclipse、idea等),這樣就不會發生訪問權限問題了。
另外,說一下,建立虛擬機時候,就默認使用hadoop用戶建立,這樣就不用專門去建立這個用戶和組了。

mapred-site.xml

<configuration>
        <property>
             <name>mapreduce.framework.name</name>
                <value>yarn</value>
           </property>
          <property>
                  <name>mapreduce.jobhistory.address</name>
                  <value>hadoop1:10020</value>
          </property>
          <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>hadoop1:19888</value>
       </property>
</configuration>

slaves

hadoop2
hadoop3
hadoop4
hadoop5

yarn-site.xml

<configuration>
    <!-- reducer取數據的方式是mapreduce_shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>hadoop1:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>hadoop1:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>hadoop1:8031</value>
  </property>
        <property>
              <name>yarn.resourcemanager.admin.address</name>
               <value>hadoop1:8033</value>
       </property>
       <property>
               <name>yarn.resourcemanager.webapp.address</name>
               <value>hadoop1:8088</value>
       </property>
</configuration>

hbase

hbase-site.xml

<configuration>
	<property>
        <name>hbase.rootdir</name>
        <value>hdfs://hadoop1:9000/hbase</value>
    </property>
        <value>hadoop1,hadoop2,hadoop3,hadoop4,hadoop5</value>
    </property>
    <property>
        <name>zookeeper.session.timeout</name>
        <value>60000000</value>
    </property>
    <property>
        <name>dfs.support.append</name>
        <value>true</value>
    </property>
</configuration>

hbase-env.sh

export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"

regionservers

hadoop2
hadoop3
hadoop4
hadoop5

vbox設置共享目錄

mount -t vboxsf share /home/hadoop/mount_point/

虛擬機中需要安裝VBoxGuestAdditions.iso
掛載: mount /dev/cdrom /home/hadoop/mount_point/
cd /home/hadoop/mount_point/
sh ./VBoxLinuxAdditions.run
執行過程中可能有錯,根據錯誤日誌修改
需要
sudo yum install gcc kernal kernal-devel
成功後就可以掛載了
mount -t vboxsf share /home/hadoop/mount_point/


Zookeeper設置

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper/tmp/data
dataLogDir=/usr/local/zookeeper/tmp/logs
clientPort=2181
server.1=hadoop1:2888:3888
server.2=hadoop2:2888:3888
server.3=hadoop3:2888:3888
#server.4=hadoop4:2888:3888
#server.5=hadoop5:2888:3888
#maxClientCnxns=60
#autopurge.snapRetainCount=3
#autopurge.purgeInterval=1

配置好後在dataDir目錄中創建myid,並相應的設置 1,2,3,4,5,


HIVE 設置

<configuration>
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>root</value>
	</property>
	<!-- 設置 hive倉庫的HDFS上的位置 -->
	<property>
		<name>hive.metastore.warehouse.dir</name>
		<value>/user/hive/warehouse</value>
	</property>
	  
	<!--資源臨時文件存放位置 -->
	<property>
		<name>hive.downloaded.resources.dir</name>
		<value>/usr/local/hive/tmp/${hive.session.id}_resources</value>
	</property>
	<!-- Hive在0.9版本之前需要設置hive.exec.dynamic.partition爲true, Hive在0.9版本之後默認爲true -->
	<property>
		<name>hive.exec.dynamic.partition</name>
		<value>true</value>
	</property>
	<property>
		<name>hive.exec.dynamic.partition.mode</name>
		<value>nonstrict</value>
	</property>
	<!-- 修改日誌位置 -->
	<property>
		<name>hive.exec.local.scratchdir</name>
		<value>/usr/local/hive/tmp/HiveJobsLog</value>
	</property>
	<property>
		<name>hive.downloaded.resources.dir</name>
		<value>/usr/local/hive/tmp/ResourcesLog</value>
	</property>
	<property>
		<name>hive.querylog.location</name>
		<value>/usr/local/hive/tmp/HiveRunLog</value>
	</property>
	<property>
		<name>hive.server2.logging.operation.log.location</name>
		<value>/usr/local/hive/tmp/OperationLogs</value>
	</property>
	<!-- 配置HWI接口 -->
	<property>
		<name>hive.hwi.war.file</name>
		<value>${env:HWI_WAR_FILE}</value>
	</property>
	<property>
		<name>hive.hwi.listen.host</name>
		<value>0.0.0.0</value>
	  </property>
	<property>
		<name>hive.hwi.listen.port</name>
		<value>9999</value>
	</property>
	  
	<!-- Hiveserver2已經不再需要hive.metastore.local這個配置項了(hive.metastore.uris爲空,則表示是metastore在本地,否則就是遠程)遠程的話直接配置hive.metastore.uris即可 -->
	<!-- property>
		<name>hive.metastore.uris</name>
		<value>thrift://m1:9083</value>
		<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
	</property --> 
	<property>
		<name>hive.server2.thrift.bind.host</name>
		<value>hadoop1</value>
	  </property>
	<property>
		<name>hive.server2.thrift.port</name>
		<value>10000</value>
	  </property>
	<property>
		<name>hive.server2.thrift.http.port</name>
		<value>10001</value>
	</property>
	<property>
		<name>hive.server2.thrift.http.path</name>
		<value>cliservice</value>
	</property>
	<!-- HiveServer2的WEB UI -->
	<property>
		<name>hive.server2.webui.host</name>
		<value>0.0.0.0</value>
	</property>
	<property>
		<name>hive.server2.webui.port</name>
		<value>10002</value>
	</property>
	<property>
		<name>hive.scratch.dir.permission</name>
		<value>755</value>
	  </property>
	<!-- 下面hive.aux.jars.path這個屬性裏面你這個jar包地址如果是本地的記住前面要加file://不然找不到, 而且會報org.apache.hadoop.hive.contrib.serde2.RegexSerDe錯誤 -->
	  <property>
		<name>hive.aux.jars.path</name>
		<value/>
	  </property>
	<property>
		<name>hive.server2.enable.doAs</name>
		<value>true</value>
	  </property>
	<property>
		<name>hive.auto.convert.join</name>
		<value>true</value>
	  </property>
	<property>
		<name>spark.dynamicAllocation.enabled</name>
		<value>true</value>
		<description>動態分配資源</description>  
	</property>
	<!-- 使用Hive on spark時,若不設置下列該配置會出現內存溢出異常 -->
	<property>
		<name>spark.driver.extraJavaOptions</name>
		<value>-XX:PermSize=128M -XX:MaxPermSize=512M</value>
	</property>
	<property>
		<name>datanucleus.autoCreateSchema</name>
		<value>true</value>
	</property>
	<property>
		<name>datanucleus.autoCreateTables</name>
		<value>true</value>
	</property>
	<property>
		<name>datanucleus.autoCreateColumns</name>
		<value>true</value>
	</property>
</configuration>

hive的配置感覺是最複雜的了,上面使用的是mysql作爲元數據管理,如果用centos7的話,系統默認自帶mariadb數據庫跟mysql是一樣的。


Redis4.0.1

安裝:

$ wget http://download.redis.io/releases/redis-4.0.1.tar.gz
$ tar xzf redis-4.0.1.tar.gz
$ cd redis-4.0.1
$ make

make成功後,執行

$ src/redis-server

測試:

$ src/redis-cli
redis> set foo bar
OK
redis> get foo
"bar"

如果make不成功,可以參考README.md,使用

make MALLOC=libc 

再編譯一次,默認使用

make MALLOC=jemalloc

VirtualBox 磁盤複製

虛擬機做好一個後,複製一下就會得到另一臺虛擬機,但是有時候,並不是通過VirtualBox界面工具複製的,直接手動複製粘貼,這樣這個虛擬機是啓動不來的,所以需要如下方法:

cmd至virtualBox運行目錄後,執行

VBoxManage.exe internalcommands sethduuid G:\vbox\xxx.vdi

將修改VDI的UUID
修改成功提示UUID changed to: 428079cd-830d-49b1-bfde-feac051b4d3e

run VBoxManage internalcommands sethduuid <VDI/VMDK file> twice (the first time is just to conveniently generate an UUID, you could use any other UUID generation method instead)
open the .vbox file in a text editor
replace the UUID found in <Machine uuid="{...}" with the UUID you got when you ran sethduuid the first time
replace the UUID found in <HardDisk uuid="{...}" and in <Image uuid="{}" (towards the end) with the UUID you got when you ran sethduuid the second time

Spark

spark-env.sh

export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/usr/local/jdk
export SPARK_MASTER_IP=hadoop1
export SPARK_WORKER_MEMORY=4G
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

我這裏虛擬機的內存是8G的。


設置靜態IP地址

[hadoop@zookeeper network-scripts]$ cat ifcfg-enp0s3
TYPE="Ethernet"
#BOOTPROTO="dhcp"
BOOTPROTO="static"
IPADDR=192.168.0.10
NETMASK=255.255.255.0
DEFROUTE="yes"
PEERDNS="yes"
PEERROUTES="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_PEERDNS="yes"
IPV6_PEERROUTES="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="enp0s3"
UUID="09f62fa6-36bc-4782-95ad-63fda20b194f"
DEVICE="enp0s3"
ONBOOT="yes"

關閉防火牆

sudo systemctl stop firewalld.service
sudo systemctl disable firewalld.service
sudo systemctl status firewalld.service

啓動一個服務:systemctl start firewalld.service
關閉一個服務:systemctl stop firewalld.service
重啓一個服務:systemctl restart firewalld.service
顯示一個服務的狀態:systemctl status firewalld.service
在開機時啓用一個服務:systemctl enable firewalld.service
在開機時禁用一個服務:systemctl disable firewalld.service
查看服務是否開機啓動:systemctl is-enabled firewalld.service
查看已啓動的服務列表:systemctl list-unit-files|grep enabled

從VirtulBox轉VMware磁盤

vmkload_mod multiextent
vmkfstools -i hadoop3-disk1.vmdk hadoop3-disk2.vmdk -d thin 
vmkfstools -U hadoop3-disk1.vmdk 
vmkfstools -E hadoop3-disk2.vmdk hadoop3-disk1.vmdk 
vmkload_mod -u multiextent

Hadoop退出安全模式

1. 在HDFS配置文件中修改安全模式閥值

在hdfs-site.xml中設置安全閥值屬性,屬性值默認爲0.999f,如果設爲1則不進行安全檢查

<property>
  <name>dfs.safemode.threshold.pct</name>
  <value>0.999f</value>
  <description>
	Specifies the percentage of blocks that should satisfy
	the minimal replication requirement defined by dfs.replication.min.
	Values less than or equal to 0 mean not to wait for any particular
	percentage of blocks before exiting safemode.
	Values greater than 1 will make safe mode permanent.
  </description>
</property>

因爲是在配置文件中進行硬修改,不利於管理員操作和修改,因此不推薦此方式

2. 直接在bash輸入指令脫離安全模式(推薦)

在安全模式下輸入指令:

hadoop dfsadmin -safemode leave

即可退出安全模式。

hdfs文件保存到本地

hadoop fs -get [-ignorecrc] [-crc] 複製文件到本地文件系統
hadoop fs -get hdfs://host:port/user/hadoop/file localfile

各模式下運行spark自帶實例SparkPi

2.1 local模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local lib/spark-examples-1.0.0-hadoop2.2.0.jar 

2.2 standalone模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.0.10:7077 lib/spark-examples-1.0.0-hadoop2.2.0.jar 

2.3 on-yarn-cluster模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster lib/spark-examples-1.0.0-hadoop2.2.0.jar

2.4 on-yarn-client模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client lib/spark-examples-1.0.0-hadoop2.2.0.jar

2.5 參考

http://spark.apache.org/docs/latest/submitting-applications.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章