一、安裝Hadoop
配置ssh
配置ssh就是爲了能夠實現免密登錄,這樣方便遠程管理Hadoop並無需登錄密碼在Hadoop集羣上共享文件資源。
如果機子沒有配置ssh的話,在命令終端輸入ssh localhost是需要輸入你的電腦登錄密碼的。配置好ssh後,就無需輸入密碼了。
第一步就是在終端執行" ssh-keygen -t rsa -P ‘’ ",
之後一路enter鍵,當然如果你之前已經執行過這樣的語句,那過程中會提示是否要覆蓋原有的key,輸入y即可。
home:~ root$ ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/root/.ssh/id_rsa):
/Users/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /Users/root/.ssh/id_rsa.
Your public key has been saved in /Users/root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:5FXIbTsTcgSg4xApj2y5sY+n7+qEPKorhZaNhmioGDk [email protected]
The key's randomart image is:
+---[RSA 2048]----+
| .. .o.=o |
| . .. . +.= |
| . =. o . .+ o |
| * .o + . + |
|++++ . S o |
|EB+. |
|B*.o |
|=.o o |
|*o+*o |
+----[SHA256]-----+
第二步執行語句" cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys "用於授權你的公鑰到本地可以無需密碼實現登錄。
理論上這時候,你在終端輸入ssh lcoalhost就能夠免密登錄了。
home:~ root$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
報錯信息:
home:~ root$ ssh localhost
ssh: connect to host localhost port 22: Connection refused
解決方法:
選擇系統偏好設置->選擇共享->點擊遠程登錄
home:~ root$ ssh localhost
Last login: Fri Jan 25 17:31:54 2019 from ::1
配置環境變量
vim ~/.bash_profile
添加以下兩句
export HADOOP_HOME=/Users/root/software/hadoop/hadoop3.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
生效環境變量
source ~/.bash_profile
修改配置文件
1、hadoop2.9/etc/hadoop-env.sh文件
修改或替換以下內容
export JAVA_HOME=${JAVA_HOME}
export HADOOP_HEAPSIZE=2000
exportHADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK-Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
2、配置NameNode主機名與端口
進入hadoop2.9/etc/core-site.xml文件
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/hadoop-2.9/tmp/hadoop-${user.name}</value>(根據情況定義當前目錄)
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
3、配置HDFS的默認參數副本數
進入hadoop2.9/etc/hdfs-site.xml文件
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
4、配置JobTracker主機名與端口
進入hadoop2.9/etc/mapred-site.xml文件
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
</configuration>
注:如果mapred-site.xml文件不存在,需要自己創建(複製mapred-site.xml.template文件對後綴名進行修改)
5、進入hadoop2.9/etc/yarn-site.xml文件
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
6、格式化文件系統
進入hadoop2.9文件夾,用如下命令格式化:
bin/hdfs namenode -format (指定其安裝目錄的路徑)
出現如下,說明成功
2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2019-01-25 17:58:32,573 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2019-01-25 17:58:32,573 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2019-01-25 17:58:32,575 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2019-01-25 17:58:32,575 INFO util.GSet: VM type = 64-bit
2019-01-25 17:58:32,576 INFO util.GSet: 0.029999999329447746% max memory 3.6 GB = 1.1 MB
2019-01-25 17:58:32,576 INFO util.GSet: capacity = 2^17 = 131072 entries
2019-01-25 17:58:32,613 INFO namenode.FSImage: Allocated new BlockPoolId: BP-137592425-192.168.11.67-1548410312604
2019-01-25 17:58:32,629 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
2019-01-25 17:58:32,642 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-root/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2019-01-25 17:58:32,760 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-root/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 409 bytes saved in 0 seconds .
2019-01-25 17:58:32,776 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-01-25 17:58:32,781 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at xlc-2.local/192.168.11.67
************************************************************/
7、啓動NameNode和DataNode的守護進程。
sbin/start-dfs.sh
8、啓動ResourceManager和NodeManager的守護進程。
sbin/start-yarn.sh
直接啓動所有服務
sbin/start-all.sh
10、驗證hadoop
http://localhost:50070 打開能進入hdfs管理頁面,表示啓動成功。
http://localhost:8088 打開能進入hadoop進程管理頁面,表示啓動成功。
問題:
無法訪問HDFS http://localhost:50070/
yarn訪問地址正常: http://localhost:8088
解決辦法:
直接使用hadoop 2.9版本安裝後可正常訪問。
二、安裝hive3.1
1、下載
在安裝hive之前需要安裝mysql,由於本機已經安裝了mysql,所以省略。
下載地址:https://hive.apache.org/downloads.html
當前下載爲apache-hive-3.1.1-bin.tar.gz,解壓後重命名爲Hadoop目錄 ,將遷移到hadoop安裝目錄下。
2、配置系統環境變量
vim ~/.bash_profile
export HIVE_HOME=/usr/hadoop/hadoop2.9/hive(注:按自己路徑修改)
export PATH=HIVE_HOME/bin:$HIVE_HOME/conf
3、修改Hive配置文檔
1)、進入/usr/hadoop/hadoop2.9/hive/conf,新建文件hive-site.xml
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-default.xml
cp hive-site.xml.template hive-site.xml
cp hive-log4j.properties.template hive-log4j.properties
cp hive-exec-log4j.properties.template hive-exec-log4j.properties
2)、添加hive-site.xml內容
<configuration>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>111111</value>
</property>
</configuration>
3)、修改hive-env.sh內容
HADOOP_HOME=/usr/hadoop/hadoop2.9
export HIVE_CONF_DIR=/usr/hadoop/hadoop2.9/hive/conf
將java項目中使用的mysql-connector-java.jar移至hive/lib目錄下面。
mysql-connector-java-5.1.46.jar
4、啓動hive
1)、 如果是第一次啓動Hive,則需要先執行如下初始化命令:
schematool -dbType mysql -initSchema
XLC-2:bin xianglingchuan$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&useSSL=false
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException : Could not create connection to database server. Attempted reconnect 3 times. Giving up.
SQL Error code: 0
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
啓動hive
XLC-2:bin xianglingchuan$ bin/hive
19/01/26 23:37:57 DEBUG util.VersionInfo: version: 2.9.2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 96860f82-17bb-48ca-9475-700e2ebffc6f
Logging initialized using configuration in file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
問題一: 啓動及ssh localhost時提示"Bad configuration option: usekeychain"
home:sbin root$ ./start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as xianglingchuan in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: /Users/root/.ssh/config: line 8: Bad configuration option: usekeychain
home:sbin root$ ssh localhost
/Users/root/.ssh/config: line 8: Bad configuration option: usekeychain
/Users/root/.ssh/config: line 21: Bad configuration option: usekeychain
/Users/root/.ssh/config: line 30: Bad configuration option: usekeychain
/Users/root/.ssh/config: terminating, 3 bad configuration options
解決辦法:
直接將.ssh/config文件內容清空,重新生成key
問題二、hadoop 2.9版本報錯:
```
home:sbin root$ sbin/start-dfs.sh
19/01/26 17:36:51 DEBUG util.Shell: setsid is not available on this machine. So not using it.
19/01/26 17:36:51 DEBUG util.Shell: setsid exited with exit code 0
19/01/26 17:36:51 ERROR conf.Configuration: error parsing conf core-site.xml
com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 start byte 0xa0 (at char #766, byte #37)
解決辦法:
將core-site.xml新加入的property節點去空格及驗證編碼格式。
# 常用信息
## Hadoop開啓關閉調試信息
開啓:export HADOOP_ROOT_LOGGER=DEBUG,console
關閉:export HADOOP_ROOT_LOGGER=INFO,console
## 各配置文件作用說明
core-site.xml 配置Service的URI地址、Hadoop集羣臨時目錄等信息
hdfs-site.xml 配置Hadoop集羣的HDFS別名、通信地址、端口等信息
map-site.xml 計算框架資源管理名稱、歷史任務訪問地址等信息(2.9爲mapred-site.xml)
yarn-site.xml 配置資源管理器的相關內容
fair-scheduler.xml Hadoop FairScheduler調度策略配置文件