一、安裝Hadoop

配置ssh

配置ssh就是爲了能夠實現免密登錄，這樣方便遠程管理Hadoop並無需登錄密碼在Hadoop集羣上共享文件資源。
如果機子沒有配置ssh的話，在命令終端輸入ssh localhost是需要輸入你的電腦登錄密碼的。配置好ssh後，就無需輸入密碼了。

第一步就是在終端執行" ssh-keygen -t rsa -P ‘’ "，
之後一路enter鍵，當然如果你之前已經執行過這樣的語句，那過程中會提示是否要覆蓋原有的key，輸入y即可。

home:~ root$ ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/root/.ssh/id_rsa): 
/Users/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /Users/root/.ssh/id_rsa.
Your public key has been saved in /Users/root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:5FXIbTsTcgSg4xApj2y5sY+n7+qEPKorhZaNhmioGDk [email protected]
The key's randomart image is:
+---[RSA 2048]----+
|    ..  .o.=o    |
|  . .. .  +.=    |
| . =. o . .+ o   |
|  * .o + .  +    |
|++++  . S    o   |
|EB+.             |
|B*.o             |
|=.o o            |
|*o+*o            |
+----[SHA256]-----+

第二步執行語句" cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys "用於授權你的公鑰到本地可以無需密碼實現登錄。
理論上這時候，你在終端輸入ssh lcoalhost就能夠免密登錄了。

home:~ root$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

報錯信息:

home:~ root$ ssh localhost
ssh: connect to host localhost port 22: Connection refused

解決方法:
選擇系統偏好設置->選擇共享->點擊遠程登錄

home:~ root$ ssh localhost
Last login: Fri Jan 25 17:31:54 2019 from ::1

配置環境變量

vim ~/.bash_profile
添加以下兩句

export HADOOP_HOME=/Users/root/software/hadoop/hadoop3.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

生效環境變量

 source ~/.bash_profile

修改配置文件

1、hadoop2.9/etc/hadoop-env.sh文件

修改或替換以下內容
export JAVA_HOME=${JAVA_HOME}
export HADOOP_HEAPSIZE=2000
exportHADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK-Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

2、配置NameNode主機名與端口

進入hadoop2.9/etc/core-site.xml文件

<configuration>
    <property>
       <name>hadoop.tmp.dir</name>
       <value>/Users/hadoop-2.9/tmp/hadoop-${user.name}</value>(根據情況定義當前目錄)
       <description>A base for other temporary directories.</description>
    </property>
    <property>
       <name>fs.default.name</name>
       <value>hdfs://localhost:9000</value>
    </property>
</configuration>

3、配置HDFS的默認參數副本數

進入hadoop2.9/etc/hdfs-site.xml文件

<configuration>
    <property>
       <name>dfs.replication</name>
       <value>1</value>
    </property>
</configuration>

4、配置JobTracker主機名與端口

進入hadoop2.9/etc/mapred-site.xml文件

<configuration>
    <property>
       <name>mapred.job.tracker</name>
       <value>hdfs://localhost:9000</value>
    </property>
    <property>
       <name>mapred.tasktracker.map.tasks.maximum</name>
       <value>2</value>
    </property>
    <property>
       <name>mapred.tasktracker.reduce.tasks.maximum</name>
       <value>2</value>
    </property>
</configuration>

注：如果mapred-site.xml文件不存在，需要自己創建（複製mapred-site.xml.template文件對後綴名進行修改）

5、進入hadoop2.9/etc/yarn-site.xml文件

<configuration>
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>
</configuration>

6、格式化文件系統

進入hadoop2.9文件夾，用如下命令格式化：
bin/hdfs namenode -format (指定其安裝目錄的路徑)

出現如下，說明成功

2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2019-01-25 17:58:32,570 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2019-01-25 17:58:32,573 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2019-01-25 17:58:32,573 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2019-01-25 17:58:32,575 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2019-01-25 17:58:32,575 INFO util.GSet: VM type       = 64-bit
2019-01-25 17:58:32,576 INFO util.GSet: 0.029999999329447746% max memory 3.6 GB = 1.1 MB
2019-01-25 17:58:32,576 INFO util.GSet: capacity      = 2^17 = 131072 entries
2019-01-25 17:58:32,613 INFO namenode.FSImage: Allocated new BlockPoolId: BP-137592425-192.168.11.67-1548410312604
2019-01-25 17:58:32,629 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
2019-01-25 17:58:32,642 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-root/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2019-01-25 17:58:32,760 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-root/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 409 bytes saved in 0 seconds .
2019-01-25 17:58:32,776 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-01-25 17:58:32,781 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at xlc-2.local/192.168.11.67
************************************************************/

7、啓動NameNode和DataNode的守護進程。

sbin/start-dfs.sh

8、啓動ResourceManager和NodeManager的守護進程。

sbin/start-yarn.sh
直接啓動所有服務
sbin/start-all.sh

10、驗證hadoop

http://localhost:50070 打開能進入hdfs管理頁面，表示啓動成功。
http://localhost:8088 打開能進入hadoop進程管理頁面，表示啓動成功。

問題:
無法訪問HDFS http://localhost:50070/
yarn訪問地址正常: http://localhost:8088
解決辦法:
直接使用hadoop 2.9版本安裝後可正常訪問。

二、安裝hive3.1

1、下載

在安裝hive之前需要安裝mysql,由於本機已經安裝了mysql,所以省略。
下載地址:https://hive.apache.org/downloads.html
當前下載爲apache-hive-3.1.1-bin.tar.gz,解壓後重命名爲Hadoop目錄，將遷移到hadoop安裝目錄下。

2、配置系統環境變量

vim ~/.bash_profile
export HIVE_HOME=/usr/hadoop/hadoop2.9/hive(注：按自己路徑修改)
export PATH= $PATH:$ HIVE_HOME/bin:$HIVE_HOME/conf

3、修改Hive配置文檔

1)、進入/usr/hadoop/hadoop2.9/hive/conf，新建文件hive-site.xml

cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-default.xml
cp hive-site.xml.template hive-site.xml
cp hive-log4j.properties.template hive-log4j.properties
cp hive-exec-log4j.properties.template hive-exec-log4j.properties

2)、添加hive-site.xml內容

<configuration>
    <property>
        <name>hive.metastore.local</name>
        <value>true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&useSSL=false</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>111111</value>
    </property>
</configuration>

3)、修改hive-env.sh內容

HADOOP_HOME=/usr/hadoop/hadoop2.9
export HIVE_CONF_DIR=/usr/hadoop/hadoop2.9/hive/conf
將java項目中使用的mysql-connector-java.jar移至hive/lib目錄下面。
mysql-connector-java-5.1.46.jar

4、啓動hive

1)、如果是第一次啓動Hive，則需要先執行如下初始化命令：
schematool -dbType mysql -initSchema

XLC-2:bin xianglingchuan$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&useSSL=false
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException : Could not create connection to database server. Attempted reconnect 3 times. Giving up.
SQL Error code: 0
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

啓動hive

XLC-2:bin xianglingchuan$ bin/hive
19/01/26 23:37:57 DEBUG util.VersionInfo: version: 2.9.2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/xianglingchuan/software/hadoop/hadoop2.9/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 96860f82-17bb-48ca-9475-700e2ebffc6f

Logging initialized using configuration in file:/Users/xianglingchuan/software/hadoop/hadoop2.9/hive3.1/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

問題一: 啓動及ssh localhost時提示"Bad configuration option: usekeychain"

home:sbin root$ ./start-all.sh 
WARNING: Attempting to start all Apache Hadoop daemons as xianglingchuan in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: /Users/root/.ssh/config: line 8: Bad configuration option: usekeychain

home:sbin root$ ssh localhost
/Users/root/.ssh/config: line 8: Bad configuration option: usekeychain
/Users/root/.ssh/config: line 21: Bad configuration option: usekeychain
/Users/root/.ssh/config: line 30: Bad configuration option: usekeychain
/Users/root/.ssh/config: terminating, 3 bad configuration options

解決辦法:
直接將.ssh/config文件內容清空，重新生成key

問題二、hadoop 2.9版本報錯:

```

home:sbin root$ sbin/start-dfs.sh
19/01/26 17:36:51 DEBUG util.Shell: setsid is not available on this machine. So not using it.
19/01/26 17:36:51 DEBUG util.Shell: setsid exited with exit code 0
19/01/26 17:36:51 ERROR conf.Configuration: error parsing conf core-site.xml
com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 start byte 0xa0 (at char #766, byte #37)

解決辦法: 
    將core-site.xml新加入的property節點去空格及驗證編碼格式。



# 常用信息
## Hadoop開啓關閉調試信息  
   開啓：export HADOOP_ROOT_LOGGER=DEBUG,console  
   關閉：export HADOOP_ROOT_LOGGER=INFO,console

## 各配置文件作用說明
core-site.xml  配置Service的URI地址、Hadoop集羣臨時目錄等信息
hdfs-site.xml  配置Hadoop集羣的HDFS別名、通信地址、端口等信息
map-site.xml  計算框架資源管理名稱、歷史任務訪問地址等信息(2.9爲mapred-site.xml)
yarn-site.xml  配置資源管理器的相關內容
fair-scheduler.xml Hadoop FairScheduler調度策略配置文件

Mac下Hadoop單機模式安裝