Linux CentOS7上安裝Hadoop2.9(Hadoop/hive/sqoop)

一、Hadoop安裝配置

Hadoop2.9上傳文件並解壓
在data目錄下新建一個名爲hadoop的目錄(命令:mkdir hadoop),並將下載得到的hadoop-2.9.2.tar.gz上載到該目錄下。

mkdir /data/hadoop
mkdir /data/hadoop/tmp
mkdir /data/hadoop/var
mkdir /data/hadoop/dfs
mkdir /data/hadoop/dfs/name
mkdir /data/hadoop/dfs/data
mkdir /data/hadoop/tmp/dfs/name

配置/etc/profile文件

export HADOOP_HOME=/data/hadoop-2.9.2
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

配置文件修改

core-site.xml

<configuration>
   <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
   </property>
   <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9999</value>
   </property>
</configuration>

hadoop-env.sh

修改JAVA_HOME指定系統安裝目錄

#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/java/jdk1.8.0_144

hdfs-site.xml

<configuration>
<!--指定數據冗餘份數-->
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<!-- 指定可以通過web訪問HDFS目錄-->
<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
<property>
  <name>dfs.http.address</name>
  <value>0.0.0.0:50070</value>
</property>
</configuration>

yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
</configuration>

mapred-site.xml

  <property>
    <name>mapred.job.tracker</name>
    <value>hdfs://localhost:9999</value>
  </property>
  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>2</value>
  </property>
  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>2</value>
  </property>

錯誤信息

啓動namenode錯誤

錯誤日誌:

2019-07-29 20:30:27,717 INFO org.mortbay.log: Stopped [email protected]:50070
2019-07-29 20:30:27,817 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2019-07-29 20:30:27,818 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2019-07-29 20:30:27,818 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2019-07-29 20:30:27,820 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.net.BindException: Problem binding to [izwz938o0q4p4r0l3sljxnz:9000] java.net.BindException: Cannot assign requested address; For more details see:  http://wiki.apache.org/hadoop/BindException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:735)
        at org.apache.hadoop.ipc.Server.bind(Server.java:561)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1037)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:2738)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:958)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:420)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:341)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.<init>(NameNodeRpcServer.java:431)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:803)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:730)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:932)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
Caused by: java.net.BindException: Cannot assign requested address
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at org.apache.hadoop.ipc.Server.bind(Server.java:544)
        ... 13 more
2019-07-29 20:30:27,823 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.net.BindException: Problem binding to [izwz938o0q4p4r0l3sljxnz:9000] java.net.BindException: Cannot assign requested address; For more details see:  http://wiki.apache.org/hadoop/BindException
2019-07-29 20:30:27,825 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 

原因: 9000端口被佔用導致
解決思路:
Hadoop 配置中 50070 端口是 namenode 的默認端口,http://192.168.10.10:50070 拒絕訪問時說明 namenode 沒有成功啓動,或者檢查一下防火牆,但是,我在沒有關閉防火牆的情況下也是可以訪問這個網址的
那麼就進去 shell 命令行,執行 jps 命令,查看 namenode 是否啓動,沒有啓動的話,就進入 hadoop 根目錄的 logs 目錄查看 namenode 的啓動日誌。

提示NameNode is not formatted.錯誤

2019-08-01 16:05:45,793 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/hadoop/tmp/dfs/name/in_use.lock acquired by nodename 21761@localhost
2019-08-01 16:05:45,798 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:236)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1052)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:666)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:728)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:932)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
2019-08-01 16:05:45,806 INFO org.mortbay.log: Stopped [email protected]:50070
2019-08-01 16:05:45,908 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2019-08-01 16:05:45,910 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2019-08-01 16:05:45,910 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2019-08-01 16:05:45,911 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: NameNode is not formatted.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:236)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1052)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:666)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:728)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:932)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
2019-08-01 16:05:45,915 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.IOException: NameNode is not formatted.

可以看到出錯得地方是 NameNode is not formatted.
刪除該目錄下文件
重新格式

如果是首次啓動,在namenode1上運行format命令

    bin/hadoop namenode -format

如果是非首次啓動,則在namenode1上運行以下命令

   bin/hdfs namenode  -initializeSharedEdits

二、hive3.1配置

下載hive3.1.tar包,放到/data/hive3.1目錄下。

配置文件修改

hive-env.sh

export HADOOP_HOME=/data/hadoop-2.9.2
export HIVE_CONF_DIR=/data/hive3.1/conf

hive-site.xml

<configuration>
    <property><name>hive.metastore.local</name><value>true</value></property> 
    <property><name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8&amp;autoReconnect=true&amp;useSSL=false</value>
    </property>
    <property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value></property>
    <property><name>javax.jdo.option.ConnectionUserName</name><value>root</value></property>
    <property><name>javax.jdo.option.ConnectionPassword</name><value>111111</value></property>
 <!-- 顯示錶的列名 -->
 <property><name>hive.cli.print.header</name><value>true</value></property>
 <!-- 顯示數據庫名稱 -->
 <property><name>hive.cli.print.current.db</name><value>true</value></property>
<property>
  <name>hive.metastore.schema.verification</name>
  <value>false</value>
   <description>
   </description>
</property>
</configuration>

初始化hive

cd /data/hive3.1/bin
./schematool -dbType mysql -initSchema

三、sqoop導入mysql數據至hive

下載sqoop源碼包解壓到/data/sqoop

錯誤信息

./sqoop import --connect jdbc:mysql://127.0.0.1:3306/mydata??zeroDateTimeBehavior=CONVERT_TO_NULL --username root --P --table app_h5_start --hive-import --hive-table app_h5_start --bindir ./ -m 1

19/08/01 17:53:19 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
19/08/01 17:53:19 ERROR tool.ImportTool: Import failed: java.io.IOException: Cannot run program "hive": error=2, 沒有那個文件或目錄
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at java.lang.Runtime.exec(Runtime.java:620)
        at java.lang.Runtime.exec(Runtime.java:528)
        at org.apache.sqoop.util.Executor.exec(Executor.java:76)
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:382)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
Caused by: java.io.IOException: error=2, 沒有那個文件或目錄
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 20 more

設置hive環境變量

export HIVE_HOME=/data/hive3.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin

完整從mysql中導入數據方法

./sqoop import --connect jdbc:mysql://127.0.0.1:3306/mydata?zeroDateTimeBehavior=CONVERT_TO_NULL --username root --P --table app_h5_start --hive-import --hive-table app_h5_start --bindir ./ -m 1

四、hive數據統計及導出

計算數據後插入到指定數據表中

insert into june_active(SELECT app_key,device_type,COUNT(DISTINCT register_device_id) AS total FROM app_h5_start GROUP BY app_key,device_type)

hive (default)> insert into june_active(SELECT app_key,device_type,COUNT(DISTINCT register_device_id) AS total FROM app_h5_start GROUP BY app_key,device_type);
Query ID = root_20190801185848_ad61d15a-4448-4cac-86ae-2b750cfee34f
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 18
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
.......

創建hive數據表

create table IF NOT EXISTS default.june_active(app_key string COMMENT 'app_key',device_type string,total string);

hive數據表內容導出

./hive -e "select * from june_active" >> /home/hadoop/output/june_active.txt
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章