初識Hadoop
什麼是Hadoop?
Hadoop就是存儲海量數據和分析海量數據的工具,Hadoop是由java語言編寫的,在分佈式服務器集羣上存儲海量數據並運行分佈式分析應用的開源框架,Hadoop是專爲離線和大規模數據分析而設計的,並不適合那種對幾個記錄隨機讀寫的在線事務處理模式
Hadoop的核心(HDFS和MapReduce)
HDFS:爲海量的數據提供了存儲
MapReduce:爲海量的數據提供了計算
Hadoop擅長幹什麼?
1.大數據存儲(分佈式存儲)
2.日誌的處理(擅長日誌分析)
3.ETL:數據抽取到oracle、mysql、mongdb及主流數據庫
4.機器學習(Apache、Mahout)
5.搜索引擎(Hadoop+lucene實現)
6.數據挖掘
它的實際應用有:Flume+Logstash+Kafka+Spark Streaming進行實時日誌處理分析
簡單安裝配置Hadoop羣集
環境:
192.168.100.11(hostname:openstack1)
192.168.100.101(hostname:openstack2)
192.168.100.111(hostname:openstack3)
添加修改hosts文件(三臺都要做)
vim /etc/hosts
192.168.100.11 openstack1
192.168.100.101 openstack2
192.168.100.111 openstack3
創建用戶並添加權限(三臺都要做)
groupadd hadoop && useradd -g hadoop hduser
passwd hduser
chmod 777 /etc/sudoers
vim /etc/sudoers
#添加在root ALL=(ALL)ALL下面
hduser ALL=(ALL) ALL
chmod 440 /etc/sudoers
然後重啓電腦:init 6
在openstack1上做ssh無密登錄發送給其他兩個節點方便後面的配置
ssh-keygen -t rsa
ssh-copy-id [email protected]
ssh-copy-id [email protected]
安裝配置jdk和hadoop(三臺都要執行)
tar zxf jdk-8u91-linux-x64.tar.gz
mv jdk1.8.0_91 /usr/local/jdk1.8
tar zxf hadoop-2.6.1.tar.gz
mv hadoop-2.6.1 /home/hduser/hadoop
vim /etc/profile
export JAVA_HOME=/usr/local/jdk1.8
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}
export HADOOP_HOME=/home/hduser/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
source /etc/profile
配置各個配置文件(只需要配置openstack1就好其他的做了無密碼登錄,直接scp過去就行)
配置文件的目錄在安裝目錄的/etc/hadoop/目錄下(我的目錄是/home/hduser/hadoop/etc/hadoop/)
cd /home/hduser/hadoop/etc/hadoop/
vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8
vim yarn-env.sh
export JAVA_HOME=/usr/local/jdk1.8
vim slaves
openstack2
openstack3
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://openstack1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp</value>
</property>
</configuration>
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>openstack1:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
vim mapred-site.xml
(這個文件安裝之後只有模板文件mapred-site.xml.template,複製改名爲mapred-site.xml就可以使用了)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>openstack1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>openstack1:19888</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-service.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>resourcemanager.address</name>
<value>openstack1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>openstack1:8030</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>openstack1:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>openstack1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>openstack1:8088</value>
</property>
</configuration>
把這7個配置文件傳給其他兩個節點
scp -r /home/hduser/hadoop/etc/hadoop/ root@openstack2:/home/hduser/hadoop/etc/
scp -r /home/hduser/hadoop/etc/hadoop/ root@openstack3:/home/hduser/hadoop/etc/
驗證hadoop是否配置正確
1.格式化NameNode
/home/hduser/hadoop/bin/hdfs namenode -format
2.啓動HDFS
/home/hduser/hadoopsbin/start-dfs.sh
3.jps查看java進程
4.啓動YARN
/home/hduser/hadoop/sbin/start-yarn.sh
補充:可以直接執行/home/hduser/hadoopsbin/start-all.sh就可以同時啓動HDFS和YARN
5.查看羣集狀態
/home/hduser/hadoop/bin/hdfs dfsadmin -report
[root@openstack1 hadoop]# bin/hdfs dfsadmin -report
Configured Capacity: 101838282752 (94.84 GB)
Present Capacity: 101117652992 (94.17 GB)
DFS Remaining: 101117644800 (94.17 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.100.111:50010 (openstack3)
Hostname: openstack3
Decommission Status : Normal
Configured Capacity: 50919141376 (47.42 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 360673280 (343.96 MB)
DFS Remaining: 50558464000 (47.09 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.29%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Jun 03 02:12:45 EDT 2020
Name: 192.168.100.101:50010 (openstack2)
Hostname: openstack2
Decommission Status : Normal
Configured Capacity: 50919141376 (47.42 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 359956480 (343.28 MB)
DFS Remaining: 50559180800 (47.09 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.29%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Jun 03 02:12:45 EDT 2020
也可以訪問網頁查看:http://192.168.101.11:50070