centos6.5安裝hadoop2.6.4 原

1、下載

http://hadoop.apache.org/releases.html

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

2、三臺虛擬機

192.168.17.178
192.168.17.179
192.168.17.180

3、刪除centos自帶的jdk,安裝jdk-7u80,三臺機器都執行這一步操作

rpm -qa | grep java
rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
rpm -ivh jdk-7u80-linux-x64.rpm

安裝完成後,配置jdk路徑

vi /etc/profile
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$JAVA_HOME/bin
使修改生效
source /etc/profile   //使修改立即生效

4、設置178 ssh無密碼登錄179,180

178上執行命令

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
scp .ssh/id_dsa.pub [email protected]:/root
scp .ssh/id_dsa.pub [email protected]:/root

179、180執行命令

mkdir .ssh
cat id_dsa.pub >> .ssh/authorized_keys

5、上傳hadoop2.6.4到178機器,並解壓到/usr/local/hadoop

/usr/local/hadoop/etc/hadoop/hadoop-env.sh配置

# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest

# Assuming your installation directory is /usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop

/usr/local/hadoop/etc/hadoop/core-site.xml配置

<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://192.168.17.178:9000</value>
 </property>

  <property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop/tmp</value>
  </property>
 
  <property>
	<name>fs.checkpoint.period</name>
	<value>300</value>
	</property>
  <property>
	<name>fs.checkpoint.dir</name>
	<value>/usr/local/hadoop/dfs/namesecondary</value>
  </property>
</configuration>

/usr/local/hadoop/etc/hadoop/hdfs-site.xml配置

<configuration>
	<property>
		<name>dfs.http.address</name>
		<value>192.168.17.178:50070</value>
	</property>
	<property>
		<name>dfs.secondary.http.address</name>
		<value>192.168.17.178:50090</value>
    </property>
    <property>
		<name>dfs.namenode.name.dir</name>
		<value>/usr/local/hadoop/dfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/usr/local/hadoop/dfs/data</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
    <property>
        <name>dfs.nameservices</name>
        <value>192.168.17.178</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>192.168.17.178:50090</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>

/usr/local/hadoop/etc/hadoop/mapred-site.xml配置

<configuration>
     <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
         <final>true</final>
    </property>

    <property>
        <name>mapreduce.jobtracker.http.address</name>
        <value>192.168.17.178:50030</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>192.168.17.178:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>192.168.17.178:19888</value>
    </property>
        <property>
                <name>mapred.job.tracker</name>
                <value>192.168.17.178:9001</value>
        </property>
</configuration>

/usr/local/hadoop/etc/hadoop/yarm-site.xml配置

<configuration>
 <!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>192.168.17.178</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>192.168.17.178:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>192.168.17.178:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>192.168.17.178:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>192.168.17.178:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.17.178:8088</value>
    </property>

</configuration>

/usr/local/hadoop/etc/hadoop/master(注意master只是配置secondarynamenode節點運行,並不是配置主從)

192.168.17.178

/usr/local/hadoop/etc/hadoop/slaves

192.168.17.178
192.168.17.179
192.168.17.180

6、複製配置好的hadoop2.6.4到179,180機器

scp -r /usr/local/hadoop 192.168.17.179:/usr/local/hadoop
scp -r /usr/local/hadoop 192.168.17.180:/usr/local/hadoop

7、啓動和關閉hadoop

bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/stop-dfs.sh
sbin/start-yarn.sh
sbin/stop-yarn.sh
mr-jobhistory-daemon.sh start historyserver
mr-jobhistory-daemon.sh stop historyserver
sbin/hadoop-daemon.sh start secondarynamenode
sbin/hadoop-daemon.sh stop secondarynamenode

通過下邊三個地址查看hadoop狀態

http://192.168.17.178:8088
http://192.168.17.178:19888
http://192.168.17.178:50070

window下配置hadoop開發環境

hadoop解壓在D:\hadoop\hadoop-2.6.4

1、安裝 Hadoop-Eclipse-Plugin

下載插件https://github.com/winghc/hadoop2x-eclipse-plugin,把hadoop2x-eclipse-plugin-master\release\hadoop-eclipse-plugin-2.6.0.jar(本人使用Eclipse Luna(4.4.2))放到eclipse的plugins文件夾。

打開eclipse,window->preferences->Hadoop Map/Reduce配置hadoop路徑D:\hadoop\hadoop-2.6.4

2、下載hadoop2.6版本的winutils和相關hadoop.dll文件,並放進D:\hadoop\hadoop-2.6.4\bin目錄,解決java.io.IOException: Could not locate executable null \bin\winutils.exe in the Hadoop binaries.報錯.

3、把hadoop.dll放進C:\Windows\System32目錄或eclipse工程項目裏面,解決java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z報錯

如果是放進C:\Windows\System32,這裏有個值得注意的問題,hadoop.dll版本必須與操作系統、jdk相同32位或64位才行,之前有臺64位win7,32位jdk,32位eclipse機器一直報錯就是這原因造成的。

而hadoop.dll放進eclipse工程裏面(與src同一層目錄),只需與jdk版本相同就行。

4、file->new->other->Map/Reduce projectu新建項目工程

運行例子:文章以文件形式保存在hdfs://192.168.17.178:9000/mongo目錄,對所有文章按分詞,並統計出現次數,分詞使用ansj庫。

import java.io.IOException;
import java.util.List;

import org.ansj.domain.Term;
import org.ansj.splitWord.analysis.ToAnalysis;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class NewsWordCount {
	public static class TokenizerMapper extends
			Mapper<Object, Text, Text, IntWritable> {

		private final static IntWritable one = new IntWritable(1);
		private Text word = new Text();

		public void map(Object key, Text value, Context context)
				throws IOException, InterruptedException {
			System.out.println("map value=" + value.toString());
			 List<Term> parse = ToAnalysis.parse(value.toString());
			 System.out.println(parse);
			 for(Term term : parse){
				 String natrue = term.getNatureStr();
				 String name = term.getName();
				 word.set(name + "_" + natrue);
				 context.write(word, one);
			 }
		}
	}

	public static class IntSumReducer extends
			Reducer<Text, IntWritable, Text, IntWritable> {
		private IntWritable result = new IntWritable();

		public void reduce(Text key, Iterable<IntWritable> values,
				Context context) throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable val : values) {
				sum += val.get();
			}
			result.set(sum);
			System.out.println("reduce text=" + key + "   result=" + result);
			context.write(key, result);
		}
	}

	public static void main(String[] args) throws Exception {
		System.setProperty("hadoop.home.dir", "D:/hadoop/hadoop-2.6.4");
		System.setProperty("HADOOP_USER_NAME", "root");
		args = new String[] { "hdfs://192.168.17.178:9000/mongo",
		"hdfs://192.168.17.178:9000/mongo_output" };
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf, "news word count");
		job.setJarByClass(NewsWordCount.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setCombinerClass(IntSumReducer.class);
		job.setReducerClass(IntSumReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

查看結果:

查看結果:
bin/hdfs dfs -cat /mongo_output/part-r-00000


參考文檔

http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/SingleCluster.html

http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/ClusterSetup.html


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章