前面幾篇都是在Hadoop環境中,使用Hadoop工具進行MapReduce計算。本篇介紹在Java應用中如何利用Hadoop服務進行MapReduce計算。
一、安裝配置Hadoop
1、解壓Hadoop
$tar zxvf hadoop-1.2.1-bin.tar.gz -C /usr/local/app/hadoop
2、配置Hadoop環境
修改/etc/profile信息:
export JAVA_HOME=/usr/local/app/jdk1.6.0_45
export JRE_HOME=$JAVA_HOME/jre
export HADOOP_HOME=/usr/local/app/hadoop/hadoop-1.2.1
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
更新系統環境配置:
$source /etc/profile
3、配置Hadoop服務
配置$HADOOP_HOME/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.242.128:9000</value>//建議配置成機器域名或ip
<final>true</final>
</property><pre name="code" class="html"> <property>
<name>hadoop.tmp.dir</name>
<value>/home/guzicheng/hadoop/tmp</value>
</property>
</configuration>
配置$HADOOP_HOME/conf/hdfs-site.xml
<configuration><pre name="code" class="html"><pre name="code" class="html"> <property>
<name>dfs.name.dir</name>
<value>/home/guzicheng/hadoop/hdfs/name</value>
<final>true</final>
</property>
<pre name="code" class="html"> <property>
<name>dfs.data.dir</name>
<value>/home/guzicheng/hadoop/hdfs/data</value>
<final>true</final>
</property><pre name="code" class="html"><pre name="code" class="html"> <property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
配置$HADOOP_HOME/conf/mapred-site.xml<configuration><pre name="code" class="html"><pre name="code" class="html"> <property>
<name>mapred.job.tracker</name>
<value>192.168.242.128:9001</value>
<final>true</final>
</property>
</configuration>
4、啓動Hadoop服務
進入$HADOOP_HOME/bin目錄
$./start-all.sh
啓動過程中會提示輸入系統用戶的密碼,輸入密碼回車即可。
啓動成功後會出現NameNode、SecondaryNameNode、DataNode、JobTracker、TaskTracker這5個進程,可以輸入如下命令查看:
$jps
二、編寫Java客戶端
1、Maven配置
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
2、Java代碼
@Service
public class WordCountServiceImpl implements WordCountService
{
//測試代碼
public static void main(String[] args)
{
try {
new WordCountServiceImpl().wordCount();
} catch (Exception e){
e.printStackTrace();
}
}
public int wordCount() throws Exception {
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://192.168.242.128:9000");//設置hadoop服務地址
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("input"));//設置hdfs輸入路徑,hdfs://192.168.242.128:9000/user/guzicheng/input
FileOutputFormat.setOutputPath(job, new Path("output"));//設置hdfs輸出路徑,hdfs://192.168.242.128:9000/user/guzicheng/output
return job.waitForCompletion(true) ? 0 : 1;
}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context)
throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
this.result.set(sum);
context.write(key, this.result);
}
}
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
{
private static final IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException
{
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
this.word.set(itr.nextToken());
context.write(this.word, one);
}
}
}
}
三、問題彙總
1、連接失敗
異常信息:
java.net.ConnectException: Call to 192.168.242.128/192.168.242.128:9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
……
問題原因:
1)Hadoop服務配置問題
配置Hadoop服務的時候,指定的服務地址爲本機域名或地址(localhost或127.0.0.1),如:
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
<final>true</final>
</property>
2)服務器防火牆
Hadoop服務端口被防火牆阻擋。
解決方法:
1)修改Hadoop服務配置
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.242.128:9000</value>
<final>true</final>
</property>
2)服務器防火牆
打開Linux端口9000(namenode rpc交互端口)、50010(datanode交互端口)、50030(job tracker交互端口)。
修改/etc/sysconfig/iptables
<pre name="code" class="plain">-A INPUT -m state --state NEW -m tcp -p tcp --dport 9000 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50010 -j ACCEPT-A INPUT -m state --state NEW -m tcp -p tcp --dport 50030 -j ACCEPT
重啓防火牆
$service iptables restart
2、文件權限問題
環境:java客戶端運行環境爲windows,hadoop服務運行環境爲linux
異常信息:
java.io.IOException: Failed to set permissions of path: \tmp\……
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
……
問題原因:Windows下的文件權限問題導致,在Linux下不存在
解決方法:註釋掉FileUtil.checkReturnValue()方法,如下
……
private static void checkReturnValue(boolean rv, File p, FsPermission permission)
throws IOException
{
/**
if (!rv)
throw new IOException(new StringBuilder().append("Failed to set permissions of path: ").append(p).append(" to ").
append(String.format("%04o", new Object[] { Short.valueOf(permission.toShort()) })).toString());
**/
}
……
重新編譯打包的hadoop-1.2.1.jar參考: