windows搭建cygwin、hadoop以及和eclipse集成

整个过程参考了以下文章:

1、http://cw550284.iteye.com/blog/1064844

2、http://lirenjuan.iteye.com/blog/1280729

大家也知道map reduce程序调试是一个很困难的事情,还好有cygwin这个好用的工具,以及eclipse相应的插件,真是帮了我们大忙啦!嗯,下面我总结一下我的安装和配置过程:

一、cygwin的安装

这个没有什么好说的了,从cygwin的官网上下载安装文件,在线安装即可,下载地址:http://cygwin.com/install.html

cygwin安装完成后,配置环境变量:CYGWIN_HOME,并将%CYGWIN_HOME%\bin配置在PATH中。

二、hadoop的安装和配置

首先要建立ssh无密码访问,这个步骤在上面的第一篇文章中有说明,这里就说说我在建立ssh授权的是时候遇到的问题吧!

在建立是时候无密码访问时可能会出现错误,但是无法定位到问题所在,可以打开ssh的debug模式:
ssh -vv localhost
问题1:

ssh服务没有启动
解决办法:重启sshd服务 cygrunsrv -S sshd


问题2:

ssh: Permission denied
Problem: you can't login to your account. You set the password using `passwd`, but it still gives you this error.
Solution: The problem is that sometime Cygwin does not create your local user in the /etc/passwd file. The solution is simple:
mkpasswd.exe -c > /etc/passwd
Now you should see your Windows user in the passwd file. Now use the `passwd` command to give yourself a password for the Cygwin user. This is not the same as the Windows user.

解决办法:mkpasswd.exe -c > /etc/passwd生成密码文件

ssh无密码访问建立完成,接下来安装hadoop,具体步骤和linux下一样:

1、修改hadoop-env.sh的配置;

2、修改core-site.xml的配置;

3、修改mapred-site.xml的配置;

4、配置环境变量:HADOOP_HOME,并将%HADOOP_HOME%\bin配置在PATH中;

5、启动hadoop,并验证。

三、eclipse集成

首先下载并安装eclipse的插件:hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar

如果用eclipse的话,按照以下步骤进行安装:

安装hadoop-eclipse-plugin
a、在eclipse的安装目录下新建文件夹:links
b、新建链接文件,hadoop.link ,内容为:path=E:\\eclipsePlugins\\hadoop
c、在path目录下新建文件夹:plugins,并把hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar放在该目录下,即:E:\eclipsePlugins\hadoop\plugins,hadoop-0.20.2的插件一定要用这个,如果用本身自带的0.20.2插件的话,eclipse调试时无法弹出Run on Hadoop
d、删除E:\Program Files\eclipse\configuration下的org.eclipse.update文件夹

笔者使用的是sprinsource发布的eclipse工具,安装hadoop-eclipse-plugin插件比较简单,直接将hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar放在sts-2.3.2.RELEASE\plugins下,重新启动eclipse就好了。

一切都部署完成,运行一个map reduce任务试试吧!这期间我遇到过以下问题:

问题1:

执行map任务时出现:
12/03/07 14:56:13 INFO mapred.JobClient: Task Id : attempt_201203071039_0011_m_000001_2, Status : FAILED
java.io.FileNotFoundException: File E:/cygdrive/e/data/tmp/mapred/local/taskTracker/jobcache/job_201203071039_0011/attempt_201203071039_0011_m_000001_2/work/tmp does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
	at org.apache.hadoop.mapred.Child.main(Child.java:155)
解决办法:map过程中需要一些临时文件来存放map的结果。错误的原因在于找不到该临时文件。将mapred-site.xml配置文件中的配置mapred.child.tmp改为一个绝对路径,如下:
<property>
  <name>mapred.child.tmp</name>
  <value>E:\Apache\Hadoop\Run\tmp</value>
  <description> To set the value of tmp directory for map and reduce tasks.
  If the value is an absolute path, it is directly assigned. Otherwise, it is
  prepended with task's working directory. The java tasks are executed with
  option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and
  streaming are set with environment variable,
   TMPDIR='the absolute path of the tmp dir'
  </description>
</property>
问题2:

执行map reduce任务时出现:
java.lang.IllegalArgumentException: Can't read partitions file
       at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
       at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
       at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
       at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:560)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
       at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383)
       at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
       at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:776)
       at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
       at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
       at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:296)
解决办法:请参考http://hbase.apache.org/book/trouble.mapreduce.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章