首先是 n 台 建立好的虚拟机, 我用的是3台.
在虚拟机上安装 好java
修改etc/hosts 文件
192.168.1.111 h1
192.168.1.112 h2
192.168.1.113 h3
- 192.168.1.111 h1
- 192.168.1.112 h2
- 192.168.1.113 h3
- # Do not remove the following line, or various programs
- # that require network functionality will fail.
- 127.0.0.1 localhost.localdomain localhost
- ::1 localhost6.localdomain6 localhost6
添加 hadoop 运行账号
添加ssh 密钥
ssh-keygen -t rsa
cp id_rsa.pub authorized_keys
配置hadoop-env.sh
- # The java implementation to use. Required.
- export JAVA_HOME=/usr/java/jdk1.7.0_15
然后是 3大配置文件
core-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://h1:9000</value>
- </property>
- <!--临时文件存放路径-->
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/root/hadoop/tmp</value>
- </property>
- <!--垃圾回收 如果是0 的话关闭-->
- <property>
- <name>fs.trash.interval</name>
- <value>1400</value>
- <description>Number of minutes between trash checkpoints.
- If zero, the trash feature is disabled.
- </description>
- </property>
- </configuration>
附加:
fs.trash.interval 垃圾间隔
如果为0 则取消垃圾回收
hdfs-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>dfs.data.dir</name>
- <value>${hadoop.tmp.dir}/dfs/data</value>
- <description>Determines where on the local filesystem an DFS data node
- should store its blocks. If this is a comma-delimited
- list of directories, then data will be stored in all named
- directories, typically on different devices.
- Directories that do not exist are ignored.
- </description>
- </property>
- <!--备份的节点数 一共3台虚拟机 一个是master 两个是 slaves 备份数就是2 -->
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- <description>Default block replication.
- The actual number of replications can be specified when the file is created.
- The default is used if replication is not specified in create time.
- </description>
- </property>
- <!--指定块的百分比,应该满足
- 最小的复制要求定义为最小dfs复制。
- 值小于或等于0的意思不是在安全模式启动。
- 值大于1将使安全模式永久。
- -->
- <property>
- <name>dfs.safemode.threshold.pct</name>
- <value>1</value>
- <description>
- Specifies the percentage of blocks that should satisfy
- the minimal replication requirement defined by dfs.replication.min.
- Values less than or equal to 0 mean not to start in safe mode.
- Values greater than 1 will make safe mode permanent.
- </description>
- </property>
- <!--hdfs 的权限 设置, 由于是在win 下边的eclipse开发,为防止权限问题 ,把权限设置为false 关闭掉-->
- <property>
- <name>dfs.permissions</name>
- <value>false</value>
- <description>
- If "true", enable permission checking in HDFS.
- If "false", permission checking is turned off,
- but all other behavior is unchanged.
- Switching from one parameter value to the other does not change the mode,
- owner or group of files or directories.
- </description>
- </property>
- </configuration>
mapred-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>h1:9001</value>
- </property>
- </configuration>
配置 master 和slaves
master
- h1
slaves
- h2
- h3
复制到 其他节点
scp -r ./hadoop h1:~
然后格式化 节点
hadoop namenode -format
注意事项:
如果没有吧 hadoop 添加到环境变量中 ,需要在 hadoop /bin 中执行 命令;
防火墙最好关掉,否则会连接不上
iptables 防火墙的设置
查看防火墙状态:sudo service iptables status
暂时关闭防火墙:sudo service iptables stop
禁止防火墙在开机时启动chkconfig iptables off
设置防火墙在开机时启动chkconfig iptables on