hadoop
java
/etc/profile
export JAVA_HOME=/usr/local/share/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
ssh
rsync
HDFS NameNode, SecondaryNameNode, and DataNode
YARN ResourceManager, NodeManager, and WebAppProxy
MapReduce MapReduce Job History Server
read-only default configuration - core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml
Site-specific configuration - etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml
you can control the Hadoop scripts found in the bin/ directory of the distribution, by setting site-specific values via the etc/hadoop/hadoop-env.sh and etc/hadoop/yarn-env.sh.
To configure the Hadoop cluster you will need to configure the environment in which the Hadoop daemons execute as well as the configuration parameters for the Hadoop daemons.
HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN damones are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be used, then the MapReduce Job History Server will also be running. For large installations, these are generally running on separate hosts.
dfs.namenode.http-address 0.0.0.0:50070
dfs.datanode.http.address 0.0.0.0:50075
dfs.namenode.secondary.http-address 0.0.0.0:50090
mapreduce.jobtracker.http.address 0.0.0.0:50030
mapreduce.tasktracker.http.address 0.0.0.0:50060
mapreduce.jobhistory.address 0.0.0.0:10020
mapreduce.jobhistory.webapp.address 0.0.0.0:19888
yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032
yarn.nodemanager.address ${yarn.nodemanager.hostname}:0
yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:8088
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
NameNode http://nn_host:port/ Default HTTP port is 50070. dfs.namenode.http-address
ResourceManager http://rm_host:port/ Default HTTP port is 8088. yarn.resourcemanager.webapp.address
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888. mapreduce.jobhistory.webapp.address
1.core-default.xml
fs.defaultFS file:///
hadoop.tmp.dir /tmp/hadoop-${user.name}
io.file.buffer.size 4096
2.hdfs-default.xml
dfs.namenode.replication.min 1
dfs.replication.max 512
dfs.replication 3
dfs.blocksize 134217728
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary
dfs.webhdfs.enabled true
dfs.namenode.handler.count 10
3.yarn-default.xml
yarn.resourcemanager.hostname 0.0.0.0
yarn.nodemanager.hostname 0.0.0.0
yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032
yarn.resourcemanager.admin.address ${yarn.resourcemanager.hostname}:8033
yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:8088
yarn.nodemanager.webapp.address ${yarn.nodemanager.hostname}:8042
yarn.nodemanager.address ${yarn.nodemanager.hostname}:0
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler
yarn.nodemanager.aux-services {A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers}
yarn.acl.enable false
yarn.admin.acl *
yarn.log-aggregation-enable false
yarn.nodemanager.resource.memory-mb 8192
yarn.nodemanager.resource.cpu-vcores 8
4.mapred-default.xml
mapreduce.map.memory.mb
mapreduce.map.cpu.vcores
mapreduce.reduce.memory.mb
mapreduce.reduce.cpu.vcores
mapreduce.jobhistory.address 0.0.0.0:10020
mapreduce.jobhistory.webapp.address 0.0.0.0:19888
mapreduce.framework.name local {The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.}
To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.
The first time you bring up HDFS, it must be formatted. Format a new distributed filesystem as hdfs:
Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] [--script script] (start|stop) <hadoop-command> <args...>
$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
$HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
$HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:
$HADOOP_PREFIX/sbin/start-dfs.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn:
$HADOOP_PREFIX/sbin/start-yarn.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs:
$HADOOP_PREFIX/sbin/stop-dfs.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn:
$HADOOP_PREFIX/sbin/stop-yarn.sh
bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
bin/hdfs
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
classpath prints the classpath
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
journalnode run the DFS journalnode
zkfc run the ZK Failover Controller daemon
datanode run a DFS datanode
dfsadmin run a DFS admin client
haadmin run a DFS HA admin client
fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version
Most commands print help when invoked w/o parameters.
hadoop-env.sh、yarn-env.sh
JAVA_HOME
# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
etc/hadoop/slaves
jps
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
/etc/profile
export JAVA_HOME=/usr/local/share/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
ssh
rsync
HDFS NameNode, SecondaryNameNode, and DataNode
YARN ResourceManager, NodeManager, and WebAppProxy
MapReduce MapReduce Job History Server
read-only default configuration - core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml
Site-specific configuration - etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml
you can control the Hadoop scripts found in the bin/ directory of the distribution, by setting site-specific values via the etc/hadoop/hadoop-env.sh and etc/hadoop/yarn-env.sh.
To configure the Hadoop cluster you will need to configure the environment in which the Hadoop daemons execute as well as the configuration parameters for the Hadoop daemons.
HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN damones are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be used, then the MapReduce Job History Server will also be running. For large installations, these are generally running on separate hosts.
dfs.namenode.http-address 0.0.0.0:50070
dfs.datanode.http.address 0.0.0.0:50075
dfs.namenode.secondary.http-address 0.0.0.0:50090
mapreduce.jobtracker.http.address 0.0.0.0:50030
mapreduce.tasktracker.http.address 0.0.0.0:50060
mapreduce.jobhistory.address 0.0.0.0:10020
mapreduce.jobhistory.webapp.address 0.0.0.0:19888
yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032
yarn.nodemanager.address ${yarn.nodemanager.hostname}:0
yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:8088
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
NameNode http://nn_host:port/ Default HTTP port is 50070. dfs.namenode.http-address
ResourceManager http://rm_host:port/ Default HTTP port is 8088. yarn.resourcemanager.webapp.address
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888. mapreduce.jobhistory.webapp.address
1.core-default.xml
fs.defaultFS file:///
hadoop.tmp.dir /tmp/hadoop-${user.name}
io.file.buffer.size 4096
2.hdfs-default.xml
dfs.namenode.replication.min 1
dfs.replication.max 512
dfs.replication 3
dfs.blocksize 134217728
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary
dfs.webhdfs.enabled true
dfs.namenode.handler.count 10
3.yarn-default.xml
yarn.resourcemanager.hostname 0.0.0.0
yarn.nodemanager.hostname 0.0.0.0
yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032
yarn.resourcemanager.admin.address ${yarn.resourcemanager.hostname}:8033
yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:8088
yarn.nodemanager.webapp.address ${yarn.nodemanager.hostname}:8042
yarn.nodemanager.address ${yarn.nodemanager.hostname}:0
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler
yarn.nodemanager.aux-services {A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers}
yarn.acl.enable false
yarn.admin.acl *
yarn.log-aggregation-enable false
yarn.nodemanager.resource.memory-mb 8192
yarn.nodemanager.resource.cpu-vcores 8
4.mapred-default.xml
mapreduce.map.memory.mb
mapreduce.map.cpu.vcores
mapreduce.reduce.memory.mb
mapreduce.reduce.cpu.vcores
mapreduce.jobhistory.address 0.0.0.0:10020
mapreduce.jobhistory.webapp.address 0.0.0.0:19888
mapreduce.framework.name local {The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.}
To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.
The first time you bring up HDFS, it must be formatted. Format a new distributed filesystem as hdfs:
Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] [--script script] (start|stop) <hadoop-command> <args...>
$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
$HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
$HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:
$HADOOP_PREFIX/sbin/start-dfs.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn:
$HADOOP_PREFIX/sbin/start-yarn.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs:
$HADOOP_PREFIX/sbin/stop-dfs.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn:
$HADOOP_PREFIX/sbin/stop-yarn.sh
bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
bin/hdfs
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
classpath prints the classpath
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
journalnode run the DFS journalnode
zkfc run the ZK Failover Controller daemon
datanode run a DFS datanode
dfsadmin run a DFS admin client
haadmin run a DFS HA admin client
fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version
Most commands print help when invoked w/o parameters.
hadoop-env.sh、yarn-env.sh
JAVA_HOME
# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
etc/hadoop/slaves
jps
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.