hadoop

java
/etc/profile
export JAVA_HOME=/usr/local/share/jdk  
export JRE_HOME=${JAVA_HOME}/jre  
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  
export PATH=${JAVA_HOME}/bin:$PATH


ssh
rsync


HDFS NameNode, SecondaryNameNode, and DataNode
YARN ResourceManager, NodeManager, and WebAppProxy
MapReduce MapReduce Job History Server


read-only default configuration - core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml
Site-specific configuration - etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml
you can control the Hadoop scripts found in the bin/ directory of the distribution, by setting site-specific values via the etc/hadoop/hadoop-env.sh and etc/hadoop/yarn-env.sh.
To configure the Hadoop cluster you will need to configure the environment in which the Hadoop daemons execute as well as the configuration parameters for the Hadoop daemons.
HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN damones are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be used, then the MapReduce Job History Server will also be running. For large installations, these are generally running on separate hosts.




dfs.namenode.http-address 0.0.0.0:50070
dfs.datanode.http.address 0.0.0.0:50075
dfs.namenode.secondary.http-address 0.0.0.0:50090
mapreduce.jobtracker.http.address  0.0.0.0:50030
mapreduce.tasktracker.http.address 0.0.0.0:50060
mapreduce.jobhistory.address      0.0.0.0:10020
mapreduce.jobhistory.webapp.address 0.0.0.0:19888
yarn.resourcemanager.address            ${yarn.resourcemanager.hostname}:8032
yarn.nodemanager.address                ${yarn.nodemanager.hostname}:0
yarn.resourcemanager.scheduler.address  ${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.webapp.address     ${yarn.resourcemanager.hostname}:8088
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031


NameNode http://nn_host:port/ Default HTTP port is 50070. dfs.namenode.http-address
ResourceManager http://rm_host:port/ Default HTTP port is 8088.  yarn.resourcemanager.webapp.address
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888. mapreduce.jobhistory.webapp.address


1.core-default.xml
fs.defaultFS                  file:///
hadoop.tmp.dir                /tmp/hadoop-${user.name}
io.file.buffer.size           4096
2.hdfs-default.xml
dfs.namenode.replication.min  1
dfs.replication.max           512
dfs.replication               3
dfs.blocksize                 134217728
dfs.namenode.name.dir         file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir         file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir   file://${hadoop.tmp.dir}/dfs/namesecondary
dfs.webhdfs.enabled           true
dfs.namenode.handler.count    10
3.yarn-default.xml
yarn.resourcemanager.hostname   0.0.0.0
yarn.nodemanager.hostname      0.0.0.0
yarn.resourcemanager.scheduler.address  ${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031
yarn.resourcemanager.address  ${yarn.resourcemanager.hostname}:8032
yarn.resourcemanager.admin.address  ${yarn.resourcemanager.hostname}:8033
yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:8088
yarn.nodemanager.webapp.address     ${yarn.nodemanager.hostname}:8042
yarn.nodemanager.address      ${yarn.nodemanager.hostname}:0
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler
yarn.nodemanager.aux-services {A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers}
yarn.acl.enable               false
yarn.admin.acl                *
yarn.log-aggregation-enable   false
yarn.nodemanager.resource.memory-mb     8192
yarn.nodemanager.resource.cpu-vcores    8
4.mapred-default.xml
mapreduce.map.memory.mb
mapreduce.map.cpu.vcores
mapreduce.reduce.memory.mb
mapreduce.reduce.cpu.vcores
mapreduce.jobhistory.address  0.0.0.0:10020
mapreduce.jobhistory.webapp.address 0.0.0.0:19888
mapreduce.framework.name      local {The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.}


To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.
The first time you bring up HDFS, it must be formatted. Format a new distributed filesystem as hdfs:
Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] [--script script] (start|stop) <hadoop-command> <args...>
$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
$HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver


$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
$HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver


If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:
$HADOOP_PREFIX/sbin/start-dfs.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn:
$HADOOP_PREFIX/sbin/start-yarn.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs:
$HADOOP_PREFIX/sbin/stop-dfs.sh
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn:
$HADOOP_PREFIX/sbin/stop-yarn.sh


bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.


bin/hdfs
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  classpath            prints the classpath
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
                                                Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      list/get/set block storage policies
  version              print the version
Most commands print help when invoked w/o parameters.


hadoop-env.sh、yarn-env.sh
JAVA_HOME
# set to the root of your Java installation
  export JAVA_HOME=/usr/java/latest
  export HADOOP_HOME=/opt/hadoop
  export PATH=$HADOOP_HOME/bin:$PATH
  export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  
etc/hadoop/slaves
jps
  
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar 
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
  
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章