pig安裝在hadoop僞分佈式節點

本次pig安裝在一個hadoop僞分佈式節點。

Pigyahoo捐獻給apache的一個項目,它是SQL-like語言,是在MapReduce上構建的一種高級查詢語言,把一些運算編譯進MapReduce模型的MapReduce中,並且用戶可以定義自己的功能。

Pig是一個客戶端應用程序,就算你要在Hadoop集羣上運行Pig,也不需要在集羣上裝額外的東西。

 

首先從官網上下載pig安裝包,並上傳到服務器後。使用以下命令解壓:

[hadoop@hadoop1 soft]$ tar -zxvf pig-0.13.0.tar.gz

 

爲了配置方便,簡單可以修改一下解壓後的文件名

[hadoop@hadoop1 ~]$ mv pig-0.13.0 pig2

 

hadoop用戶的.bash_profile中增加pig環境變量

[hadoop@hadoop1 ~]$ cat .bash_profile

# .bash_profile

 

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

     . ~/.bashrc

fi

 

# User specific environment and startup programs

 

PATH=$PATH:$HOME/bin

 

export PATH

export JAVA_HOME=/usr/lib/jvm/java-1.7.0/

export HADOOP_HOME=/home/hadoop/hadoop2

export PIG_HOME=/home/hadoop/pig2

export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/

export ATH=$PATH:$JAVA_HOME/bin/:$HADOOP_HOME/bin:$PIG_HOME/bin

 

[hadoop@hadoop1 ~]$ source .bash_profile

 

 

Pig有兩種模式:

一種是Localmode,也就是本地模式,這種模式下Pig運行在一個JVM裏,訪問的是本地的文件系統,只適合於小規模數據集,一般是用來體驗Pig。而且,它並沒有用到HadoopLocalrunnerPig把查詢轉換爲物理的Plan,然後自己去執行。

在終端下輸入

% pig -x local

就可以進入Local模式了。

還有一種就是Hadoop模式了,這種模式下,Pig才真正的把查詢轉換爲相應的MapReduce Jobs,並提交到Hadoop集羣去運行,集羣可以是真實的分佈式也可以是僞分佈式。

 

 

[hadoop@hadoop1 ~]$ pig

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : LOCAL

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : MAPREDUCE

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Picked MAPREDUCE as the ExecType

2014-09-10 21:04:09,149 [main] INFO  org.apache.pig.Main - Apache Pig version0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58

2014-09-10 21:04:09,150 [main] INFO  org.apache.pig.Main - Logging error messagesto: /home/hadoop/pig2/pig-err.log

2014-09-10 21:04:09,435 [main] INFO  org.apache.pig.impl.util.Utils - Defaultbootup file /home/hadoop/.pigbootup not found

2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address

2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated.Instead, use fs.defaultFS

2014-09-10 21:04:10,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop1:9000

2014-09-10 21:04:10,360 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation- mapred.used.genericoptionsparser is deprecated. Instead, usemapreduce.client.genericoptionsparser.used

2014-09-10 21:04:12,820 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address

2014-09-10 21:04:12,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to map-reduce job tracker at: hadoop1:9001

2014-09-10 21:04:12,831 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation- fs.default.name is deprecated. Instead, use fs.defaultFS

grunt>

 

grunt> help

Commands:

<pig latin statement>; - See thePigLatin manual for details: http://hadoop.apache.org/pig

File system commands:

   fs <fs arguments> - Equivalent to Hadoop dfs command:http://hadoop.apache.org/common/docs/current/hdfs_shell.html

Diagnostic commands:

   describe <alias>[::<alias] - Show the schema for the alias.Inner aliases can be described as A::B.

   explain [-script <pigscript>] [-out <path>] [-brief][-dot|-xml] [-param <param_name>=<param_value>]

       [-param_file <file_name>] [<alias>] - Show the executionplan to compute the alias or for entire script.

       -script - Explain the entire script.

       -out - Store the output into directory rather than print to stdout.

       -brief - Don't expand nested plans (presenting a smaller graph foroverview).

       -dot - Generate the output in .dot format. Default is text format.

       -xml - Generate the output in .xml format. Default is text format.

       -param <param_name - See parameter substitution for details.

       -param_file <file_name> - See parameter substitution for details.

       alias - Alias to explain.

   dump <alias> - Compute the alias and writes the results to stdout.

Utility Commands:

   exec [-param <param_name>=param_value] [-param_file<file_name>] <script> -

       Execute the script with access to grunt environment including aliases.

       -param <param_name - See parameter substitution for details.

       -param_file <file_name> - See parameter substitution for details.

       script - Script to be executed.

   run [-param <param_name>=param_value] [-param_file<file_name>] <script> -

       Execute the script with access to grunt environment.

       -param <param_name - See parameter substitution for details.

       -param_file <file_name> - See parameter substitution for details.

       script - Script to be executed.

   sh  <shell command> - Invokea shell command.

   kill <job_id> - Kill the hadoop job specified by the hadoop jobid.

   set <key> <value> - Provide execution parameters to Pig.Keys and values are case sensitive.

       The following keys are supported:

       default_parallel - Script-level reduce parallelism. Basic input sizeheuristics used by default.

       debug - Set debug on or off. Default is off.

       job.name - Single-quoted name for jobs. Default is PigLatin:<scriptname>

       job.priority - Priority for jobs. Values: very_low, low, normal, high,very_high. Default is normal

       stream.skippath - String that contains the path. This is used bystreaming.

       any hadoop property.

   help - Display this message.

   history [-n] - Display the list statements in cache.

       -n Hide line numbers.

   quit - Quit the grunt shell.

grunt>

 


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章