pig安裝在hadoop僞分佈式節點

本次pig安裝在一個hadoop僞分佈式節點。

Pig是yahoo捐獻給apache的一個項目，它是SQL-like語言，是在MapReduce上構建的一種高級查詢語言，把一些運算編譯進MapReduce模型的Map和Reduce中，並且用戶可以定義自己的功能。

Pig是一個客戶端應用程序，就算你要在Hadoop集羣上運行Pig，也不需要在集羣上裝額外的東西。

首先從官網上下載pig安裝包，並上傳到服務器後。使用以下命令解壓：

[hadoop@hadoop1 soft]$ tar -zxvf pig-0.13.0.tar.gz

爲了配置方便，簡單可以修改一下解壓後的文件名

[hadoop@hadoop1 ~]$ mv pig-0.13.0 pig2

在hadoop用戶的.bash_profile中增加pig環境變量

[hadoop@hadoop1 ~]$ cat .bash_profile

# .bash_profile

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

. ~/.bashrc

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

export JAVA_HOME=/usr/lib/jvm/java-1.7.0/

export HADOOP_HOME=/home/hadoop/hadoop2

export PIG_HOME=/home/hadoop/pig2

export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/

export ATH=$PATH:$JAVA_HOME/bin/:$HADOOP_HOME/bin:$PIG_HOME/bin

[hadoop@hadoop1 ~]$ source .bash_profile

Pig有兩種模式：

一種是Localmode，也就是本地模式，這種模式下Pig運行在一個JVM裏，訪問的是本地的文件系統，只適合於小規模數據集，一般是用來體驗Pig。而且，它並沒有用到Hadoop的Localrunner，Pig把查詢轉換爲物理的Plan，然後自己去執行。

在終端下輸入

% pig -x local

就可以進入Local模式了。

還有一種就是Hadoop模式了，這種模式下，Pig才真正的把查詢轉換爲相應的MapReduce Jobs，並提交到Hadoop集羣去運行，集羣可以是真實的分佈式也可以是僞分佈式。

[hadoop@hadoop1 ~]$ pig

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : LOCAL

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : MAPREDUCE

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Picked MAPREDUCE as the ExecType

2014-09-10 21:04:09,149 [main] INFO org.apache.pig.Main - Apache Pig version0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58

2014-09-10 21:04:09,150 [main] INFO org.apache.pig.Main - Logging error messagesto: /home/hadoop/pig2/pig-err.log

2014-09-10 21:04:09,435 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /home/hadoop/.pigbootup not found

2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address

2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated.Instead, use fs.defaultFS

2014-09-10 21:04:10,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop1:9000

2014-09-10 21:04:10,360 [main] INFO org.apache.hadoop.conf.Configuration.deprecation- mapred.used.genericoptionsparser is deprecated. Instead, usemapreduce.client.genericoptionsparser.used

2014-09-10 21:04:12,820 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address

2014-09-10 21:04:12,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to map-reduce job tracker at: hadoop1:9001

2014-09-10 21:04:12,831 [main] INFO org.apache.hadoop.conf.Configuration.deprecation- fs.default.name is deprecated. Instead, use fs.defaultFS

grunt>

grunt> help

Commands:

<pig latin statement>; - See thePigLatin manual for details: http://hadoop.apache.org/pig

File system commands:

fs <fs arguments> - Equivalent to Hadoop dfs command:http://hadoop.apache.org/common/docs/current/hdfs_shell.html

Diagnostic commands:

describe <alias>[::<alias] - Show the schema for the alias.Inner aliases can be described as A::B.

explain [-script <pigscript>] [-out <path>] [-brief][-dot|-xml] [-param <param_name>=<param_value>]

[-param_file <file_name>] [<alias>] - Show the executionplan to compute the alias or for entire script.

-script - Explain the entire script.

-out - Store the output into directory rather than print to stdout.

-brief - Don't expand nested plans (presenting a smaller graph foroverview).

-dot - Generate the output in .dot format. Default is text format.

-xml - Generate the output in .xml format. Default is text format.

-param <param_name - See parameter substitution for details.

-param_file <file_name> - See parameter substitution for details.

alias - Alias to explain.

dump <alias> - Compute the alias and writes the results to stdout.

Utility Commands:

exec [-param <param_name>=param_value] [-param_file<file_name>] <script> -

Execute the script with access to grunt environment including aliases.

-param <param_name - See parameter substitution for details.

-param_file <file_name> - See parameter substitution for details.

script - Script to be executed.

run [-param <param_name>=param_value] [-param_file<file_name>] <script> -

Execute the script with access to grunt environment.

-param <param_name - See parameter substitution for details.

-param_file <file_name> - See parameter substitution for details.

script - Script to be executed.

sh <shell command> - Invokea shell command.

kill <job_id> - Kill the hadoop job specified by the hadoop jobid.

set <key> <value> - Provide execution parameters to Pig.Keys and values are case sensitive.

The following keys are supported:

default_parallel - Script-level reduce parallelism. Basic input sizeheuristics used by default.

debug - Set debug on or off. Default is off.

job.name - Single-quoted name for jobs. Default is PigLatin:<scriptname>

job.priority - Priority for jobs. Values: very_low, low, normal, high,very_high. Default is normal

stream.skippath - String that contains the path. This is used bystreaming.

any hadoop property.

help - Display this message.

history [-n] - Display the list statements in cache.

-n Hide line numbers.

quit - Quit the grunt shell.

grunt>

pig安裝在hadoop僞分佈式節點

ollama使用

Window 安裝 Python 失敗 0x80070643，發生嚴重錯誤

TiDB Vector 太香啦：以圖搜圖初體驗！

《最新出爐》系列入門篇-Python+Playwright自動化測試-41-錄製視頻

maxwell收集binlog，不同網絡環境通過nginx發送數據到kafka集羣

使用xtrabackup進行對數據進行全量恢復後，啓動數據庫報錯

IDEA本地運行spark生成數據到hive中出錯

通過maxwell讀取binlog日誌，把mysql變化數據傳入redis

使用idea開發flink報錯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結