CENTOS7.2安裝CDH5.10和Kudu1.2(二)

5 Kudu安裝

CDH5.10開始,打包集成Kudu1.2,並且Cloudera正式提供支持。這個版本開始Kudu的安裝較之前要簡單很多,省去了Impala_Kudu,安裝完KuduImpala即可直接操作Kudu

以下安裝步驟基於用戶使用Cloudera Manager來安裝和部署Kudu1.2

5.1 安裝csd文件

1.下載csd文件

[root@ip-172-31-2-159 ~]# wget  http://archive.cloudera.com/kudu/csd/KUDU-5.10.0.jar

2.將下載的jar包文件移動到/opt/cloudera/csd目錄

[root@ip-172-31-2-159 ~]# mv KUDU-5.10.0.jar  /opt/cloudera/csd

3.修改權限

[root@ip-172-31-2-159 ~]# chown  cloudera-scm:cloudera-scm /opt/cloudera/csd/KUDU-5.10.0.jar 

[root@ip-172-31-2-159 ~]# chmod 644  /opt/cloudera/csd/KUDU-5.10.0.jar

4.重啓Cloudera Manager服務

[root@ip-172-31-2-159 ~]# systemctl restart  cloudera-scm-server

5.2 安裝Kudu服務

1.下載Kudu服務需要的Parcel

[root@ip-172-31-2-159 ~]# wget  http://archive.cloudera.com/kudu/parcels/5.10/KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel

[root@ip-172-31-2-159 ~]# wget  http://archive.cloudera.com/kudu/parcels/5.10/KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel.sha1

[root@ip-172-31-2-159 ~]# wget  http://archive.cloudera.com/kudu/parcels/5.10/manifest.json

2.將KuduParcel包部署到http服務

[root@ip-172-31-2-159 ~]# mkdir kudu1.2

[root@ip-172-31-2-159 ~]# mv  KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel* kudu1.2/

[root@ip-172-31-2-159 ~]# mv manifest.json kudu1.2

[root@ip-172-31-2-159 ~]# mv kudu1.2/ /var/www/html/

[root@ip-172-31-2-159 ~]# systemctl start httpd

3.檢查http顯示Kudu正常:

4.通過CM界面配置KuduParcel地址,並下載,分發,激活Kudu

5.通過CM安裝Kudu1.2

添加Kudu服務

選擇MasterTablet Server

配置相應的目錄,注:無論是Master還是Tablet根據實際情況數據目錄(fs_data_dir)應該都可能有多個,以提高併發讀寫,從而提高Kudu性能

啓動Kudu服務

安裝完畢

5.3 配置Impala

CDH5.10中,安裝完Kudu1.2後,默認Impala即可直接操作Kudu進行SQL操作,但爲了省去每次建表都需要在TBLPROPERTIES中添加kudu_master_addresses屬性,建議在Impala的高級配置KuduMaster的地址:--kudu_master_hosts=ip-172-31-2-159:7051

快速組件服務驗證

6.1 HDFS驗證(mkdir+put+cat+get)

[root@ip-172-31-2-159 ~]# hadoop fs -mkdir -p  /lilei/test_table

[root@ip-172-31-2-159 ~]# cat > a.txt

1#2

c#d

#^C

[root@ip-172-31-2-159 ~]# 

[root@ip-172-31-2-159 ~]# 

[root@ip-172-31-2-159 ~]# 

[root@ip-172-31-2-159 ~]# hadoop fs -put a.txt  /lilei/test_table

[root@ip-172-31-2-159 ~]# hadoop fs -cat  /lilei/test_table/a.txt

1#2

c#d

[root@ip-172-31-2-159 ~]# rm -rf a.txt

[root@ip-172-31-2-159 ~]# 

[root@ip-172-31-2-159 ~]# hadoop fs -get  /lilei/test_table/a.txt

[root@ip-172-31-2-159 ~]# 

[root@ip-172-31-2-159 ~]# cat a.txt

1#2

c#d

6.2 Hive驗證

[root@ip-172-31-2-159 ~]# hive

 

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/hive-common-1.1.0-cdh5.10.0.jar!/hive-log4j.properties

WARNING: Hive CLI is deprecated and migration to  Beeline is recommended.

hive> create external table test_table

    > (

    > s1  string,

    > s2  string

    > )

    >  row format delimited fields terminated by '#'

    >  stored as textfile location '/lilei/test_table';

OK

Time taken: 0.631 seconds

hive> select * from test_table;

OK

1   2

c   d

Time taken: 0.36 seconds, Fetched: 2 row(s)

hive> select count(*) from test_table;

Query ID =  root_20170404013939_69844998-4456-4bc1-9da5-53ea91342e43

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile  time: 1

In order to change the average load for a reducer  (in bytes):

  set  hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set  hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set  mapreduce.job.reduces=<number>

Starting Job = job_1491283979906_0005, Tracking  URL = http://ip-172-31-2-159:8088/proxy/application_1491283979906_0005/

Kill Command =  /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/bin/hadoop  job  -kill job_1491283979906_0005

Hadoop job information for Stage-1: number of mappers:  1; number of reducers: 1

2017-04-04 01:39:25,425 Stage-1 map = 0%,  reduce = 0%

2017-04-04 01:39:31,689 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.02 sec

2017-04-04 01:39:36,851 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.34 sec

MapReduce Total cumulative CPU time: 2 seconds  340 msec

Ended Job = job_1491283979906_0005

MapReduce Jobs Launched: 

Stage-Stage-1: Map: 1  Reduce: 1    Cumulative CPU: 2.34 sec   HDFS  Read: 6501 HDFS Write: 2 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 340  msec

OK

2

Time taken: 21.56 seconds, Fetched: 1 row(s)

6.3 MapReduce驗證

[root@ip-172-31-2-159 ~]# hadoop jar  /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 5  5

Number of Maps   = 5

Samples per Map = 5

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Wrote input for Map #4

Starting Job

17/04/04 01:38:15 INFO client.RMProxy: Connecting  to ResourceManager at ip-172-31-2-159/172.31.2.159:8032

17/04/04 01:38:15 INFO mapreduce.JobSubmissionFiles:  Permissions on staging directory /user/root/.staging are incorrect:  rwxrwxrwx. Fixing permissions to correct value rwx------

17/04/04 01:38:15 INFO input.FileInputFormat:  Total input paths to process : 5

17/04/04 01:38:15 INFO mapreduce.JobSubmitter:  number of splits:5

17/04/04 01:38:15 INFO mapreduce.JobSubmitter:  Submitting tokens for job: job_1491283979906_0004

17/04/04 01:38:16 INFO impl.YarnClientImpl:  Submitted application application_1491283979906_0004

17/04/04 01:38:16 INFO mapreduce.Job: The url to  track the job:  http://ip-172-31-2-159:8088/proxy/application_1491283979906_0004/

17/04/04 01:38:16 INFO mapreduce.Job: Running  job: job_1491283979906_0004

17/04/04 01:38:21 INFO mapreduce.Job: Job  job_1491283979906_0004 running in uber mode : false

17/04/04 01:38:21 INFO mapreduce.Job:  map 0% reduce 0%

17/04/04 01:38:26 INFO mapreduce.Job:  map 100% reduce 0%

17/04/04 01:38:32 INFO mapreduce.Job:  map 100% reduce 100%

17/04/04 01:38:32 INFO mapreduce.Job: Job  job_1491283979906_0004 completed successfully

17/04/04 01:38:32 INFO mapreduce.Job: Counters:  49

    File  System Counters

       FILE:  Number of bytes read=64

       FILE:  Number of bytes written=749758

       FILE:  Number of read operations=0

       FILE:  Number of large read operations=0

       FILE:  Number of write operations=0

       HDFS:  Number of bytes read=1350

       HDFS:  Number of bytes written=215

       HDFS:  Number of read operations=23

       HDFS:  Number of large read operations=0

       HDFS:  Number of write operations=3

    Job  Counters 

       Launched  map tasks=5

       Launched  reduce tasks=1

       Data-local  map tasks=5

       Total  time spent by all maps in occupied slots (ms)=16111

       Total  time spent by all reduces in occupied slots (ms)=2872

       Total  time spent by all map tasks (ms)=16111

       Total  time spent by all reduce tasks (ms)=2872

       Total  vcore-seconds taken by all map tasks=16111

       Total  vcore-seconds taken by all reduce tasks=2872

       Total  megabyte-seconds taken by all map tasks=16497664

       Total  megabyte-seconds taken by all reduce tasks=2940928

    Map-Reduce  Framework

       Map  input records=5

       Map  output records=10

       Map  output bytes=90

       Map  output materialized bytes=167

       Input  split bytes=760

       Combine  input records=0

       Combine  output records=0

       Reduce  input groups=2

       Reduce  shuffle bytes=167

       Reduce  input records=10

       Reduce  output records=0

       Spilled  Records=20

       Shuffled  Maps =5

       Failed  Shuffles=0

       Merged  Map outputs=5

       GC  time elapsed (ms)=213

       CPU  time spent (ms)=3320

       Physical  memory (bytes) snapshot=2817884160

       Virtual  memory (bytes) snapshot=9621606400

       Total  committed heap usage (bytes)=2991587328

    Shuffle  Errors

       BAD_ID=0

       CONNECTION=0

       IO_ERROR=0

       WRONG_LENGTH=0

       WRONG_MAP=0

       WRONG_REDUCE=0

    File  Input Format Counters 

       Bytes  Read=590

    File  Output Format Counters 

       Bytes  Written=97

Job Finished in 17.145 seconds

Estimated value of Pi is 3.68000000000000000000

6.4 Impala驗證

[root@ip-172-31-2-159 ~]# impala-shell -i  ip-172-31-7-96

Starting Impala Shell without Kerberos  authentication

Connected to ip-172-31-7-96:21000

Server version: impalad version 2.7.0-cdh5.10.0  RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2)

***********************************************************************************

Welcome to the Impala shell.

(Impala Shell v2.7.0-cdh5.10.0 (785a073) built on  Fri Jan 20 12:03:56 PST 2017)

 

Run the PROFILE command after a query has  finished to see a comprehensive summary

of all the performance and diagnostic information  that Impala gathered for that

query. Be warned, it can be very long!

***********************************************************************************

[ip-172-31-7-96:21000] > show tables;

Query: show tables

+------------+

| name        |

+------------+

| test_table |

+------------+

Fetched 1 row(s) in 0.20s

[ip-172-31-7-96:21000] > select * from  test_table;

Query: select * from test_table

Query submitted at: 2017-04-04 01:41:56  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=c4a06bd46f9106b:4a69f04800000000

+----+----+

| s1 | s2 |

+----+----+

| 1  |  2  |

| c  |  d  |

+----+----+

Fetched 2 row(s) in 3.73s

[ip-172-31-7-96:21000] > select count(*) from  test_table;

Query: select count(*) from test_table

Query submitted at: 2017-04-04 01:42:06  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=2a415724696f7414:1f9113ea00000000

+----------+

| count(*) |

+----------+

| 2         |

+----------+

Fetched 1 row(s) in 0.15s

6.5 Spark驗證

[root@ip-172-31-2-159 ~]# spark-shell

Setting default log level to "WARN".

To adjust logging level use  sc.setLogLevel(newLevel).

Welcome to

       ____              __

     /  __/__  ___ _____/ /__

    _\ \/ _  \/ _ `/ __/  '_/

   /___/  .__/\_,_/_/ /_/\_\   version 1.6.0

      /_/

 

Using Scala version 2.10.5 (Java HotSpot(TM)  64-Bit Server VM, Java 1.7.0_67)

Type in expressions to have them evaluated.

Type :help for more information.

Spark context available as sc (master =  yarn-client, app id = application_1491283979906_0006).

17/04/04 01:43:26 WARN metastore.ObjectStore:  Version information not found in metastore.  hive.metastore.schema.verification is not enabled so recording the schema  version 1.1.0

17/04/04 01:43:27 WARN metastore.ObjectStore:  Failed to get database default, returning NoSuchObjectException

SQL context available as sqlContext.

 

scala> var  textFile=sc.textFile("hdfs://ip-172-31-2-159:8020/lilei/test_table/a.txt")

textFile: org.apache.spark.rdd.RDD[String] =  hdfs://ip-172-31-2-159:8020/lilei/test_table/a.txt MapPartitionsRDD[1] at  textFile at <console>:27

 

scala> 

 

scala> textFile.count()

res0: Long = 2

6.6 Kudu驗證

[root@ip-172-31-2-159 ~]# impala-shell -i  ip-172-31-7-96

Starting Impala Shell without Kerberos  authentication

Connected to ip-172-31-7-96:21000

Server version: impalad version 2.7.0-cdh5.10.0  RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2)

***********************************************************************************

Welcome to the Impala shell.

(Impala Shell v2.7.0-cdh5.10.0 (785a073) built on  Fri Jan 20 12:03:56 PST 2017)

 

Every command must be terminated by a ';'.

***********************************************************************************

[ip-172-31-7-96:21000] > CREATE TABLE  my_first_table

                        > (

                       >   id BIGINT,

                       >   name STRING,

                       >   PRIMARY KEY(id)

                       > )

                       > PARTITION BY HASH  PARTITIONS 16

                       > STORED AS KUDU;

Query: create TABLE my_first_table

(

  id  BIGINT,

  name  STRING,

  PRIMARY  KEY(id)

)

PARTITION BY HASH PARTITIONS 16

STORED AS KUDU

 

Fetched 0 row(s) in 1.35s

[ip-172-31-7-96:21000] > INSERT INTO  my_first_table VALUES (99, "sarah");

Query: insert INTO my_first_table VALUES (99,  "sarah")

Query submitted at: 2017-04-04 01:46:08  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=824ce0b3765c6b91:5ea8dd7c00000000

Modified 1 row(s), 0 row error(s) in 3.37s

[ip-172-31-7-96:21000] > 

[ip-172-31-7-96:21000] > INSERT INTO  my_first_table VALUES (1, "john"), (2, "jane"), (3,  "jim");

Query: insert INTO my_first_table VALUES (1,  "john"), (2, "jane"), (3, "jim")

Query submitted at: 2017-04-04 01:46:13  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=a645259c3b8ae7cd:e446e15500000000

Modified 3 row(s), 0 row error(s) in 0.11s

[ip-172-31-7-96:21000] > select * from  my_first_table;

Query: select * from my_first_table

Query submitted at: 2017-04-04 01:46:19  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=f44021589ff0d94d:8d30568200000000

+----+-------+

| id | name   |

+----+-------+

| 2  |  jane  |

| 3  |  jim   |

| 1  |  john  |

| 99 | sarah |

+----+-------+

Fetched 4 row(s) in 0.55s

[ip-172-31-7-96:21000] > delete from  my_first_table where id =99;

Query: delete from my_first_table where id =99

Query submitted at: 2017-04-04 01:46:56  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=814090b100fdf0b4:1b516fe400000000

Modified 1 row(s), 0 row error(s) in 0.15s

[ip-172-31-7-96:21000] > 

[ip-172-31-7-96:21000] > select * from  my_first_table;

Query: select * from my_first_table

Query submitted at: 2017-04-04 01:46:57  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=724aa3f84cedb109:a679bf0200000000

+----+------+

| id | name |

+----+------+

| 2  | jane  |

| 3  |  jim  |

| 1  | john  |

+----+------+

Fetched 3 row(s) in 0.15s

[ip-172-31-7-96:21000] > INSERT INTO  my_first_table VALUES (99, "sarah");

Query: insert INTO my_first_table VALUES (99,  "sarah")

Query submitted at: 2017-04-04 01:47:32  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=6244b3c6d33b443e:f43c857300000000

Modified 1 row(s), 0 row error(s) in 0.11s

[ip-172-31-7-96:21000] > 

[ip-172-31-7-96:21000] > update my_first_table  set name='lilei' where id=99;

Query: update my_first_table set name='lilei'  where id=99

Query submitted at: 2017-04-04 01:47:32 (Coordinator:  http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=8f4ab0dd3c19f9df:b2c7bdfa00000000

Modified 1 row(s), 0 row error(s) in 0.13s

[ip-172-31-7-96:21000] > select * from  my_first_table;

Query: select * from my_first_table

Query submitted at: 2017-04-04 01:47:34  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=6542579c8bd5b6ad:af68f50800000000

+----+-------+

| id | name   |

+----+-------+

| 2  |  jane  |

| 3  |  jim   |

| 1  |  john  |

| 99 | lilei |

+----+-------+

Fetched 4 row(s) in 0.15s

[ip-172-31-7-96:21000] > upsert  into my_first_table values(1,  "john"), (4, "tom"), (99, "lilei1");

Query: upsert into my_first_table values(1,  "john"), (4, "tom"), (99, "lilei1")

Query submitted at: 2017-04-04 01:48:52  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=694fc7ac2bc71d21:947f1fa200000000

Modified 3 row(s), 0 row error(s) in 0.11s

[ip-172-31-7-96:21000] > 

[ip-172-31-7-96:21000] > select * from  my_first_table;

Query: select * from my_first_table

Query submitted at: 2017-04-04 01:48:52  (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=a64e0ee707762b6b:69248a6c00000000

+----+--------+

| id | name    |

+----+--------+

| 2  |  jane   |

| 3  |  jim    |

| 1  |  john   |

| 99 | lilei1 |

| 4  |  tom    |

+----+--------+

Fetched 5 row(s) in 0.16s

 

“醉酒鞭名馬,少年多浮誇! 嶺南浣溪沙,嘔吐酒肆下!摯友不肯放,數據玩的花!”



本文分享自微信公衆號 - Hadoop實操(gh_c4c535955d0f)。
如有侵權,請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”,歡迎正在閱讀的你也加入,一起分享。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章