5 Kudu安裝

CDH從5.10開始，打包集成Kudu1.2，並且Cloudera正式提供支持。這個版本開始Kudu的安裝較之前要簡單很多，省去了Impala_Kudu，安裝完Kudu，Impala即可直接操作Kudu。

以下安裝步驟基於用戶使用Cloudera Manager來安裝和部署Kudu1.2

5.1 安裝csd文件

1.下載csd文件

[root@ip-172-31-2-159 ~]# wget http://archive.cloudera.com/kudu/csd/KUDU-5.10.0.jar

2.將下載的jar包文件移動到/opt/cloudera/csd目錄

[root@ip-172-31-2-159 ~]# mv KUDU-5.10.0.jar /opt/cloudera/csd

3.修改權限

[root@ip-172-31-2-159 ~]# chown cloudera-scm:cloudera-scm /opt/cloudera/csd/KUDU-5.10.0.jar

[root@ip-172-31-2-159 ~]# chmod 644 /opt/cloudera/csd/KUDU-5.10.0.jar

4.重啓Cloudera Manager服務

[root@ip-172-31-2-159 ~]# systemctl restart cloudera-scm-server

5.2 安裝Kudu服務

1.下載Kudu服務需要的Parcel包

[root@ip-172-31-2-159 ~]# wget http://archive.cloudera.com/kudu/parcels/5.10/KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel

[root@ip-172-31-2-159 ~]# wget http://archive.cloudera.com/kudu/parcels/5.10/KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel.sha1

[root@ip-172-31-2-159 ~]# wget http://archive.cloudera.com/kudu/parcels/5.10/manifest.json

2.將Kudu的Parcel包部署到http服務

[root@ip-172-31-2-159 ~]# mkdir kudu1.2

[root@ip-172-31-2-159 ~]# mv KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel* kudu1.2/

[root@ip-172-31-2-159 ~]# mv manifest.json kudu1.2

[root@ip-172-31-2-159 ~]# mv kudu1.2/ /var/www/html/

[root@ip-172-31-2-159 ~]# systemctl start httpd

3.檢查http顯示Kudu正常：

4.通過CM界面配置Kudu的Parcel地址，並下載，分發，激活Kudu。

5.通過CM安裝Kudu1.2

添加Kudu服務

選擇Master和Tablet Server

配置相應的目錄，注：無論是Master還是Tablet根據實際情況數據目錄(fs_data_dir)應該都可能有多個，以提高併發讀寫，從而提高Kudu性能

啓動Kudu服務

安裝完畢

5.3 配置Impala

在CDH5.10中，安裝完Kudu1.2後，默認Impala即可直接操作Kudu進行SQL操作，但爲了省去每次建表都需要在TBLPROPERTIES中添加kudu_master_addresses屬性，建議在Impala的高級配置KuduMaster的地址：--kudu_master_hosts=ip-172-31-2-159:7051

6 快速組件服務驗證

6.1 HDFS驗證(mkdir+put+cat+get)

[root@ip-172-31-2-159 ~]# hadoop fs -mkdir -p /lilei/test_table

[root@ip-172-31-2-159 ~]# cat > a.txt

1#2

c#d

我#你^C

[root@ip-172-31-2-159 ~]#

[root@ip-172-31-2-159 ~]# hadoop fs -put a.txt /lilei/test_table

[root@ip-172-31-2-159 ~]# hadoop fs -cat /lilei/test_table/a.txt

1#2

c#d

[root@ip-172-31-2-159 ~]# rm -rf a.txt

[root@ip-172-31-2-159 ~]#

[root@ip-172-31-2-159 ~]# hadoop fs -get /lilei/test_table/a.txt

[root@ip-172-31-2-159 ~]#

[root@ip-172-31-2-159 ~]# cat a.txt

1#2

c#d

6.2 Hive驗證

[root@ip-172-31-2-159 ~]# hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/hive-common-1.1.0-cdh5.10.0.jar!/hive-log4j.properties

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.

hive> create external table test_table

> (

> s1 string,

> s2 string

> )

> row format delimited fields terminated by '#'

> stored as textfile location '/lilei/test_table';

Time taken: 0.631 seconds

hive> select * from test_table;

1 2

c d

Time taken: 0.36 seconds, Fetched: 2 row(s)

hive> select count(*) from test_table;

Query ID = root_20170404013939_69844998-4456-4bc1-9da5-53ea91342e43

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1491283979906_0005, Tracking URL = http://ip-172-31-2-159:8088/proxy/application_1491283979906_0005/

Kill Command = /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/bin/hadoop job -kill job_1491283979906_0005

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2017-04-04 01:39:25,425 Stage-1 map = 0%, reduce = 0%

2017-04-04 01:39:31,689 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.02 sec

2017-04-04 01:39:36,851 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.34 sec

MapReduce Total cumulative CPU time: 2 seconds 340 msec

Ended Job = job_1491283979906_0005

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.34 sec HDFS Read: 6501 HDFS Write: 2 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 340 msec

Time taken: 21.56 seconds, Fetched: 1 row(s)

6.3 MapReduce驗證

[root@ip-172-31-2-159 ~]# hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 5 5

Number of Maps = 5

Samples per Map = 5

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Wrote input for Map #4

Starting Job

17/04/04 01:38:15 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-2-159/172.31.2.159:8032

17/04/04 01:38:15 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory /user/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------

17/04/04 01:38:15 INFO input.FileInputFormat: Total input paths to process : 5

17/04/04 01:38:15 INFO mapreduce.JobSubmitter: number of splits:5

17/04/04 01:38:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491283979906_0004

17/04/04 01:38:16 INFO impl.YarnClientImpl: Submitted application application_1491283979906_0004

17/04/04 01:38:16 INFO mapreduce.Job: The url to track the job: http://ip-172-31-2-159:8088/proxy/application_1491283979906_0004/

17/04/04 01:38:16 INFO mapreduce.Job: Running job: job_1491283979906_0004

17/04/04 01:38:21 INFO mapreduce.Job: Job job_1491283979906_0004 running in uber mode : false

17/04/04 01:38:21 INFO mapreduce.Job: map 0% reduce 0%

17/04/04 01:38:26 INFO mapreduce.Job: map 100% reduce 0%

17/04/04 01:38:32 INFO mapreduce.Job: map 100% reduce 100%

17/04/04 01:38:32 INFO mapreduce.Job: Job job_1491283979906_0004 completed successfully

17/04/04 01:38:32 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=64

FILE: Number of bytes written=749758

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=1350

HDFS: Number of bytes written=215

HDFS: Number of read operations=23

HDFS: Number of large read operations=0

HDFS: Number of write operations=3

Job Counters

Launched map tasks=5

Launched reduce tasks=1

Data-local map tasks=5

Total time spent by all maps in occupied slots (ms)=16111

Total time spent by all reduces in occupied slots (ms)=2872

Total time spent by all map tasks (ms)=16111

Total time spent by all reduce tasks (ms)=2872

Total vcore-seconds taken by all map tasks=16111

Total vcore-seconds taken by all reduce tasks=2872

Total megabyte-seconds taken by all map tasks=16497664

Total megabyte-seconds taken by all reduce tasks=2940928

Map-Reduce Framework

Map input records=5

Map output records=10

Map output bytes=90

Map output materialized bytes=167

Input split bytes=760

Combine input records=0

Combine output records=0

Reduce input groups=2

Reduce shuffle bytes=167

Reduce input records=10

Reduce output records=0

Spilled Records=20

Shuffled Maps =5

Failed Shuffles=0

Merged Map outputs=5

GC time elapsed (ms)=213

CPU time spent (ms)=3320

Physical memory (bytes) snapshot=2817884160

Virtual memory (bytes) snapshot=9621606400

Total committed heap usage (bytes)=2991587328

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=590

File Output Format Counters

Bytes Written=97

Job Finished in 17.145 seconds

Estimated value of Pi is 3.68000000000000000000

6.4 Impala驗證

[root@ip-172-31-2-159 ~]# impala-shell -i ip-172-31-7-96

Starting Impala Shell without Kerberos authentication

Connected to ip-172-31-7-96:21000

Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2)

***********************************************************************************

Welcome to the Impala shell.

(Impala Shell v2.7.0-cdh5.10.0 (785a073) built on Fri Jan 20 12:03:56 PST 2017)

Run the PROFILE command after a query has finished to see a comprehensive summary

of all the performance and diagnostic information that Impala gathered for that

query. Be warned, it can be very long!

***********************************************************************************

[ip-172-31-7-96:21000] > show tables;

Query: show tables

+------------+

| name |

+------------+

| test_table |

+------------+

Fetched 1 row(s) in 0.20s

[ip-172-31-7-96:21000] > select * from test_table;

Query: select * from test_table

Query submitted at: 2017-04-04 01:41:56 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=c4a06bd46f9106b:4a69f04800000000

+----+----+

| s1 | s2 |

+----+----+

| 1 | 2 |

| c | d |

+----+----+

Fetched 2 row(s) in 3.73s

[ip-172-31-7-96:21000] > select count(*) from test_table;

Query: select count(*) from test_table

Query submitted at: 2017-04-04 01:42:06 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=2a415724696f7414:1f9113ea00000000

+----------+

| count(*) |

+----------+

| 2 |

+----------+

Fetched 1 row(s) in 0.15s

6.5 Spark驗證

[root@ip-172-31-2-159 ~]# spark-shell

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel).

Welcome to

____ __

/ __/__ ___ _____/ /__

_\ \/ _ \/ _ `/ __/ '_/

/___/ .__/\_,_/_/ /_/\_\ version 1.6.0

/_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67)

Type in expressions to have them evaluated.

Type :help for more information.

Spark context available as sc (master = yarn-client, app id = application_1491283979906_0006).

17/04/04 01:43:26 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0

17/04/04 01:43:27 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException

SQL context available as sqlContext.

scala> var textFile=sc.textFile("hdfs://ip-172-31-2-159:8020/lilei/test_table/a.txt")

textFile: org.apache.spark.rdd.RDD[String] = hdfs://ip-172-31-2-159:8020/lilei/test_table/a.txt MapPartitionsRDD[1] at textFile at <console>:27

scala>

scala> textFile.count()

res0: Long = 2

6.6 Kudu驗證

[root@ip-172-31-2-159 ~]# impala-shell -i ip-172-31-7-96

Starting Impala Shell without Kerberos authentication

Connected to ip-172-31-7-96:21000

Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2)

***********************************************************************************

Welcome to the Impala shell.

(Impala Shell v2.7.0-cdh5.10.0 (785a073) built on Fri Jan 20 12:03:56 PST 2017)

Every command must be terminated by a ';'.

***********************************************************************************

[ip-172-31-7-96:21000] > CREATE TABLE my_first_table

> (

> id BIGINT,

> name STRING,

> PRIMARY KEY(id)

> )

> PARTITION BY HASH PARTITIONS 16

> STORED AS KUDU;

Query: create TABLE my_first_table

(

id BIGINT,

name STRING,

PRIMARY KEY(id)

)

PARTITION BY HASH PARTITIONS 16

STORED AS KUDU

Fetched 0 row(s) in 1.35s

[ip-172-31-7-96:21000] > INSERT INTO my_first_table VALUES (99, "sarah");

Query: insert INTO my_first_table VALUES (99, "sarah")

Query submitted at: 2017-04-04 01:46:08 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=824ce0b3765c6b91:5ea8dd7c00000000

Modified 1 row(s), 0 row error(s) in 3.37s

[ip-172-31-7-96:21000] >

[ip-172-31-7-96:21000] > INSERT INTO my_first_table VALUES (1, "john"), (2, "jane"), (3, "jim");

Query: insert INTO my_first_table VALUES (1, "john"), (2, "jane"), (3, "jim")

Query submitted at: 2017-04-04 01:46:13 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=a645259c3b8ae7cd:e446e15500000000

Modified 3 row(s), 0 row error(s) in 0.11s

[ip-172-31-7-96:21000] > select * from my_first_table;

Query: select * from my_first_table

Query submitted at: 2017-04-04 01:46:19 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=f44021589ff0d94d:8d30568200000000

+----+-------+

| id | name |

+----+-------+

| 2 | jane |

| 3 | jim |

| 1 | john |

| 99 | sarah |

+----+-------+

Fetched 4 row(s) in 0.55s

[ip-172-31-7-96:21000] > delete from my_first_table where id =99;

Query: delete from my_first_table where id =99

Query submitted at: 2017-04-04 01:46:56 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=814090b100fdf0b4:1b516fe400000000

Modified 1 row(s), 0 row error(s) in 0.15s

[ip-172-31-7-96:21000] >

[ip-172-31-7-96:21000] > select * from my_first_table;

Query: select * from my_first_table

Query submitted at: 2017-04-04 01:46:57 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=724aa3f84cedb109:a679bf0200000000

+----+------+

| id | name |

+----+------+

| 2 | jane |

| 3 | jim |

| 1 | john |

+----+------+

Fetched 3 row(s) in 0.15s

[ip-172-31-7-96:21000] > INSERT INTO my_first_table VALUES (99, "sarah");

Query: insert INTO my_first_table VALUES (99, "sarah")

Query submitted at: 2017-04-04 01:47:32 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=6244b3c6d33b443e:f43c857300000000

Modified 1 row(s), 0 row error(s) in 0.11s

[ip-172-31-7-96:21000] >

[ip-172-31-7-96:21000] > update my_first_table set name='lilei' where id=99;

Query: update my_first_table set name='lilei' where id=99

Query submitted at: 2017-04-04 01:47:32 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=8f4ab0dd3c19f9df:b2c7bdfa00000000

Modified 1 row(s), 0 row error(s) in 0.13s

[ip-172-31-7-96:21000] > select * from my_first_table;

Query: select * from my_first_table

Query submitted at: 2017-04-04 01:47:34 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=6542579c8bd5b6ad:af68f50800000000

+----+-------+

| id | name |

+----+-------+

| 2 | jane |

| 3 | jim |

| 1 | john |

| 99 | lilei |

+----+-------+

Fetched 4 row(s) in 0.15s

[ip-172-31-7-96:21000] > upsert into my_first_table values(1, "john"), (4, "tom"), (99, "lilei1");

Query: upsert into my_first_table values(1, "john"), (4, "tom"), (99, "lilei1")

Query submitted at: 2017-04-04 01:48:52 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=694fc7ac2bc71d21:947f1fa200000000

Modified 3 row(s), 0 row error(s) in 0.11s

[ip-172-31-7-96:21000] >

[ip-172-31-7-96:21000] > select * from my_first_table;

Query: select * from my_first_table

Query submitted at: 2017-04-04 01:48:52 (Coordinator: http://ip-172-31-7-96:25000)

Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=a64e0ee707762b6b:69248a6c00000000

+----+--------+

| id | name |

+----+--------+

| 2 | jane |

| 3 | jim |

| 1 | john |

| 99 | lilei1 |

| 4 | tom |

+----+--------+

Fetched 5 row(s) in 0.16s

“醉酒鞭名馬，少年多浮誇！嶺南浣溪沙，嘔吐酒肆下！摯友不肯放，數據玩的花！”

本文分享自微信公衆號 - Hadoop實操（gh_c4c535955d0f）。
如有侵權，請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”，歡迎正在閱讀的你也加入，一起分享。

CENTOS7.2安裝CDH5.10和Kudu1.2(二)

5 Kudu安裝

5.1 安裝csd文件

5.2 安裝Kudu服務

5.3 配置Impala

6.1 HDFS驗證(mkdir+put+cat+get)

6.2 Hive驗證

6.3 MapReduce驗證

6.4 Impala驗證

6.5 Spark驗證

6.6 Kudu驗證

.Net 8.0 下的新RPC，IceRPC之試試的新玩法"打洞"

關於遊戲付費的一點想法

我通過CKA和CKS啦！

《最新出爐》系列入門篇-Python+Playwright自動化測試-42-強大的可視化追蹤利器Trace Viewer

大數據怎麼學？對大數據開發領域及崗位的詳細解讀，完整理解大數據開發領域技術體系

安裝chromadb注意事項

前端面試題 - null是原始類型，但爲什麼typeof null的結果是object？

前端面試題 - 如何實現promise？

Java中的List

有遇到過嗎？同樣的規則 Excel 中比Python 結果大

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結