5 Kudu安裝
CDH從5.10開始,打包集成Kudu1.2,並且Cloudera正式提供支持。這個版本開始Kudu的安裝較之前要簡單很多,省去了Impala_Kudu,安裝完Kudu,Impala即可直接操作Kudu。
以下安裝步驟基於用戶使用Cloudera Manager來安裝和部署Kudu1.2
5.1 安裝csd文件
1.下載csd文件
[root@ip-172-31-2-159 ~]# wget http://archive.cloudera.com/kudu/csd/KUDU-5.10.0.jar |
2.將下載的jar包文件移動到/opt/cloudera/csd目錄
[root@ip-172-31-2-159 ~]# mv KUDU-5.10.0.jar /opt/cloudera/csd |
3.修改權限
[root@ip-172-31-2-159 ~]# chown cloudera-scm:cloudera-scm /opt/cloudera/csd/KUDU-5.10.0.jar [root@ip-172-31-2-159 ~]# chmod 644 /opt/cloudera/csd/KUDU-5.10.0.jar |
4.重啓Cloudera Manager服務
[root@ip-172-31-2-159 ~]# systemctl restart cloudera-scm-server |
5.2 安裝Kudu服務
1.下載Kudu服務需要的Parcel包
[root@ip-172-31-2-159 ~]# wget http://archive.cloudera.com/kudu/parcels/5.10/KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel [root@ip-172-31-2-159 ~]# wget http://archive.cloudera.com/kudu/parcels/5.10/KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel.sha1 [root@ip-172-31-2-159 ~]# wget http://archive.cloudera.com/kudu/parcels/5.10/manifest.json |
2.將Kudu的Parcel包部署到http服務
[root@ip-172-31-2-159 ~]# mkdir kudu1.2 [root@ip-172-31-2-159 ~]# mv KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel* kudu1.2/ [root@ip-172-31-2-159 ~]# mv manifest.json kudu1.2 [root@ip-172-31-2-159 ~]# mv kudu1.2/ /var/www/html/ [root@ip-172-31-2-159 ~]# systemctl start httpd |
3.檢查http顯示Kudu正常:
4.通過CM界面配置Kudu的Parcel地址,並下載,分發,激活Kudu。
5.通過CM安裝Kudu1.2
添加Kudu服務
選擇Master和Tablet Server
配置相應的目錄,注:無論是Master還是Tablet根據實際情況數據目錄(fs_data_dir)應該都可能有多個,以提高併發讀寫,從而提高Kudu性能
啓動Kudu服務
安裝完畢
5.3 配置Impala
在CDH5.10中,安裝完Kudu1.2後,默認Impala即可直接操作Kudu進行SQL操作,但爲了省去每次建表都需要在TBLPROPERTIES中添加kudu_master_addresses屬性,建議在Impala的高級配置KuduMaster的地址:--kudu_master_hosts=ip-172-31-2-159:7051
6 快速組件服務驗證
6.1 HDFS驗證(mkdir+put+cat+get)
[root@ip-172-31-2-159 ~]# hadoop fs -mkdir -p /lilei/test_table [root@ip-172-31-2-159 ~]# cat > a.txt 1#2 c#d 我#你^C [root@ip-172-31-2-159 ~]# [root@ip-172-31-2-159 ~]# [root@ip-172-31-2-159 ~]# [root@ip-172-31-2-159 ~]# hadoop fs -put a.txt /lilei/test_table [root@ip-172-31-2-159 ~]# hadoop fs -cat /lilei/test_table/a.txt 1#2 c#d [root@ip-172-31-2-159 ~]# rm -rf a.txt [root@ip-172-31-2-159 ~]# [root@ip-172-31-2-159 ~]# hadoop fs -get /lilei/test_table/a.txt [root@ip-172-31-2-159 ~]# [root@ip-172-31-2-159 ~]# cat a.txt 1#2 c#d |
6.2 Hive驗證
[root@ip-172-31-2-159 ~]# hive
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/hive-common-1.1.0-cdh5.10.0.jar!/hive-log4j.properties WARNING: Hive CLI is deprecated and migration to Beeline is recommended. hive> create external table test_table > ( > s1 string, > s2 string > ) > row format delimited fields terminated by '#' > stored as textfile location '/lilei/test_table'; OK Time taken: 0.631 seconds hive> select * from test_table; OK 1 2 c d Time taken: 0.36 seconds, Fetched: 2 row(s) hive> select count(*) from test_table; Query ID = root_20170404013939_69844998-4456-4bc1-9da5-53ea91342e43 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1491283979906_0005, Tracking URL = http://ip-172-31-2-159:8088/proxy/application_1491283979906_0005/ Kill Command = /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/bin/hadoop job -kill job_1491283979906_0005 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2017-04-04 01:39:25,425 Stage-1 map = 0%, reduce = 0% 2017-04-04 01:39:31,689 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.02 sec 2017-04-04 01:39:36,851 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.34 sec MapReduce Total cumulative CPU time: 2 seconds 340 msec Ended Job = job_1491283979906_0005 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.34 sec HDFS Read: 6501 HDFS Write: 2 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 340 msec OK 2 Time taken: 21.56 seconds, Fetched: 1 row(s) |
6.3 MapReduce驗證
[root@ip-172-31-2-159 ~]# hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 5 5 Number of Maps = 5 Samples per Map = 5 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Starting Job 17/04/04 01:38:15 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-2-159/172.31.2.159:8032 17/04/04 01:38:15 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory /user/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------ 17/04/04 01:38:15 INFO input.FileInputFormat: Total input paths to process : 5 17/04/04 01:38:15 INFO mapreduce.JobSubmitter: number of splits:5 17/04/04 01:38:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491283979906_0004 17/04/04 01:38:16 INFO impl.YarnClientImpl: Submitted application application_1491283979906_0004 17/04/04 01:38:16 INFO mapreduce.Job: The url to track the job: http://ip-172-31-2-159:8088/proxy/application_1491283979906_0004/ 17/04/04 01:38:16 INFO mapreduce.Job: Running job: job_1491283979906_0004 17/04/04 01:38:21 INFO mapreduce.Job: Job job_1491283979906_0004 running in uber mode : false 17/04/04 01:38:21 INFO mapreduce.Job: map 0% reduce 0% 17/04/04 01:38:26 INFO mapreduce.Job: map 100% reduce 0% 17/04/04 01:38:32 INFO mapreduce.Job: map 100% reduce 100% 17/04/04 01:38:32 INFO mapreduce.Job: Job job_1491283979906_0004 completed successfully 17/04/04 01:38:32 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=64 FILE: Number of bytes written=749758 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1350 HDFS: Number of bytes written=215 HDFS: Number of read operations=23 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=5 Launched reduce tasks=1 Data-local map tasks=5 Total time spent by all maps in occupied slots (ms)=16111 Total time spent by all reduces in occupied slots (ms)=2872 Total time spent by all map tasks (ms)=16111 Total time spent by all reduce tasks (ms)=2872 Total vcore-seconds taken by all map tasks=16111 Total vcore-seconds taken by all reduce tasks=2872 Total megabyte-seconds taken by all map tasks=16497664 Total megabyte-seconds taken by all reduce tasks=2940928 Map-Reduce Framework Map input records=5 Map output records=10 Map output bytes=90 Map output materialized bytes=167 Input split bytes=760 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=167 Reduce input records=10 Reduce output records=0 Spilled Records=20 Shuffled Maps =5 Failed Shuffles=0 Merged Map outputs=5 GC time elapsed (ms)=213 CPU time spent (ms)=3320 Physical memory (bytes) snapshot=2817884160 Virtual memory (bytes) snapshot=9621606400 Total committed heap usage (bytes)=2991587328 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=590 File Output Format Counters Bytes Written=97 Job Finished in 17.145 seconds Estimated value of Pi is 3.68000000000000000000 |
6.4 Impala驗證
[root@ip-172-31-2-159 ~]# impala-shell -i ip-172-31-7-96 Starting Impala Shell without Kerberos authentication Connected to ip-172-31-7-96:21000 Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2) *********************************************************************************** Welcome to the Impala shell. (Impala Shell v2.7.0-cdh5.10.0 (785a073) built on Fri Jan 20 12:03:56 PST 2017)
Run the PROFILE command after a query has finished to see a comprehensive summary of all the performance and diagnostic information that Impala gathered for that query. Be warned, it can be very long! *********************************************************************************** [ip-172-31-7-96:21000] > show tables; Query: show tables +------------+ | name | +------------+ | test_table | +------------+ Fetched 1 row(s) in 0.20s [ip-172-31-7-96:21000] > select * from test_table; Query: select * from test_table Query submitted at: 2017-04-04 01:41:56 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=c4a06bd46f9106b:4a69f04800000000 +----+----+ | s1 | s2 | +----+----+ | 1 | 2 | | c | d | +----+----+ Fetched 2 row(s) in 3.73s [ip-172-31-7-96:21000] > select count(*) from test_table; Query: select count(*) from test_table Query submitted at: 2017-04-04 01:42:06 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=2a415724696f7414:1f9113ea00000000 +----------+ | count(*) | +----------+ | 2 | +----------+ Fetched 1 row(s) in 0.15s |
6.5 Spark驗證
[root@ip-172-31-2-159 ~]# spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.0 /_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc (master = yarn-client, app id = application_1491283979906_0006). 17/04/04 01:43:26 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0 17/04/04 01:43:27 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException SQL context available as sqlContext.
scala> var textFile=sc.textFile("hdfs://ip-172-31-2-159:8020/lilei/test_table/a.txt") textFile: org.apache.spark.rdd.RDD[String] = hdfs://ip-172-31-2-159:8020/lilei/test_table/a.txt MapPartitionsRDD[1] at textFile at <console>:27
scala>
scala> textFile.count() res0: Long = 2 |
6.6 Kudu驗證
[root@ip-172-31-2-159 ~]# impala-shell -i ip-172-31-7-96 Starting Impala Shell without Kerberos authentication Connected to ip-172-31-7-96:21000 Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2) *********************************************************************************** Welcome to the Impala shell. (Impala Shell v2.7.0-cdh5.10.0 (785a073) built on Fri Jan 20 12:03:56 PST 2017)
Every command must be terminated by a ';'. *********************************************************************************** [ip-172-31-7-96:21000] > CREATE TABLE my_first_table > ( > id BIGINT, > name STRING, > PRIMARY KEY(id) > ) > PARTITION BY HASH PARTITIONS 16 > STORED AS KUDU; Query: create TABLE my_first_table ( id BIGINT, name STRING, PRIMARY KEY(id) ) PARTITION BY HASH PARTITIONS 16 STORED AS KUDU
Fetched 0 row(s) in 1.35s [ip-172-31-7-96:21000] > INSERT INTO my_first_table VALUES (99, "sarah"); Query: insert INTO my_first_table VALUES (99, "sarah") Query submitted at: 2017-04-04 01:46:08 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=824ce0b3765c6b91:5ea8dd7c00000000 Modified 1 row(s), 0 row error(s) in 3.37s [ip-172-31-7-96:21000] > [ip-172-31-7-96:21000] > INSERT INTO my_first_table VALUES (1, "john"), (2, "jane"), (3, "jim"); Query: insert INTO my_first_table VALUES (1, "john"), (2, "jane"), (3, "jim") Query submitted at: 2017-04-04 01:46:13 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=a645259c3b8ae7cd:e446e15500000000 Modified 3 row(s), 0 row error(s) in 0.11s [ip-172-31-7-96:21000] > select * from my_first_table; Query: select * from my_first_table Query submitted at: 2017-04-04 01:46:19 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=f44021589ff0d94d:8d30568200000000 +----+-------+ | id | name | +----+-------+ | 2 | jane | | 3 | jim | | 1 | john | | 99 | sarah | +----+-------+ Fetched 4 row(s) in 0.55s [ip-172-31-7-96:21000] > delete from my_first_table where id =99; Query: delete from my_first_table where id =99 Query submitted at: 2017-04-04 01:46:56 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=814090b100fdf0b4:1b516fe400000000 Modified 1 row(s), 0 row error(s) in 0.15s [ip-172-31-7-96:21000] > [ip-172-31-7-96:21000] > select * from my_first_table; Query: select * from my_first_table Query submitted at: 2017-04-04 01:46:57 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=724aa3f84cedb109:a679bf0200000000 +----+------+ | id | name | +----+------+ | 2 | jane | | 3 | jim | | 1 | john | +----+------+ Fetched 3 row(s) in 0.15s [ip-172-31-7-96:21000] > INSERT INTO my_first_table VALUES (99, "sarah"); Query: insert INTO my_first_table VALUES (99, "sarah") Query submitted at: 2017-04-04 01:47:32 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=6244b3c6d33b443e:f43c857300000000 Modified 1 row(s), 0 row error(s) in 0.11s [ip-172-31-7-96:21000] > [ip-172-31-7-96:21000] > update my_first_table set name='lilei' where id=99; Query: update my_first_table set name='lilei' where id=99 Query submitted at: 2017-04-04 01:47:32 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=8f4ab0dd3c19f9df:b2c7bdfa00000000 Modified 1 row(s), 0 row error(s) in 0.13s [ip-172-31-7-96:21000] > select * from my_first_table; Query: select * from my_first_table Query submitted at: 2017-04-04 01:47:34 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=6542579c8bd5b6ad:af68f50800000000 +----+-------+ | id | name | +----+-------+ | 2 | jane | | 3 | jim | | 1 | john | | 99 | lilei | +----+-------+ Fetched 4 row(s) in 0.15s [ip-172-31-7-96:21000] > upsert into my_first_table values(1, "john"), (4, "tom"), (99, "lilei1"); Query: upsert into my_first_table values(1, "john"), (4, "tom"), (99, "lilei1") Query submitted at: 2017-04-04 01:48:52 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=694fc7ac2bc71d21:947f1fa200000000 Modified 3 row(s), 0 row error(s) in 0.11s [ip-172-31-7-96:21000] > [ip-172-31-7-96:21000] > select * from my_first_table; Query: select * from my_first_table Query submitted at: 2017-04-04 01:48:52 (Coordinator: http://ip-172-31-7-96:25000) Query progress can be monitored at: http://ip-172-31-7-96:25000/query_plan?query_id=a64e0ee707762b6b:69248a6c00000000 +----+--------+ | id | name | +----+--------+ | 2 | jane | | 3 | jim | | 1 | john | | 99 | lilei1 | | 4 | tom | +----+--------+ Fetched 5 row(s) in 0.16s |
“醉酒鞭名馬,少年多浮誇! 嶺南浣溪沙,嘔吐酒肆下!摯友不肯放,數據玩的花!”
本文分享自微信公衆號 - Hadoop實操(gh_c4c535955d0f)。
如有侵權,請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”,歡迎正在閱讀的你也加入,一起分享。