Kylin消費Kafka數據流式構建cube

文章開始之前先說明環境情況,這裏kylin消費的kafka數據是從Oracle 數據庫用Ogg For Bigdata以json格式將數據投遞到kafka topic的,投遞的時候,關於insert和update 之前的數據投遞到名爲 ZTVOUCHER_INS 的topic,而delete和update之後的數據投遞到名爲 ZTVOUCHER_DEL 的topic中,前面說的這種投遞方式本人在文章OGG For Bigdata 12按操作類型同步Oracle數據到kafka不同topic中已經詳細說明了,這裏主要介紹kylin如何消費數據創建流式cube。

一、源端做DML操作

1.源端表ztvoucher目前沒有數據,現在做insert,並查詢:


insert into ztvoucher (MANDT, GJAHR, BUKRS, BELNR, BUZEI, MONAT, BUDAT, HKONT, DMBTR, ZZ0014)
values ('666', '2222', '3432', '2200001414', '001', '01', '20190101', '9101000000', 100.00, '101');

1 row created.

SQL> commit;

Commit complete.

SQL> alter system switch logfile;

System altered.
SQL>  select * from ztvoucher;
MANDT	GJAHR	BUKRS	BELNR	BUZEI	MONAT	BUDAT	HKONT	DMBTR	ZZ0014
666	2222	3432	2200001414	001	01	20190101	9101000000	100.00	101

2.去kafka查看:

[root@hadoop kafka]# ./console.sh 
input topic:ZTVOUCHER_INS
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper]
.{"table":"SCOTT.ZTVOUCHER_INS","op_type":"I","op_ts":"2019-05-22 16:18:58.353767","current_ts":"2019-05-22T16:19:11.352000","pos":"00000000080000012086","tokens":{"TKN-OP-TYPE":"INSERT"},"a
fter":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":100.00,"ZZ0014":"101"}}

發現源端做的insert已經在 topic:ZTVOUCHER_INS有了。
3.源端做update操作:

update ztvoucher set dmbtr=50 where mandt='666';
commit;
alter system switch logfile;

4.去kafka查看:
先看ZTOVOUCHER_INS 內容:

[root@hadoop kafka]# ./console.sh 
input topic:ZTVOUCHER_INS
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper]
.{"table":"SCOTT.ZTVOUCHER_INS","op_type":"I","op_ts":"2019-05-22 16:18:58.353767","current_ts":"2019-05-22T16:19:11.352000","pos":"00000000080000012086","tokens":{"TKN-OP-TYPE":"INSERT"},"a
fter":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":100.00,"ZZ0014":"101"}}{"table":"SCOTT.ZTVOUCHER_INS","op_type":"I","op_ts":"2019-05-22 16:22:48.354189","current_ts":"2019-05-22T16:23:33.799000","pos":"00000000080000012613","tokens":{"TKN-OP-TYPE":"SQL COMPUPD
ATE"},"after":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":50.00,"ZZ0014":"101"}}

發現除了之前的insert操作,現在update之後的數據也進來了。
再看ZTVOUCHER_DEL:

[root@hadoop kafka]# ./console.sh 
input topic:ZTVOUCHER_DEL
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper]
.{"table":"SCOTT.ZTVOUCHER_DEL","op_type":"I","op_ts":"2019-05-22 16:22:48.354189","current_ts":"2019-05-22T16:23:23.781000","pos":"00000000080000012345","tokens":{"TKN-OP-TYPE":"SQL COMPUPD
ATE"},"after":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":100.00,"ZZ0014":"101"}}

發現DEL的topic中也存入了update之前的數據。
5.源端做delete操作:

delete from ztvoucher where mandt='666';
commit;
alter system switch logfile;

6.去kafka查看:

[root@hadoop kafka]# ./console.sh 
input topic:ZTVOUCHER_DEL
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper]
.{"table":"SCOTT.ZTVOUCHER_DEL","op_type":"I","op_ts":"2019-05-22 16:22:48.354189","current_ts":"2019-05-22T16:23:23.781000","pos":"00000000080000012345","tokens":{"TKN-OP-TYPE":"SQL COMPUPD
ATE"},"after":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":100.00,"ZZ0014":"101"}}{"table":"SCOTT.ZTVOUCHER_DEL","op_type":"I","op_ts":"2019-05-22 16:26:26.353705","current_ts":"2019-05-22T16:27:15.049000","pos":"00000000080000012857","tokens":{"TKN-OP-TYPE":"DELETE"},"a
fter":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":50.00,"ZZ0014":"101"}}

發現除了上面update之前的數據以外,還寫入了剛做的delete操作的數據。
好了,現在數據都組織好了,現在去流式創建cube。

二、流式構建cube

流式構建cube官方連接(本人用的2.4版本):
http://kylin.apache.org/cn/docs24/tutorial/cube_streaming.html
流式構建cube需要一個類型爲timestamp的時間列字段用來標識消息的時間,從前面兩個topic中的json數據可以看到,op_ts字段滿足這個要求。
1、用j’son數據定義一張表
先來構建

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章