文章開始之前先說明環境情況,這裏kylin消費的kafka數據是從Oracle 數據庫用Ogg For Bigdata以json格式將數據投遞到kafka topic的,投遞的時候,關於insert和update 之前的數據投遞到名爲 ZTVOUCHER_INS 的topic,而delete和update之後的數據投遞到名爲 ZTVOUCHER_DEL 的topic中,前面說的這種投遞方式本人在文章OGG For Bigdata 12按操作類型同步Oracle數據到kafka不同topic中已經詳細說明了,這裏主要介紹kylin如何消費數據創建流式cube。
一、源端做DML操作
1.源端表ztvoucher目前沒有數據,現在做insert,並查詢:
insert into ztvoucher (MANDT, GJAHR, BUKRS, BELNR, BUZEI, MONAT, BUDAT, HKONT, DMBTR, ZZ0014)
values ('666', '2222', '3432', '2200001414', '001', '01', '20190101', '9101000000', 100.00, '101');
1 row created.
SQL> commit;
Commit complete.
SQL> alter system switch logfile;
System altered.
SQL> select * from ztvoucher;
MANDT GJAHR BUKRS BELNR BUZEI MONAT BUDAT HKONT DMBTR ZZ0014
666 2222 3432 2200001414 001 01 20190101 9101000000 100.00 101
2.去kafka查看:
[root@hadoop kafka]# ./console.sh
input topic:ZTVOUCHER_INS
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper]
.{"table":"SCOTT.ZTVOUCHER_INS","op_type":"I","op_ts":"2019-05-22 16:18:58.353767","current_ts":"2019-05-22T16:19:11.352000","pos":"00000000080000012086","tokens":{"TKN-OP-TYPE":"INSERT"},"a
fter":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":100.00,"ZZ0014":"101"}}
發現源端做的insert已經在 topic:ZTVOUCHER_INS有了。
3.源端做update操作:
update ztvoucher set dmbtr=50 where mandt='666';
commit;
alter system switch logfile;
4.去kafka查看:
先看ZTOVOUCHER_INS 內容:
[root@hadoop kafka]# ./console.sh
input topic:ZTVOUCHER_INS
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper]
.{"table":"SCOTT.ZTVOUCHER_INS","op_type":"I","op_ts":"2019-05-22 16:18:58.353767","current_ts":"2019-05-22T16:19:11.352000","pos":"00000000080000012086","tokens":{"TKN-OP-TYPE":"INSERT"},"a
fter":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":100.00,"ZZ0014":"101"}}{"table":"SCOTT.ZTVOUCHER_INS","op_type":"I","op_ts":"2019-05-22 16:22:48.354189","current_ts":"2019-05-22T16:23:33.799000","pos":"00000000080000012613","tokens":{"TKN-OP-TYPE":"SQL COMPUPD
ATE"},"after":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":50.00,"ZZ0014":"101"}}
發現除了之前的insert操作,現在update之後的數據也進來了。
再看ZTVOUCHER_DEL:
[root@hadoop kafka]# ./console.sh
input topic:ZTVOUCHER_DEL
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper]
.{"table":"SCOTT.ZTVOUCHER_DEL","op_type":"I","op_ts":"2019-05-22 16:22:48.354189","current_ts":"2019-05-22T16:23:23.781000","pos":"00000000080000012345","tokens":{"TKN-OP-TYPE":"SQL COMPUPD
ATE"},"after":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":100.00,"ZZ0014":"101"}}
發現DEL的topic中也存入了update之前的數據。
5.源端做delete操作:
delete from ztvoucher where mandt='666';
commit;
alter system switch logfile;
6.去kafka查看:
[root@hadoop kafka]# ./console.sh
input topic:ZTVOUCHER_DEL
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper]
.{"table":"SCOTT.ZTVOUCHER_DEL","op_type":"I","op_ts":"2019-05-22 16:22:48.354189","current_ts":"2019-05-22T16:23:23.781000","pos":"00000000080000012345","tokens":{"TKN-OP-TYPE":"SQL COMPUPD
ATE"},"after":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":100.00,"ZZ0014":"101"}}{"table":"SCOTT.ZTVOUCHER_DEL","op_type":"I","op_ts":"2019-05-22 16:26:26.353705","current_ts":"2019-05-22T16:27:15.049000","pos":"00000000080000012857","tokens":{"TKN-OP-TYPE":"DELETE"},"a
fter":{"MANDT":"666","GJAHR":"2222","BUKRS":"3432","BELNR":"2200001414","BUZEI":"001","MONAT":"01","BUDAT":"20190101","HKONT":"9101000000","DMBTR":50.00,"ZZ0014":"101"}}
發現除了上面update之前的數據以外,還寫入了剛做的delete操作的數據。
好了,現在數據都組織好了,現在去流式創建cube。
二、流式構建cube
流式構建cube官方連接(本人用的2.4版本):
http://kylin.apache.org/cn/docs24/tutorial/cube_streaming.html
流式構建cube需要一個類型爲timestamp的時間列字段用來標識消息的時間,從前面兩個topic中的json數據可以看到,op_ts字段滿足這個要求。
1、用j’son數據定義一張表
先來構建