@hucong
目錄導航
canal問題總結
重置canal中binlog信息
通常在查看canal.log時,提示一堆錯誤,如reset by peer之類的多半是canal中記錄的binlog位置與MySQL中實際記錄的binlog位置不同造成的
2019-01-15 10:52:20.941 [New I/O server worker #1-3] ERROR c.a.otter.canal.server.netty.handler.SessionHandler - something goes wrong with channel:[id: 0x2a1b7b6f, /149.129.68.40:48252 => /172.26.100.222:11111], exception=java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:322)
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
- 排查措施:檢查對應位置的binlog值是否一致
1.首先停止canal服務器,記錄canal服務端的binlog值,配置文件在canal的conf目錄下對應項目的meta.dat文件中
vim usr/local/canal/conf/example/meta.dat
找到對應的binlog信息
"journalName":"mysql-bin.000001","position":43581207,"
2.記錄canal服務器所在的MySQL節點信息
進入MySQL命令行模式
mysql> show master status;
+-----------------+---------+--------------+-----------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-----------------+---------+--------------+-----------------+------------------+
| mysql-bin.000002 | 6399145 | | | |
+-----------------+---------+--------------+-----------------+------------------+
1 row in set (0.09 sec)
發現file名稱和position值都不匹配,解決方法有兩種:
- 替換:將meta.dat中的binlog信息改爲和MySQL一致
- 重置:清空MySQL中binlog信息(position不一定爲0),然後將meta.dat中的binlog信息改爲和MySQL一致
測試過程中重置的方法基本都能解決大多數問題
binlog重置方法:在MySQL命令模式下
mysql> reset master;
在Linux上運行canal-receiver客戶端
1.安裝配置jdk和maven環境,可以查看CentOS 7.3安裝配置JDK8+Maven
2.拉取canal-receiver代碼
# git clone https://gitee.com/xingcyun/canal-receiver.git //yum install -y git
3.編譯代碼,在canal-receiver目錄下執行
# mvn clean
# mvn install
都提示成功success說明編譯成功,此時自動生成程序目錄./target
4.啓動程序,在./target目錄下執行
# java -jar canal-receiver.jar start 1
關於如何在後臺運行程序參考:Linux查看、關閉、後臺運行任務
canal配置
###無MQ下canal
設置對應文件的instance.properties(默認example)
# vim conf/example/instance.properties
修改以下配置
#################################################
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=1234
# position info
canal.instance.master.address=127.0.0.1:3306
# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.defaultDatabaseName =material_1703
# table regex
canal.instance.filter.regex=material_1703.bi_bill
#################################################
###canal+kafka配置
- instance.properties
#################################################
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=1234
# position info
canal.instance.master.address=127.0.0.1:3306
# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.defaultDatabaseName =material_1703
# table regex
canal.instance.filter.regex=material_1703.bi_bill
# mq config
canal.mq.topic=TopicReceiver
#################################################
注:canal.mq.topic與實際創建的爲準
- canal.properties
canal.zkServers =39.98.41.26:2181
# tcp, kafka, RocketMQ
canal.serverMode = kafka
canal.destinations = example
canal.mq.servers = 39.98.41.26:9092
canal.mq.retries = 0
canal.mq.batchSize = 16384
canal.mq.maxRequestSize = 1048576
canal.mq.lingerMs = 1
canal.mq.bufferMemory = 33554432
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
canal.mq.flatMessage = true
canal.mq.compressionType = none
canal.mq.acks = all
注:zookeeper端口2181;kafka端口9092
canal問題排查
###123
問題1:
ERROR c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - dump address /192.168.1.50:3306 has an error, retrying. caused by
com.alibaba.otter.canal.parse.exception.CanalParseException: can’t find start position for example
原因:meta.dat 中保存的位點信息和數據庫的位點信息不一致;導致canal抓取不到數據庫的動作;
解決方案:刪除meta.dat刪除,再重啓canal,問題解決;
集羣操作:進入canal對應的zookeeper集羣下,刪除節點/otter/canal/destinations/xxxxx/1001/cursor ;重啓canal即可恢復;
問題2:
java.lang.OutOfMemoryError: Java heap space
canal消費端掛了太久,在zk對應conf下節點的
/otter/canal/destinations/test_db/1001/cursor
位點信息是很早以前,導致重啓canal時,從很早以前的位點開始消費,導致canal服務器內存爆掉
監聽數據庫變更,只有TransactionBegin/TransactionEnd,沒有拿到數據的EventType;
原因可能是canal.instance.filter.black.regex=.*\…*導致,改canal.instance.filter.black.regex=再重啓試試;
問題3:
ERROR com.alibaba.otter.canal.common.alarm.LogAlarmHandler - destination:fdyb_db[com.alibaba.otter.canal.parse.exception.CanalParseException: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: com.google.common.collect.ComputationException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql
.pds_4490277
Caused by: com.google.common.collect.ComputationException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql
.pds_4490277
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql
.pds_4490277
Caused by: java.io.IOException: ErrorPacket [errorNumber=1142, fieldCount=-1, message=SELECT command denied to user ‘cy_canal’@‘11.217.0.224’ for table ‘pds_4490277’, sqlState=42000, sqlStateMarker=#]
with command: descmysql
.pds_4490277
分析:mysql系統表權限較高,canal讀該表的binlog失敗,位點無法移動
解決:將配置項中黑名單加上mysql下的所有表:canal.instance.filter.black.regex = mysql\…* ,修改後canal集羣不需要重啓即可恢復;
其它注意點:檢查下CanalConnector是否調用subscribe(filter)方法;有的話,filter需要和instance.properties的canal.instance.filter.regex一致,否則subscribe的filter會覆蓋instance的配置,如果subscribe的filter是.\…,那麼相當於你消費了所有的更新數據。
問題4:
現象:數據庫修改後,canal應用感知不到binlog,數據無法正常消費處理;
定位:1.查看canal服務器,canal應用,zk服務器的日誌,確認無異常;2.查看mysql,es服務器,無異常,3.查看canal服務器,canal應用配置項,發現canal服務器的canal.properties有問題;
原因:canal.properties中配置了canal.ip和canal.zkServers,如果是zk集羣模式下的canal配置了canal.ip,則會優先按IP連接canal服務器,從而讓zk功能失效,位點文件則會保存到本地;一旦本地位點文件出現問題,各方無錯誤日誌,問題就很難排查;
解決:將canal.ip配置項置爲空,關掉canal服務器,canal應用,刪除zk上的節點,重啓canal服務器,canal應用,問題解決;