【Linux+SQL】關於阿里canal的問題總結

@hucong

目錄導航

canal問題總結

重置canal中binlog信息

通常在查看canal.log時,提示一堆錯誤,如reset by peer之類的多半是canal中記錄的binlog位置與MySQL中實際記錄的binlog位置不同造成的

2019-01-15 10:52:20.941 [New I/O server worker #1-3] ERROR c.a.otter.canal.server.netty.handler.SessionHandler - something goes wrong with channel:[id: 0x2a1b7b6f, /149.129.68.40:48252 => /172.26.100.222:11111], exception=java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
	at sun.nio.ch.IOUtil.read(IOUtil.java:192)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:322)
	at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
	at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
  • 排查措施:檢查對應位置的binlog值是否一致

1.首先停止canal服務器,記錄canal服務端的binlog值,配置文件在canal的conf目錄下對應項目的meta.dat文件中

vim usr/local/canal/conf/example/meta.dat

找到對應的binlog信息

"journalName":"mysql-bin.000001","position":43581207,"

2.記錄canal服務器所在的MySQL節點信息

進入MySQL命令行模式

mysql> show master status;
+-----------------+---------+--------------+-----------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-----------------+---------+--------------+-----------------+------------------+
| mysql-bin.000002 | 6399145 |              |                 |                  |
+-----------------+---------+--------------+-----------------+------------------+
1 row in set (0.09 sec)

發現file名稱和position值都不匹配,解決方法有兩種:

  • 替換:將meta.dat中的binlog信息改爲和MySQL一致
  • 重置:清空MySQL中binlog信息(position不一定爲0),然後將meta.dat中的binlog信息改爲和MySQL一致

測試過程中重置的方法基本都能解決大多數問題

binlog重置方法:在MySQL命令模式下

mysql> reset master;

在Linux上運行canal-receiver客戶端

1.安裝配置jdk和maven環境,可以查看CentOS 7.3安裝配置JDK8+Maven

2.拉取canal-receiver代碼

# git clone https://gitee.com/xingcyun/canal-receiver.git       //yum install -y git

3.編譯代碼,在canal-receiver目錄下執行

# mvn clean
# mvn install

都提示成功success說明編譯成功,此時自動生成程序目錄./target
4.啓動程序,在./target目錄下執行

# java -jar canal-receiver.jar start 1

關於如何在後臺運行程序參考:Linux查看、關閉、後臺運行任務

canal配置

###無MQ下canal
設置對應文件的instance.properties(默認example)

# vim conf/example/instance.properties

修改以下配置

#################################################
## mysql serverId , v1.0.26+ will autoGen 
canal.instance.mysql.slaveId=1234

# position info
canal.instance.master.address=127.0.0.1:3306

# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.defaultDatabaseName =material_1703

# table regex
canal.instance.filter.regex=material_1703.bi_bill
#################################################

###canal+kafka配置

  • instance.properties
#################################################
## mysql serverId , v1.0.26+ will autoGen 
canal.instance.mysql.slaveId=1234

# position info
canal.instance.master.address=127.0.0.1:3306

# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.defaultDatabaseName =material_1703

# table regex
canal.instance.filter.regex=material_1703.bi_bill

# mq config
canal.mq.topic=TopicReceiver           
#################################################

注:canal.mq.topic與實際創建的爲準

  • canal.properties
canal.zkServers =39.98.41.26:2181

# tcp, kafka, RocketMQ
canal.serverMode = kafka

canal.destinations = example

canal.mq.servers = 39.98.41.26:9092
canal.mq.retries = 0
canal.mq.batchSize = 16384
canal.mq.maxRequestSize = 1048576
canal.mq.lingerMs = 1
canal.mq.bufferMemory = 33554432
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
canal.mq.flatMessage = true
canal.mq.compressionType = none
canal.mq.acks = all

注:zookeeper端口2181;kafka端口9092

canal問題排查

###123

問題1:

ERROR c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - dump address /192.168.1.50:3306 has an error, retrying. caused by
com.alibaba.otter.canal.parse.exception.CanalParseException: can’t find start position for example

原因:meta.dat 中保存的位點信息和數據庫的位點信息不一致;導致canal抓取不到數據庫的動作;

解決方案:刪除meta.dat刪除,再重啓canal,問題解決;

集羣操作:進入canal對應的zookeeper集羣下,刪除節點/otter/canal/destinations/xxxxx/1001/cursor ;重啓canal即可恢復;

問題2:

java.lang.OutOfMemoryError: Java heap space

canal消費端掛了太久,在zk對應conf下節點的

/otter/canal/destinations/test_db/1001/cursor位點信息是很早以前,導致重啓canal時,從很早以前的位點開始消費,導致canal服務器內存爆掉

監聽數據庫變更,只有TransactionBegin/TransactionEnd,沒有拿到數據的EventType;

原因可能是canal.instance.filter.black.regex=.*\…*導致,改canal.instance.filter.black.regex=再重啓試試;

問題3:

ERROR com.alibaba.otter.canal.common.alarm.LogAlarmHandler - destination:fdyb_db[com.alibaba.otter.canal.parse.exception.CanalParseException: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: com.google.common.collect.ComputationException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql.pds_4490277
Caused by: com.google.common.collect.ComputationException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql.pds_4490277
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql.pds_4490277
Caused by: java.io.IOException: ErrorPacket [errorNumber=1142, fieldCount=-1, message=SELECT command denied to user ‘cy_canal’@‘11.217.0.224’ for table ‘pds_4490277’, sqlState=42000, sqlStateMarker=#]
with command: desc mysql.pds_4490277

分析:mysql系統表權限較高,canal讀該表的binlog失敗,位點無法移動

解決:將配置項中黑名單加上mysql下的所有表:canal.instance.filter.black.regex = mysql\…* ,修改後canal集羣不需要重啓即可恢復;

其它注意點:檢查下CanalConnector是否調用subscribe(filter)方法;有的話,filter需要和instance.properties的canal.instance.filter.regex一致,否則subscribe的filter會覆蓋instance的配置,如果subscribe的filter是.\…,那麼相當於你消費了所有的更新數據。

問題4:

現象:數據庫修改後,canal應用感知不到binlog,數據無法正常消費處理;

定位:1.查看canal服務器,canal應用,zk服務器的日誌,確認無異常;2.查看mysql,es服務器,無異常,3.查看canal服務器,canal應用配置項,發現canal服務器的canal.properties有問題;

原因:canal.properties中配置了canal.ip和canal.zkServers,如果是zk集羣模式下的canal配置了canal.ip,則會優先按IP連接canal服務器,從而讓zk功能失效,位點文件則會保存到本地;一旦本地位點文件出現問題,各方無錯誤日誌,問題就很難排查;

解決:將canal.ip配置項置爲空,關掉canal服務器,canal應用,刪除zk上的節點,重啓canal服務器,canal應用,問題解決;

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章