【Linux+SQL】关于阿里canal的问题总结

@hucong

目录导航

canal问题总结

重置canal中binlog信息

通常在查看canal.log时,提示一堆错误,如reset by peer之类的多半是canal中记录的binlog位置与MySQL中实际记录的binlog位置不同造成的

2019-01-15 10:52:20.941 [New I/O server worker #1-3] ERROR c.a.otter.canal.server.netty.handler.SessionHandler - something goes wrong with channel:[id: 0x2a1b7b6f, /149.129.68.40:48252 => /172.26.100.222:11111], exception=java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
	at sun.nio.ch.IOUtil.read(IOUtil.java:192)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:322)
	at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
	at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
  • 排查措施:检查对应位置的binlog值是否一致

1.首先停止canal服务器,记录canal服务端的binlog值,配置文件在canal的conf目录下对应项目的meta.dat文件中

vim usr/local/canal/conf/example/meta.dat

找到对应的binlog信息

"journalName":"mysql-bin.000001","position":43581207,"

2.记录canal服务器所在的MySQL节点信息

进入MySQL命令行模式

mysql> show master status;
+-----------------+---------+--------------+-----------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-----------------+---------+--------------+-----------------+------------------+
| mysql-bin.000002 | 6399145 |              |                 |                  |
+-----------------+---------+--------------+-----------------+------------------+
1 row in set (0.09 sec)

发现file名称和position值都不匹配,解决方法有两种:

  • 替换:将meta.dat中的binlog信息改为和MySQL一致
  • 重置:清空MySQL中binlog信息(position不一定为0),然后将meta.dat中的binlog信息改为和MySQL一致

测试过程中重置的方法基本都能解决大多数问题

binlog重置方法:在MySQL命令模式下

mysql> reset master;

在Linux上运行canal-receiver客户端

1.安装配置jdk和maven环境,可以查看CentOS 7.3安装配置JDK8+Maven

2.拉取canal-receiver代码

# git clone https://gitee.com/xingcyun/canal-receiver.git       //yum install -y git

3.编译代码,在canal-receiver目录下执行

# mvn clean
# mvn install

都提示成功success说明编译成功,此时自动生成程序目录./target
4.启动程序,在./target目录下执行

# java -jar canal-receiver.jar start 1

关于如何在后台运行程序参考:Linux查看、关闭、后台运行任务

canal配置

###无MQ下canal
设置对应文件的instance.properties(默认example)

# vim conf/example/instance.properties

修改以下配置

#################################################
## mysql serverId , v1.0.26+ will autoGen 
canal.instance.mysql.slaveId=1234

# position info
canal.instance.master.address=127.0.0.1:3306

# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.defaultDatabaseName =material_1703

# table regex
canal.instance.filter.regex=material_1703.bi_bill
#################################################

###canal+kafka配置

  • instance.properties
#################################################
## mysql serverId , v1.0.26+ will autoGen 
canal.instance.mysql.slaveId=1234

# position info
canal.instance.master.address=127.0.0.1:3306

# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.defaultDatabaseName =material_1703

# table regex
canal.instance.filter.regex=material_1703.bi_bill

# mq config
canal.mq.topic=TopicReceiver           
#################################################

注:canal.mq.topic与实际创建的为准

  • canal.properties
canal.zkServers =39.98.41.26:2181

# tcp, kafka, RocketMQ
canal.serverMode = kafka

canal.destinations = example

canal.mq.servers = 39.98.41.26:9092
canal.mq.retries = 0
canal.mq.batchSize = 16384
canal.mq.maxRequestSize = 1048576
canal.mq.lingerMs = 1
canal.mq.bufferMemory = 33554432
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
canal.mq.flatMessage = true
canal.mq.compressionType = none
canal.mq.acks = all

注:zookeeper端口2181;kafka端口9092

canal问题排查

###123

问题1:

ERROR c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - dump address /192.168.1.50:3306 has an error, retrying. caused by
com.alibaba.otter.canal.parse.exception.CanalParseException: can’t find start position for example

原因:meta.dat 中保存的位点信息和数据库的位点信息不一致;导致canal抓取不到数据库的动作;

解决方案:删除meta.dat删除,再重启canal,问题解决;

集群操作:进入canal对应的zookeeper集群下,删除节点/otter/canal/destinations/xxxxx/1001/cursor ;重启canal即可恢复;

问题2:

java.lang.OutOfMemoryError: Java heap space

canal消费端挂了太久,在zk对应conf下节点的

/otter/canal/destinations/test_db/1001/cursor位点信息是很早以前,导致重启canal时,从很早以前的位点开始消费,导致canal服务器内存爆掉

监听数据库变更,只有TransactionBegin/TransactionEnd,没有拿到数据的EventType;

原因可能是canal.instance.filter.black.regex=.*\…*导致,改canal.instance.filter.black.regex=再重启试试;

问题3:

ERROR com.alibaba.otter.canal.common.alarm.LogAlarmHandler - destination:fdyb_db[com.alibaba.otter.canal.parse.exception.CanalParseException: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: com.google.common.collect.ComputationException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql.pds_4490277
Caused by: com.google.common.collect.ComputationException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql.pds_4490277
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:mysql.pds_4490277
Caused by: java.io.IOException: ErrorPacket [errorNumber=1142, fieldCount=-1, message=SELECT command denied to user ‘cy_canal’@‘11.217.0.224’ for table ‘pds_4490277’, sqlState=42000, sqlStateMarker=#]
with command: desc mysql.pds_4490277

分析:mysql系统表权限较高,canal读该表的binlog失败,位点无法移动

解决:将配置项中黑名单加上mysql下的所有表:canal.instance.filter.black.regex = mysql\…* ,修改后canal集群不需要重启即可恢复;

其它注意点:检查下CanalConnector是否调用subscribe(filter)方法;有的话,filter需要和instance.properties的canal.instance.filter.regex一致,否则subscribe的filter会覆盖instance的配置,如果subscribe的filter是.\…,那么相当于你消费了所有的更新数据。

问题4:

现象:数据库修改后,canal应用感知不到binlog,数据无法正常消费处理;

定位:1.查看canal服务器,canal应用,zk服务器的日志,确认无异常;2.查看mysql,es服务器,无异常,3.查看canal服务器,canal应用配置项,发现canal服务器的canal.properties有问题;

原因:canal.properties中配置了canal.ip和canal.zkServers,如果是zk集群模式下的canal配置了canal.ip,则会优先按IP连接canal服务器,从而让zk功能失效,位点文件则会保存到本地;一旦本地位点文件出现问题,各方无错误日志,问题就很难排查;

解决:将canal.ip配置项置为空,关掉canal服务器,canal应用,删除zk上的节点,重启canal服务器,canal应用,问题解决;

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章