準實時索引搭建canal

準實時索引搭建canal

canal 是阿里的一款中間件,source 爲 mysql,target 爲其他存儲,阿里的 canal 藉助於 mysql 主備同步的機制,僞裝成 mysql 的一個備庫,去感知 mysql 當中的 binlog 二進制信息的變化,同時同步出來一個結構化的數據交給 target 消費端進行信息模型的轉換,可以將 mysql 中變化的數據通過管道存儲到其他的存儲中。
下載解壓

下載阿里巴巴的canal組件,下載地址canal,之後上傳到集羣節點的/opt/software目錄中,然後將其解壓到/opt/apps目錄下:

# 這裏我爲了方便期間,下載了以下四個內容,如果只需要使用 canal,下載 adapter 和 deployer 即可
# 解壓,解壓前需要先創建 /opt/apps/adapter-1.1.4、/opt/apps/admin-1.1.4、/opt/apps/deployer-1.1.4 和 example-1.1.4 目錄
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/adapter-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/admin-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/deployer-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/example-1.1.4
配置mysql

配置mysql 開啓主從:

# 因爲 mysql 默認是沒有開啓主從的,所以需要先修改 mysql 爲 master 節點
# linux 如果使用 rpm 安裝的,那麼 my.cnf 配置文件一般在 /etc/my.cnf
[yangqi@yankee software]$ sudo vi /etc/my.cnf
# 在最後加上以下內容:
=====================================================================
server-id=1
binlog_format=ROW
log_bin=mysql_bin
=====================================================================
# 配置完成之後,重啓 mysqld 服務
[yangqi@yankee software]$ sudo systemctl restart mysqld

# 查看是否配置成功
# 連接 mysql
[yangqi@yankee software]$ mysql -u root -pxiaoer
mysql> show variables like 'log_bin';
# 出現以下內容則表示已經配置好了該節點的 mysql 開啓了 bin_log
=====================================================================
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin       | ON    |
+---------------+-------+
1 row in set (0.07 sec)
=====================================================================

# 一般情況下不會將 root 賬戶交給主從使用,所以需要新建一個賬戶,我已經新建過了,所以就不再進行新建
# 但是需要進行授權,授予 select、replication slave 和 replication client 權限
mysql> grant select,replication slave,replication client on *.* to 'yangqi'@'%' identified by 'xiaoer';
# 出現以下內容則表示已經成功
=====================================================================
Query OK, 0 rows affected, 1 warning (0.05 sec)
=====================================================================
# 還需要給 yangqi 賬戶 localhost 連接授予權限
# 但是需要進行授權,授予 select、replication slave 和 replication client 權限
mysql> grant select,replication slave,replication client on *.* to 'yangqi'@'localhost' identified by 'xiaoer';
# 出現以下內容則表示已經成功
=====================================================================
Query OK, 0 rows affected, 1 warning (0.05 sec)
=====================================================================
# 刷新權限
mysql> flush privileges;
配置canal管道
[yangqi@yankee apps]$ cd deployer-1.1.4/conf/example
# 編輯 instance.properties 文件
[yangqi@yankee example]$ vi instance.properties
# 修改以下內容
=====================================================================
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=2

# username/password
canal.instance.dbUsername=yangqi
canal.instance.dbPassword=xiaoer
=====================================================================
# 啓動 deployer
[yangqi@yankee example]$ cd ../../
[yangqi@yankee deployer-1.1.4]$ bin/startup.sh
# 查看 deployer 是否啓動
[yangqi@yankee deployer-1.1.4]$ ps -ef | grep canal
# 或者查看端口 11111 是否被佔用
=====================================================================
[yangqi@yankee deployer-1.1.4]$ netstat -ntulp | grep 11111
# 出現如下信息則表示啓動成功
tcp        0      0 0.0.0.0:11111           0.0.0.0:*               LISTEN      45531/java  
=====================================================================
啓動錯誤

有時候因爲內存的關係可能沒有啓動成功,可以查看日誌文件logs/canal/canal_stdout.log文件,如果報錯類似如下:

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1073741824 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/apps/deployer-1.1.4/bin/hs_err_pid45386.log
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=96m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000700000000, 1073741824, 0) failed; error='Cannot allocate memory' (errno=12)

可能是由於內存不足引起的錯誤,可以修改startup.sh中的如下參數:

# 可以根據自己的機器適當調整 -Xms -Xmx -Xmn 參數
if [ -n "$str" ]; then
        JAVA_OPTS="-server -Xms256m -Xmx256m -Xmn256m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError"
else
        JAVA_OPTS="-server -Xms256m -Xmx256m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m "
fi
配置canal adapter
修改canal-adapter模塊源碼
# 由於 adapter 不兼容 elasticsearch-7.3.0,我們將源碼包下載到本地進行重新編譯,下載時一定要注意選擇對應的版本
# 修改 canal-adapter 模塊中的 pom.xml 文件中的四個 elasticsearch 相關的依賴包版本爲 7.3.0
# 進入命令行,進入源碼的根目錄下,我的是 canal-canal-1.1.4 目錄,執行
mvn clean package -DskipTests

# 第一次執行可能會報如下錯誤
=====================================================================
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile (default-compile) on project client-adapter.elasticsearch: Compilation failure
[ERROR] /E:/code/JavaEE/github/canal-canal-1.1.4/client-adapter/elasticsearch/src/main/java/com/alibaba/otter/canal/client/adapter/es/ESAdapter.java:[223,56] 不兼容的類型: org.apache.lucene.search.TotalHits無法轉換爲long
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :client-adapter.elasticsearch
=====================================================================
# 修改 ESAdapter 類中的第 233 行,修改爲如下
=====================================================================
long rowCount = response.getHits().getTotalHits().value;
=====================================================================

# 第二次執行可能會報如下錯誤
=====================================================================
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile (default-compile) on project client-adapter.elasticsearch: Compilation failure
[ERROR] /E:/code/JavaEE/github/canal-canal-1.1.4/client-adapter/elasticsearch/src/main/java/com/alibaba/otter/canal/client/adapter/es/support/ESConnection.java:[420,47] 無法將類 org.elasticsearch.client.RestHighLevelClient中的方法 bulk應用 到給定類型;
[ERROR]   需要: org.elasticsearch.action.bulk.BulkRequest,org.elasticsearch.client.RequestOptions
[ERROR]   找到: org.elasticsearch.action.bulk.BulkRequest
[ERROR]   原因: 實際參數列表和形式參數列表長度不同
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :client-adapter.elasticsearch
=====================================================================
# 修改 ESConnection 類中的第 420 行,修改爲如下
=====================================================================
return restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
=====================================================================

# 等打包完成後,進入 client-adapter/launcher/target/ 目錄中,將新編譯好的 canal-adapter 上傳到集羣節點的 apps 目錄中,刪掉之前解壓的 adapter-1.1.4
配置adapter-1.1.4
# 修改 canal-adapter 目錄名爲 adapter-1.1.4
[yangqi@yankee apps]$ mv canal-adapter adapter-1.1.4
# 修改 adapter-1.1.4 相關配置
[yangqi@yankee adapter-1.1.4]$ vi ./conf/application.yml
# 修改爲以下內容
=====================================================================
srcDataSources:
    defaultDS:
      url: jdbc:mysql://127.0.0.1:3306/recommendedsystem?useUnicode=true
      username: yangqi
      password: xiaoer
      
- name: es
	    # 自己 es 集羣的地址
        hosts: 192.168.21.89:9300 
        properties:
          mode: transport 
          # security.auth: test:123456 #  only used for rest mode
          # 自己 es 集羣的名字
          cluster.name: Yankee
=====================================================================

# 修改 es 相關配置,在 adapter-1.1.4/conf/es 目錄下新建 shop.yml
[yangqi@yankee adapter-1.1.4]$ vi ./conf/es/shop.yml
# 寫入以下內容
=====================================================================
dataSourceKey: defaultDS
destination: example
groupId: 
esMapping: 
    _index: shop
    _type: _doc
    _id: id
    upsert: true
    sql: "select a.id, a.name, a.tags, concat(a.latitude, ',', a.longitude) as location, a.remark_score, a.price_per_man, a.category_id, b.name as category_name, a.seller_id, c.remark_score as seller_remark_score, c.disabled_flag as seller_disabled_flag from shop a inner join category b on a.category_id = b.id inner join seller c on c.id = a.seller_id"
    commitBash: 3000
=====================================================================
啓動adapter
# 由於 adapter-1.1.4 是新編譯生成的,所以要給 bin/startup.sh 和 bin/stop.sh 授予可執行權限
[yangqi@yankee adapter-1.1.4]$ chmod 764 bin/startup.sh
[yangqi@yankee adapter-1.1.4]$ chmod 764 bin/stop.sh

# 啓動 adapter
[yangqi@yankee adapter-1.1.4]$ bin/startup.sh
啓動錯誤

有時候因爲內存的關係可能沒有啓動成功,可以查看日誌文件bin/hs_err_pid48030.log文件,如果報錯類似如下:

Memory: 4k page, physical 1863104k(71356k free), swap 4001788k(578132k free)

可能是由於內存不足引起的錯誤,可以修改startup.sh中的如下參數:

# 可以根據自己的機器適當調整 -Xms -Xmx -Xmn 參數
if [ -n "$str" ]; then
        JAVA_OPTS="-server -Xms256m -Xmx256m -Xmn256m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError"
else
        JAVA_OPTS="-server -Xms256m -Xmx256m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m "
fi
啓動
# 啓動 adapter
[yangqi@yankee adapter-1.1.4]$ bin/startup.sh
# 查看 adapter 是否啓動
[yangqi@yankee adapter-1.1.4]$ ps -ef | grep canal
# 或者查看端口 11111 是否被佔用
=====================================================================
[yangqi@yankee adapter-1.1.4]$ netstat -en | grep 11111
# 出現如下信息則表示啓動成功
tcp        0      0 127.0.0.1:44766         127.0.0.1:11111         ESTABLISHED 1000       5460274 
tcp        0      0 127.0.0.1:11111         127.0.0.1:44766         ESTABLISHED 1000       5459446
=====================================================================
測試canal
# 繼續監視 adapter-1.1.4/logs/adapter/adapter.log
[yangqi@yankee adapter-1.1.4]$ tail -f logs/adapter/adapter.log

# 修改 mysql 數組庫中的內容,可以看到 adapter.log 日誌近乎同時打印出來了所修改的內容

在這裏插入圖片描述

查看adapter.log日誌,看到以下報錯信息:

在這裏插入圖片描述

修改aplication.yml文件,刪除掉以下內容:

# 刪除掉 es 模塊中的以下內容
mode: transport

重新啓動 adapater測試,觀察adapter.log文件內容:

在這裏插入圖片描述

構建方式
canal 在發現 mysql 中的數據發生了變化之後,會進行準實時的更新,在更新時,canal 會檢測是哪一個 id 發生了改變,從而去更新某一個被修改的 id 的內容,但是在修改時並不是很智能,加入我們修改的是 name 字段,那麼它只會修改 id 爲 1 的 name 字段的值,比如同時存在兩個不同的 name,那麼此時 canal 會將這兩個 name 同時進行修改,並且修改爲剛纔在數據庫中所修改的內容。
所以直接使用 adapter 進行構建明顯不能滿足比較複雜的情況。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章