準實時索引搭建canal
canal 是阿里的一款中間件,source 爲 mysql,target 爲其他存儲,阿里的 canal 藉助於 mysql 主備同步的機制,僞裝成 mysql 的一個備庫,去感知 mysql 當中的 binlog 二進制信息的變化,同時同步出來一個結構化的數據交給 target 消費端進行信息模型的轉換,可以將 mysql 中變化的數據通過管道存儲到其他的存儲中。
下載解壓
下載阿里巴巴的canal
組件,下載地址canal,之後上傳到集羣節點的/opt/software
目錄中,然後將其解壓到/opt/apps
目錄下:
# 這裏我爲了方便期間,下載了以下四個內容,如果只需要使用 canal,下載 adapter 和 deployer 即可
# 解壓,解壓前需要先創建 /opt/apps/adapter-1.1.4、/opt/apps/admin-1.1.4、/opt/apps/deployer-1.1.4 和 example-1.1.4 目錄
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/adapter-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/admin-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/deployer-1.1.4
[yangqi@yankee software]$ tar -zvxf canal.adapter-1.1.4.tar.gz -C ../apps/example-1.1.4
配置mysql
配置mysql
開啓主從:
# 因爲 mysql 默認是沒有開啓主從的,所以需要先修改 mysql 爲 master 節點
# linux 如果使用 rpm 安裝的,那麼 my.cnf 配置文件一般在 /etc/my.cnf
[yangqi@yankee software]$ sudo vi /etc/my.cnf
# 在最後加上以下內容:
=====================================================================
server-id=1
binlog_format=ROW
log_bin=mysql_bin
=====================================================================
# 配置完成之後,重啓 mysqld 服務
[yangqi@yankee software]$ sudo systemctl restart mysqld
# 查看是否配置成功
# 連接 mysql
[yangqi@yankee software]$ mysql -u root -pxiaoer
mysql> show variables like 'log_bin';
# 出現以下內容則表示已經配置好了該節點的 mysql 開啓了 bin_log
=====================================================================
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin | ON |
+---------------+-------+
1 row in set (0.07 sec)
=====================================================================
# 一般情況下不會將 root 賬戶交給主從使用,所以需要新建一個賬戶,我已經新建過了,所以就不再進行新建
# 但是需要進行授權,授予 select、replication slave 和 replication client 權限
mysql> grant select,replication slave,replication client on *.* to 'yangqi'@'%' identified by 'xiaoer';
# 出現以下內容則表示已經成功
=====================================================================
Query OK, 0 rows affected, 1 warning (0.05 sec)
=====================================================================
# 還需要給 yangqi 賬戶 localhost 連接授予權限
# 但是需要進行授權,授予 select、replication slave 和 replication client 權限
mysql> grant select,replication slave,replication client on *.* to 'yangqi'@'localhost' identified by 'xiaoer';
# 出現以下內容則表示已經成功
=====================================================================
Query OK, 0 rows affected, 1 warning (0.05 sec)
=====================================================================
# 刷新權限
mysql> flush privileges;
配置canal
管道
[yangqi@yankee apps]$ cd deployer-1.1.4/conf/example
# 編輯 instance.properties 文件
[yangqi@yankee example]$ vi instance.properties
# 修改以下內容
=====================================================================
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=2
# username/password
canal.instance.dbUsername=yangqi
canal.instance.dbPassword=xiaoer
=====================================================================
# 啓動 deployer
[yangqi@yankee example]$ cd ../../
[yangqi@yankee deployer-1.1.4]$ bin/startup.sh
# 查看 deployer 是否啓動
[yangqi@yankee deployer-1.1.4]$ ps -ef | grep canal
# 或者查看端口 11111 是否被佔用
=====================================================================
[yangqi@yankee deployer-1.1.4]$ netstat -ntulp | grep 11111
# 出現如下信息則表示啓動成功
tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN 45531/java
=====================================================================
啓動錯誤
有時候因爲內存的關係可能沒有啓動成功,可以查看日誌文件logs/canal/canal_stdout.log
文件,如果報錯類似如下:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1073741824 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/apps/deployer-1.1.4/bin/hs_err_pid45386.log
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=96m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000700000000, 1073741824, 0) failed; error='Cannot allocate memory' (errno=12)
可能是由於內存不足引起的錯誤,可以修改startup.sh
中的如下參數:
# 可以根據自己的機器適當調整 -Xms -Xmx -Xmn 參數
if [ -n "$str" ]; then
JAVA_OPTS="-server -Xms256m -Xmx256m -Xmn256m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError"
else
JAVA_OPTS="-server -Xms256m -Xmx256m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m "
fi
配置canal adapter
修改canal-adapter
模塊源碼
# 由於 adapter 不兼容 elasticsearch-7.3.0,我們將源碼包下載到本地進行重新編譯,下載時一定要注意選擇對應的版本
# 修改 canal-adapter 模塊中的 pom.xml 文件中的四個 elasticsearch 相關的依賴包版本爲 7.3.0
# 進入命令行,進入源碼的根目錄下,我的是 canal-canal-1.1.4 目錄,執行
mvn clean package -DskipTests
# 第一次執行可能會報如下錯誤
=====================================================================
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile (default-compile) on project client-adapter.elasticsearch: Compilation failure
[ERROR] /E:/code/JavaEE/github/canal-canal-1.1.4/client-adapter/elasticsearch/src/main/java/com/alibaba/otter/canal/client/adapter/es/ESAdapter.java:[223,56] 不兼容的類型: org.apache.lucene.search.TotalHits無法轉換爲long
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :client-adapter.elasticsearch
=====================================================================
# 修改 ESAdapter 類中的第 233 行,修改爲如下
=====================================================================
long rowCount = response.getHits().getTotalHits().value;
=====================================================================
# 第二次執行可能會報如下錯誤
=====================================================================
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile (default-compile) on project client-adapter.elasticsearch: Compilation failure
[ERROR] /E:/code/JavaEE/github/canal-canal-1.1.4/client-adapter/elasticsearch/src/main/java/com/alibaba/otter/canal/client/adapter/es/support/ESConnection.java:[420,47] 無法將類 org.elasticsearch.client.RestHighLevelClient中的方法 bulk應用 到給定類型;
[ERROR] 需要: org.elasticsearch.action.bulk.BulkRequest,org.elasticsearch.client.RequestOptions
[ERROR] 找到: org.elasticsearch.action.bulk.BulkRequest
[ERROR] 原因: 實際參數列表和形式參數列表長度不同
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :client-adapter.elasticsearch
=====================================================================
# 修改 ESConnection 類中的第 420 行,修改爲如下
=====================================================================
return restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
=====================================================================
# 等打包完成後,進入 client-adapter/launcher/target/ 目錄中,將新編譯好的 canal-adapter 上傳到集羣節點的 apps 目錄中,刪掉之前解壓的 adapter-1.1.4
配置adapter-1.1.4
# 修改 canal-adapter 目錄名爲 adapter-1.1.4
[yangqi@yankee apps]$ mv canal-adapter adapter-1.1.4
# 修改 adapter-1.1.4 相關配置
[yangqi@yankee adapter-1.1.4]$ vi ./conf/application.yml
# 修改爲以下內容
=====================================================================
srcDataSources:
defaultDS:
url: jdbc:mysql://127.0.0.1:3306/recommendedsystem?useUnicode=true
username: yangqi
password: xiaoer
- name: es
# 自己 es 集羣的地址
hosts: 192.168.21.89:9300
properties:
mode: transport
# security.auth: test:123456 # only used for rest mode
# 自己 es 集羣的名字
cluster.name: Yankee
=====================================================================
# 修改 es 相關配置,在 adapter-1.1.4/conf/es 目錄下新建 shop.yml
[yangqi@yankee adapter-1.1.4]$ vi ./conf/es/shop.yml
# 寫入以下內容
=====================================================================
dataSourceKey: defaultDS
destination: example
groupId:
esMapping:
_index: shop
_type: _doc
_id: id
upsert: true
sql: "select a.id, a.name, a.tags, concat(a.latitude, ',', a.longitude) as location, a.remark_score, a.price_per_man, a.category_id, b.name as category_name, a.seller_id, c.remark_score as seller_remark_score, c.disabled_flag as seller_disabled_flag from shop a inner join category b on a.category_id = b.id inner join seller c on c.id = a.seller_id"
commitBash: 3000
=====================================================================
啓動adapter
# 由於 adapter-1.1.4 是新編譯生成的,所以要給 bin/startup.sh 和 bin/stop.sh 授予可執行權限
[yangqi@yankee adapter-1.1.4]$ chmod 764 bin/startup.sh
[yangqi@yankee adapter-1.1.4]$ chmod 764 bin/stop.sh
# 啓動 adapter
[yangqi@yankee adapter-1.1.4]$ bin/startup.sh
啓動錯誤
有時候因爲內存的關係可能沒有啓動成功,可以查看日誌文件bin/hs_err_pid48030.log
文件,如果報錯類似如下:
Memory: 4k page, physical 1863104k(71356k free), swap 4001788k(578132k free)
可能是由於內存不足引起的錯誤,可以修改startup.sh
中的如下參數:
# 可以根據自己的機器適當調整 -Xms -Xmx -Xmn 參數
if [ -n "$str" ]; then
JAVA_OPTS="-server -Xms256m -Xmx256m -Xmn256m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError"
else
JAVA_OPTS="-server -Xms256m -Xmx256m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m "
fi
啓動
# 啓動 adapter
[yangqi@yankee adapter-1.1.4]$ bin/startup.sh
# 查看 adapter 是否啓動
[yangqi@yankee adapter-1.1.4]$ ps -ef | grep canal
# 或者查看端口 11111 是否被佔用
=====================================================================
[yangqi@yankee adapter-1.1.4]$ netstat -en | grep 11111
# 出現如下信息則表示啓動成功
tcp 0 0 127.0.0.1:44766 127.0.0.1:11111 ESTABLISHED 1000 5460274
tcp 0 0 127.0.0.1:11111 127.0.0.1:44766 ESTABLISHED 1000 5459446
=====================================================================
測試canal
# 繼續監視 adapter-1.1.4/logs/adapter/adapter.log
[yangqi@yankee adapter-1.1.4]$ tail -f logs/adapter/adapter.log
# 修改 mysql 數組庫中的內容,可以看到 adapter.log 日誌近乎同時打印出來了所修改的內容
查看adapter.log
日誌,看到以下報錯信息:
修改aplication.yml
文件,刪除掉以下內容:
# 刪除掉 es 模塊中的以下內容
mode: transport
重新啓動 adapater
測試,觀察adapter.log
文件內容:
構建方式
canal 在發現 mysql 中的數據發生了變化之後,會進行準實時的更新,在更新時,canal 會檢測是哪一個 id 發生了改變,從而去更新某一個被修改的 id 的內容,但是在修改時並不是很智能,加入我們修改的是 name 字段,那麼它只會修改 id 爲 1 的 name 字段的值,比如同時存在兩個不同的 name,那麼此時 canal 會將這兩個 name 同時進行修改,並且修改爲剛纔在數據庫中所修改的內容。
所以直接使用 adapter 進行構建明顯不能滿足比較複雜的情況。