系統查詢速度慢,就想用elasticsearch增加查詢速度。並且能把pdf,csv 導入到elasticsearch.
系統使用了springboot版本號是.2.0.6.RELEASE。那麼首先要確認elasticsearch的版本號
........
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.6.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<dependencies>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-elasticsearch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
......
在intellij idea 的maven包裏面可以看到版本號
SpringBoot 2.0.6版本對應的是es5.6.12的版本
那麼就是
1.spring boot 2.0.6
2.jdk8
3.elastic search 版本號:5.6.12
4.elasticsearch-head(可視化插件,爲了不出錯,要和elastic search版本號一致),版本號:5.6.12
5.logstash-5.6.12。能導入mysql(或其他數據庫),csv到elastic search
6.fscrawler-es5-2.6。可以導入pdf,word文件。
參考網站
1.參考代碼(稍微修改):https://github.com/fonxian/spring-elasticsearch-example
2.elastic search,logstash都是在這裏下載:https://www.elastic.co/cn/downloads/past-releases
3.elastic search下載:https://www.elastic.co/cn/downloads/past-releases/elasticsearch-5-6-12
4.logstash下載:https://www.elastic.co/cn/downloads/past-releases/logstash-5-6-12
5.安裝elastic search header(先要安裝nodejs)的參考https://www.cnblogs.com/asker009/p/10045125.html
6.fscrawler下載安裝說明的網站:https://fscrawler.readthedocs.io/en/fscrawler-2.6/
7.安裝包放在百度網盤一份
鏈接:https://pan.baidu.com/s/1CzHfg0lvPI81kxgWpkLHxg
提取碼:aake
安裝說明省略。
啓動這些服務
1.啓動elastic search
直接雙擊:E:\soft\elasticsearch-5.6.12\bin\elasticsearch.bat
網頁內容:
name "3cgkZDo"
cluster_name "elasticsearch"
cluster_uuid "7Np7tHwwSpy_T-khLB7utA"
version
number "5.6.12"
build_hash "cfe3d9f"
build_date "2018-09-10T20:12:43.732Z"
build_snapshot false
lucene_version "6.6.1"
tagline "You Know, for Search"
這個時候,intellij idea 的application.properties的才能生效。否則報錯
spring.data.elasticsearch.cluster-nodes = localhost:9300
2。啓動elastic search header
doc命令:
Microsoft Windows [版本 10.0.17763.615]
(c) 2018 Microsoft Corporation。保留所有權利。
C:\Users\lunmei>cd E:\soft\elasticsearch-head-master
C:\Users\lunmei>e:
E:\soft\elasticsearch-head-master>grunt server
Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:9100
3.用logstash導入csv,mysql
新增文件E:\soft\logstash-5.6.12\bin\logstash-csv.conf。用來把文件C:\Users\lunmei\Desktop\sys_role.csv導入到elasticsearch
input {
file {
path => ["C:\Users\lunmei\Desktop\sys_role.csv"]
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns => ["id","name","value","tips","status","create_time","update_time"]
}
mutate {
convert => {
"id" => "string"
}
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "role"
document_id => "%{id}"
document_type => "role"
}
stdout{
codec => rubydebug
}
}
新增文件:E:\soft\logstash-5.6.12\bin\logstash-mysql.conf。用來把sql數據導入到elastic search
input {
stdin {
}
jdbc {
jdbc_connection_string => "jdbc:mysql://192.168.0.100:3306/zillion-wfm?characterEncoding=UTF-8&useSSL=false&autoReconnect=true"
jdbc_user => "root"
jdbc_password => "root"
jdbc_driver_library => "mysql-connector-java-5.1.47.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
jdbc_default_timezone => "Asia/Shanghai"
record_last_run => true
use_column_value => true
tracking_column => "price"
last_run_metadata_path => "my_info_last"
#statement_filepath => "jdbc-sql.sql"
statement => "SELECT * FROM kn_knowledge"
schedule => "* * * * *"
type => "knowledge"
}
}
filter {
json {
source => "message"
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => "127.0.0.1:9200"
index => "knowledge"
document_id => "%{id}"
}
stdout {
codec => json_lines
}
}
命令
Microsoft Windows [版本 10.0.17763.615]
(c) 2018 Microsoft Corporation。保留所有權利。
C:\Users\lunmei>e:
E:\>logstash -f logstash-csv.conf
'logstash' 不是內部或外部命令,也不是可運行的程序
或批處理文件。
E:\>cd E:\soft\logstash-5.6.12\bin
E:\soft\logstash-5.6.12\bin>logstash -f logstash-csv.conf
Sending Logstash's logs to E:/soft/logstash-5.6.12/logs which is now configured via log4j2.properties
[2019-07-19T18:02:22,722][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"E:/soft/logstash-5.6.12/modules/fb_apache....
E:\soft\logstash-5.6.12\bin>logstash -f logstash-mysql.conf
Sending Logstash's logs to E:/soft/logstash-5.6.12/logs which is now configured via log4j2.properties
..........
[2019-07-19T18:02:48,618][FATAL][logstash.runner ] SIGINT received. Terminating immediately..
終止批處理操作嗎(Y/N)?
^C系統無法打開指定的設備或文件。
E:\soft\logstash-5.6.12\bin>
文件或者mysql 導入成功
4.
詳細看網頁的說明
第一步要創建一個fscrawlerRunner.bat。創建完後雙擊。
啓動fscrawler
E:\soft\fscrawler-es5-2.6\bin>fscrawler test
17:00:23,187 WARN [f.p.e.c.f.c.FsCrawlerCli] job [test] does not exist
17:00:23,189 INFO [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
y
17:00:30,241 INFO [f.p.e.c.f.c.FsCrawlerCli] Settings have been created in [C:\Users\lunmei\.fscrawler\test\_settings.json]. Please review and edit before relaunch
這裏注意test是個變量,代表job name。第一次啓動這個job會創建一個相關的_setting.json用來配置文件和es相關的信息。而我們這個很明顯在“C:\Users\lunmei\.fscrawler\test\_settings.json“。
那麼在看網頁資料的時候,所有涉及_settings.json,就找到地方了。這個是自動自動創建的。我一開始在這裏浪費很多時間啊。
改好此文件後,再運行命令”fscrawler test“。把文件導入到elastic search中。
我有一個word文檔,內容是“你好啊,你好啊”,但是想按照字段導入到elastic search。這個還得研究研究。加油。