Solr 5.5.0 + tomcat 7.0.69 + zookeeper-3.4.6 Cloud部署

Solr介紹:
Solr是一個獨立的企業級搜索應用服務器,Solr基於Lucene的全文搜索服務器,同時對其進行了擴展,提供了比Lucene更爲豐富的查詢語言,同時實現了可配置、可擴展並對查詢性能進行了優化,並且提供了一個完善的功能管理界面,是一款非常優秀的全文搜索引擎。Solr對外提供類似於Web-service的API接口。用戶可以通過http請求,向搜索引擎服務器提交一定格式的XML文件/Json/文本等,生成索引;也可以通過Http Get操作提出查找請求,並得到Json格式的返回結果。

項目引入Solr時應該考慮的一些問題:
1、數據更新頻率:每天數據增量有多大,隨時更新還是定時更新
2、數據總量:數據要保存多長時間
3、一致性要求:期望多長時間內看到更新的數據,最長允許多長時間延遲
4、數據特點:數據源包括哪些,平均單條記錄大小
5、業務特點:有哪些排序要求,檢索條件
6、資源複用:已有的硬件配置是怎樣的,是否有升級計劃

SolrCloud:Solr分佈式擴展方案
SolrCloud是基於ZooKeeper和Solr的分佈式解決方案,爲Solr添加分佈式功能,用於建立高可用,高伸縮,自動容錯,分佈式索引,分佈式查詢的Solr服務器集羣
它有幾個特色功能:
1)集中式的配置信息 
2)自動容錯 
3)近實時搜索 
4)查詢時自動負載均衡 

Solr 5.5.0 + tomcat 7.0.69 + zookeeper-3.4.6 Cloud部署

(本文因爲機器不夠,只能在單機環境僞分佈式部署,模擬4臺真實機器,真實部署可將tomcat/zookeeper的端口不做調整即可)

4臺tomcat組成下述部署方案:Collection分成兩個Shard分別存儲索引信息,每個Shard又分成兩個core_node(一主一備)來調配索引最終配置完成結果如下:







(1)
apache-tomcat-7.0.69集羣配置:
版本:apache-tomcat-7.0.69
下載:http://tomcat.apache.org/download-70.cgi
位置:/var/local/
數量:4臺:/var/local/apache-tomcat-7.0.69-1,/var/local/apache-tomcat-7.0.69-2,/var/local/apache-tomcat-7.0.69-3,/var/local/apache-tomcat-7.0.69-4
說明:主要調整單機環境下tomcat的端口衝突
apache-tomcat-7.0.69-1
sudo vi /var/local/apache-tomcat-7.0.69-1/conf/server.xml
{
<Server port="18005" shutdown="SHUTDOWN">
<Connector port="18080" protocol="HTTP/1.1"  connectionTimeout="20000" redirectPort="8443" />
<Connector port="18009" protocol="AJP/1.3" redirectPort="8443" />
}


apache-tomcat-7.0.69-2
{
sudo vi /var/local/apache-tomcat-7.0.69-2/conf/server.xml
<Server port="28005" shutdown="SHUTDOWN">
<Connector port="28080" protocol="HTTP/1.1"  connectionTimeout="20000" redirectPort="8443" />
<Connector port="28009" protocol="AJP/1.3" redirectPort="8443" />
}


apache-tomcat-7.0.69-3
sudo vi /var/local/apache-tomcat-7.0.69-3/conf/server.xml
{
<Server port="38005" shutdown="SHUTDOWN">
<Connector port="38080" protocol="HTTP/1.1"  connectionTimeout="20000" redirectPort="8443" />
<Connector port="38009" protocol="AJP/1.3" redirectPort="8443" />
}


apache-tomcat-7.0.69-4
sudo vi /var/local/apache-tomcat-7.0.69-4/conf/server.xml
{
<Server port="48005" shutdown="SHUTDOWN">
<Connector port="48080" protocol="HTTP/1.1"  connectionTimeout="20000" redirectPort="8443" />
<Connector port="48009" protocol="AJP/1.3" redirectPort="8443" />
}


(2)
zookeeper-3.4.6集羣配置:
版本:zookeeper-3.4.6
下載:https://zookeeper.apache.org/releases.html#download
位置:/var/local/
數量:4臺:/var/local/zookeeper-3.4.6-1,/var/local/zookeeper-3.4.6-2,/var/local/zookeeper-3.4.6-3,/var/local/zookeeper-3.4.6-4
說明:主要調整單機環境下zookeeper的端口及目錄衝突
zookeeper-3.4.6-1
sudo mkdir -p /var/local/zookeeper-3.4.6-1/data
sudo mkdir -p /var/local/zookeeper-3.4.6-1/data/log
echo 1 > /var/local/zookeeper-3.4.6-1/data/myid (sudo vi /var/local/zookeeper-3.4.6-1/data/myid {1})
cd /var/local/zookeeper-3.4.6-1/conf/ &&sudo mv zoo_sample.cfg zoo.cfg &&sudo vi zoo.cfg
{
clientPort=2181
dataDir=/var/local/zookeeper-3.4.6-1/data
dataLogDir=/var/local/zookeeper-3.4.6-1/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1

server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}


zookeeper-3.4.6-2
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-2
sudo vi /var/local/zookeeper-3.4.6-2/data/myid {2}
sudo vi /var/local/zookeeper-3.4.6-2/conf/zoo.cfg
{
clientPort=2182
dataDir=/var/local/zookeeper-3.4.6-2/data
dataLogDir=/var/local/zookeeper-3.4.6-2/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1

server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}


zookeeper-3.4.6-3
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-3
sudo vi /var/local/zookeeper-3.4.6-3/data/myid {3}
sudo vi /var/local/zookeeper-3.4.6-3/conf/zoo.cfg
{
clientPort=2183
dataDir=/var/local/zookeeper-3.4.6-3/data
dataLogDir=/var/local/zookeeper-3.4.6-3/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1

server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}


zookeeper-3.4.6-4
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-4
sudo vi /var/local/zookeeper-3.4.6-4/data/myid {4}
sudo vi /var/local/zookeeper-3.4.6-4/conf/zoo.cfg
{
clientPort=2184
dataDir=/var/local/zookeeper-3.4.6-4/data
dataLogDir=/var/local/zookeeper-3.4.6-4/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1

server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}


(3)
solr-5.5.0集羣配置:
版本:solr-5.5.0
下載:https://mirrors.tuna.tsinghua.edu.cn/apache/lucene/solr/5.5.0/
位置:/var/local/apache-tomcat-7.0.69-1~4
公有配置文件:/var/local/cloud_conf
數量:4臺:/var/local/apache-tomcat-7.0.69-1,/var/local/apache-tomcat-7.0.69-2,/var/local/apache-tomcat-7.0.69-3,/var/local/apache-tomcat-7.0.69-4
Solr WEB系統部署:
sudo cp -r ~/solr_cloud/solr-5.5.0/server/solr-webapp/webapp /var/local/apache-tomcat-7.0.69-1/webapps/solr
sudo cp -r ~/solr_cloud/solr-5.5.0/server/lib/ext/* /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/lib (其他需要用的jar包自行復制即可:~/solr_cloud/solr-5.5.0/dist/)
sudo cp -r ~/solr_cloud/solr-5.5.0/server/resources/log4j.properties /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/classes (classes若不存在則手動建立)

solr-5.5.0-1
sudo mkdir -p /var/local/apache-tomcat-7.0.69-1/solr_home/
sudo cp ~/solr_cloud/solr-5.5.0/example/example-DIH/solr/solr.xml /var/local/apache-tomcat-7.0.69-1/solr_home/
sudo vi /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/web.xml
{
    <env-entry>
       <env-entry-name>solr/home</env-entry-name>
       <env-entry-value>/var/local/apache-tomcat-7.0.69-1/solr_home</env-entry-value>
       <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>
}

sudo vi /var/local/apache-tomcat-7.0.69-1/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -Dbootstrap_confdir=/var/local/cloud_conf -Dcollection.configName=myconf -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184 -DnumShards=4"
}

sudo vi /var/local/apache-tomcat-7.0.69-1/solr_home/solr.xml
{
<solrcloud>
    <str name="host">${host:}</str>
    <int name="hostPort">18080</int>
    <str name="hostContext">${hostContext:solr}</str>
    <int name="zkClientTimeout">${zkClientTimeout:15000}</int>
    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
    <str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:0}</int>
    <int name="connTimeout">${connTimeout:0}</int>
  </shardHandlerFactory>
}


solr-5.5.0-2
sudo mkdir -p /var/local/apache-tomcat-7.0.69-2/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-2/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-2/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-2/webapps/solr/WEB-INF/web.xml
{
    <env-entry>
       <env-entry-name>solr/home</env-entry-name>
       <env-entry-value>/var/local/apache-tomcat-7.0.69-2/solr_home</env-entry-value>
       <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>
}

sudo vi /var/local/apache-tomcat-7.0.69-2/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}

sudo vi /var/local/apache-tomcat-7.0.69-2/solr_home/solr.xml
{
<solrcloud>
    <str name="host">${host:}</str>
    <int name="hostPort">28080</int>
    <str name="hostContext">${hostContext:solr}</str>
    <int name="zkClientTimeout">${zkClientTimeout:15000}</int>
    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
    <str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:0}</int>
    <int name="connTimeout">${connTimeout:0}</int>
  </shardHandlerFactory>
}


solr-5.5.0-3
sudo mkdir -p /var/local/apache-tomcat-7.0.69-3/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-3/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-3/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-3/webapps/solr/WEB-INF/web.xml
{
    <env-entry>
       <env-entry-name>solr/home</env-entry-name>
       <env-entry-value>/var/local/apache-tomcat-7.0.69-3/solr_home</env-entry-value>
       <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>
}

sudo vi /var/local/apache-tomcat-7.0.69-3/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}

sudo vi /var/local/apache-tomcat-7.0.69-3/solr_home/solr.xml
{
<solrcloud>
    <str name="host">${host:}</str>
    <int name="hostPort">38080</int>
    <str name="hostContext">${hostContext:solr}</str>
    <int name="zkClientTimeout">${zkClientTimeout:15000}</int>
    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
    <str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:0}</int>
    <int name="connTimeout">${connTimeout:0}</int>
  </shardHandlerFactory>
}


solr-5.5.0-4
sudo mkdir -p /var/local/apache-tomcat-7.0.69-4/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-4/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-4/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-4/webapps/solr/WEB-INF/web.xml
{
    <env-entry>
       <env-entry-name>solr/home</env-entry-name>
       <env-entry-value>/var/local/apache-tomcat-7.0.69-4/solr_home</env-entry-value>
       <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>
}

sudo vi /var/local/apache-tomcat-7.0.69-4/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}

sudo vi /var/local/apache-tomcat-7.0.69-4/solr_home/solr.xml
{
<solrcloud>
    <str name="host">${host:}</str>
    <int name="hostPort">48080</int>
    <str name="hostContext">${hostContext:solr}</str>
    <int name="zkClientTimeout">${zkClientTimeout:15000}</int>
    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
    <str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:0}</int>
    <int name="connTimeout">${connTimeout:0}</int>
  </shardHandlerFactory>
}


(5)
啓動及說明:
啓動tomcat集羣
sudo /var/local/apache-tomcat-7.0.69-1/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-2/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-3/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-4/bin/startup.sh
關閉tomcat集羣
sudo /var/local/apache-tomcat-7.0.69-1/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-2/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-3/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-4/bin/shutdown.sh


啓動zookeeper集羣
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh restart && sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh restart &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh restart &&sudo /var/local/zookeeper-3.4.6-4/bin/zkServer.sh restart
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh stop && sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh stop &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh stop
查看zookeeper集羣狀態
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-4/bin/zkServer.sh status


訪問Solr Web系統:(192.168.5.48即本機IP)
http://192.168.5.48:18080/solr/index.html
http://192.168.5.48:28080/solr/index.html
http://192.168.5.48:38080/solr/index.html
http://192.168.5.48:48080/solr/index.html
上述地址均可訪問及管理Solr Web系統


說明:
1)solr-5.5.0中會出現在Collection中點擊query命令時,誤將地址欄 / 轉義爲 %2F 的bug,如下:

http://192.168.5.48:18080/solr/test1%2Fselect?indent=on&q=*:*&wt=json

2)solr-5.5.0中自帶的zookeeper-3.4.6.jar,因此建議zookeeper選用-3.4.6版本的

3)solr-5以上的schema由managed-schema通過API管理,在solrconfig.xml中可以查看到:

  <!-- To disable dynamic schema REST APIs, use the following for <schemaFactory>:
       <schemaFactory class="ClassicIndexSchemaFactory"/>
       When ManagedIndexSchemaFactory is specified instead, Solr will load the schema from
       the resource named in 'managedSchemaResourceName', rather than from schema.xml.
       Note that the managed schema resource CANNOT be named schema.xml.  If the managed
       schema does not exist, Solr will create it after reading schema.xml, then rename
       'schema.xml' to 'schema.xml.bak'.
       Do NOT hand edit the managed schema - external modifications will be ignored and
       overwritten as a result of schema modification REST API calls.
       When ManagedIndexSchemaFactory is specified with mutable = true, schema
       modification REST API calls will be allowed; otherwise, error responses will be
       sent back for these requests.
  -->
  <schemaFactory class="ManagedIndexSchemaFactory">
    <bool name="mutable">true</bool>
    <str name="managedSchemaResourceName">managed-schema</str>
  </schemaFactory>

4)schema部分說明:

Stored:字段值會以保存一份原始內容在在索引中,可以被搜索組件組件返回,考慮到性能問題,對於長文本就不適合存儲在索引中。
Indexed:表示字段會加被Sorl處理加入到索引中,只有被索引的字段才能被搜索到。
docValues: 表示此域是否需要添加一個docValues域,這對facet查詢,group分組,排序,function查詢有好處,能加快索引數據加載,對NRT近實時搜索比較友好,且更節省內存,但它也有一些限制,比如當前docValues域只支持strField,UUIDField,Trie*Field等域,且要求域的域值是單值不能是多值域
multiValued: 表示這個域是否可以存儲多個值
omitNorms: 此屬性若設置爲true,即表示將忽略域值的長度標準化,忽略在索引過程中對當前域的權重設置,且會節省內存。
只有全文本域或者你需要在索引創建過程中設置域的權重時才需要把這個值設爲false,對於基本數據類型且不分詞的域如intFeild,longField,StrField等默認此屬性值就是true,否則默認就是false.
omitPositions=true|false如果設置,省略掉term vector中的地址信息
omitTermFreqAndPositions=true|false 如果設置,省略掉freq和term vector中的地址信息
termVectors: 設置爲true即表示需要爲該field存儲項向量信息,當你需要MoreLikeThis功能時,則需要將此屬性值設爲true,這樣會帶來一些性能提升。
termPositions: 是否存儲Term的起始位置信息,這會增大索引的體積,但高亮功能需要依賴此項設置,否則無法高亮
termOffsets: 表示是否存儲索引的位置偏移量,高亮功能需要此項配置,當你使用SpanQuery時,此項配置會影響匹配的結果集
sortMissingLast表示如果域值爲null,在根據當前域進行排序時,把包含null值的document排在最後一位
sortMissingFirst:表示如果域值爲null,在根據當前域進行排序時,把包含null值的document排在前面一位

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章