SolrCloud 4.3.1+Tomcat 7安裝配置實踐

我們使用Solr Replication可以實現Solr服務器的可用性,即使某一個索引副本由於磁盤介質故障或者誤操作刪除等,其他的多個複製副本仍然可以提供服務。如果只是單純的基於Solr Replication技術,只能對一個索引進行管理維護,當索引數據達到一定規模,搜索的性能成了瓶頸,除了重新規劃設計索引,實現邏輯劃分以外,沒有更好地方法實現查詢服務器的可擴展性。
SolrCloud就是爲了解決這個問題而提出的。SolrCloud通過ZooKeeper集羣來進行協調,使一個索引(SolrCloud中叫做一個Collection)進行分片,各個分片可以分佈在不同的物理節點上,而且,對於同一個Collection的多個分片(Shard)之間沒有交集,亦即,多個物理分片組成一個完成的索引Collection。爲了保證分片數據的可用性,SolrCloud自動支持Solr Replication,可以同時對分片進行復制,冗餘存儲。下面,我們基於Solr最新的4.3.1版本進行安裝配置SolrCloud集羣,通過實踐來實現索引數據的分佈存儲和檢索。

準備工作

  • 服務器信息
三臺服務器:
10.95.3.61          master
10.95.3.62          slave1
10.95.3.65          slave4
  • ZooKeeper集羣配置
安裝ZooKeeper集羣,在上面3分節點上分別安裝,使用的版本是zookeeper-3.4.5。
首先,在master節點上配置zoo.cfg,內容如下所示:
[hadoop@master ~]$ vi applications/zookeeper/zookeeper-3.4.5/conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/hadoop/applications/zookeeper/zookeeper-3.4.5/data
# the port at which the clients will connect
clientPort=2188

dataLogDir=/home/hadoop/applications/zookeeper/zookeeper-3.4.5/data/logs

server.1=master:4888:5888
server.2=slave1:4888:5888
server.3=slave4:4888:5888
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
然後,創建對應的數據存儲目錄後,可以直接將該配置複製到其他兩個節點上:
[hadoop@master ~]$ scp -r applications/zookeeper/zookeeper-3.4.5 hadoop@slave1:~/applications/zookeeper/
[hadoop@master ~]$ scp -r applications/zookeeper/zookeeper-3.4.5 hadoop@slave4:~/applications/zookeeper/
啓動ZooKeeper集羣,在每個節點上分別啓動ZooKeeper服務:
[hadoop@master ~]$ cd applications/zookeeper/zookeeper-3.4.5/
[hadoop@master zookeeper-3.4.5]$ bin/zkServer.sh start

[hadoop@slave1 ~]$ cd applications/zookeeper/zookeeper-3.4.5/
[hadoop@slave1 zookeeper-3.4.5]$ bin/zkServer.sh start

[hadoop@slave4 ~]$ cd applications/zookeeper/zookeeper-3.4.5/
[hadoop@slave4 zookeeper-3.4.5]$ bin/zkServer.sh start
可以查看ZooKeeper集羣的狀態,保證集羣啓動沒有問題:
[hadoop@master zookeeper-3.4.5]$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/applications/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: follower

[hadoop@slave1 zookeeper-3.4.5]$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/applications/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: follower

[hadoop@slave4 zookeeper-3.4.5]$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/applications/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: leader
可以看到,slave4節點是ZooKeeper集羣服務Leader。
  • SolrCloud相關目錄
我們選擇/home/hadoop/applications/solr/cloud目錄存放Solr的庫文件和配置文件,該目錄下有lib和multicore兩個子目錄。
另外,還有一個存儲索引的目錄,設置爲/home/hadoop/applications/storage/cloud/data。

SolrCloud配置

首先在一個節點上對SOLR進行配置,我們選擇master節點。
1、SOLR基本配置
將下載的SOLR的壓縮包解壓縮,將solr-4.3.1\example\webapps\solr.war解開,將solr-4.3.1\example\webapps\solr\WEB-INF\lib和solr-4.3.1\example\lib\ext中的jar文件拷貝到solr-4.3.1\example\webapps\solr\WEB-INF\lib中,並將解開的solr目錄改名爲solr-cloud,然後傳到服務器的Tomcat下的webapps目錄下。
將solr-4.3.1\example\webapps\solr\WEB-INF\lib和solr-4.3.1\example\lib\ext下面的jar文件都拷貝到指定目錄/home/hadoop/applications/solr/cloud/lib/中:
[hadoop@master ~]$ ls /home/hadoop/applications/solr/cloud/lib/
commons-cli-1.2.jar           lucene-analyzers-common-4.3.1.jar    lucene-suggest-4.3.1.jar
commons-codec-1.7.jar         lucene-analyzers-kuromoji-4.3.1.jar  noggit-0.5.jar
commons-fileupload-1.2.1.jar  lucene-analyzers-phonetic-4.3.1.jar  org.restlet-2.1.1.jar
commons-io-2.1.jar            lucene-codecs-4.3.1.jar              org.restlet.ext.servlet-2.1.1.jar
commons-lang-2.6.jar          lucene-core-4.3.1.jar                slf4j-api-1.6.6.jar
guava-13.0.1.jar              lucene-grouping-4.3.1.jar            slf4j-log4j12-1.6.6.jar
httpclient-4.2.3.jar          lucene-highlighter-4.3.1.jar         solr-core-4.3.1.jar
httpcore-4.2.2.jar            lucene-memory-4.3.1.jar              solr-solrj-4.3.1.jar
httpmime-4.2.3.jar            lucene-misc-4.3.1.jar                spatial4j-0.3.jar
jcl-over-slf4j-1.6.6.jar      lucene-queries-4.3.1.jar             wstx-asl-3.2.7.jar
jul-to-slf4j-1.6.6.jar        lucene-queryparser-4.3.1.jar         zookeeper-3.4.5.jar
log4j-1.2.16.jar              lucene-spatial-4.3.1.jar
目錄/home/hadoop/applications/solr/cloud/multicore的結構,如圖所示:

下面,我們對上面conf目錄下的配置文件進行說明:
  • schema.xml文件
<?xml version="1.0" ?>

<schema name="example core two" version="1.1">
     <types>
          <fieldtype name="string" class="solr.StrField" omitNorms="true" />
          <fieldType name="long" class="solr.TrieLongField" />
          <fieldtype name="int" class="solr.IntField" />
          <fieldtype name="float" class="solr.FloatField" />
          <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0" />
     </types>
     <fields>
          <field name="id" type="long" indexed="true" stored="true" multiValued="false" required="true" />
          <field name="area" type="string" indexed="true" stored="false" multiValued="false" />
          <field name="building_type" type="int" indexed="true" stored="false" multiValued="false" />
          <field name="category" type="string" indexed="true" stored="false" multiValued="false" />
          <field name="temperature" type="int" indexed="true" stored="false" multiValued="false" />
          <field name="code" type="int" indexed="true" stored="false" multiValued="false" />
          <field name="latitude" type="float" indexed="true" stored="false" multiValued="false" />
          <field name="longitude" type="float" indexed="true" stored="false" multiValued="false" />
          <field name="when" type="date" indexed="true" stored="false" multiValued="false" />
          <field name="_version_" type="long" indexed="true" stored="true" />
     </fields>
     <uniqueKey>id</uniqueKey>
     <defaultSearchField>area</defaultSearchField>
     <solrQueryParser defaultOperator="OR" />
</schema>
  • solrconfig.xml文件
<?xml version="1.0" encoding="UTF-8" ?>

<config>
     <luceneMatchVersion>LUCENE_43</luceneMatchVersion>
     <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}" />
     <dataDir>${solr.shard.data.dir:}</dataDir>
     <schemaFactory class="ClassicIndexSchemaFactory" />
    
     <updateHandler class="solr.DirectUpdateHandler2">
          <updateLog>
               <str name="dir">${solr.shard.data.dir:}</str>
          </updateLog>
     </updateHandler>
    
     <!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled. -->
     <requestHandler name="/get" class="solr.RealTimeGetHandler">
          <lst name="defaults">
               <str name="omitHeader">true</str>
          </lst>
     </requestHandler>
     <requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" />
     <requestDispatcher handleSelect="true">
          <requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" formdataUploadLimitInKB="2048" />
     </requestDispatcher>

     <requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />
     <requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" />
     <requestHandler name="/update" class="solr.UpdateRequestHandler" />
     <requestHandler name="/update/csv" class="solr.CSVRequestHandler" startup="lazy">
          <lst name="defaults">
               <str name="separator">,</str>
               <str name="header">true</str>
               <str name="encapsulator">"</str>
          </lst>
          <updateLog>
               <str name="dir">${solr.shard.data.dir:}</str>
          </updateLog>
     </requestHandler>
     <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
     <requestHandler name="/admin/ping" class="solr.PingRequestHandler">
          <lst name="invariants">
               <str name="q">solrpingquery</str>
          </lst>
          <lst name="defaults">
               <str name="echoParams">all</str>
          </lst>
     </requestHandler>
    
     <updateRequestProcessorChain name="sample">
          <processor class="solr.LogUpdateProcessorFactory" />
          <processor class="solr.DistributedUpdateProcessorFactory" />
          <processor class="solr.RunUpdateProcessorFactory" />
     </updateRequestProcessorChain>

     <query>
          <maxBooleanClauses>1024</maxBooleanClauses>
          <filterCache class="solr.FastLRUCache" size="10240" initialSize="512" autowarmCount="0" />
          <queryResultCache class="solr.LRUCache" size="10240" initialSize="512" autowarmCount="0" />
          <documentCache class="solr.LRUCache" size="10240" initialSize="512" autowarmCount="0" />
          <enableLazyFieldLoading>true</enableLazyFieldLoading>
          <queryResultWindowSize>20</queryResultWindowSize>
          <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
          <maxWarmingSearchers>2</maxWarmingSearchers>
     </query>
     <admin>
          <defaultQuery>solr</defaultQuery>
     </admin>
</config>
  • solrcore.properties文件
solr.shard.data.dir=/home/hadoop/applications/storage/cloud/data
屬性solr.shard.data.dir在solrconfig.xml文件中北引用過,指定索引數據的存放位置。
  • solr.xml文件
該文件中指定了ZooKeeper的相關配置,已經Solr Core的配置內容:
<?xml version="1.0" encoding="UTF-8" ?>

<solr persistent="true">
  <cores defaultCoreName="collection1" host="${host:}" adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="8888" hostContext="${hostContext:solr-cloud}">
  </cores>
</solr>
注意:這裏,我們並沒有配置任何的core元素,這個等到整個配置安裝完成之後,通過SOLR提供的REST接口,來實現Collection以及Shard的創建,從而來更新這些配置文件。
2、ZooKeeper管理監控配置文件
SolrCloud是通過ZooKeeper集羣來保證配置文件的變更及時同步到各個節點上,所以,需要將配置文件上傳到ZooKeeper集羣中:
[hadoop@master ~]$ java -classpath .:/home/hadoop/applications/solr/cloud/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost master:2188,slave1:2188,slave4:2188 -confdir /home/hadoop/applications/solr/cloud/multicore/collection1/conf -confname myconf

[hadoop@master ~]$ java -classpath .:/home/hadoop/applications/solr/cloud/lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection collection1 -confname myconf -zkhost master:2188,slave1:2188,slave4:2188
上傳完成以後,我們檢查一下ZooKeeper上的存儲情況:
[hadoop@master ~]$ cd applications/zookeeper/zookeeper-3.4.5/
[hadoop@master zookeeper-3.4.5]$ bin/zkCli.sh -server master:2188
...
[zk: master:2188(CONNECTED) 0] ls /
[configs, collections, zookeeper]
[zk: master:2188(CONNECTED) 2] ls /configs
[myconf]
[zk: master:2188(CONNECTED) 3] ls /configs/myconf
[solrcore.properties, solrconfig.xml, schema.xml]

3、Tomcat配置與啓動
在Tomcat的啓動腳本bin/catalina.sh中,增加如下配置:
JAVA_OPTS="-server -Xmx4096m -Xms1024m -verbose:gc -Xloggc:solr_gc.log -Dsolr.solr.home=/home/hadoop/applications/solr/cloud/multicore -DzkHost=master:2188,slave1:2188,slave4:2188"
啓動Tomcat服務器:
[hadoop@master ~]$ cd servers/apache-tomcat-7.0.42
[hadoop@master apache-tomcat-7.0.42]$ bin/catalina.sh start
可以查看日誌,如下所示:
八月 01, 2013 3:11:03 下午 org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: :HADOOP_HOME/lib/native:/dw/snappy/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
八月 01, 2013 3:11:03 下午 org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8888"]
八月 01, 2013 3:11:03 下午 org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8009"]
八月 01, 2013 3:11:03 下午 org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1410 ms
八月 01, 2013 3:11:03 下午 org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
八月 01, 2013 3:11:03 下午 org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.42
八月 01, 2013 3:11:03 下午 org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/ROOT
八月 01, 2013 3:11:04 下午 org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/host-manager
八月 01, 2013 3:11:04 下午 org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/manager
八月 01, 2013 3:11:04 下午 org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/examples
八月 01, 2013 3:11:04 下午 org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/solr-cloud
2013-08-01 15:11:05.369 [localhost-startStop-1] INFO  org.apache.solr.servlet.SolrDispatchFilter  – SolrDispatchFilter.init()
2013-08-01 15:11:05.392 [localhost-startStop-1] INFO  org.apache.solr.core.SolrResourceLoader  – No /solr/home in JNDI
2013-08-01 15:11:05.393 [localhost-startStop-1] INFO  org.apache.solr.core.SolrResourceLoader  – using system property solr.solr.home: /home/hadoop/applications/solr/cloud/multicore
2013-08-01 15:11:05.402 [localhost-startStop-1] INFO  org.apache.solr.core.CoreContainer  – looking for solr config file: /home/hadoop/applications/solr/cloud/multicore/solr.xml
2013-08-01 15:11:05.403 [localhost-startStop-1] INFO  org.apache.solr.core.CoreContainer  – New CoreContainer 1665441141
2013-08-01 15:11:05.406 [localhost-startStop-1] INFO  org.apache.solr.core.CoreContainer  – Loading CoreContainer using Solr Home: '/home/hadoop/applications/solr/cloud/multicore/'
2013-08-01 15:11:05.406 [localhost-startStop-1] INFO  org.apache.solr.core.SolrResourceLoader  – new SolrResourceLoader for directory: '/home/hadoop/applications/solr/cloud/multicore/'
2013-08-01 15:11:05.616 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='adminHandler']
2013-08-01 15:11:05.618 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/int[@name='coreLoadThreads']
2013-08-01 15:11:05.620 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='coreRootDirectory']
2013-08-01 15:11:05.621 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='distribUpdateConnTimeout']
2013-08-01 15:11:05.622 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='distribUpdateSoTimeout']
2013-08-01 15:11:05.624 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/str[@name='host']
2013-08-01 15:11:05.626 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/str[@name='hostContext']
2013-08-01 15:11:05.628 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='hostPort']
2013-08-01 15:11:05.630 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='leaderVoteWait']
2013-08-01 15:11:05.632 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='managementPath']
2013-08-01 15:11:05.633 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='sharedLib']
2013-08-01 15:11:05.635 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='shareSchema']
2013-08-01 15:11:05.636 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/int[@name='transientCacheSize']
2013-08-01 15:11:05.638 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='zkClientTimeout']
2013-08-01 15:11:05.640 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='zkHost']
2013-08-01 15:11:05.647 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/str[@name='class']
2013-08-01 15:11:05.648 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/str[@name='enabled']
2013-08-01 15:11:05.649 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/watcher/int[@name='size']
2013-08-01 15:11:05.654 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/watcher/int[@name='threshold']
2013-08-01 15:11:05.657 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/@coreLoadThreads
2013-08-01 15:11:05.658 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/@sharedLib
2013-08-01 15:11:05.659 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/@zkHost
2013-08-01 15:11:05.661 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/@class
2013-08-01 15:11:05.662 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/@enabled
2013-08-01 15:11:05.663 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/watcher/@size
2013-08-01 15:11:05.665 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/watcher/@threshold
2013-08-01 15:11:05.666 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@adminHandler
2013-08-01 15:11:05.668 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@distribUpdateConnTimeout
2013-08-01 15:11:05.669 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@distribUpdateSoTimeout
2013-08-01 15:11:05.672 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@host=${host:}
2013-08-01 15:11:05.673 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@hostContext=${hostContext:solr-cloud}
2013-08-01 15:11:05.674 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@hostPort=8888
2013-08-01 15:11:05.676 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@leaderVoteWait
2013-08-01 15:11:05.677 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@managementPath
2013-08-01 15:11:05.679 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@shareSchema
2013-08-01 15:11:05.680 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@transientCacheSize
2013-08-01 15:11:05.681 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@zkClientTimeout=${zkClientTimeout:15000}
2013-08-01 15:11:05.686 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/shardHandlerFactory/@class
2013-08-01 15:11:05.692 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/shardHandlerFactory/@name
2013-08-01 15:11:05.694 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/shardHandlerFactory/int[@connTimeout]
2013-08-01 15:11:05.695 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/shardHandlerFactory/int[@socketTimeout]
2013-08-01 15:11:05.699 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@defaultCoreName=collection1
2013-08-01 15:11:05.700 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/@persistent=true
2013-08-01 15:11:05.701 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@adminPath=/admin/cores
2013-08-01 15:11:05.713 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='adminHandler']
2013-08-01 15:11:05.714 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/int[@name='coreLoadThreads']
2013-08-01 15:11:05.715 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='coreRootDirectory']
2013-08-01 15:11:05.718 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='distribUpdateConnTimeout']
2013-08-01 15:11:05.719 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='distribUpdateSoTimeout']
2013-08-01 15:11:05.720 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/str[@name='host']
2013-08-01 15:11:05.722 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/str[@name='hostContext']
2013-08-01 15:11:05.723 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='hostPort']
2013-08-01 15:11:05.724 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='leaderVoteWait']
2013-08-01 15:11:05.727 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='managementPath']
2013-08-01 15:11:05.728 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='sharedLib']
2013-08-01 15:11:05.729 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/str[@name='shareSchema']
2013-08-01 15:11:05.730 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/int[@name='transientCacheSize']
2013-08-01 15:11:05.735 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='zkClientTimeout']
2013-08-01 15:11:05.737 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/solrcloud/int[@name='zkHost']
2013-08-01 15:11:05.740 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/str[@name='class']
2013-08-01 15:11:05.747 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/str[@name='enabled']
2013-08-01 15:11:05.749 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/watcher/int[@name='size']
2013-08-01 15:11:05.752 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/watcher/int[@name='threshold']
2013-08-01 15:11:05.755 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/@coreLoadThreads
2013-08-01 15:11:05.756 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/@sharedLib
2013-08-01 15:11:05.759 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/@zkHost
2013-08-01 15:11:05.760 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/@class
2013-08-01 15:11:05.761 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/@enabled
2013-08-01 15:11:05.763 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/watcher/@size
2013-08-01 15:11:05.764 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/logging/watcher/@threshold
2013-08-01 15:11:05.765 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@adminHandler
2013-08-01 15:11:05.768 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@distribUpdateConnTimeout
2013-08-01 15:11:05.769 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@distribUpdateSoTimeout
2013-08-01 15:11:05.770 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@host=${host:}
2013-08-01 15:11:05.771 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@hostContext=${hostContext:solr-cloud}
2013-08-01 15:11:05.772 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@hostPort=8888
2013-08-01 15:11:05.774 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@leaderVoteWait
2013-08-01 15:11:05.776 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@managementPath
2013-08-01 15:11:05.777 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@shareSchema
2013-08-01 15:11:05.778 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/@transientCacheSize
2013-08-01 15:11:05.779 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@zkClientTimeout=${zkClientTimeout:15000}
2013-08-01 15:11:05.780 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/shardHandlerFactory/@class
2013-08-01 15:11:05.781 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/shardHandlerFactory/@name
2013-08-01 15:11:05.783 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/shardHandlerFactory/int[@connTimeout]
2013-08-01 15:11:05.785 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/shardHandlerFactory/int[@socketTimeout]
2013-08-01 15:11:05.786 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@defaultCoreName=collection1
2013-08-01 15:11:05.787 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/@persistent=true
2013-08-01 15:11:05.788 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null solr/cores/@adminPath=/admin/cores
2013-08-01 15:11:05.791 [localhost-startStop-1] DEBUG org.apache.solr.core.Config  – null missing optional solr/cores/shardHandlerFactory
2013-08-01 15:11:05.799 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting socketTimeout to: 0
2013-08-01 15:11:05.802 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting urlScheme to: http://
2013-08-01 15:11:05.802 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting connTimeout to: 0
2013-08-01 15:11:05.803 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting maxConnectionsPerHost to: 20
2013-08-01 15:11:05.803 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting corePoolSize to: 0
2013-08-01 15:11:05.804 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting maximumPoolSize to: 2147483647
2013-08-01 15:11:05.805 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting maxThreadIdleTime to: 5
2013-08-01 15:11:05.805 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting sizeOfQueue to: -1
2013-08-01 15:11:05.806 [localhost-startStop-1] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  – Setting fairnessPolicy to: false
2013-08-01 15:11:05.824 [localhost-startStop-1] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  – Creating new http client, config:maxConnectionsPerHost=20&maxConnections=10000&socketTimeout=0&connTimeout=0&retry=false
2013-08-01 15:11:06.248 [localhost-startStop-1] INFO  org.apache.solr.core.CoreContainer  – Registering Log Listener
2013-08-01 15:11:06.251 [localhost-startStop-1] INFO  org.apache.solr.core.CoreContainer  – Zookeeper client=master:2188,slave1:2188,slave4:2188
2013-08-01 15:11:06.273 [localhost-startStop-1] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  – Creating new http client, config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0
2013-08-01 15:11:06.402 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.ConnectionManager  – Waiting for client to connect to ZooKeeper
2013-08-01 15:11:06.461 [localhost-startStop-1-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  – Watcher org.apache.solr.common.cloud.ConnectionManager@4b1707b4 name:ZooKeeperConnection Watcher:master:2188,slave1:2188,slave4:2188 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
2013-08-01 15:11:06.462 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.ConnectionManager  – Client is connected to ZooKeeper
2013-08-01 15:11:06.485 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /overseer/queue
2013-08-01 15:11:06.523 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /overseer/collection-queue-work
2013-08-01 15:11:06.546 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /live_nodes
2013-08-01 15:11:06.555 [localhost-startStop-1] INFO  org.apache.solr.cloud.ZkController  – Register node as live in ZooKeeper:/live_nodes/10.95.3.61:8888_solr-cloud
2013-08-01 15:11:06.562 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /live_nodes/10.95.3.61:8888_solr-cloud
2013-08-01 15:11:06.578 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /overseer_elect/election
2013-08-01 15:11:06.626 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /overseer_elect/leader
2013-08-01 15:11:06.644 [localhost-startStop-1] INFO  org.apache.solr.cloud.Overseer  – Overseer (id=234248255751323650-10.95.3.61:8888_solr-cloud-n_0000000000) starting
2013-08-01 15:11:06.667 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /overseer/queue-work
2013-08-01 15:11:06.697 [Overseer-234248255751323650-10.95.3.61:8888_solr-cloud-n_0000000000] INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – Process current queue of collection creations
2013-08-01 15:11:06.698 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /clusterstate.json
2013-08-01 15:11:06.711 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.SolrZkClient  – makePath: /aliases.json
2013-08-01 15:11:06.720 [localhost-startStop-1] INFO  org.apache.solr.common.cloud.ZkStateReader  – Updating cluster state from ZooKeeper...
2013-08-01 15:11:06.780 [Thread-2] INFO  org.apache.solr.cloud.Overseer  – Starting to work on the main queue
2013-08-01 15:11:06.829 [localhost-startStop-1] INFO  org.apache.solr.servlet.SolrDispatchFilter  – user.dir=/home/hadoop/servers/apache-tomcat-7.0.42
2013-08-01 15:11:06.829 [localhost-startStop-1] INFO  org.apache.solr.servlet.SolrDispatchFilter  – SolrDispatchFilter.init() done
八月 01, 2013 3:11:06 下午 org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/docs
八月 01, 2013 3:11:06 下午 org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8888"]
八月 01, 2013 3:11:06 下午 org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["ajp-bio-8009"]
八月 01, 2013 3:11:06 下午 org.apache.catalina.startup.Catalina start
INFO: Server startup in 3163 ms
我開的是DEBUG模式,便於調試。
這時候,SolrCloud集羣中只有一個活躍的節點,而且默認生成了一個collection1實例,這個實例實際上虛擬的,因爲通過web界面無法訪問http://master:8888/solr-cloud/,看不到任何有關SolrCloud的信息,如圖所示:

4、同步數據和配置信息,啓動其他節點
在另外兩個節點上安裝Tomcat和Solr服務器,只需要拷貝對應的目錄即可:
[hadoop@master ~]$ scp -r servers/ hadoop@slave1:~/
[hadoop@master ~]$ scp -r servers/ hadoop@slave4:~/

[hadoop@master ~]$ scp -r applications/solr/cloud hadoop@slave1:~/applications/solr/
[hadoop@master ~]$ scp -r applications/solr/cloud hadoop@slave4:~/applications/solr/

[hadoop@slave1 ~]$ mkdir -p applications/storage/cloud/data/
[hadoop@slave4 ~]$ mkdir -p applications/storage/cloud/data/
啓動其他Solr服務器節點:
[hadoop@slave1 ~]$ cd servers/apache-tomcat-7.0.42
[hadoop@slave1 apache-tomcat-7.0.42]$ bin/catalina.sh start

[hadoop@slave4 ~]$ cd servers/apache-tomcat-7.0.42
[hadoop@slave4 apache-tomcat-7.0.42]$ bin/catalina.sh start
查看ZooKeeper集羣中數據狀態:
[zk: master:2188(CONNECTED) 3] ls /live_nodes
[10.95.3.65:8888_solr-cloud, 10.95.3.61:8888_solr-cloud, 10.95.3.62:8888_solr-cloud]
這時,已經存在3個活躍的節點了,但是SolrCloud集羣並沒有更多信息,訪問http://master:8888/solr-cloud/後,同上面的圖是一樣的,沒有SolrCloud相關數據。
5、創建Collection、Shard和Replication
  • 創建Collection及初始Shard
直接通過REST接口來創建Collection,如下所示:
[hadoop@master ~]$ curl 'http://master:8888/solr-cloud/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=1'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">4103</int></lst><lst name="success"><lst><lst name="responseHeader"><int name="status">0</int><int name="QTime">3367</int></lst><str name="core">mycollection_shard2_replica1</str><str name="saved">/home/hadoop/applications/solr/cloud/multicore/solr.xml</str></lst><lst><lst name="responseHeader"><int name="status">0</int><int name="QTime">3280</int></lst><str name="core">mycollection_shard1_replica1</str><str name="saved">/home/hadoop/applications/solr/cloud/multicore/solr.xml</str></lst><lst><lst name="responseHeader"><int name="status">0</int><int name="QTime">3690</int></lst><str name="core">mycollection_shard3_replica1</str><str name="saved">/home/hadoop/applications/solr/cloud/multicore/solr.xml</str></lst></lst>
</response>
上面鏈接中的幾個參數的含義,說明如下:
name                   待創建Collection的名稱
numShards           分片的數量
replicationFactor   複製副本的數量
執行上述操作如果沒有異常,已經創建了一個Collection,名稱爲mycollection,而且每個節點上存在一個分片。這時,也可以查看ZooKeeper中狀態:
[zk: master:2188(CONNECTED) 5] ls /collections
[mycollection, collection1]
[zk: master:2188(CONNECTED) 6] ls /collections/mycollection
[leader_elect, leaders]
可以通過Web管理頁面,訪問http://master:8888/solr-cloud/#/~cloud,查看SolrCloud集羣的分片信息,如圖所示:

由上圖可以看到,對應節點上SOLR分片的對應關係:
shard3     10.95.3.61          master
shard1     10.95.3.62          slave1
shard2     10.95.3.65          slave4
實際上,我們從master節點可以看到,SOLR的配置文件內容,已經發生了變化,如下所示:
[hadoop@master ~]$ cat applications/solr/cloud/multicore/solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
  <cores defaultCoreName="collection1" host="${host:}" adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="8888" hostContext="${hostContext:solr-cloud}">
    <core loadOnStartup="true" shard="shard3" instanceDir="mycollection_shard3_replica1/" transient="false" name="mycollection_shard3_replica1" collection="mycollection"/>
  </cores>
</solr>
  • 創建Replication
下面對已經創建的初始分片進行復制。
shard1已經在slave1上,我們複製分片到master和slave4上,執行如下命令:
[hadoop@master ~]$ curl 'http://master:8888/solr-cloud/admin/cores?action=CREATE&collection=mycollection&name=mycollection_shard1_replica_2&shard=shard1'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1485</int></lst><str name="core">mycollection_shard1_replica_2</str><str name="saved">/home/hadoop/applications/solr/cloud/multicore/solr.xml</str>
</response>

[hadoop@master ~]$ curl 'http://master:8888/solr-cloud/admin/cores?action=CREATE&collection=mycollection&name=mycollection_shard1_replica_3&shard=shard1'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">2543</int></lst><str name="core">mycollection_shard1_replica_3</str><str name="saved">/home/hadoop/applications/solr/cloud/multicore/solr.xml</str>
</response>

[hadoop@slave4 ~]$ curl 'http://slave4:8888/solr-cloud/admin/cores?action=CREATE&collection=mycollection&name=mycollection_shard1_replica_4&shard=shard1'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">2405</int></lst><str name="core">mycollection_shard1_replica_4</str><str name="saved">/home/hadoop/applications/solr/cloud/multicore/solr.xml</str>
</response>
最後的結果是,slave1上的shard1,在master節點上有2個副本,名稱爲mycollection_shard1_replica_2和mycollection_shard1_replica_3,在slave4節點上有一個副本,名稱爲mycollection_shard1_replica_4.
也可以通過查看master和slave4上的目錄變化,如下所示:
[hadoop@master ~]$ ll applications/solr/cloud/multicore/
總用量 24
drwxrwxr-x. 4 hadoop hadoop 4096 8月   1 09:58 collection1
drwxrwxr-x. 3 hadoop hadoop 4096 8月   1 15:41 mycollection_shard1_replica_2
drwxrwxr-x. 3 hadoop hadoop 4096 8月   1 15:42 mycollection_shard1_replica_3
drwxrwxr-x. 3 hadoop hadoop 4096 8月   1 15:23 mycollection_shard3_replica1
-rw-rw-r--. 1 hadoop hadoop  784 8月   1 15:42 solr.xml
-rw-rw-r--. 1 hadoop hadoop 1004 8月   1 10:02 zoo.cfg

[hadoop@slave4 ~]$ ll applications/solr/cloud/multicore/
總用量 20
drwxrwxr-x. 4 hadoop hadoop 4096 8月   1 14:53 collection1
drwxrwxr-x. 3 hadoop hadoop 4096 8月   1 15:44 mycollection_shard1_replica_4
drwxrwxr-x. 3 hadoop hadoop 4096 8月   1 15:23 mycollection_shard2_replica1
-rw-rw-r--. 1 hadoop hadoop  610 8月   1 15:44 solr.xml
-rw-rw-r--. 1 hadoop hadoop 1004 8月   1 15:08 zoo.cfg
其中,mycollection_shard3_replica1和mycollection_shard2_replica1都是創建Collection的時候自動生成的分片,也就是第一個副本。
通過Web界面,可以更加直觀地看到shard1的情況,如圖所示:

我們再次從master節點可以看到,SOLR的配置文件內容,又發生了變化,如下所示:
[hadoop@master ~]$ cat applications/solr/cloud/multicore/solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
  <cores defaultCoreName="collection1" host="${host:}" adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="8888" hostContext="${hostContext:solr-cloud}">
    <core loadOnStartup="true" shard="shard3" instanceDir="mycollection_shard3_replica1/" transient="false" name="mycollection_shard3_replica1" collection="mycollection"/>
    <core loadOnStartup="true" shard="shard1" instanceDir="mycollection_shard1_replica_2/" transient="false" name="mycollection_shard1_replica_2" collection="mycollection"/>
    <core loadOnStartup="true" shard="shard1" instanceDir="mycollection_shard1_replica_3/" transient="false" name="mycollection_shard1_replica_3" collection="mycollection"/>
  </cores>
</solr>
到此爲止,我們已經基於3個物理節點,配置完成了SolrCloud集羣。

索引數據

我們根據前面定義的schema.xml,自己構造了一個數據集,代碼如下所示:
package org.shirdrn.solr.data;

import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Random;

public class BuildingSampleGenerator {

     private final DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
     private Random random = new Random();
    
     static String[] areas = {
          "北京", "上海", "深圳", "廣州", "天津", "重慶","成都",
          "銀川", "瀋陽", "大連", "吉林", "鄭州", "徐州", "蘭州",
          "東京", "紐約", "貴州", "長春", "大連", "武漢","南京",
          "海口", "太原", "濟南", "日照", "菏澤", "包頭", "松原"
     };
    
     long pre = 0L;
     long current = 0L;
     public synchronized long genId() {
          current = System.nanoTime();
          if(current == pre) {
               try {
                    Thread.sleep(0, 1);
               } catch (InterruptedException e) {
                    e.printStackTrace();
               }
               current = System.nanoTime();
               pre = current;
          }
          return current;
     }
    
     public String genArea() {
          return areas[random.nextInt(areas.length)];
     }
    
     private int maxLatitude = 90;
     private int maxLongitude = 180;
    
     public Coordinate genCoordinate() {
          int beforeDot = random.nextInt(maxLatitude);
          double afterDot = random.nextDouble();
          double lat = beforeDot + afterDot;
         
          beforeDot = random.nextInt(maxLongitude);
          afterDot = random.nextDouble();
          double lon = beforeDot + afterDot;
         
          return new Coordinate(lat, lon);
     }
    
     private Random random1 = new Random(System.currentTimeMillis());
     private Random random2 = new Random(2 * System.currentTimeMillis());
     public int genFloors() {
          return 1 + random1.nextInt(50) + random2.nextInt(50);
     }
    
     public class Coordinate {
         
          double latitude;
          double longitude;
         
          public Coordinate() {
               super();
          }
         
          public Coordinate(double latitude, double longitude) {
               super();
               this.latitude = latitude;
               this.longitude = longitude;
          }

          public double getLatitude() {
               return latitude;
          }

          public double getLongitude() {
               return longitude;
          }
     }
    
    
     static int[] signs = {-1, 1};
     public int genTemperature() {
          return signs[random.nextInt(2)] * random.nextInt(81);
     }
    
     static String[] codes = {"A", "B", "C", "D", "E", "F", "G", "H", "I",
          "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V",
          "W", "X", "Y", "Z"};
     public String genCode() {
          return codes[random.nextInt(codes.length)];
     }
    
     static int[] types = {0, 1, 2, 3};
     public int genBuildingType() {
          return types[random.nextInt(types.length)];
     }
    
     static String[] categories = {
          "辦公建築", "教育建築", "商業建築", "文教建築", "醫衛建築",
          "住宅", "宿舍", "公寓", "工業建築"};
     public String genBuildingCategory() {
          return categories[random.nextInt(categories.length)];
     }
    
     public void generate(String file, int count) throws IOException {
          BufferedWriter w = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "UTF-8"));
          w.write("id,area,building_type,category,temperature,code,latitude,longitude,when");
          w.newLine();
         
         
          for(int i=0; i<count; i++) {
               String when = df.format(new Date());
              
               StringBuffer sb = new StringBuffer();
               sb.append(genId()).append(",")
                    .append("\"").append(genArea()).append("\"").append(",")
                    .append(genBuildingType()).append(",")
                    .append("\"").append(genBuildingCategory()).append("\"").append(",")
                    .append(genTemperature()).append(",")
                    .append(genCode()).append(",");
               Coordinate coord = genCoordinate();
               sb.append(coord.latitude).append(",")
                    .append(coord.longitude).append(",")
                    .append("\"").append(when).append("\"");
               w.write(sb.toString());
               w.newLine();
          }
          w.close();
          System.out.println("Finished: file=" + file);
     }
    
     public static void main(String[] args) throws Exception {
          BuildingSampleGenerator gen = new BuildingSampleGenerator();
          String file = "E:\\Develop\\eclipse-jee-kepler\\workspace\\solr-data\\building_files";
          for(int i=0; i<=9; i++) {
               String f = new String(file + "_100w_0" + i + ".csv");
               gen.generate(f, 5000000);
          }
     }

}
生成的文件,如下所示:
[hadoop@master solr-data]$ ll building_files_100w*
-rw-rw-r--. 1 hadoop hadoop 109025853 7月  26 14:05 building_files_100w_00.csv
-rw-rw-r--. 1 hadoop hadoop 108015504 7月  26 10:53 building_files_100w_01.csv
-rw-rw-r--. 1 hadoop hadoop 108022184 7月  26 11:00 building_files_100w_02.csv
-rw-rw-r--. 1 hadoop hadoop 108016854 7月  26 11:00 building_files_100w_03.csv
-rw-rw-r--. 1 hadoop hadoop 108021750 7月  26 11:00 building_files_100w_04.csv
-rw-rw-r--. 1 hadoop hadoop 108017496 7月  26 11:00 building_files_100w_05.csv
-rw-rw-r--. 1 hadoop hadoop 108016193 7月  26 11:00 building_files_100w_06.csv
-rw-rw-r--. 1 hadoop hadoop 108023537 7月  26 11:00 building_files_100w_07.csv
-rw-rw-r--. 1 hadoop hadoop 108014684 7月  26 11:00 building_files_100w_08.csv
-rw-rw-r--. 1 hadoop hadoop 108022044 7月  26 11:00 building_files_100w_09.csv
數據文件格式如下:
[hadoop@master solr-data]$ head building_files_100w_00.csv
id,area,building_type,category,temperature,code,latitude,longitude,when
18332617097417,"廣州",2,"醫衛建築",61,N,5.160762478343409,62.92919119315037,"2013-07-26T14:05:55.832Z"
18332617752331,"成都",1,"教育建築",10,Q,77.34792453477195,72.59812030045762,"2013-07-26T14:05:55.833Z"
18332617815833,"大連",0,"教育建築",18,T,81.47569061530493,0.2177194388096203,"2013-07-26T14:05:55.833Z"
18332617903711,"廣州",0,"辦公建築",31,D,51.85825084513671,13.60710950097155,"2013-07-26T14:05:55.833Z"
18332617958555,"深圳",3,"商業建築",5,H,22.181374031472675,119.76001810254823,"2013-07-26T14:05:55.833Z"
18332618020454,"濟南",3,"公寓",-65,L,84.49607030736806,29.93095171443135,"2013-07-26T14:05:55.834Z"
18332618075939,"北京",2,"住宅",-29,J,86.61660177436184,39.20847527640485,"2013-07-26T14:05:55.834Z"
18332618130141,"菏澤",0,"醫衛建築",24,J,70.57574551258345,121.21977908377244,"2013-07-26T14:05:55.834Z"
18332618184343,"徐州",2,"辦公建築",31,W,0.10129771041097524,153.40533210345387,"2013-07-26T14:05:55.834Z"
我們向已經搭建好的SolrCloud集羣,執行索引數據的操作。這裏,實現了一個簡易的客戶端,代碼如下所示:
package org.shirdrn.solr.indexing;

import java.io.IOException;
import java.net.MalformedURLException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CloudSolrServer;
import org.apache.solr.common.SolrInputDocument;
import org.shirdrn.solr.data.BuildingSampleGenerator;
import org.shirdrn.solr.data.BuildingSampleGenerator.Coordinate;

public class CloudSolrClient {

     private CloudSolrServer cloudSolrServer;

     public synchronized void open(final String zkHost, final String  defaultCollection,
               int  zkClientTimeout, final int zkConnectTimeout) {
          if (cloudSolrServer == null) {
               try {
                    cloudSolrServer = new CloudSolrServer(zkHost);
                    cloudSolrServer.setDefaultCollection(defaultCollection);
                    cloudSolrServer.setZkClientTimeout(zkClientTimeout);
                    cloudSolrServer.setZkConnectTimeout(zkConnectTimeout);
               } catch (MalformedURLException e) {
                    System.out
                              .println("The URL of zkHost is not correct!! Its form must as below:\n zkHost:port");
                    e.printStackTrace();
               } catch (Exception e) {
                    e.printStackTrace();
               }
          }
     }

     public void addDoc(long id, String area, int buildingType, String category,
               int temperature, String code, double latitude, double longitude, String when) {
          try {
               SolrInputDocument doc = new SolrInputDocument();
               doc.addField("id", id);
               doc.addField("area", area);
               doc.addField("building_type", buildingType);
               doc.addField("category", category);
               doc.addField("temperature", temperature);
               doc.addField("code", code);
               doc.addField("latitude", latitude);
               doc.addField("longitude", longitude);
               doc.addField("when", when);
               cloudSolrServer.add(doc);
               cloudSolrServer.commit();
          } catch (SolrServerException e) {
               System.err.println("Add docs Exception !!!");
               e.printStackTrace();
          } catch (IOException e) {
               e.printStackTrace();
          } catch (Exception e) {
               System.err.println("Unknowned Exception!!!!!");
               e.printStackTrace();
          }

     }

     public static void main(String[] args) {
          final String zkHost = "master:2188";         
          final String  defaultCollection = "mycollection";
          final int  zkClientTimeout = 20000;
          final int zkConnectTimeout = 1000;
         
          CloudSolrClient client = new CloudSolrClient();
          client.open(zkHost, defaultCollection, zkClientTimeout, zkConnectTimeout);
         
          BuildingSampleGenerator gen = new BuildingSampleGenerator();
          final DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
         
          for(int i = 0; i < 10000; i++) {
               long id = gen.genId();
               String area = gen.genArea();
               int buildingType = gen.genBuildingType();
               String category = gen.genBuildingCategory();
               int temperature = gen.genTemperature();
               String code = gen.genCode();
               Coordinate coord = gen.genCoordinate();
               double latitude = coord.getLatitude();
               double longitude = coord.getLongitude();
               String when = df.format(new Date());
               client.addDoc(id, area, buildingType, category, temperature, code, latitude, longitude, when);
          }
         
     }

}
這樣,可以查看SolrCloud管理頁面,或者直接登錄到服務器上,能夠看到對應索引數據分片的情況,比較均勻地分佈到各個Shard節點上。
當然,也可以從Web管理頁面上來管理各個分片的副本數據,比如某個分片具有太多的副本,通過頁面上的刪除掉(unload)該副本,實際該副本的元數據信息被從ZooKeeper集羣維護的信息中刪除,在具體的節點上的副本數據並沒有刪除,而只是處於離線狀態,不能提供服務。

搜索數據

我們可以執行搜索,執行如下搜索條件:
http://master:8888/solr-cloud/mycollection/select?q=北京 紐約&fl=*&fq=category:公寓&fq=building_type:2&start=0&rows=10
搜索結果,如下所示:
<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">570</int>
	</lst>
	<result name="response" numFound="201568" start="0" maxScore="1.5322487">
		<doc>
			<long name="id">37109751480918</long>
			<long name="_version_">1442164237143113728</long>
		</doc>
		<doc>
			<long name="id">37126929150371</long>
			<long name="_version_">1442164255154503680</long>
		</doc>
		<doc>
			<long name="id">37445266827945</long>
			<long name="_version_">1442164588949798912</long>
		</doc>
		<doc>
			<long name="id">37611390043867</long>
			<long name="_version_">1442164763138195456</long>
		</doc>
		<doc>
			<long name="id">37892268870281</long>
			<long name="_version_">1442165057653833728</long>
		</doc>
		<doc>
			<long name="id">89820941817153</long>
			<long name="_version_">1442219517734289408</long>
		</doc>
		<doc>
			<long name="id">89825667635450</long>
			<long name="_version_">1442219522665742336</long>
		</doc>
		<doc>
			<long name="id">89830029550692</long>
			<long name="_version_">1442219527207124993</long>
		</doc>
		<doc>
			<long name="id">93932235463589</long>
			<long name="_version_">1442223828610580480</long>
		</doc>
		<doc>
			<long name="id">93938975733467</long>
			<long name="_version_">1442223835684274177</long>
		</doc>
	</result>
</response>
可以查看對應的日誌,示例如下所示:
2013-08-05 18:38:26.814 [http-bio-8888-exec-228] INFO  org.apache.solr.core.SolrCore  – [mycollection_shard1_0_replica2] webapp=/solr-cloud path=/select params={NOW=1375699145633&shard.url=10.95.3.62:8888/solr-cloud/mycollection_shard1_0_replica1/|10.95.3.61:8888/solr-cloud/mycollection_shard1_0_replica3/&fl=id,score&start=0&q=北京+紐約&distrib=false&wt=javabin&isShard=true&fsv=true&fq=category:公寓&fq=building_type:2&version=2&rows=10} hits=41529 status=0 QTime=102

2013-08-05 18:39:06.203 [http-bio-8888-exec-507] INFO  org.apache.solr.core.SolrCore  – [mycollection_shard3_replica1] webapp=/solr-cloud path=/select params={fl=*&start=0&q=北京+紐約&fq=category:公寓&fq=building_type:2&rows=10} hits=201568 status=0 QTime=570


相關問題

1、我在進行Collection的創建的時候,當前有4個節點,在ZooKeeper集羣中註冊,執行如下命令:
[hadoop@slave1 multicore]$ curl 'http://slave1:8888/solr-cloud/admin/collections?action=CREATE&name=tinycollection&numShards=2&replicationFactor=3'
出現異常:
<?xml version="1.0" encoding="UTF-8"?>
<response>
     <lst name="responseHeader">
          <int name="status">400</int>
          <int name="QTime">81</int>
     </lst>
     <str name="Operation createcollection caused exception:">org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Cannot create collection tinycollection. Value of maxShardsPerNode is 1, and the number of live nodes is 4. This allows a maximum of 4 to be created. Value of numShards is 2 and value of replicationFactor is 3. This requires 6 shards to be created (higher than the allowed number)</str>
     <lst name="exception">
          <str name="msg">Cannot create collection tinycollection. Value of maxShardsPerNode is 1, and the number of live nodes is 4. This allows a maximum of 4 to be created. Value of numShards is 2 and value of replicationFactor is 3. This requires 6 shards to be created (higher than the allowed number)</str>
          <int name="rspCode">400</int>
     </lst>
     <lst name="error">
          <str name="msg">Cannot create collection tinycollection. Value of maxShardsPerNode is 1, and the number of live nodes is 4. This allows a maximum of 4 to be created. Value of numShards is 2 and value of replicationFactor is 3. This requires 6 shards to be created (higher than the allowed number)</str>
          <int name="code">400</int>
     </lst>
</response>
根據上面異常信息可知,當前有4個節點可用,但是我在創建Collection的時候,指定兩個Shard,同時複製因子是3,所以最低要求,需要6個節點。所以,可以減少複製因子,例如replicationFactor=2,表示一共存在兩個副本(Leader分片和另一個副本),然後再執行創建Collection的操作就不會報錯了。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章