搜索引擎 apache-solr

SOLR

 

1.Solr server setup

Java environment setup

Download linux JDK 6 from this website :

http://java.sun.com/javase/downloads/index.jsp

After installing JDK, edit /ect/profile , add these code to the end of the file

JAVA_HOME=/usr/java/jdk1.6.0_16

PATH=$JAVA_HOME/bin:$PATH

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export JAVA_HOME

export PATH

export CLASSPATH

 

/usr/java/jdk1.6.0_16 is the folder of the jdk. You should change it ,if you don’t install jdk in this folder.

 Solr setup

1.Download solr (apache-solr-1.3.0.zip ) from this website:

http://ftp.kddilabs.jp/infosystems/apache/lucene/solr/

 

2.Install solr with following steps

#unzip -q apache-solr-1.3.0.zip
#cd apache-solr-1.3.0/example/
# java -jar start.jar
           we can see that the Solr is running by loading http://localhost:8983/solr/admin/ in web browser. This is the main starting point for Administering Solr.

This is tutorial of solr http://lucene.apache.org/solr/tutorial.html.

2.Search Apach solr with php.

This is a tutorial of php solr client example:

http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/

We use PHP Solr Client to access to solr server . Download PHP Solr Client from this website: http://code.google.com/p/solr-php-client/downloads/list

 

Change default Solr index data schema.

Solr index data schema is in the folder of “apache-solr-1.3.0/example/solr/conf/ schema.xml”

This is the snippet of solr schema.

<schema name="example" version="1.1">
 ...
 <fields>
 <field name="id" type="string" indexed="true" stored="true" required="true" /> 
   <field name="sku" type="textTight" indexed="true" stored="true" omitNorms="true"/>
   <field name="name" type="text" indexed="true" stored="true"/>
 ...
 </fields>
 <uniqueKey>id</uniqueKey>
 ...
 <defaultSearchField>text</defaultSearchField>
 ...
</schema>

Edit the field element , change it as below:

<field name="id" type="string" indexed="true" stored="true" required="true" />

 <field name="product_name" type="text" indexed="true" stored="true"/>

<defaultSearchField>product_name</defaultSearchField>

To make this change active ,we have to restart Solr server as command like this:

#java -jar start.jar

 

Create index by PHP

using php solr client , we can access to Solr easily.This is an example fo how to create an index by php.

<?php

require_once 'Apache/Solr/Service.php';

//10.60.0.111 is solr service ip.

$solr=new Apache_Solr_Service('10.60.0.111','8983','/solr');

if (!$solr->ping())

{

              echo("service not responding");

}

else

{

              echo("solr Service is available<br />");

}

$parts=array(

 '1'=>array(

 'id'=>'a123',

 'product_name'=>'garoontest'

 ),

 '2'=>array(

 'id'=>'a456',

 'product_name'=>'share360,test'

 )

 );

$documents = array();

 foreach ( $parts as $item => $fields ) {

    $part = new Apache_Solr_Document();

    foreach ( $fields as $key => $value ) {

      if ( is_array( $value ) ) {

        foreach ( $value as $datum ) {

          $part->setMultiValue( $key, $datum );

        }

      }

      else {

        $part->$key = $value;

      }

    }

    $documents[] = $part;

 }

   try {

    $solr->addDocuments( $documents );

    $solr->commit();

    $solr->optimize();

 }

 catch ( Exception $e ) {

    echo $e->getMessage();

 }

?>

l   Search index by PHP .

This is an example of searching index by php

<?php

require_once 'Apache/Solr/Service.php';

$solr=new Apache_Solr_Service('10.60.0.111','8983','/solr');

if (!$solr->ping())

{

              echo("service not responding");

}

else

{

              echo("sucess");

}

$offset = 0;

$limit = 10;

$query="garoon";

$response=$solr->search($query,$offset,$limit);

if ($response->getHttpStatus()==200)

{

 if ( $response->response->numFound > 0 ) {

        echo "$query <br />";

 

        foreach ( $response->response->docs as $doc )

        {

          echo "id: ".$doc->id."product_name ".$doc->product_name. "--";

          echo '<br />';

        }

        echo '<br />';

      }

}

else {

      echo $response->getHttpStatusMessage();

     }

     

?>

l    delete index by PHP

<?php

require_once 'Apache/Solr/Service.php';

 

//10.60.0.111 is solr service ip.

$solr=new Apache_Solr_Service('10.60.0.111','8983','/solr');

if (!$solr->ping())

{

              echo("service not responding <br />");

}

else

{

              echo("solr Service is available<br />");

}

$response=$solr->deleteById("a123");

echo($response->getHttpStatusMessage());

?>

 update index by PHP

 If we want to update a document to index , there are two methods to resolve it :

     Method 1: delete the document by id, and then add an new one to index.

     Method 2: use the add method to directly add the document to index , because id is an indentify field, Solr server will use new document to cover the old one.

 如何使Solr支持中文,日文和英文的全文搜索呢。apache提供提供了一個 cjk庫函數供我們使用,具體使用參考:http://chaifeng.com/blog/2008/01/_apache_solr.html

 默認情況下 Apache Solr 是不支持中文檢索的,如果文檔中包含中文,必須用完整的一句中文才能檢索出內容。
下面以 Apache Solr 的演示程序爲例,注意:粗體部分是需要修改的地方。
找到如下三行:
     <fieldType name="text" class="solr.TextFieldpositionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
修改爲:
     <fieldType name="text" class="solr.TextField">
       <analyzer type="index" class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
         <tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
找到如下兩行:
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
修改爲:
       <analyzer type="query" class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
         <tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
修改完畢,重新運行 Apache Solr 就可以對中文進行檢索了,原先已經導入的文檔需要重新導入。
記住原先的配置中有個 positionIncrementGap="100" 一定要刪除了,否則會有異常。

 注意:如果是php編程,一定要讓程序代碼的編碼格式爲utf-8編碼形式,不然創建索引會失敗。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章