我封裝的全文檢索之lucene篇 原

    最近利用晚上下班還有周末的時間自己搗騰的封裝了一個我自己的全文檢索引擎(基於lucene和solr).現在將大概的思路給寫出來,分享下:

    1.首先是索引對象,也可以說是查詢的VO對象.封裝了幾個常用字段(如:主鍵,所屬者ID,所屬者姓名,進入詳情頁面的link,創建時間等),其他各個模塊的字段(如:標題,內容,郵箱等)

SearchBean.java

字段的代碼如下:

/********以下 共有字段***********/
    /**
     * 檢索的內容
     */
    protected String keyword;
    /**
     * 擁有者ID
     */
    protected String owerId;
    /**
     * 擁有者name
     */
    protected String owerName;
    /**
     * 檢索對象的唯一標識位的值
     */
    protected String id;
    /**
     * 檢索出對象後進入詳情頁面的鏈接
     */
    protected String link;
    /**
     * 創建時間
     */
    protected String createDate;
    /**
     * index類型
     */
    protected String indexType;

    //setter,getter方法省略
/********以上 共有字段***********/

/*************以下 其他字段************/
    /**
     * 需要檢索出來的字段及其值的對應map
     */
    private Map<String, String> searchValues;

    /**
     * 值對象
     */
    private Object object;

    /**
     * 獲取檢索出來的doIndexFields字段的值
     *
     * @return
     */
    public Map<String, String> getSearchValues() {
        return searchValues;
    }

    /**
     * 設置檢索出來的doIndexFields字段的值
     *
     * @param searchValues
     */
    public void setSearchValues(Map<String, String> searchValues) {
        this.searchValues = searchValues;
    }
    /********************以上 其他字段*******************/

抽象方法代碼如下:


/*****************以下 抽象方法******************/
    /**
     * 返回需要進行檢索的字段
     *
     * @return
     */
    public abstract String[] getDoSearchFields();

    /**
     * 進行索引的字段
     *
     * @return
     */
    public abstract String[] getDoIndexFields();

    /**
     * 初始化searchBean中的公共字段(每個對象都必須創建的索引字段)
     * @throws Exception
     */
    public abstract void initPublicFields() throws Exception;

    /**
     * 返回索引類型
     * 
     * @return
     */
    public abstract String getIndexType();
    /*****************以上 抽象方法********************/
共有的方法:



/*******************以下 公共方法**********************/
    /**
     * 獲取需要創建索引字段的鍵值對map
     *
     * @return
     */
    public Map<String, String> getIndexFieldValues() {
        if(this.object == null){
            logger.warn("given object is null!");
            return Collections.emptyMap();
        }

        String[] doIndexFields = this.getDoIndexFields();
        if(doIndexFields == null || doIndexFields.length < 1){
            logger.debug("given no doIndexFields!");
            return Collections.emptyMap();
        }

        Map<String, String> extInfo = new HashMap<String, String>();
        for(String f : doIndexFields){
            String value = getValue(f, object);
            extInfo.put(f, value);
        }

        return extInfo;
    }

    /**
     * 獲取一個對象中的某個字段的值,結果轉化成string類型
     *
     * @param field         字段名稱
     * @param obj           對象
     * @return
     */
    private String getValue(String field, Object obj){
        if(StringUtils.isEmpty(field)){
            logger.warn("field is empty!");
            return StringUtils.EMPTY;
        }

        String result = StringUtils.EMPTY;
        try {
            Object value = ObjectUtils.getFieldValue(object, field);
            if (value == null)
                result = StringUtils.EMPTY;
            else if (value instanceof String)
                result = (String) value;
            else if (value instanceof Collections || value instanceof Map)
                result = ToStringBuilder.reflectionToString(object);
            else if (value instanceof Date)
                result = DateUtils.formatDate((Date) value);
            else
                result = value.toString();

        } catch (IllegalAccessException e) {
            logger.error("can not find a value for field '{}' in object class '{}'!", field, object.getClass());
        }

        return result;
    }

    /**
     * you must use this method when you create the index, set what object you will to be created its index!
     *
     * @param object            the object which you will want to be create index
     */
    public void setObject(Object object){
        this.object = object;
    }

    /**
     * get what object you want to be created index!
     *
     * @return
     */
    public Object getObject(){
        return this.object;
    }
    /***************以上 公共方法*************/

2.現在有很多開源或者閉源的索引引擎可以用在項目上使用,所以我寫了一個接口和一個抽取了一些公共方法的抽象類,只需要將你選擇的搜索引擎的具體創建索引,檢索等功能的實現代碼寫在一個繼承上面這個抽象類的子類中,就可以隨意的切換使用的目標引擎.貼上接口和抽象類

SearchEngine.java

package com.message.base.search.engine;

import com.message.base.pagination.PaginationSupport;
import com.message.base.search.SearchBean;

import java.util.List;

/**
 * 索引引擎實現構建索引.刪除索引.更新索引.檢索等操作.
 *
 * @author sunhao([email protected])
 * @version V1.0
 * @createTime 13-5-5 上午1:38
 */
public interface SearchEngine {

    /**
     * 創建索引(考慮線程安全)
     *
     * @param searchBeans       對象
     * @throws Exception
     */
    public void doIndex(List<SearchBean> searchBeans) throws Exception;

    /**
     * 刪除索引
     *
     * @param bean              對象
     * @throws Exception
     */
    public void deleteIndex(SearchBean bean) throws Exception;

    /**
     * 刪除索引(刪除多個)
     *
     * @param beans             對象
     * @throws Exception
     */
    public void deleteIndexs(List<SearchBean> beans) throws Exception;

    /**
     * 進行檢索
     *
     * @param bean              檢索對象(一般只需要放入值keyword,即用來檢索的關鍵字)
     * @param isHighlighter     是否高亮
     * @param start             開始值
     * @param num               偏移量
     * @return
     * @throws Exception
     */
    public PaginationSupport doSearch(SearchBean bean, boolean isHighlighter, int start, int num) throws Exception;

    /**
     * 進行多個檢索對象的檢索
     *
     * @param beans             多個檢索對象(一般只需要放入值keyword,即用來檢索的關鍵字)
     * @param isHighlighter     是否高亮
     * @param start             開始值
     * @param num               偏移量
     * @return
     * @throws Exception
     */
    public PaginationSupport doSearch(List<SearchBean> beans, boolean isHighlighter, int start, int num) throws Exception;

    /**
     * 刪除某個類型的所有索引(考慮線程安全)
     *
     * @param clazz             索引類型
     * @throws Exception
     */
    public void deleteIndexsByIndexType(Class<? extends SearchBean> clazz) throws Exception;

    /**
     * 刪除某個類型的所有索引(考慮線程安全)
     *
     * @param indexType         索引類型
     * @throws Exception
     */
    public void deleteIndexsByIndexType(String indexType) throws Exception;

    /**
     * 刪除所有的索引
     *
     * @throws Exception
     */
    public void deleteAllIndexs() throws Exception;

    /**
     * 更新索引
     *
     * @param searchBean        需要更新的bean
     * @throws Exception
     */
    public void updateIndex(SearchBean searchBean) throws Exception;

    /**
     * 批量更新索引
     *
     * @param searchBeans       需要更新的beans
     * @throws Exception
     */
    public void updateIndexs(List<SearchBean> searchBeans) throws Exception;
}

AbstractSearchEngine.java


package com.message.base.search.engine;

import com.message.base.pagination.PaginationSupport;
import com.message.base.pagination.PaginationUtils;
import com.message.base.search.SearchBean;
import com.message.base.utils.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Collections;

/**
 * 搜索引擎的公用方法.
 *
 * @author sunhao([email protected])
 * @version V1.0
 * @createTime 13-5-8 下午10:53
 */
public abstract class AbstractSearchEngine implements SearchEngine {
    private static final Logger logger = LoggerFactory.getLogger(AbstractSearchEngine.class);

    /**
     * 進行高亮處理時,html片段的前綴
     */
    private String htmlPrefix = "<p>";
    /**
     * 進行高亮處理時,html片段的後綴
     */
    private String htmlSuffix = "</p>";

    public String getHtmlPrefix() {
        return htmlPrefix;
    }

    public void setHtmlPrefix(String htmlPrefix) {
        this.htmlPrefix = htmlPrefix;
    }

    public String getHtmlSuffix() {
        return htmlSuffix;
    }

    public void setHtmlSuffix(String htmlSuffix) {
        this.htmlSuffix = htmlSuffix;
    }

    public PaginationSupport doSearch(SearchBean bean, boolean isHighlighter, int start, int num) throws Exception {
        if(bean == null){
            logger.debug("given search bean is empty!");
            return PaginationUtils.getNullPagination();
        }

        return doSearch(Collections.singletonList(bean), isHighlighter, start, num);
    }

    /**
     * 獲取index類型
     *
     * @param bean
     * @return
     */
    public String getIndexType(SearchBean bean){
        return StringUtils.isNotEmpty(bean.getIndexType()) ? bean.getIndexType() : bean.getClass().getSimpleName();
    }
}


3.開始談談lucene

貼上代碼先:

LuceneSearchEngine.java


package com.message.base.search.engine;

import com.message.base.pagination.PaginationSupport;
import com.message.base.pagination.PaginationUtils;
import com.message.base.search.SearchBean;
import com.message.base.search.SearchInitException;
import com.message.base.utils.StringUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.BeanUtils;

import java.io.File;
import java.io.IOException;
import java.util.*;

/**
 * 基於lucene實現的索引引擎.
 *
 * @author sunhao([email protected])
 * @version V1.0
 * @createTime 13-5-5 上午10:38
 */
public class LuceneSearchEngine extends AbstractSearchEngine {
    private static final Logger logger = LoggerFactory.getLogger(LuceneSearchEngine.class);
    /**
     * 索引存放路徑
     */
    private String indexPath;
    /**
     * 分詞器
     */
    private Analyzer analyzer = new SimpleAnalyzer();

    public synchronized void doIndex(List<SearchBean> searchBeans) throws Exception {
        this.createOrUpdateIndex(searchBeans, true);
    }

    public synchronized void deleteIndex(SearchBean bean) throws Exception {
        if(bean == null){
            logger.warn("Get search bean is empty!");
            return;
        }

        String id = bean.getId();

        if(StringUtils.isEmpty(id)){
            logger.warn("get id and id value from bean is empty!");
            return;
        }
        String indexType = getIndexType(bean);
        Directory indexDir = this.getIndexDir(indexType);
        IndexWriter writer = this.getWriter(indexDir);

        writer.deleteDocuments(new Term("pkId", id));
        writer.commit();
        this.destroy(writer);
    }

    public synchronized void deleteIndexs(List<SearchBean> beans) throws Exception {
        if(beans == null){
            logger.warn("Get beans is empty!");
            return;
        }

        for(SearchBean bean : beans){
            this.deleteIndex(bean);
        }
    }

    public PaginationSupport doSearch(List<SearchBean> beans, boolean isHighlighter, int start, int num) throws Exception {
        if(beans == null || beans.isEmpty()){
            logger.debug("given search beans is empty!");
            return PaginationUtils.getNullPagination();
        }

        List queryResults = new ArrayList();
        int count = 0;
        for(SearchBean bean : beans){
            String indexType = getIndexType(bean);

            IndexReader reader = IndexReader.open(this.getIndexDir(indexType));

            List<String> fieldNames = new ArrayList<String>();             //查詢的字段名
            List<String> queryValue = new ArrayList<String>();             //待查詢字段的值
            List<BooleanClause.Occur> flags = new ArrayList<BooleanClause.Occur>();

            //要進行檢索的字段
            String[] doSearchFields = bean.getDoSearchFields();
            if(doSearchFields == null || doSearchFields.length == 0)
                return PaginationUtils.getNullPagination();

            //默認字段
            if(StringUtils.isNotEmpty(bean.getKeyword())){
                for(String field : doSearchFields){
                    fieldNames.add(field);
                    queryValue.add(bean.getKeyword());
                    flags.add(BooleanClause.Occur.SHOULD);
                }
            }

            Query query = MultiFieldQueryParser.parse(Version.LUCENE_CURRENT, queryValue.toArray(new String[]{}), fieldNames.toArray(new String[]{}),
                    flags.toArray(new BooleanClause.Occur[]{}), analyzer);

            logger.debug("make query string is '{}'!", query.toString());
            IndexSearcher searcher = new IndexSearcher(reader);
            ScoreDoc[] scoreDocs = searcher.search(query, 1000000).scoreDocs;

            //查詢起始記錄位置
            int begin = (start == -1 && num == -1) ? 0 : start;
            //查詢終止記錄位置
            int end = (start == -1 && num == -1) ? scoreDocs.length : Math.min(begin + num, scoreDocs.length);

            //高亮處理
            Highlighter highlighter = null;
            if(isHighlighter){
                SimpleHTMLFormatter formatter = new SimpleHTMLFormatter(this.getHtmlPrefix(), this.getHtmlSuffix());
                highlighter = new Highlighter(formatter, new QueryScorer(query));
            }

            List<SearchBean> results = new ArrayList<SearchBean>();
            for (int i = begin; i < end; i++) {
                SearchBean result = BeanUtils.instantiate(bean.getClass());

                int docID = scoreDocs[i].doc;
                Document hitDoc = searcher.doc(docID);

                result.setId(hitDoc.get("pkId"));
                result.setLink(hitDoc.get("link"));
                result.setOwerId(hitDoc.get("owerId"));
                result.setOwerName(hitDoc.get("owerName"));
                result.setCreateDate(hitDoc.get("createDate"));
                result.setIndexType(indexType);

                String keyword = StringUtils.EMPTY;
                if(isHighlighter && highlighter != null)
                    keyword = highlighter.getBestFragment(analyzer, "keyword", hitDoc.get("keyword"));

                if(StringUtils.isEmpty(keyword))
                    keyword = hitDoc.get("keyword");

                result.setKeyword(keyword);

                Map<String, String> extendValues = new HashMap<String, String>();
                for(String field : doSearchFields){
                    String value = hitDoc.get(field);
                    if(isHighlighter && highlighter != null)
                        value = highlighter.getBestFragment(analyzer, field, hitDoc.get(field));

                    if(StringUtils.isEmpty(value))
                        value = hitDoc.get(field);

                    extendValues.put(field, value);
                }

                result.setSearchValues(extendValues);

                results.add(result);
            }

            queryResults.addAll(results);
            count += scoreDocs.length;
            searcher.close();
            reader.close();
        }

        PaginationSupport paginationSupport = PaginationUtils.makePagination(queryResults, count, num, start);
        return paginationSupport;
    }

    public synchronized void deleteIndexsByIndexType(Class<? extends SearchBean> clazz) throws Exception {
        String indexType = getIndexType(BeanUtils.instantiate(clazz));
        this.deleteIndexsByIndexType(indexType);
    }

    public synchronized void deleteIndexsByIndexType(String indexType) throws Exception {
        //傳入readOnly的參數,默認是隻讀的
        IndexReader reader = IndexReader.open(this.getIndexDir(indexType), false);
        int result = reader.deleteDocuments(new Term("indexType", indexType));
        reader.close();
        logger.debug("the rows of delete index is '{}'! index type is '{}'!", result, indexType);
    }

    public synchronized void deleteAllIndexs() throws Exception {
        File indexFolder = new File(this.indexPath);
        if(indexFolder == null || !indexFolder.isDirectory()){
            //不存在或者不是文件夾
            logger.debug("indexPath is not a folder! indexPath: '{}'!", indexPath);
            return;
        }

        File[] children = indexFolder.listFiles();
        for(File child : children){
            if(child == null || !child.isDirectory()) continue;

            String indexType = child.getName();
            logger.debug("Get indexType is '{}'!", indexType);

            this.deleteIndexsByIndexType(indexType);
        }
    }

    public void updateIndex(SearchBean searchBean) throws Exception {
        this.updateIndexs(Collections.singletonList(searchBean));
    }

    public void updateIndexs(List<SearchBean> searchBeans) throws Exception {
        this.createOrUpdateIndex(searchBeans, false);
    }

    /**
     * 創建或者更新索引
     *
     * @param searchBeans       需要創建或者更新的對象
     * @param isCreate          是否是創建索引;true創建索引,false更新索引
     * @throws Exception
     */
    private synchronized void createOrUpdateIndex(List<SearchBean> searchBeans, boolean isCreate) throws Exception {
        if(searchBeans == null || searchBeans.isEmpty()){
            logger.debug("do no index!");
            return;
        }

        Directory indexDir = null;
        IndexWriter writer = null;
        for(Iterator<SearchBean> it = searchBeans.iterator(); it.hasNext(); ){
            SearchBean sb = it.next();
            String indexType = getIndexType(sb);
            if(sb == null){
                logger.debug("give SearchBean is null!");
                return;
            }
            boolean anotherSearchBean = indexDir != null && !indexType.equals(((FSDirectory) indexDir).getFile().getName());
            if(indexDir == null || anotherSearchBean){
                indexDir = this.getIndexDir(indexType);
            }
            if(writer == null || anotherSearchBean){
                this.destroy(writer);
                writer = this.getWriter(indexDir);
            }

            Document doc = new Document();

            //初始化一些字段
            sb.initPublicFields();
            String id = sb.getId();

            //主鍵的索引,不作爲搜索字段,並且也不進行分詞
            Field idField = new Field("pkId", id, Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
            doc.add(idField);

            logger.debug("create id index for '{}', value is '{}'! index is '{}'!", new Object[]{"pkId", id, idField});

            String owerId = sb.getOwerId();
            if(StringUtils.isEmpty(owerId)){
                throw new SearchInitException("you must give a owerId");
            }
            Field owerId_ = new Field("owerId", owerId, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
            doc.add(owerId_);

            String owerName = sb.getOwerName();
            if(StringUtils.isEmpty(owerName)){
                throw new SearchInitException("you must give a owerName");
            }
            Field owerName_ = new Field("owerName", owerName, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
            doc.add(owerName_);

            String link = sb.getLink();
            if(StringUtils.isEmpty(link)){
                throw new SearchInitException("you must give a link");
            }
            Field link_ = new Field("link", link, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
            doc.add(link_);

            String keyword = sb.getKeyword();
            if(StringUtils.isEmpty(keyword)){
                throw new SearchInitException("you must give a keyword");
            }
            Field keyword_ = new Field("keyword", keyword, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
            doc.add(keyword_);

            String createDate = sb.getCreateDate();
            if(StringUtils.isEmpty(createDate)){
                throw new SearchInitException("you must give a createDate");
            }
            Field createDate_ = new Field("createDate", createDate, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
            doc.add(createDate_);

            //索引類型字段
            Field indexType_ = new Field("indexType", indexType, Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
            doc.add(indexType_);

            //進行索引的字段
            String[] doIndexFields = sb.getDoIndexFields();
            Map<String, String> indexFieldValues = sb.getIndexFieldValues();
            if(doIndexFields != null && doIndexFields.length > 0){
                for(String field : doIndexFields){
                    Field extInfoField = new Field(field, indexFieldValues.get(field), Field.Store.YES, Field.Index.ANALYZED,
                            Field.TermVector.WITH_POSITIONS_OFFSETS);

                    doc.add(extInfoField);
                }
            }

            if(isCreate)
                writer.addDocument(doc);
            else
                writer.updateDocument(new Term("pkId", sb.getId()), doc);

            writer.optimize();
        }

        this.destroy(writer);
        logger.debug("create or update index success!");
    }

    public Directory getIndexDir(String suffix) throws Exception {
        return FSDirectory.open(new File(indexPath + File.separator + suffix));
    }

    public IndexWriter getWriter(Directory indexDir) throws IOException {
        return new IndexWriter(indexDir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
    }

    public void destroy(IndexWriter writer) throws Exception {
        if(writer != null)
            writer.close();
    }

    public void setIndexPath(String indexPath) {
        this.indexPath = indexPath;
    }

    public void setAnalyzer(Analyzer analyzer) {
        this.analyzer = analyzer;
    }

}



關於如何使用lucene這裏我就不再重複了,網上一大堆這方面的資料,有什麼不懂得可以谷歌一下.下面談談我的一些想法,有不對的,儘管拍磚,來吧:

....

也沒啥好說的,等想到再補充吧,就是覺得有一點比較操蛋,窩心:

FSDirectory.open(new File("D:\index\xxx"/**一個不存在的目錄,或者是一個不是索引的目錄**/));
使用上面一段取到索引Directory的時候,如果目錄不存在會報錯.可以有人認爲這沒什麼,就是應該,我封裝的這代碼裏面,確實對這玩意有要求的.


上面的SearchBean.java中有一個字段叫indexType,當沒有指定的時候,默認爲類名,如MessageSerarchBean,如果我沒有對Message進行創建索引操作,在檢索的時候就報錯了.我得想想用什麼方法給解決掉.


最後PS:這是博客,沒法上傳代碼,所以我就在代碼分享的地方上傳代碼,link: http://www.oschina.net/code/snippet_151849_21445;


下一篇:我封裝的全文檢索之lucene篇

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章