基於Lucene多索引進行索引和搜索

Lucene支持創建多個索引目錄，同時存儲多個索引。我們可能擔心的問題是，在索引的過程中，分散地存儲到多個索引目錄中，是否在搜索時能夠得到全局的相關度計算得分，其實Lucene的ParallelMultiSearcher和MultiSearcher支持全局得分的計算，也就是說，雖然索引分佈在多個索引目錄中，在搜索的時候還會將全部的索引數據聚合在一起進行查詢匹配和得分計算。

索引目錄處理

下面我們通過將索引隨機地分佈到以a~z的26個目錄中，並實現一個索引和搜索的程序，來驗證一下Lucene得分的計算。

首先，實現一個用來構建索引目錄以及處理搜索的工具類，代碼如下所示：

package org.shirdrn.lucene;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.search.DefaultSimilarity;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Searchable;
import org.apache.lucene.search.Similarity;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.shirdrn.lucene.MultipleIndexing.IndexWriterObj;

/**
 * Indexing accross multiple Lucene indexes.
 * 
 * @author shirdrn
 * @date   2011-12-12
 */
public class IndexHelper {
	
	private static WriterHelper writerHelper = null;
	private static SearcherHelper searcherHelper = null;
	
	public static WriterHelper newWriterHelper(String root, IndexWriterConfig indexConfig) {
		return WriterHelper.newInstance(root, indexConfig);
	}
	
	public static SearcherHelper newSearcherHelper(String root, IndexWriterConfig indexConfig) {
		return SearcherHelper.newInstance(root, indexConfig);
	}

	protected static class WriterHelper {
		private String alphabet = "abcdefghijklmnopqrstuvwxyz";
		private Lock locker = new ReentrantLock();
		private String indexRootDir = null;
		private IndexWriterConfig indexConfig;
		private Map<Character, IndexWriterObj> indexWriters = new HashMap<Character, IndexWriterObj>();
		private static Random random = new Random();
		private WriterHelper() {
			
		}
		private synchronized static WriterHelper newInstance(String root, IndexWriterConfig indexConfig) {
			if(writerHelper==null) {
				writerHelper = new WriterHelper();
				writerHelper.indexRootDir = root;
				writerHelper.indexConfig = indexConfig;
			}
			return writerHelper;
		}
		public IndexWriterObj selectIndexWriter() {
			int pos = random.nextInt(alphabet.length());
			char ch = alphabet.charAt(pos);
			String dir = new String(new char[] {ch});
			locker.lock();
			try {
				File path = new File(indexRootDir, dir);
				if(!path.exists()) {
					path.mkdir();
				}
				if(!indexWriters.containsKey(ch)) {
					IndexWriter indexWriter = new IndexWriter(FSDirectory.open(path), indexConfig.getAnalyzer(), MaxFieldLength.UNLIMITED);
					indexWriters.put(ch, new IndexWriterObj(indexWriter, dir));
				}
			} catch (CorruptIndexException e) {
				e.printStackTrace();
			} catch (LockObtainFailedException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			} finally {
				locker.unlock();
			}
			return indexWriters.get(ch);
		}
		@SuppressWarnings("deprecation")
		public void closeAll(boolean autoOptimize) {
			Iterator<Map.Entry<Character, IndexWriterObj>> iter = indexWriters.entrySet().iterator();
			while(iter.hasNext()) {
				Map.Entry<Character, IndexWriterObj> entry = iter.next();
				try {
					if(autoOptimize) {
						entry.getValue().indexWriter.optimize();
					}
					entry.getValue().indexWriter.close();
				} catch (CorruptIndexException e) {
					e.printStackTrace();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
	}
	
	protected static class SearcherHelper {
		private List<IndexSearcher> searchers = new ArrayList<IndexSearcher>();
		private Similarity similarity = new DefaultSimilarity();
		private SearcherHelper() {
			
		}
		private synchronized static SearcherHelper newInstance(String root, IndexWriterConfig indexConfig) {
			if(searcherHelper==null) {
				searcherHelper = new SearcherHelper();
				if(indexConfig.getSimilarity()!=null) {
					searcherHelper.similarity = indexConfig.getSimilarity();
				}
				File indexRoot = new File(root);
				File[] files = indexRoot.listFiles();
				for(File f : files) {
					IndexSearcher searcher = null;
					try {
						searcher = new IndexSearcher(FSDirectory.open(f));
					} catch (CorruptIndexException e) {
						e.printStackTrace();
					} catch (IOException e) {
						e.printStackTrace();
					}
					if(searcher!=null) {
						searcher.setSimilarity(searcherHelper.similarity);
						searcherHelper.searchers.add(searcher);
					}
				}
			}
			return searcherHelper;
		}
		public void closeAll() {
			Iterator<IndexSearcher> iter = searchers.iterator();
			while(iter.hasNext()) {
				try {
					iter.next().close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
		public Searchable[] getSearchers() {
			Searchable[] a = new Searchable[searchers.size()];
			return searchers.toArray(a);
		}
	}
}

由於在索引的時候，同時打開了多個Directory實例，而每個Directory對應一個IndexWriter，我們通過記錄a~z這26個字母爲每個IndexWriter的名字，將IndexWriter和目錄名稱包裹在IndexWriterObj類的對象中，便於通過日誌看到實際數據的分佈。在進行Lucene Document構建的時候，將這個索引目錄的名字（a~z字符中之一）做成一個Field。在索引的時候，值需要調用IndexHelper.WriterHelper的selectIndexWriter()方法，即可以自動選擇對應的IndexWriter實例去進行索引。

在搜索的時候，通過IndexHelper.SearcherHelper工具來獲取多個Searchable實例的數組，調用getSearchers()即可以獲取到，提供給MultiSearcher構建搜索。

索引實現

我們的數據源，選擇根據指定的查詢條件直接從MongoDB中讀取，所以有關處理與MongoDB進行交互的代碼都封裝到了處理索引的代碼中，通過內部類實現。我們看一下，執行索引數據的實現，代碼如下所示：

package org.shirdrn.lucene;

import java.io.IOException;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.shirdrn.lucene.IndexHelper.WriterHelper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.Mongo;
import com.mongodb.MongoException;

/**
 * Indexing accross multiple Lucene indexes.
 * 
 * @author shirdrn
 * @date   2011-12-12
 */
public class MultipleIndexing {

	private static Logger LOG = LoggerFactory.getLogger(MultipleIndexing.class);
	private DBCollection pageColletion;
	private WriterHelper writerHelper;
	private Map<IndexWriter, IntCounter> docCountPerIndexWriter = new HashMap<IndexWriter, IntCounter>();
	private int maxIndexCommitCount = 100;
	private AtomicInteger docCounter = new AtomicInteger();

	public MultipleIndexing(String indexRoot, int maxIndexCommitCount, MongoConfig mongoConfig, IndexWriterConfig indexConfig) {
		super();
		if (maxIndexCommitCount != 0) {
			this.maxIndexCommitCount = maxIndexCommitCount;
		}
		pageColletion = MongoHelper.newHelper(mongoConfig).getCollection(mongoConfig.collectionName);
		writerHelper = IndexHelper.newWriterHelper(indexRoot, indexConfig);
	}

	/**
	 * Indexing
	 * @param conditions
	 */
	public void index(Map<String, Object> conditions) {
		DBCursor cursor = pageColletion.find(new BasicDBObject(conditions));
		try {
			while (cursor.hasNext()) {
				try {
					IndexWriterObj obj = writerHelper.selectIndexWriter();
					Document document = encapsulate(cursor.next().toMap(), obj.name);
					obj.indexWriter.addDocument(document);
					docCounter.addAndGet(1);
					LOG.info("Global docCounter: " + docCounter.get());
					increment(obj.indexWriter);
					checkCommit(obj.indexWriter);
				} catch (MongoException e) {
					e.printStackTrace();
				} catch (CorruptIndexException e) {
					e.printStackTrace();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
			finallyCommitAll();
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			cursor.close();
			writerHelper.closeAll(true);
			LOG.info("Close all indexWriters.");
		}
	}

	private void finallyCommitAll() throws Exception {
		Iterator<IndexWriter> iter = docCountPerIndexWriter.keySet().iterator();
		while(iter.hasNext()) {
			iter.next().commit();
		}
	}

	private void checkCommit(IndexWriter indexWriter) throws Exception {
		if(docCountPerIndexWriter.get(indexWriter).value%maxIndexCommitCount==0) {
			indexWriter.commit();
			LOG.info("Commit: " + indexWriter + ", " + docCountPerIndexWriter.get(indexWriter).value);
		}
	}

	private void increment(IndexWriter indexWriter) {
		IntCounter counter = docCountPerIndexWriter.get(indexWriter);
		if (counter == null) {
			counter = new IntCounter(1);
			docCountPerIndexWriter.put(indexWriter, counter);
		} else {
			++counter.value;
		}
	}

	@SuppressWarnings("unchecked")
	private Document encapsulate(Map map, String path) {
		String title = (String) map.get("title");
		String content = (String) map.get("content");
		String url = (String) map.get("url");
		Document doc = new Document();
		doc.add(new Field(FieldName.TITLE, title, Store.YES, Index.ANALYZED_NO_NORMS));
		doc.add(new Field(FieldName.CONTENT, content, Store.NO, Index.ANALYZED));
		doc.add(new Field(FieldName.URL, url, Store.YES, Index.NOT_ANALYZED_NO_NORMS));
		doc.add(new Field(FieldName.PATH, path, Store.YES, Index.NOT_ANALYZED_NO_NORMS));
		return doc;
	}

	protected interface FieldName {
		public static final String TITLE = "title";
		public static final String CONTENT = "content";
		public static final String URL = "url";
		public static final String PATH = "path";
	}

	protected class IntCounter {
		public IntCounter(int value) {
			super();
			this.value = value;
		}
		private int value;
	}
	
	protected static class IndexWriterObj {
		IndexWriter indexWriter;
		String name;
		public IndexWriterObj(IndexWriter indexWriter, String name) {
			super();
			this.indexWriter = indexWriter;
			this.name = name;
		}
		@Override
		public String toString() {
			return "[" + name + "]";
		}
	}
	
	public static class MongoConfig implements Serializable {
		private static final long serialVersionUID = -3028092758346115702L;
		private String host;
		private int port;
		private String dbname;
		private String collectionName;

		public MongoConfig(String host, int port, String dbname, String collectionName) {
			super();
			this.host = host;
			this.port = port;
			this.dbname = dbname;
			this.collectionName = collectionName;
		}

		@Override
		public boolean equals(Object obj) {
			MongoConfig other = (MongoConfig) obj;
			return host.equals(other.host) && port == other.port && dbname.equals(other.dbname) && collectionName.equals(other.collectionName);
		}
	}

	protected static class MongoHelper {
		private static Mongo mongo;
		private static MongoHelper helper;
		private MongoConfig mongoConfig;

		private MongoHelper(MongoConfig mongoConfig) {
			super();
			this.mongoConfig = mongoConfig;
		}

		public synchronized static MongoHelper newHelper(MongoConfig mongoConfig) {
			try {
				if (helper == null) {
					helper = new MongoHelper(mongoConfig);
					mongo = new Mongo(mongoConfig.host, mongoConfig.port);
					Runtime.getRuntime().addShutdownHook(new Thread() {
						@Override
						public void run() {
							if (mongo != null) {
								mongo.close();
							}
						}
					});
				}
			} catch (Exception e) {
				e.printStackTrace();
			}
			return helper;
		}

		public DBCollection getCollection(String collectionName) {
			DBCollection c = null;
			try {
				c = mongo.getDB(mongoConfig.dbname).getCollection(collectionName);
			} catch (Exception e) {
				e.printStackTrace();
			}
			return c;
		}
	}
}

上面代碼是基於單線程的，如果你的應用具有海量的數據，這種方式勢必會影響索引的吞吐量。不過基於上述代碼很容易將其改造成多線程的，基本思路就是：將數據源在內存中適當緩存，然後基於生產者-消費者模型，啓動多個消費線程去獲取數據同時併發地向IndexWriter推送數據（尤其需要注意的是，在同一個IndexWriter實例上不要進行同步，否則容易造成死鎖，因爲IndexWriter是線程安全的）。

另外需要說明一點，有關索引數據分佈和更新的問題。基於上述隨機選擇索引目錄，在一定程度上能夠均勻地將數據分佈到不同的目錄中，但是在更新的時候，如果處理不當會造成數據的重複（因爲隨機），解決重複的方法就是在外部增加重複檢測工作，限制將重複（非常相似）的文檔再次進行索引。

下面我們看一下索引的測試用例，代碼如下所示：

package org.shirdrn.lucene;

import java.util.HashMap;
import java.util.Map;

import junit.framework.TestCase;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.util.Version;
import org.shirdrn.lucene.MultipleIndexing.MongoConfig;

public class TestMultipleIndexing extends TestCase {

	MultipleIndexing indexer;
	
	@Override
	protected void setUp() throws Exception {
		MongoConfig mongoConfig = new MongoConfig("192.168.0.184", 27017, "page", "Article");
		String indexRoot = "E:\\Store\\indexes";
		int maxIndexCommitCount = 200;
		Analyzer a = new SmartChineseAnalyzer(Version.LUCENE_35, true);
		IndexWriterConfig indexConfig = new IndexWriterConfig(Version.LUCENE_35, a);
		indexConfig.setOpenMode(OpenMode.CREATE);
		indexer = new MultipleIndexing(indexRoot, maxIndexCommitCount, mongoConfig, indexConfig);
	}
	
	@Override
	protected void tearDown() throws Exception {
		super.tearDown();
	}
	
	public void testIndexing() {
		Map<String, Object> conditions = new HashMap<String, Object>();
		conditions.put("spiderName", "sinaSpider");
		indexer.index(conditions);
	}
}

我這裏，索引了9w多篇文檔，生成的索引還算均勻地分佈在名稱爲a~z的26個目錄中。

搜索實現

在搜索的時候，你可以選擇ParallelMultiSearcher或MultiSearcher的任意一個，MultiSearcher是在搜索時候，通過一個循環來遍歷多個索引獲取到檢索結果，而ParallelMultiSearcher則是啓動多個線程並行執行搜索，使用它們的效率在不同配置的機器上效果是不同的，在實際使用的時候根據你的需要來決定。我簡單地使用了MultiSearcher來構建搜索，實現代碼如下所示：

package org.shirdrn.lucene;

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.MultiSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.util.Version;
import org.shirdrn.lucene.IndexHelper.SearcherHelper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * Searching accross multiple Lucene indexes.
 * 
 * @author shirdrn
 * @date   2011-12-12
 */
public class MultipleSearching {

	private static Logger LOG = LoggerFactory.getLogger(MultipleSearching.class);
	private SearcherHelper searcherHelper;
	private Searcher searcher;
	private QueryParser queryParser;
	private IndexWriterConfig indexConfig;
	
	private Query query;
	private ScoreDoc[] scoreDocs;
	
	public MultipleSearching(String indexRoot, IndexWriterConfig indexConfig) {
		searcherHelper = IndexHelper.newSearcherHelper(indexRoot, indexConfig);
		this.indexConfig = indexConfig;
		try {
			searcher = new MultiSearcher(searcherHelper.getSearchers());
			searcher.setSimilarity(indexConfig.getSimilarity());
			queryParser = new QueryParser(Version.LUCENE_35, "content", indexConfig.getAnalyzer());
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public void search(String queries) {
		try {
			query = queryParser.parse(queries);
			TopScoreDocCollector collector = TopScoreDocCollector.create(100000, true);
			searcher.search(query, collector);
			scoreDocs = collector.topDocs().scoreDocs;
		} catch (ParseException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public void iterateDocs(int start, int end) {
		for (int i = start; i < Math.min(scoreDocs.length, end); i++) {
			try {
				LOG.info(searcher.doc(scoreDocs[i].doc).toString());
			} catch (CorruptIndexException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
	
	public void explain(int start, int end) {
		for (int i = start; i < Math.min(scoreDocs.length, end); i++) {
			try {
				System.out.println(searcher.explain(query, scoreDocs[i].doc));
			} catch (CorruptIndexException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
	
	public void close() {
		searcherHelper.closeAll();
		try {
			searcher.close();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

我們的一個目的是查看搜索是否是在多個索引目錄上進行檢索，並且最終相關度排序是基於多個索引計算的。上面給出了一個更接近測試用例的實現，iterateDocs()迭代出文檔並輸出，explain()方法查看得分計算明細。
下面給出搜索的測試用例，代碼如下所示：

package org.shirdrn.lucene;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.util.Version;

import junit.framework.TestCase;

public class TestMultipleSearching extends TestCase {

	MultipleSearching searcher;
	
	@Override
	protected void setUp() throws Exception {
		String indexRoot = "E:\\Store\\indexes";
		Analyzer a = new SmartChineseAnalyzer(Version.LUCENE_35, true);
		IndexWriterConfig indexConfig = new IndexWriterConfig(Version.LUCENE_35, a);
		searcher = new MultipleSearching(indexRoot, indexConfig);
	}
	
	@Override
	protected void tearDown() throws Exception {
		searcher.close();
	}
	
	public void testSearching() {
		searcher.search("+title:拉斯維加斯^1.25 (+content:美國^1.50 +content:拉斯維加斯)");
		searcher.iterateDocs(0, 10);
		searcher.explain(0, 5);
	}
}

搜索結果，迭代出來的文檔數據信息，如下所示：

2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:全新體驗 拉斯維加斯的完美24小時(組圖)(4)_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/world/2010-08-16/1400141443_4.shtml> stored,indexed,omitNorms<path:x>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:拉斯維加斯 觸摸你的奢侈底線(組圖)_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/world/2009-05-21/095684952.shtml> stored,indexed,omitNorms<path:v>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:美國拉斯維加斯地圖_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/world/2008-08-20/113317460.shtml> stored,indexed,omitNorms<path:a>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:美國拉斯維加斯：潮野水上樂園_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/world/2008-08-20/093217358.shtml> stored,indexed,omitNorms<path:e>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:美國拉斯維加斯：米高梅歷險遊樂園_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/world/2008-08-20/095617381.shtml> stored,indexed,omitNorms<path:k>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:美國拉斯維加斯主要景點_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/world/2008-08-20/114817479.shtml> stored,indexed,omitNorms<path:m>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:娛樂之都拉斯維加斯宣佈在中國推旅遊市場新戰略_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/news/2008-11-19/094337435.shtml> stored,indexed,omitNorms<path:j>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:美國拉斯維加斯簡介_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/world/2008-08-19/160017116.shtml> stored,indexed,omitNorms<path:v>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:拉斯維加斯“貓王模仿秀”亮相國際旅交會_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/news/2009-11-23/1004116788.shtml> stored,indexed,omitNorms<path:j>>
2011-12-13 12:12:55 org.shirdrn.lucene.MultipleSearching iterateDocs
信息: Document<stored,indexed,tokenized,omitNorms<title:10大美食家的饕餮名城：拉斯維加斯(圖)(4)_新浪旅遊_新浪網> stored,indexed,omitNorms<url:http://travel.sina.com.cn/food/2009-01-16/090855088.shtml> stored,indexed,omitNorms<path:s>>

通過path可以看到，搜索結果是將多個索引目錄下的結果進行了聚合。

下面是搜索結果相關度得分情況，如下所示：

7.3240967 = (MATCH) sum of:
  6.216492 = (MATCH) weight(title:拉斯維加斯^1.25 in 616), product of:
    0.7747233 = queryWeight(title:拉斯維加斯^1.25), product of:
      1.25 = boost
      8.024145 = idf(docFreq=82, maxDocs=93245)
      0.07723921 = queryNorm
    8.024145 = (MATCH) fieldWeight(title:拉斯維加斯 in 616), product of:
      1.0 = tf(termFreq(title:拉斯維加斯)=1)
      8.024145 = idf(docFreq=82, maxDocs=93245)
      1.0 = fieldNorm(field=title, doc=616)
  1.1076047 = (MATCH) sum of:
    0.24895692 = (MATCH) weight(content:美國^1.5 in 616), product of:
      0.39020002 = queryWeight(content:美國^1.5), product of:
        1.5 = boost
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.07723921 = queryNorm
      0.63802385 = (MATCH) fieldWeight(content:美國 in 616), product of:
        1.7320508 = tf(termFreq(content:美國)=3)
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.109375 = fieldNorm(field=content, doc=616)
    0.8586478 = (MATCH) weight(content:拉斯維加斯 in 616), product of:
      0.49754182 = queryWeight(content:拉斯維加斯), product of:
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.07723921 = queryNorm
      1.7257802 = (MATCH) fieldWeight(content:拉斯維加斯 in 616), product of:
        2.4494898 = tf(termFreq(content:拉斯維加斯)=6)
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.109375 = fieldNorm(field=content, doc=616)

7.2405667 = (MATCH) sum of:
  6.216492 = (MATCH) weight(title:拉斯維加斯^1.25 in 2850), product of:
    0.7747233 = queryWeight(title:拉斯維加斯^1.25), product of:
      1.25 = boost
      8.024145 = idf(docFreq=82, maxDocs=93245)
      0.07723921 = queryNorm
    8.024145 = (MATCH) fieldWeight(title:拉斯維加斯 in 2850), product of:
      1.0 = tf(termFreq(title:拉斯維加斯)=1)
      8.024145 = idf(docFreq=82, maxDocs=93245)
      1.0 = fieldNorm(field=title, doc=2850)
  1.0240744 = (MATCH) sum of:
    0.17423354 = (MATCH) weight(content:美國^1.5 in 2850), product of:
      0.39020002 = queryWeight(content:美國^1.5), product of:
        1.5 = boost
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.07723921 = queryNorm
      0.44652367 = (MATCH) fieldWeight(content:美國 in 2850), product of:
        1.4142135 = tf(termFreq(content:美國)=2)
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.09375 = fieldNorm(field=content, doc=2850)
    0.8498409 = (MATCH) weight(content:拉斯維加斯 in 2850), product of:
      0.49754182 = queryWeight(content:拉斯維加斯), product of:
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.07723921 = queryNorm
      1.7080793 = (MATCH) fieldWeight(content:拉斯維加斯 in 2850), product of:
        2.828427 = tf(termFreq(content:拉斯維加斯)=8)
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.09375 = fieldNorm(field=content, doc=2850)

7.1128473 = (MATCH) sum of:
  6.216492 = (MATCH) weight(title:拉斯維加斯^1.25 in 63), product of:
    0.7747233 = queryWeight(title:拉斯維加斯^1.25), product of:
      1.25 = boost
      8.024145 = idf(docFreq=82, maxDocs=93245)
      0.07723921 = queryNorm
    8.024145 = (MATCH) fieldWeight(title:拉斯維加斯 in 63), product of:
      1.0 = tf(termFreq(title:拉斯維加斯)=1)
      8.024145 = idf(docFreq=82, maxDocs=93245)
      1.0 = fieldNorm(field=title, doc=63)
  0.896355 = (MATCH) sum of:
    0.1451946 = (MATCH) weight(content:美國^1.5 in 63), product of:
      0.39020002 = queryWeight(content:美國^1.5), product of:
        1.5 = boost
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.07723921 = queryNorm
      0.37210304 = (MATCH) fieldWeight(content:美國 in 63), product of:
        1.4142135 = tf(termFreq(content:美國)=2)
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.078125 = fieldNorm(field=content, doc=63)
    0.7511604 = (MATCH) weight(content:拉斯維加斯 in 63), product of:
      0.49754182 = queryWeight(content:拉斯維加斯), product of:
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.07723921 = queryNorm
      1.5097432 = (MATCH) fieldWeight(content:拉斯維加斯 in 63), product of:
        3.0 = tf(termFreq(content:拉斯維加斯)=9)
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.078125 = fieldNorm(field=content, doc=63)

7.1128473 = (MATCH) sum of:
  6.216492 = (MATCH) weight(title:拉斯維加斯^1.25 in 2910), product of:
    0.7747233 = queryWeight(title:拉斯維加斯^1.25), product of:
      1.25 = boost
      8.024145 = idf(docFreq=82, maxDocs=93245)
      0.07723921 = queryNorm
    8.024145 = (MATCH) fieldWeight(title:拉斯維加斯 in 2910), product of:
      1.0 = tf(termFreq(title:拉斯維加斯)=1)
      8.024145 = idf(docFreq=82, maxDocs=93245)
      1.0 = fieldNorm(field=title, doc=2910)
  0.896355 = (MATCH) sum of:
    0.1451946 = (MATCH) weight(content:美國^1.5 in 2910), product of:
      0.39020002 = queryWeight(content:美國^1.5), product of:
        1.5 = boost
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.07723921 = queryNorm
      0.37210304 = (MATCH) fieldWeight(content:美國 in 2910), product of:
        1.4142135 = tf(termFreq(content:美國)=2)
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.078125 = fieldNorm(field=content, doc=2910)
    0.7511604 = (MATCH) weight(content:拉斯維加斯 in 2910), product of:
      0.49754182 = queryWeight(content:拉斯維加斯), product of:
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.07723921 = queryNorm
      1.5097432 = (MATCH) fieldWeight(content:拉斯維加斯 in 2910), product of:
        3.0 = tf(termFreq(content:拉斯維加斯)=9)
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.078125 = fieldNorm(field=content, doc=2910)

7.1128473 = (MATCH) sum of:
  6.216492 = (MATCH) weight(title:拉斯維加斯^1.25 in 2920), product of:
    0.7747233 = queryWeight(title:拉斯維加斯^1.25), product of:
      1.25 = boost
      8.024145 = idf(docFreq=82, maxDocs=93245)
      0.07723921 = queryNorm
    8.024145 = (MATCH) fieldWeight(title:拉斯維加斯 in 2920), product of:
      1.0 = tf(termFreq(title:拉斯維加斯)=1)
      8.024145 = idf(docFreq=82, maxDocs=93245)
      1.0 = fieldNorm(field=title, doc=2920)
  0.896355 = (MATCH) sum of:
    0.1451946 = (MATCH) weight(content:美國^1.5 in 2920), product of:
      0.39020002 = queryWeight(content:美國^1.5), product of:
        1.5 = boost
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.07723921 = queryNorm
      0.37210304 = (MATCH) fieldWeight(content:美國 in 2920), product of:
        1.4142135 = tf(termFreq(content:美國)=2)
        3.3678925 = idf(docFreq=8734, maxDocs=93245)
        0.078125 = fieldNorm(field=content, doc=2920)
    0.7511604 = (MATCH) weight(content:拉斯維加斯 in 2920), product of:
      0.49754182 = queryWeight(content:拉斯維加斯), product of:
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.07723921 = queryNorm
      1.5097432 = (MATCH) fieldWeight(content:拉斯維加斯 in 2920), product of:
        3.0 = tf(termFreq(content:拉斯維加斯)=9)
        6.4415708 = idf(docFreq=403, maxDocs=93245)
        0.078125 = fieldNorm(field=content, doc=2920)

可見，搜索結果相關度得分，是基於全部的多個索引來計算的（maxDocs=93245）。

基於Lucene多索引進行索引和搜索

索引目錄處理

索引實現

搜索實現

RHEL 5下安裝Scrapy-0.14.0.2841爬蟲框架

開發更新Solr索引的工具

Solr實現Low Level查詢解析（QParser）

JMX技術基礎

基於Solr 3.5搭建搜索服務器

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結