Lucene(一)

Lucene

目錄

Lucene


什麼是Lucene

Lucene是一個開放源代碼的全文檢索引擎工具包,但它不是一個完整的全文檢索引擎,而是一個全文檢索引擎的架構,提供了完整的查詢引擎和索引引擎,部分文本分析引擎(英文與德文兩種西方語言)。

索引和搜索流程圖

1、綠色表示索引過程,對要搜索的原始內容進行索引構建一個索引庫,索引過程包括:

           確定原始內容即要搜索的內容à獲得文檔à創建文檔à分析文檔à索引文檔

2、紅色表示搜索過程,從索引庫中搜索內容,搜索過程包括:

           用戶通過搜索界面à創建查詢à執行搜索,從索引庫搜索à渲染搜索結果

採集數據

從互聯網上、數據庫、文件系統中等獲取需要搜索的原始信息,這個過程就是信息採集,採集數據的目的是爲了對原始內容進行索引。

採集數據分類:

1、對於互聯網上網頁,可以使用工具將網頁抓取到本地生成html文件。

2、數據庫中的數據,可以直接連接數據庫讀取表中的數據。

3、文件系統中的某個文件,可以通過I/O操作讀取文件的內容。

在Internet上採集信息的軟件通常稱爲爬蟲或蜘蛛,也稱爲網絡機器人,爬蟲訪問互聯網上的每一個網頁,將獲取到的網頁內容存儲起來。

Lucene不提供信息採集的類庫,需要自己編寫一個爬蟲程序實現信息採集,也可以通過一些開源軟件實現信息採集,如下:

Solr(http://lucene.apache.org/solr) ,solr是apache的一個子項目,支持從關係數據庫、xml文檔中提取原始數據。

Nutch(http://lucene.apache.org/nutch), Nutch是apache的一個子項目,包括大規模爬蟲工具,能夠抓取和分辨web網站數據。

jsoup(http://jsoup.org/ ),jsoup 是一款Java 的HTML解析器,可直接解析某個URL地址、HTML文本內容。它提供了一套非常省力的API,可通過DOM,CSS以及類似於jQuery的操作方法來取出和操作數據。


/*
sql
MySQL - 5.1.72-community 
*********************************************************************
*/
/*!40101 SET NAMES utf8 */;

create table `book` (
	`id` int (11),
	`name` varchar (192),
	`price` float ,
	`pic` varchar (96),
	`description` text 
); 
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('1','java 編程思想','71.5','23488292934.jpg','作者簡介  Bruce Eckel,是MindView公司的總裁,該公司向客戶提供軟件諮詢和培訓。他是C++標準委員會擁有表決權的成員之一,擁有應用物理學學士和計算機工程碩士學位。除本書外,他還是《C++編程思想》的作者,並與人合著了《C++編程思想第2卷》。\r\n\r\n《計算機科學叢書:Java編程思想(第4版)》贏得了全球程序員的廣泛讚譽,即使是最晦澀的概念,在BruceEckel的文字親和力和小而直接的編程示例面前也會化解於無形。從Java的基礎語法到最高級特性(深入的面向對象概念、多線程、自動項目構建、單元測試和調試等),本書都能逐步指導你輕鬆掌握。\r\n  從《計算機科學叢書:Java編程思想(第4版)》獲得的各項大獎以及來自世界各地的讀者評論中,不難看出這是一本經典之作。本書的作者擁有多年教學經驗,對C、C++以及Java語言都有獨到、深入的見解,以通俗易懂及小而直接的示例解釋了一個個晦澀抽象的概念。本書共22章,包括操作符、控制執行流程、訪問權限控制、複用類、多態、接口、通過異常處理錯誤、字符串、泛型、數組、容器深入研究、JavaI/O系統、枚舉類型、併發以及圖形化用戶界面等內容。這些豐富的內容,包含了Java語言基礎語法以及高級特性,適合各個層次的Java程序員閱讀,同時也是高等院校講授面向對象程序設計語言以及Java語言的絕佳教材和參考書。\r\n  《計算機科學叢書:Java編程思想(第4版)》特點:\r\n  適合初學者與專業人員的經典的面向對象敘述方式,爲更新的JavaSE5/6增加了新的示例和章節。\r\n  測驗框架顯示程序輸出。\r\n  設計模式貫穿於衆多示例中:適配器、橋接器、職責鏈、命令、裝飾器、外觀、工廠方法、享元、點名、數據傳輸對象、空對象、代理、單例、狀態、策略、模板方法以及訪問者。\r\n  爲數據傳輸引入了XML,爲用戶界面引入了SWT和Flash。\r\n  重新撰寫了有關併發的章節,有助於讀者掌握線程的相關知識。\r\n  專門爲第4版以及JavaSE5/6重寫了700多個編譯文件中的500多個程序。\r\n  支持網站包含了所有源代碼、帶註解的解決方案指南、網絡日誌以及多媒體學習資料。\r\n  覆蓋了所有基礎知識,同時論述了高級特性。\r\n  詳細地闡述了面向對象原理。\r\n  在線可獲得Java講座CD,其中包含BruceEckel的全部多媒體講座。\r\n  在網站上可以觀看現場講座、諮詢和評論。\r\n  專門爲第4版以及JavaSE5/6重寫了700多個編譯文件中的500多個程序。\r\n  支持網站包含了所有源代碼、帶註解的解決方案指南、網絡日誌以及多媒體學習資料。\r\n  覆蓋了所有基礎知識,同時論述了高級特性。\r\n  詳細地闡述了面向對象原理。\r\n\r\n\r\n');
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('2','apache lucene','66.0','77373773737.jpg','lucene是apache的開源項目,是一個全文檢索的工具包。\r\n# Apache Lucene README file\r\n\r\n## Introduction\r\n\r\nLucene is a Java full-text search engine.  Lucene is not a complete\r\napplication, but rather a code library and API that can easily be used\r\nto add search capabilities to applications.\r\n\r\n * The Lucene web site is at: http://lucene.apache.org/\r\n * Please join the Lucene-User mailing list by sending a message to:\r\n   [email protected]\r\n\r\n## Files in a binary distribution\r\n\r\nFiles are organized by module, for example in core/:\r\n\r\n* `core/lucene-core-XX.jar`:\r\n  The compiled core Lucene library.\r\n\r\nTo review the documentation, read the main documentation page, located at:\r\n`docs/index.html`\r\n\r\nTo build Lucene or its documentation for a source distribution, see BUILD.txt');
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('3','mybatis','55.0','88272828282.jpg','MyBatis介紹\r\n\r\nMyBatis 本是apache的一個開源項目iBatis, 2010年這個項目由apache software foundation 遷移到了google code,並且改名爲MyBatis。 \r\nMyBatis是一個優秀的持久層框架,它對jdbc的操作數據庫的過程進行封裝,使開發者只需要關注 SQL 本身,而不需要花費精力去處理例如註冊驅動、創建connection、創建statement、手動設置參數、結果集檢索等jdbc繁雜的過程代碼。\r\nMybatis通過xml或註解的方式將要執行的statement配置起來,並通過java對象和statement中的sql進行映射生成最終執行的sql語句,最後由mybatis框架執行sql並將結果映射成java對象並返回。\r\n');
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('4','spring','56.0','83938383222.jpg','## Spring Framework\r\nspringmvc.txt\r\nThe Spring Framework provides a comprehensive programming and configuration model for modern\r\nJava-based enterprise applications - on any kind of deployment platform. A key element of Spring is\r\ninfrastructural support at the application level: Spring focuses on the \"plumbing\" of enterprise\r\napplications so that teams can focus on application-level business logic, without unnecessary ties\r\nto specific deployment environments.\r\n\r\nThe framework also serves as the foundation for\r\n[Spring Integration](https://github.com/SpringSource/spring-integration),\r\n[Spring Batch](https://github.com/SpringSource/spring-batch) and the rest of the Spring\r\n[family of projects](http://springsource.org/projects). Browse the repositories under the\r\n[SpringSource organization](https://github.com/SpringSource) on GitHub for a full list.\r\n\r\n[.NET](https://github.com/SpringSource/spring-net) and\r\n[Python](https://github.com/SpringSource/spring-python) variants are available as well.\r\n\r\n## Downloading artifacts\r\nInstructions on\r\n[downloading Spring artifacts](https://github.com/SpringSource/spring-framework/wiki/Downloading-Spring-artifacts)\r\nvia Maven and other build systems are available via the project wiki.\r\n\r\n## Documentation\r\nSee the current [Javadoc](http://static.springsource.org/spring-framework/docs/current/api)\r\nand [Reference docs](http://static.springsource.org/spring-framework/docs/current/reference).\r\n\r\n## Getting support\r\nCheck out the [Spring forums](http://forum.springsource.org) and the\r\n[Spring tag](http://stackoverflow.com/questions/tagged/spring) on StackOverflow.\r\n[Commercial support](http://springsource.com/support/springsupport) is available too.\r\n\r\n## Issue Tracking\r\nSpring\'s JIRA issue tracker can be found [here](http://jira.springsource.org/browse/SPR). Think\r\nyou\'ve found a bug? Please consider submitting a reproduction project via the\r\n[spring-framework-issues](https://github.com/springsource/spring-framework-issues) repository. The\r\n[readme](https://github.com/springsource/spring-framework-issues#readme) provides simple\r\nstep-by-step instructions.\r\n\r\n## Building from source\r\nInstructions on\r\n[building Spring from source](https://github.com/SpringSource/spring-framework/wiki/Building-from-source)\r\nare available via the project wiki.\r\n\r\n## Contributing\r\n[Pull requests](http://help.github.com/send-pull-requests) are welcome; you\'ll be asked to sign our\r\ncontributor license agreement ([CLA](https://support.springsource.com/spring_committer_signup)).\r\nTrivial changes like typo fixes are especially appreciated (just\r\n[fork and edit!](https://github.com/blog/844-forking-with-the-edit-button)). For larger changes,\r\nplease search through JIRA for similiar issues, creating a new one if necessary, and discuss your\r\nideas with the Spring team.\r\n\r\n## Staying in touch\r\nFollow [@springframework](http://twitter.com/springframework) and its\r\n[team members](http://twitter.com/springframework/team/members) on Twitter. In-depth articles can be\r\nfound at the SpringSource [team blog](http://blog.springsource.org), and releases are announced via\r\nour [news feed](http://www.springsource.org/news-events).\r\n\r\n## License\r\nThe Spring Framework is released under version 2.0 of the\r\n[Apache License](http://www.apache.org/licenses/LICENSE-2.0).\r\n');
insert into `book` (`id`, `name`, `price`, `pic`, `description`) values('5','solr','78.0','99999229292.jpg','solr是一個全文檢索服務\r\n# Licensed to the Apache Software Foundation (ASF) under one or more\r\n# contributor license agreements.  See the NOTICE file distributed with\r\n# this work for additional information regarding copyright ownership.\r\n# The ASF licenses this file to You under the Apache License, Version 2.0\r\n# (the \"License\"); you may not use this file except in compliance with\r\n# the License.  You may obtain a copy of the License at\r\n#\r\n#     http://www.apache.org/licenses/LICENSE-2.0\r\n#\r\n# Unless required by applicable law or agreed to in writing, software\r\n# distributed under the License is distributed on an \"AS IS\" BASIS,\r\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\r\n# See the License for the specific language governing permissions and\r\n# limitations under the License.\r\n\r\n\r\nWelcome to the Apache Solr project!\r\n-----------------------------------\r\n\r\nSolr is the popular, blazing fast open source enterprise search platform\r\nfrom the Apache Lucene project.\r\n\r\nFor a complete description of the Solr project, team composition, source\r\ncode repositories, and other details, please see the Solr web site at\r\nhttp://lucene.apache.org/solr\r\n\r\n\r\nGetting Started\r\n---------------\r\n\r\nSee the \"example\" directory for an example Solr setup.  A tutorial\r\nusing the example setup can be found at\r\n   http://lucene.apache.org/solr/tutorial.html\r\nor linked from \"docs/index.html\" in a binary distribution.\r\nAlso, there are Solr clients for many programming languages, see \r\n   http://wiki.apache.org/solr/IntegratingSolr\r\n\r\n\r\nFiles included in an Apache Solr binary distribution\r\n----------------------------------------------------\r\n\r\nexample/\r\n  A self-contained example Solr instance, complete with a sample\r\n  configuration, documents to index, and the Jetty Servlet container.\r\n  Please see example/README.txt for information about running this\r\n  example.\r\n\r\ndist/solr-XX.war\r\n  The Apache Solr Application.  Deploy this WAR file to any servlet\r\n  container to run Apache Solr.\r\n\r\ndist/solr-<component>-XX.jar\r\n  The Apache Solr libraries.  To compile Apache Solr Plugins,\r\n  one or more of these will be required.  The core library is\r\n  required at a minimum. (see http://wiki.apache.org/solr/SolrPlugins\r\n  for more information).\r\n\r\ndocs/index.html\r\n  The Apache Solr Javadoc API documentation and Tutorial\r\n\r\n\r\nInstructions for Building Apache Solr from Source\r\n-------------------------------------------------\r\n\r\n1. Download the Java SE 7 JDK (Java Development Kit) or later from http://java.sun.com/\r\n   You will need the JDK installed, and the $JAVA_HOME/bin (Windows: %JAVA_HOME%\\bin) \r\n   folder included on your command path. To test this, issue a \"java -version\" command \r\n   from your shell (command prompt) and verify that the Java version is 1.7 or later.\r\n\r\n2. Download the Apache Ant binary distribution (1.8.2+) from \r\n   http://ant.apache.org/  You will need Ant installed and the $ANT_HOME/bin (Windows: \r\n   %ANT_HOME%\\bin) folder included on your command path. To test this, issue a \r\n   \"ant -version\" command from your shell (command prompt) and verify that Ant is \r\n   available. \r\n\r\n   You will also need to install Apache Ivy binary distribution (2.2.0) from \r\n   http://ant.apache.org/ivy/ and place ivy-2.2.0.jar file in ~/.ant/lib -- if you skip \r\n   this step, the Solr build system will offer to do it for you.\r\n\r\n3. Download the Apache Solr distribution, linked from the above web site. \r\n   Unzip the distribution to a folder of your choice, e.g. C:\\solr or ~/solr\r\n   Alternately, you can obtain a copy of the latest Apache Solr source code\r\n   directly from the Subversion repository:\r\n\r\n     http://lucene.apache.org/solr/versioncontrol.html\r\n\r\n4. Navigate to the \"solr\" folder and issue an \"ant\" command to see the available options\r\n   for building, testing, and packaging Solr.\r\n  \r\n   NOTE: \r\n   To see Solr in action, you may want to use the \"ant example\" command to build\r\n   and package Solr into the example/webapps directory. See also example/README.txt.\r\n\r\n\r\nExport control\r\n-------------------------------------------------\r\nThis distribution includes cryptographic software.  The country in\r\nwhich you currently reside may have restrictions on the import,\r\npossession, use, and/or re-export to another country, of\r\nencryption software.  BEFORE using any encryption software, please\r\ncheck your country\'s laws, regulations and policies concerning the\r\nimport, possession, or use, and re-export of encryption software, to\r\nsee if this is permitted.  See <http://www.wassenaar.org/> for more\r\ninformation.\r\n\r\nThe U.S. Government Department of Commerce, Bureau of Industry and\r\nSecurity (BIS), has classified this software as Export Commodity\r\nControl Number (ECCN) 5D002.C.1, which includes information security\r\nsoftware using or performing cryptographic functions with asymmetric\r\nalgorithms.  The form and manner of this Apache Software Foundation\r\ndistribution makes it eligible for export under the License Exception\r\nENC Technology Software Unrestricted (TSU) exception (see the BIS\r\nExport Administration Regulations, Section 740.13) for both object\r\ncode and source code.\r\n\r\nThe following provides more details on the included cryptographic\r\nsoftware:\r\n    Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for\r\n    extracting text content and metadata from encrypted PDF files.\r\n    See http://www.bouncycastle.org/ for more details on Bouncy Castle.\r\n');

代碼實現索引流程

/*
 *創建java bean 
 */
public class Book {
	// 圖書ID
	private Integer id;
	// 圖書名稱
	private String name;
	// 圖書價格
	private Float price;
	// 圖書圖片
	private String pic;
	// 圖書描述
	private String desc;
get/set。。。
}
/*
 *創建dao接口
 */
public interface BookDao {

	/**
	 * 查詢所有的book數據
	 * 
	 * @return
	 */
	List<Book> queryBookList();
}
/*
 *創建接口實現類
 */
public class BookDaoImpl implements BookDao {

	@Override
	public List<Book> queryBookList() {
		// 數據庫鏈接
		Connection connection = null;
		// 預編譯statement
		PreparedStatement preparedStatement = null;
		// 結果集
		ResultSet resultSet = null;
		// 圖書列表
		List<Book> list = new ArrayList<Book>();

		try {
			// 加載數據庫驅動
			Class.forName("com.mysql.jdbc.Driver");
			// 連接數據庫
			connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/solrTest", "root", "root");

			// SQL語句
			String sql = "SELECT * FROM book";
			// 創建preparedStatement
			preparedStatement = connection.prepareStatement(sql);
			// 獲取結果集
			resultSet = preparedStatement.executeQuery();
			// 結果集解析
			while (resultSet.next()) {
				Book book = new Book();
				book.setId(resultSet.getInt("id"));
				book.setName(resultSet.getString("name"));
				book.setPrice(resultSet.getFloat("price"));
				book.setPic(resultSet.getString("pic"));
				book.setDesc(resultSet.getString("desc"));
				list.add(book);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
		
		return list;
	}
}
/*
 * 實現索引流程
 * 1.採集數據
 * 2.創建Document文檔對象
 * 3.創建分析器(分詞器)
 * 4.創建IndexWriterConfig配置信息類
 * 5.創建Directory對象,聲明索引庫存儲位置
 * 6.創建IndexWriter寫入對象
 * 7.把Document寫入到索引庫中
 * 8.釋放資源
 */
public class CreateIndexTest {
	@Test
	public void testCreateIndex() throws Exception {
		// 1. 採集數據
		BookDao bookDao = new BookDaoImpl();
		List<Book> bookList = bookDao.queryBookList();

		// 2. 創建Document文檔對象
		List<Document> documents = new ArrayList<>();
		for (Book book : bookList) {
			Document document = new Document();

			// Document文檔中添加Field域

            // 圖書Id
            // Store.YES:表示存儲到文檔域中
            // 不分詞,不索引,儲存
                document.add(new StoredField("id", book.getId().toString()));
            // 圖書名稱
            // 分詞,索引,儲存
                document.add(new TextField("name", book.getName().toString(), Store.YES));
            // 圖書價格
            // 分詞,索引,儲存
                document.add(new FloatField("price", book.getPrice(), Store.YES));
            // 圖書圖片地址
            // 不分詞,不索引,儲存
                document.add(new StoredField("pic", book.getPic().toString()));
            // 圖書描述
            // 分詞,索引,不儲存
                document.add(new TextField("desc", book.getDesc().toString(), Store.NO));

			// 把Document放到list中
			documents.add(document);
		}

		// 3. 創建Analyzer分詞器,分析文檔,對文檔進行分詞
		Analyzer analyzer = new StandardAnalyzer();

		// 4. 創建Directory對象,聲明索引庫存放的位置
		Directory directory = FSDirectory.open(new File("D:/lucene/index"));

		// 5. 創建IndexWriteConfig對象,寫入索引需要的配置
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);

		// 6.創建IndexWriter寫入對象
		IndexWriter indexWriter = new IndexWriter(directory, config);

		// 7.寫入到索引庫,通過IndexWriter添加文檔對象document
		for (Document doc : documents) {
			indexWriter.addDocument(doc);
		}

		// 8.釋放資源
		indexWriter.close();
	}

使用Luke查看索引

Luke作爲Lucene工具包中的一個工具(http://www.getopt.org/luke/),可以通過界面來進行索引文件的查詢、修改

Field常用類型

搜索分詞

對用戶輸入的關鍵字進行分詞,一般情況索引和搜索使用的分詞器一致

IndexSearcher搜索方法如下:

indexSearcher.search(query, n)

根據Query搜索,返回評分最高的n條記錄

indexSearcher.search(query, filter, n)

根據Query搜索,添加過濾策略,返回評分最高的n條記錄

indexSearcher.search(query, n, sort)

根據Query搜索,添加排序策略,返回評分最高的n條記錄

indexSearcher.search(booleanQuery, filter, n, sort)

根據Query搜索,添加過濾策略,添加排序策略,返回評分最高的n條記錄

public class SearchIndexTest {
	@Test
	public void testSearchIndex() throws Exception {
		// 1. 創建Query搜索對象
		// 創建分詞器
		Analyzer analyzer = new StandardAnalyzer();
		// 創建搜索解析器,第一個參數:默認Field域,第二個參數:分詞器
		QueryParser queryParser = new QueryParser("desc", analyzer);

		// 創建搜索對象
		Query query = queryParser.parse("desc:java AND lucene");

		// 2. 創建Directory流對象,聲明索引庫位置
		Directory directory = FSDirectory.open(new File("C:/itcast/lucene/index"));

		// 3. 創建索引讀取對象IndexReader
		IndexReader reader = DirectoryReader.open(directory);

		// 4. 創建索引搜索對象
		IndexSearcher searcher = new IndexSearcher(reader);

		// 5. 使用索引搜索對象,執行搜索,返回結果集TopDocs
		// 第一個參數:搜索對象,第二個參數:返回的數據條數,指定查詢結果最頂部的n條數據返回
		TopDocs topDocs = searcher.search(query, 10);
		System.out.println("查詢到的數據總條數是:" + topDocs.totalHits);
		// 獲取查詢結果集
		ScoreDoc[] docs = topDocs.scoreDocs;

		// 6. 解析結果集
		for (ScoreDoc scoreDoc : docs) {
			// 獲取文檔
			int docID = scoreDoc.doc;
			Document doc = searcher.doc(docID);

			System.out.println("=============================");
			System.out.println("docID:" + docID);
			System.out.println("bookId:" + doc.get("id"));
			System.out.println("name:" + doc.get("name"));
			System.out.println("price:" + doc.get("price"));
			System.out.println("pic:" + doc.get("pic"));
			// System.out.println("desc:" + doc.get("desc"));
		}
		// 7. 釋放資源
		reader.close();
	}
}

分詞器

在對Docuemnt中的內容進行索引之前,需要使用分詞器進行分詞 ,分詞的目的是爲了搜索。分詞的主要過程就是先分詞後過濾。

分詞:採集到的數據會存儲到document對象的Field域中,分詞就是將Document中Field的value值切分成一個一個的詞。

過濾:包括去除標點符號過濾、去除停用詞過濾(的、是、a、an、the等)、大寫轉小寫、詞的形還原(複數形式轉成單數形參、過去式轉成現在式。。。)等。 

第三方中文分詞器

lucene自帶中文分詞器無法滿足開發需求,故用第三方分詞器IK-analyzer: 最新版在https://code.google.com/p/ik-analyzer/上,支持Lucene 4.10從2006年12月推出1.0版開始, IKAnalyzer已經推出了4個大版本。最初,它是以開源項目Luence爲應用主體的,結合詞典分詞和文法分析算法的中文分詞組件。從3.0版本開 始,IK發展爲面向Java的公用分詞組件,獨立於Lucene項目,同時提供了對Lucene的默認優化實現。

<!-- IKAnalyzer.cfg.xml配置文件 -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
	<comment>IK Analyzer 擴展配置</comment>
	<!--用戶可以在這裏配置自己的擴展字典 	-->
	<entry key="ext_dict">ext.dic;</entry> 

	<!--用戶可以在這裏配置自己的擴展停止詞字典-->
	<entry key="ext_stopwords">stopword.dic;</entry> 
	
</properties>

索引維護

刪除索引

@Test
public void testIndexDelete() throws Exception {
	// 創建Directory流對象
	Directory directory = FSDirectory.open(new File("d:/lucene/index"));
	IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, null);
	// 創建寫入對象
	IndexWriter indexWriter = new IndexWriter(directory, config);

	// 根據Term刪除索引庫,name:java
	indexWriter.deleteDocuments(new Term("name", "java"));
        //刪除全部索引(慎用)
        //indexWriter.deleteAll();
	
	// 釋放資源
	indexWriter.close();
}

修改索引

@Test
public void testIndexUpdate() throws Exception {
	// 創建分詞器
	Analyzer analyzer = new IKAnalyzer();
	// 創建Directory流對象
	Directory directory = FSDirectory.open(new File("d:/lucene/index"));
	IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
	// 創建寫入對象
	IndexWriter indexWriter = new IndexWriter(directory, config);

	// 創建Document
	Document document = new Document();
	document.add(new TextField("id", "1002", Store.YES));
	document.add(new TextField("name", "lucene測試test 002", Store.YES));

	// 執行更新,會把所有符合條件的Document刪除,再新增。
	indexWriter.updateDocument(new Term("name", "test"), document);

	// 釋放資源
	indexWriter.close();
}

查詢索引

對要搜索的信息創建Query查詢對象,Lucene會根據Query查詢對象生成最終的查詢語法。類似關係數據庫Sql語法一樣,Lucene也有自己的查詢語法,比如:“name:lucene”表示查詢名字爲name的Field域中的“lucene”的文檔信息。

可通過兩種方法創建查詢對象:

1)使用Lucene提供Query子類

Query是一個抽象類,lucene提供了很多查詢對象,比如TermQuery項精確查詢,NumericRangeQuery數字範圍查詢等。

	Query query = new TermQuery(new Term("name", "lucene"));

2)使用QueryParse解析查詢表達式

QueryParser會將用戶輸入的查詢表達式解析成Query對象實例。

	QueryParser queryParser = new QueryParser("name", new IKAnalyzer());
	Query query = queryParser.parse("name:lucene");

通過Query子類搜索

TermQuery詞項查詢,TermQuery不使用分析器,搜索關鍵詞進行精確匹配Field域中的詞,比如訂單號、分類ID號等。 Where name =思念Spring,搜索對象創建:

//搜索對象創建:
@Test
public void testSearchTermQuery() throws Exception {
	// 創建TermQuery搜索對象
	Query query = new TermQuery(new Term("name", "lucene"));

	doSearch(query);
}
//抽取搜索邏輯:
private void doSearch(Query query) throws IOException {
	// 2. 執行搜索,返回結果集
	// 創建Directory流對象
	Directory directory = FSDirectory.open(new File("D:/lucene/index"));

	// 創建索引讀取對象IndexReader
	IndexReader reader = DirectoryReader.open(directory);

	// 創建索引搜索對象
	IndexSearcher searcher = new IndexSearcher(reader);

	// 使用索引搜索對象,執行搜索,返回結果集TopDocs
	// 第一個參數:搜索對象,第二個參數:返回的數據條數,指定查詢結果最頂部的n條數據返回
	TopDocs topDocs = searcher.search(query, 10);

	System.out.println("查詢到的數據總條數是:" + topDocs.totalHits);

	// 獲取查詢結果集
	ScoreDoc[] docs = topDocs.scoreDocs;

	// 解析結果集
	for (ScoreDoc scoreDoc : docs) {
		// 獲取文檔id
		int docID = scoreDoc.doc;
		Document doc = searcher.doc(docID);

		System.out.println("======================================");

		System.out.println("docID:" + docID);
		System.out.println("bookId:" + doc.get("id"));
		System.out.println("name:" + doc.get("name"));
		System.out.println("price:" + doc.get("price"));
		System.out.println("pic:" + doc.get("pic"));
		// System.out.println("desc:" + doc.get("desc"));
	}
	// 3. 釋放資源
	reader.close();
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章