hadoop和全文檢索的結合開始（更新中）

原創

2020-02-21 11:22

嘗試中，更新中。。

本文基於hadoop1.0和lucene4.4

/**
* @param indexFiles 需要索引的文件
* @throws Exception
*/
public void doIndex(String indexFiles) throws Exception {
    BufferedReader filereader = getBR(FILE.HDF,indexFiles); //獲取文件的buffer流
    if(filereader ==null ) return ;
    String row = null;
    while ((row = filereader.readLine()) != null) {
        Document document = genDoc(row); //每行對應生產一個document，自定義
        indexWriter.addDocument(document); //添加入IndexWriter
    }
    indexWriter.optimize(); //索引優化
    indexWriter.close(); //寫入
    filereader.close();
}

private BufferedReader getBR(FILE where,String indexFiles) throws IOException {
    switch (where)

   {
      case LOCAL: { //本地文件讀取
             return new BufferedReader(new FileReader(indexFiles));
              }

     case HDF: {//hdf文件獲取
           FileSystem dfs = FileSystem.get(config); //FileSystem獲取，需要獲取hadoop的configuration配置信息，其中主要的是期望獲取hdfs空間的地址
           FSDataInputStream fsin = dfs.open(new Path(indexFiles));
           return new BufferedReader(new InputStreamReader(fsin, "UTF-8")); //FSDataInputStream 是InputStream的繼承類
            }
    default:
        break;
   }
    return null;
}

public String doSearch(String searchField) throws Exception {
    if(searcher==null) {
        System.out.println("start search...");
        searcher = new IndexSearcher(rdir); //注意了，rdir是RAMDirectory噢，並且是當前jvm中建好索引的那個RAMDirectory 喲～～
        System.out.println("Total Documents = " + searcher.maxDoc()); //查看已經建立好的document數量
    }
    if(queryParser==null) queryParser = new QueryParser(Version.LUCENE_30, Field,
            new StandardAnalyzer(Version.LUCENE_30));  //Field用於自定義，指定你要查詢的那個字段

        Query query = queryParser.parse(searchField);  //查詢給定的值
        TopDocs hits = searcher.search(query, n);  //開始查詢，n用於自定義，表示你期望查詢的結果的條數
        System.out.println("Number of matching documents = " + hits.totalHits); //輸出實際查詢到的條數
        for(int i = 0; i<hits.totalHits;i++)
            ScoreDoc sdoc = hits.scoreDocs[i]; //獲取結果集
            Document doc = searcher.doc(sdoc.doc); //獲取對應的doc文件，如果需要的話
            handle(doc) ; //自定義，處理查詢出來的document
        }

}

sg_0504

發佈了39 篇原創文章 · 獲贊 7 · 訪問量 10萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hadoop和全文檢索的結合開始（更新中）

聚簇、非聚簇索引結構及B樹、位圖、散列索引

Solr4.4 docValues解析和性能測試

Android之自學筆記（一）

Android之自學筆記（二）

VirtualBox安裝64位系統CentOS會遇到的問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結