Java for Web學習筆記(一二四):搜索(6)Lucene和Hibernate search

Lucene

Lucene是很強大的檢索工具,Hibernate Search將lucene core和JPA/Hibernate ORM結合起來,當我們通過JPA添加或者修改數據時,自動在Lucene中index了entity,在檢索時採用lucene core搜索引起進行搜索,並返回JPA對象實體。

<dependency>
     <groupId>org.hibernate</groupId>
     <artifactId>hibernate-search-orm</artifactId>
     <version>5.9.1.Final</version>
</dependency>

設置Hibernate Search

在上下文的配置中:

@Bean
public LocalContainerEntityManagerFactoryBean entityManagerFactory() throws PropertyVetoException{
    Map<String, Object> properties = new Hashtable<>();
    properties.put("javax.persistence.schema-generation.database.action","none");
    /* 允許Hibernate ORM使用Hibernate Search。採用Lucene standalone (no Solr),並將索引保存在本地文件系統。
     * 當本war裏面的Hibernate ORM相關的數據庫寫時,將觸發Hibernate search對相關內容進行索引,寫入到文件中。
     * 這種方式不適用於Tomcat集羣的方式,如果採用Tomcat集羣,需要使用Solr server。*/
    properties.put("hibernate.search.default.directory_provider", "filesystem");
    /* 本例放在 ../searchIndexes,開發環境中爲eclipse的第一級目錄 */
    properties.put("hibernate.search.default.indexBase", "../searchIndexes");		
    properties.put("hibernate.show_sql", "true");
    properties.put("hibernate.dialect", "org.hibernate.dialect.MySQL5InnoDBDialect");

    LocalContainerEntityManagerFactoryBean factory = new LocalContainerEntityManagerFactoryBean();
    factory.setJpaVendorAdapter(new HibernateJpaVendorAdapter());
    factory.setDataSource(this.springJpaDataSource());
    factory.setPackagesToScan("cn.wei.flowingflying.chapter23.entities");
    factory.setSharedCacheMode(SharedCacheMode.ENABLE_SELECTIVE);
    factory.setValidationMode(ValidationMode.NONE);
    factory.setJpaPropertyMap(properties);

    return factory;
}

小例子有關的Entity數據和小例子目的

這是數據庫表UserPrincipal_23,映射爲entity User:
mysql> select * from UserPrincipal_23;
+--------+----------+
| UserId | Username |
+--------+----------+
|      4 | John     |
|      3 | Mike     |
|      1 | Nicholas |
|      2 | Sarah    |
+--------+----------+

表格Post_23,映射爲entity Post:

mysql> select * from Post_23;
+--------+--------+----------------+--------------------------------------+------------+
| PostId | UserId | Title          | Body                                 | Keywords   |
+--------+--------+----------------+--------------------------------------+------------+
|      1 |      3 | Title One      | Test One. Hello world! Java!         | one java   |
|      2 |      1 | Title Two      | Hello, my friend! This is title two. | two friend |
|      3 |      1 | Hello Nicholas | My name is Nicholas! Hi, Nicholas    | Nicholas   |
+--------+--------+----------------+--------------------------------------+------------+

mysql> select Post_23.*,UserPrincipal_23.Username from Post_23 left join UserPrincipal_23 on Post_23.UserId = UserPrincipal_23.UserId;
+--------+--------+----------------+--------------------------------------+------------+----------+
| PostId | UserId | Title          | Body                                 | Keywords   | Username |
+--------+--------+----------------+--------------------------------------+------------+----------+
|      1 |      3 | Title One      | Test One. Hello world! Java!         | one java   | Mike     |
|      2 |      1 | Title Two      | Hello, my friend! This is title two. | two friend | Nicholas |
|      3 |      1 | Hello Nicholas | My name is Nicholas! Hi, Nicholas    | Nicholas   | Nicholas |
+--------+--------+----------------+--------------------------------------+------------+----------+

小例子搜索title,body,keywords和username。

小例子的search是兩個表join,對於ORM,這裏採用@ManyToOne,將在後面學習。對於Lucene,採用Hibernate search的標記。

被索引的entity

@Entity
@Table(name="UserPrincipal_23")
public class User {
    private long id;
    private String username;
	
    @Basic
    @Field //表明這個屬性在Lucene中作爲可索引項(被搜索內容)
    public String getUsername() {
        return username;
    }
    ... ...		
}

主entity

@Entity
@Table(name="Post_23")
/*【1】@Indexed:表明這個類對Hibernate search是全文檢索,將自動爲該實體創建或是更新Lucene的文檔。 
 * 文檔的Id由@DocumentId標識,如果不添加,則自動標註到entity的@Id */
@Indexed 
public class ForumPost {
    private long id;
    private User user;//表格通過外鍵UserId 關聯
    private String title;
    private String body;
    private String keywords;

    @Id
    @Column(name="PostId")
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    /*【2】設置文檔的Id: Hibernate Search爲這個entity自動創建和更新document。@DocumentId用來表示這是document ID。
     * 這裏加在@Id上作爲唯一標識,如果沒有加,自動加在@Id上。*/
    @DocumentId
    public long getId() { ... }

    @ManyToOne(fetch=FetchType.EAGER,optional=false)
    @JoinColumn(name="UserId")
    /*【4】索引到根entity(本例爲索引至User):告訴Hibernate Search這是屬性是另一個entity的Id。
     * 關聯的對象的屬性也可以進行index,本例爲User中的@Field String username。有點類似於級聯的設置 */	
    @IndexedEmbedded 
    public User getUser() { ... }

    @Basic
    @Field //【3】該屬性需要進行全文搜索
    public String getTitle() { ... }

    @Lob
    @Field //【3】該屬性需要進行全文搜索
    public String getBody() { ... }
	
    @SuppressWarnings("deprecation")
    @Basic
    /* Deprecated.  Index-time boosting will not be possible anymore starting from Lucene 7. 
     * You should use query-time boosting instead, for instance by calling boostedTo(float) 
     * when building queries with the Hibernate Search query DSL.
     * @Boost:相關性加權 */ 
    @Field(boost = @Boost(2.0F))
    public String getKeywords() { ... }
    ... ...	
}

search的相關代碼

同樣的,我們提供SearchResult來存放entity和相關度分值。

public class SearchResult<T> {
	private final T entity;
	private final double relevance;	
	......
}

設置相關的倉庫接口

public interface SearchableRepository<T>{
    Page<SearchResult<T>> search(String query, Pageable pageable);
}
public interface ForumPostRepository extends JpaRepository<ForumPost, Long>,SearchableRepository<ForumPost>{
}

在Hibernate Search中使用了Lucene文檔,相關api和JPA的api相似,當然亦可以採用Lucene的API。我們看看具體的代碼:

public class ForumPostRepositoryImpl implements SearchRepository<ForumPost>{
    //【1】獲取Hibernate search的全文檢索的entity管理器,類似於JPA中的entityManager,相關的Lucene的全文搜索的方法均給
    // 基於此FullTextEntityManager。請注意在root上下文配置的entityManagerFactory是涵蓋了Hibernate search的相關設置。
    @PersistenceContext EntityManager entityManager;
    EntityManagerProxy entityManagerProxy;

    // 1.1)在Spring框架中注入的@PersistenceContext EntityManager entityManager;,實際是EntityManger proxy
    //(爲每個事務都代表提供一個新EntityManager),我們通過initialize()獲取該proxy。 
    @PostConstruct
    public void initialize(){
        if(!(this.entityManager instanceof EntityManagerProxy))
            throw new FatalBeanException("Entity manager " + this.entityManager + " was not a proxy.");
        this.entityManagerProxy = (EntityManagerProxy) entityManager;
    }

    // 1.2)FullTextEntityManager是真實的,非proxy的,也就是我們需要爲每次搜索,創建一個新的對象。
    //(無法如Spring注入的EntityManager proxy那樣默默爲你自動實現)。且FullTextEntityManager的獲取
    // 必須通過一個真正的Hibernate ORM EntityManager實現(而不能通過proxy)來獲取。  
    private FullTextEntityManager getFullTextEntityManager(){
        return Search.getFullTextEntityManager(this.entityManagerProxy.getTargetEntityManager());
    }

    //【2】Hibernate search的全文檢索實現。
    @Override
    public Page<SearchResult<ForumPost>> search(String query, Pageable pageable) {
        // 2.1)在事務中獲取FullTextEntityManager 
        FullTextEntityManager manager = getFullTextEntityManager();

        // 2.2)進行search。Hibernate search的API和JPA的API有相似之處,因爲都是Hibernate架構。 
        QueryBuilder builder = manager.getSearchFactory().buildQueryBuilder().forEntity(ForumPost.class).get();
        Query lucene = builder.keyword()
               .onFields("title", "body", "keywords", "user.username") //指定要檢索的屬性,請注意user.username
               .matching(query) //matching裏面爲要檢索的內容
               .createQuery();

        FullTextQuery q = manager.createFullTextQuery(lucene, ForumPost.class);
        q.setProjection(FullTextQuery.THIS,FullTextQuery.SCORE); //返回ForumPost和相關度

        // 2.3)獲取搜索的結果的數量
        long total = q.getResultSize(); 

        // 2.4)獲取具體的內容
        @SuppressWarnings("unchecked")
        List<Object[]> results = q.setFirstResult(pageable.getOffset())
                                  .setMaxResults(pageable.getPageSize())
                                  .getResultList();
        List<SearchResult<ForumPost>> list = new ArrayList<>();
        results.forEach(o -> list.add(
                                   new SearchResult<>((ForumPost)o[0], (Float)o[1])) );

        return new PageImpl<>(list, pageable, total);
    }
}

相關鏈接:我的Professional Java for Web Applications相關文章

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章