Lucene根據字段進行自定義搜索擴展

最近需要對公司的產品搜索功能做一步改動,搜索到的結果首先按照是否有庫存進行排序,然後再按照銷量。由於庫存量也是一個整數,如果直接按照庫存量進行倒序排序的話,是不符合要求的,Lucene也沒有支持我們這種特殊的業務需求,但是可以通過擴展的方式進行改寫。
 
 
public class EmptyStockComparatorSource extends FieldComparatorSource {
    @Override
    public FieldComparator<?> newComparator(String fieldname, int numHits, int sortPos, boolean reversed)
            throws IOException {
        return new LongComparator(numHits, fieldname, 0L);
    }

    public static class LongComparator extends FieldComparator.NumericComparator<Long> {
        private final long[] values;
        private long bottom;
        private long topValue;

        /**
         * Creates a new comparator based on {@link Long#compare} for {@code numHits}.
         * When a document has no value for the field, {@code missingValue} is substituted.
         */
        public LongComparator(int numHits, String field, Long missingValue) {
            super(field, missingValue);
            values = new long[numHits];
        }

        @Override
        protected void doSetNextReader(LeafReaderContext context) throws IOException {
            currentReaderValues = getNumericDocValues(context, field);
            if (missingValue != null) {
                docsWithField = getDocsWithValue(context, field);
                // optimization to remove unneeded checks on the bit interface:
                if (docsWithField instanceof Bits.MatchAllBits) {
                    docsWithField = null;
                }
            } else {
                docsWithField = null;
            }
        }

        @Override
        public int compare(int slot1, int slot2) {
            return Long.compare(values[slot1], values[slot2]);
        }

        @Override
        public int compareBottom(int doc) {
            // TODO: there are sneaky non-branch ways to compute
            // -1/+1/0 sign
            long v2 = currentReaderValues.get(doc);
            // Test for v2 == 0 to save Bits.get method call for
            // the common case (doc has value and value is non-zero):
            if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {
                v2 = missingValue;
            }

            return Long.compare(bottom, v2);
        }

        @Override
        public void copy(int slot, int doc) {
            long v2 = currentReaderValues.get(doc);
            // Test for v2 == 0 to save Bits.get method call for
            // the common case (doc has value and value is non-zero):
            if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {
                v2 = missingValue;
            }

            values[slot] = v2 > 0L ? 1L : 0L;
        }

        @Override
        public void setBottom(final int bottom) {
            this.bottom = values[bottom];
        }

        @Override
        public void setTopValue(Long value) {
            topValue = value;
        }

        @Override
        public Long value(int slot) {
            return Long.valueOf(values[slot]) ;
        }

        @Override
        public int compareTop(int doc) {
            long docValue = currentReaderValues.get(doc);
            // Test for docValue == 0 to save Bits.get method call for
            // the common case (doc has value and value is non-zero):
            if (docsWithField != null && docValue == 0 && !docsWithField.get(doc)) {
                docValue = missingValue;
            }
            return Long.compare(topValue, docValue);
        }
    }
}
 
 
其中LongComparator直接從lucene源碼中copy出來,只需要做些許修改即可,最主要的修改就是copy(int slot, int doc)方法,在複製比較值得過程中,將所有存在庫存的值都視爲1,否則視爲0,這樣排序的結果就是我們所期待的。
 
我們用到的測試用例:
 
Directory directory1 = FSDirectory.open(Paths.get(
                "/Users/xxx/develop/tools/solr-5.5.0/server/solr/product/data/index"));
        DirectoryReader directoryReader1 = DirectoryReader.open(directory1);
        IndexSearcher searcher1 = new IndexSearcher(directoryReader1);
        Sort sort1 = new Sort(new SortField("psfixstock", new EmptyStockComparatorSource(), true),
                new SortField("salesVolume", SortField.Type.INT, true));

        TopFieldDocs topDocs1 = searcher1.search(new TermQuery(new Term("gender_text", "女士")), 10, sort1);
        for (ScoreDoc scoreDoc : topDocs1.scoreDocs) {
            int doc = scoreDoc.doc;
            Document document = searcher1.doc(doc);
            System.out.println(String.format("docId=%s, psfixstock=%s, salesVolumn=%s", doc, document.get("psfixstock"), document.get("salesVolume")));
        }
 
 
在排序時,需要將其加入至Sort對象中,但執行的時候出現錯誤,顯示docvalues的類型不正確:
 
Exception in thread "main" java.lang.IllegalStateException: unexpected docvalues type NONE for field 'psfixstock' (expected=NUMERIC). Use UninvertingReader or index with docvalues.
    at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
    at org.apache.lucene.index.DocValues.getNumeric(DocValues.java:227)
    at org.apache.lucene.search.FieldComparator$NumericComparator.getNumericDocValues(FieldComparator.java:167)
    at com.zp.solr.handler.component.EmptyStockComparatorSource$LongComparator.doSetNextReader(EmptyStockComparatorSource.java:36)
    at org.apache.lucene.search.SimpleFieldComparator.getLeafComparator(SimpleFieldComparator.java:36)
    at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:183)
    at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:164)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:812)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744)
    at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:729)
    at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:671)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:577)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:627)
    at com.zp.solr.handler.component.EmptyStockSortingTest.main(EmptyStockSortingTest.java:57)
 
經過一番查找,找到原因,參考文檔:http://qindongliang.iteye.com/blog/2297280,我們搜索所使用到的字段沒有設置對應的docType。如果在solr中,需要進行手動排序的字段,設置docValues=“true”,並進行重新索引(使用full-import方式):
 
   
<field name="psfixstock" type="tint" indexed="true" stored="true" multiValued="false" docValues="true" />
 
 
必須要重新建立索引纔可以正常運行。注意,此時Solr與Elastic Search採取的方案有所不同,Solr默認docValues=false,而ES則相反,使用Doc索引方式會對性能產生一定的影響,要謹慎使用。
 
對於lucene中,需要將添加document中增加數字類型Field:NumericDocValuesField,否則出現上面的錯誤,
 
document.add(new NumericDocValuesField("stock", stock));
 
 
最終的排序結果已經按照我們的需要進行了:
 
docId=2629, psfixstock=98391, salesVolumn=4685
docId=305, psfixstock=991, salesVolumn=14
docId=16762, psfixstock=3, salesVolumn=12
docId=22350, psfixstock=993, salesVolumn=10
docId=29021, psfixstock=11076, salesVolumn=10
docId=3635, psfixstock=61, salesVolumn=6
docId=4111, psfixstock=1104, salesVolumn=5
docId=10608, psfixstock=4395, salesVolumn=5
docId=4874, psfixstock=4975, salesVolumn=4
docId=4911, psfixstock=6, salesVolumn=4
docId=15071, psfixstock=998, salesVolumn=4
docId=4837, psfixstock=9, salesVolumn=3
docId=4860, psfixstock=1002, salesVolumn=3
docId=3749, psfixstock=2240, salesVolumn=2
docId=4109, psfixstock=1493, salesVolumn=2
docId=15068, psfixstock=1000, salesVolumn=2
docId=25901, psfixstock=11110, salesVolumn=2
docId=3688, psfixstock=21, salesVolumn=1
docId=4912, psfixstock=17, salesVolumn=1
docId=5035, psfixstock=2, salesVolumn=1
docId=11835, psfixstock=8, salesVolumn=1
docId=12044, psfixstock=1, salesVolumn=1
docId=13508, psfixstock=2, salesVolumn=1
docId=20019, psfixstock=1, salesVolumn=1
docId=20884, psfixstock=100000, salesVolumn=1
docId=22620, psfixstock=1, salesVolumn=1
docId=24128, psfixstock=1, salesVolumn=1
docId=0, psfixstock=2, salesVolumn=0
docId=9, psfixstock=1, salesVolumn=0
docId=11, psfixstock=4, salesVolumn=0
docId=15, psfixstock=3, salesVolumn=0
docId=20, psfixstock=4, salesVolumn=0
docId=23, psfixstock=2, salesVolumn=0
docId=24, psfixstock=5, salesVolumn=0
docId=25, psfixstock=7, salesVolumn=0
docId=35, psfixstock=2, salesVolumn=0
docId=53, psfixstock=2, salesVolumn=0
 
 
 
 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章