SolrRelevancyFAQ --排序

 

 

Solr Relevancy FAQ

Relevancy is the quality of results returned from a query, encompassing both what documents are found, and their relative ranking (the order that they are returned to the user.)

 

 

Should I use the standard or dismax request handler

The standard request handler uses SolrQuerySyntax to specify the query via the q parameter, and it must be well formed or an error will be returned. It's good for specifying exact, arbitrarily complex queries.

The dismax request handler has a more forgiving query parser for the q parameter, useful for directly passing in a user-supplied query string. The other parameters make it easy to search across multiple fields using disjunctions and sloppy phrase queries to return highly relevant results.

For servicing user-entered queries, start by using dismax.

 

 

How can I search for "superman" in both the title and subject fields

The standard request handler uses SolrQuerySyntax for q :

q=title:superman subject:superman

Using the dismax request handler , specify the query fields using the qf param.

q=superman&qf=title subject

 

How can I make "superman" in the title field score higher than in the subject field

For the standard request handler, "boost" the clause on the title field:

q=title:superman^2 subject:superman

Using the dismax request handler, one can specify boosts on fields in parameters such as qf :

q=superman&qf=title^2 subject

 

Why are search results returned in the order they are?

If no other sort order is specified, the default is by relevancy score.

 

How can I see the relevancy scores for search results

Request that the pseudo-field named "score" be returned by adding it to the fl (field list) parameter. The "score" will then appear along with the stored fields in returned documents. q=Justice League&fl=*,score

 

Why doesn't my query of "flash" match a field containing "Flash" (with a capital "F")

The fieldType for the field containing "Flash" must have an analyzer that lowercases terms. This will cause all searches on that field to be case insensitive.

See AnalyzersTokenizersTokenFilters for more.

 

How can I make exact-case matches score higher

Example: a query of "Penguin" should score documents containing "Penguin" higher than docs containing "penguin".

The general strategy is to index the content twice, using different fields with different fieldTypes (and different analyzers associated with those fieldTypes). One analyzer will contain a lowercase filter for case-insensitive matches, and one will preserve case for exact-case matches.

Use copyField commands in the schema to index a single input field multiple times.

Once the content is indexed into multiple fields that are analyzed differently, query across both fields .

 

I'm getting query parse exceptions when making queries

For the standard request handler, the q parameter must be correctly formatted SolrQuerySyntax , with any special characters escaped. If this is a user-entered query, consider using the dismax handler .

Many other parameters such as fq and facet.query must also conform to SolrQuerySyntax regardless of which handler is used.

 

How can I make queries of "spiderman" and "spider man" match "Spider-Man"

WordDelimiterFilter can be used in the analyzer for the field being queried to match words with intra-word delimiters such as dashes or case changes.

 

How can I search for one term near another term (say, "batman" and "movie")

A proximity search can be done with a sloppy phrase query. The closer together the two terms appear in the document, the higher the score will be. A sloppy phrase query specifies a maximum "slop", or the number of positions tokens need to be moved to get a match.

This example for the standard request handler will find all documents where "batman" occurs within 100 words of "movie":

q=text:"batman movie"~100

The dismax handler can easily create sloppy phrase queries with the pf (phrase fields) and ps (phrase slop) parameters:

q=batman movie&pf=text&ps=100

The dismax handler also allows users to explicitly specify a phrase query with double quotes, and the qs (query slop) parameter can be used to add slop to any explicit phrase queries:

q="batman movie"&qs=100

 

How can I increase the score for specific documents

 

index-time boosts

To increase the scores for certain documents that match a query, regardless of what that query may be, one can use index-time boosts.

Index-time boosts can be specified per-field also, so only queries matching on that specific field will get the extra boost. An Index-time boost on a value of a multiValued field applies to all values for that field.

Index-time boosts are assigned with the optional attribute "boost" in the <doc> section of the XML updating messages. See UpdateXmlMessages for more information.

 

Query Elevation Component

To raise certain documents to the top of the result list based on a certain queries, one can use the QueryElevationComponent .

 

How can I change the score of a document based on the *value* of a field (say, "popularity")

Use a FunctionQuery as part of your query.

Solr can parse function queries in the following syntax .

Some examples...

 

  # simple boosts by popularity
q=%2Bsupervillians+_val_:"popularity"
defType=dismax&qf=text&q=supervillians&bf=popularity

# boosts based on complex functions of the popularity field
q=%2Bsupervillians+_val_:"scale(popularity,0,100)"
defType=dismax&qf=text&q=supervillians&bf=sqrt(popularity)

 

How are documents scored

Basic scoring factors:

  • tf stands for term frequency - the more times a search term appears in a document, the higher the score
  • idf stands for inverse document frequency - matches on rarer terms count more than matches on common terms
  • coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score
  • lengthNorm - matches on a smaller field score higher than matches on a larger field
  • index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
  • query clause boost - a user may explicitly boost the contribution of one part of a query over another.

See the Lucene scoring documentation for more info.

 

Why does id:archangel come before id:hawkgirl when querying for "wings"

Add debugQuery=on to your request, and you will get (fairly dense) detailed scoring information for each document returned.

q=wings&indent=on&debugQuery=on

This extra information will appear in the "explain" section of the "debug" section in the response.

 

<response>
<result>[...]</result>
<lst name="debug">
<str name="rawquerystring">wings</str>
<str name="querystring">wings</str>
<str name="parsedquery">text:wings</str>
<str name="parsedquery_toString">text:wings</str>
<lst name="explain">
<str name="id=archangel,internal_docid=4">
0.46632254 = (MATCH) fieldWeight(text:wings in 4), product of:
1.7320508 = tf(termFreq(text:wings)=3)
2.871802 = idf(docFreq=2)
0.09375 = fieldNorm(field=text, doc=4)
</str>
<str name="id=hawkgirl,internal_docid=24">
0.35897526 = (MATCH) fieldWeight(text:wings in 24), product of:
1.0 = tf(termFreq(text:wings)=1)
2.871802 = idf(docFreq=2)
0.125 = fieldNorm(field=text, doc=24)
</str>
[...]

In this specific example, we see that the main scoring difference between the two documents is the tf or (term frequency) factor. The text field for the id:archangel document contains the term wings 3 times (termFreq(text:wings)=3 ) while the id:hawkgirl document only contains it once.

Debug info is expensive to generate, and should only be used for debugging problems with specific queries.

Debug info can also be selected from the admin query page, http://localhost:8983/solr/admin/form.jsp

 

Why doesn't document id:juggernaut appear in the top 10 results for my query

Since debugQuery=on only gives you scoring "explain" info for the documents returned, the explainOther parameter can be used to specify other documents you want detailed scoring info for.

q=supervillians&debugQuery=on&explainOther=id:juggernaut

Now you should be able to examine the scoring explain info of the top matching documents, compare it to the explain info for documents matching id:juggernaut, and determine why the rankings are not as you expect.

 

How can I boost the score of newer documents

A full example of a query for "ipod" with the score boosted higher the newer the product is:

 

http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod

One can simplify the implementation by decomposing the query into multiple arguments:

 

http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qq=ipod

Now the main "q" argument as well as the "dateboost" argument may be specified as defaults in a search handler in solrconfig.xml, and clients would only need to pass "qq", the user query.

To boost another query type such as a dismax query, the value of the boost query is a full sub-query and hence can use the {!querytype} syntax. Alternately, the defType param can be used in the boost local params to set the default type to dismax. The other dismax parameters may be set as top level parameters.

 

http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq defType=dismax}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qf=text&pf=text&qq=ipod

 

How do I give a very low boost to documents that match my query

In general the problem is that a "low" boost is still a boost, it can only improve the score of documents that match. One way to fake a "negative boost" is to give a high boost to everything that does *not* match. For example:

  • bq=(*:* -field_a:54)^10000

TODO: If "bq" supports pure negative queries then you can simplify that to bq=-field_a:54^10000

 

 

發佈了91 篇原創文章 · 獲贊 9 · 訪問量 90萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章