Solr是基于Lucene的全文搜索服务器。实际上所有的搜索索引文件都是以文件形式存储在磁盘中。数据量到一定程度上,磁盘的IO会影响搜索性能。那么针对这种情况,我们优化的过程中势必需要运用缓存技术。目前,我们熟知的缓存nosql数据库:redis、mongodb、memcache。不过,本文不在这里针对这些nosql数据库做讨论,本文主要是针对solr已经实现的缓存技术做探讨。
1、httpcache
xml配置
我们看到solrconfig.xml文件中的配置:
<httpCaching never304="true" />
<!-- If you include a <cacheControl> directive, it will be used to
generate a Cache-Control header (as well as an Expires header
if the value contains "max-age=")
By default, no Cache-Control header is generated.
You can use the <cacheControl> option even if you have set
never304="true"
-->
<!--
<httpCaching never304="true" >
<cacheControl>max-age=30, public</cacheControl>
</httpCaching>
-->
<!-- To enable Solr to respond with automatically generated HTTP
Caching headers, and to response to Cache Validation requests
correctly, set the value of never304="false"
如果要启用httpcache需要配置,如下:
<httpCaching never304="false" >
<cacheControl>max-age=30, public</cacheControl>
</httpCaching>
max-age:缓存时间,以秒为单位
public:所有资源都应用
- solr内部实现
HttpSolrCall源码片段
HttpCacheHeaderUtil.setCacheControlHeader(this.config, resp, reqMethod);
//判断缓存是否有效
if ((this.config.getHttpCachingConfig().isNever304()) ||
(!HttpCacheHeaderUtil.doCacheHeaderValidation(this.solrReq, this.req, reqMethod, resp)))
{
solrRsp = new SolrQueryResponse();
SolrRequestInfo.setRequestInfo(new SolrRequestInfo(this.solrReq, solrRsp));
execute(solrRsp);
HttpCacheHeaderUtil.checkHttpCachingVeto(solrRsp, resp, reqMethod);
Iterator headers = solrRsp.httpHeaders();
while (headers.hasNext()) {
Map.Entry entry = (Map.Entry)headers.next();
resp.addHeader((String)entry.getKey(), (String)entry.getValue());
}
QueryResponseWriter responseWriter = this.core.getQueryResponseWriter(this.solrReq);
if (this.invalidStates != null) this.solrReq.getContext().put("_stateVer_", this.invalidStates);
writeResponse(solrRsp, responseWriter, reqMethod);
}
我们看到HttpCacheHeaderUtil这个缓存工具类
//设置缓存头部信息
public static void setCacheControlHeader(SolrConfig conf, HttpServletResponse resp, Method method)
{
if ((Method.POST == method) || (Method.OTHER == method)) {
return;
}
//获取xml的配置
String cc = conf.getHttpCachingConfig().getCacheControlHeader();
if (null != cc) {
resp.setHeader("Cache-Control", cc);
}
Long maxAge = conf.getHttpCachingConfig().getMaxAge();
if (null != maxAge)
resp.setDateHeader("Expires", timeNowForHeader() + maxAge.longValue() * 1000L);
}
我们着重看下如下代码:
HttpCacheHeaderUtil.doCacheHeaderValidation(this.solrReq, this.req, reqMethod, resp)
public static boolean doCacheHeaderValidation(SolrQueryRequest solrReq, HttpServletRequest req, Method reqMethod, HttpServletResponse resp)
{
if ((Method.POST == reqMethod) || (Method.OTHER == reqMethod)) {
return false;
}
long lastMod = calcLastModified(solrReq);
String etag = calcEtag(solrReq);
resp.setDateHeader("Last-Modified", lastMod);
resp.setHeader("ETag", etag);
if (checkETagValidators(req, resp, reqMethod, etag)) {
return true;
}
if (checkLastModValidators(req, resp, lastMod)) {
return true;
}
return false;
}
public static boolean checkETagValidators(HttpServletRequest req, HttpServletResponse resp, Method reqMethod, String etag)
{
List ifNoneMatchList = Collections.list(req
.getHeaders("If-None-Match"));
if ((ifNoneMatchList.size() > 0) && (isMatchingEtag(ifNoneMatchList, etag))) {
if ((reqMethod == Method.GET) || (reqMethod == Method.HEAD))
sendNotModified(resp);
else {
sendPreconditionFailed(resp);
}
return true;
}
List ifMatchList = Collections.list(req
.getHeaders("If-Match"));
if ((ifMatchList.size() > 0) && (!isMatchingEtag(ifMatchList, etag))) {
sendPreconditionFailed(resp);
return true;
}
return false;
}
public static boolean checkLastModValidators(HttpServletRequest req, HttpServletResponse resp, long lastMod)
{
try
{
long modifiedSince = req.getDateHeader("If-Modified-Since");
if ((modifiedSince != -1L) && (lastMod <= modifiedSince))
{
sendNotModified(resp);
return true;
}
long unmodifiedSince = req.getDateHeader("If-Unmodified-Since");
if ((unmodifiedSince != -1L) && (lastMod > unmodifiedSince))
{
sendPreconditionFailed(resp);
return true;
}
}
catch (IllegalArgumentException localIllegalArgumentException)
{
}
return false;
}
主要是用来判断当前的搜索请求request的请求头header的If-Modified-Since和If-None-Match的两个值。
2、其他缓存
filterCache
Filter cache:这个是被用来缓存过滤器(就是查询参数fq)的结果和基本的枚举类型。
<!-- Filter Cache
Cache used by SolrIndexSearcher for filters (DocSets),
unordered sets of *all* documents that match a query. When a
new searcher is opened, its caches may be prepopulated or
"autowarmed" using data from caches in the old searcher.
autowarmCount is the number of items to prepopulate. For
LRUCache, the autowarmed items will be the most recently
accessed items.
Parameters:
class - the SolrCache implementation LRUCache or
(LRUCache or FastLRUCache)
size - the maximum number of entries in the cache
initialSize - the initial capacity (number of entries) of
the cache. (see java.util.HashMap)
autowarmCount - the number of entries to prepopulate from
and old cache.
-->
<filterCache class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
queryResultCache
Query result cache:缓存查询结果集。
<!-- Query Result Cache
Caches results of searches - ordered lists of document ids
(DocList) based on a query, a sort, and the range of documents requested.
-->
<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
- documentCache
Document cache:这个是被用来缓存lucene documents的,就是存储field的那个东西。
注:这个缓存是短暂的,也不会自动更新。
<!-- Document Cache
Caches Lucene Document objects (the stored fields for each
document). Since Lucene internal document ids are transient,
this cache will not be autowarmed.
-->
<documentCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
配置参数:
1、Class:指定使用solr的哪种缓存机制。
我们通过三种缓存的配置可以看到,其实现主要是分为两种:solr.LRUCache和solr.FastLRUCache.
LRUCache:基于线程安全的LinkedHashMap实现。
FastLRUCache:基于ConcurrentHashMap实现。
2、Size:允许分配多少个实体(entity)的缓存空间。
3、initialSize:分配初始多少个实体(entity)的缓存空间。
4、autowarmCount:自动预装入实体数。
- queryResultWindowSize
queryResultWindowSize:配合queryResultCache来使用。
简单来说:如果需要分页查询,那么配置为50,那么solr在查询的时候会缓存0-49个结果,那么翻页查询的时候就会直接从缓存中获取。
配置如下:
<!-- Result Window Size
An optimization for use with the queryResultCache. When a search
is requested, a superset of the requested number of document ids
are collected. For example, if a search for a particular query
requests matching documents 10 through 19, and queryWindowSize is 50,
then documents 0 through 49 will be collected and cached. Any further
requests in that range can be satisfied via the cache.
-->
<queryResultWindowSize>20</queryResultWindowSize>
这几种缓存,实际运用中需要根据查询的频率,缓存个数来具体设置,也需要实践观察。