solr索引過程源碼解析

在文章http://blog.csdn.net/jj380382856/article/details/51603818我們分析了更新索引的solrj源碼的處理方式，最後會向solr發送一個/update的請求，下面我們繼續分析solr在接收到這個請求會怎麼處理。

1.請求首先被SolrDispatchFilter截獲，然後執行dofilter方法

2.在方法中調用 Action result = call.call();方法，進入HttpSolrCall.call()方法，這個方法會調用這個類的init()方法，該方法的主要作用是根據servlet和solrconfig的配置獲取當前處理請求的SolrRequestHandler的對象。這個方法中調用了 extractHandlerFromURLPath(parser);方法，該方法代碼如下。

private void extractHandlerFromURLPath(SolrRequestParsers parser) throws Exception {
    if (handler == null && path.length() > 1) { // don't match "" or "/" as valid path
      handler = core.getRequestHandler(path);
。。。。。。
  }

執行完這個代碼後handler變成了

init()方法執行完成後action變成了process，HttpSolrCall.call()方法繼續執行，代碼如下，主要就是封裝請求，這裏面的主要的代碼是 execute(solrRsp);

 switch (action) {
        case ADMIN:
          handleAdminRequest();
          return RETURN;
        case REMOTEQUERY:
          remoteQuery(coreUrl + path, resp);
          return RETURN;
        case PROCESS:
          final Method reqMethod = Method.getMethod(req.getMethod());
          HttpCacheHeaderUtil.setCacheControlHeader(config, resp, reqMethod);
          // unless we have been explicitly told not to, do cache validation
          // if we fail cache validation, execute the query
          if (config.getHttpCachingConfig().isNever304() ||
              !HttpCacheHeaderUtil.doCacheHeaderValidation(solrReq, req, reqMethod, resp)) {
            SolrQueryResponse solrRsp = new SolrQueryResponse();
              /* even for HEAD requests, we need to execute the handler to
               * ensure we don't get an error (and to make sure the correct
               * QueryResponseWriter is selected and we get the correct
               * Content-Type)
               */
            SolrRequestInfo.setRequestInfo(new SolrRequestInfo(solrReq, solrRsp));
            execute(solrRsp);<span style="color:#ff0000;">//主要代碼</span>
            HttpCacheHeaderUtil.checkHttpCachingVeto(solrRsp, resp, reqMethod);
            Iterator<Map.Entry<String, String>> headers = solrRsp.httpHeaders();
            while (headers.hasNext()) {
              Map.Entry<String, String> entry = headers.next();
              resp.addHeader(entry.getKey(), entry.getValue());
            }
            QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq);
            if (invalidStates != null) solrReq.getContext().put(CloudSolrClient.STATE_VERSION, invalidStates);
            writeResponse(solrRsp, responseWriter, reqMethod);
          }
          return RETURN;
        default: return action;

 protected void execute(SolrQueryResponse rsp) {
    // a custom filter could add more stuff to the request before passing it on.
    // for example: sreq.getContext().put( "HttpServletRequest", req );
    // used for logging query stats in SolrCore.execute()
    solrReq.getContext().put("webapp", req.getContextPath());
    solrReq.getCore().execute(handler, solrReq, rsp);
  }

這裏面的excute（）方法代碼如下：

public void execute(SolrRequestHandler handler, SolrQueryRequest req, SolrQueryResponse rsp) {
    if (handler==null) {
      String msg = "Null Request Handler '" +
        req.getParams().get(CommonParams.QT) + "'";

      if (log.isWarnEnabled()) log.warn(logid + msg + ":" + req);

      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, msg);
    }

    preDecorateResponse(req, rsp);

    if (requestLog.isDebugEnabled() && rsp.getToLog().size() > 0) {
      // log request at debug in case something goes wrong and we aren't able to log later
      requestLog.debug(rsp.getToLogAsString(logid));
    }

    // TODO: this doesn't seem to be working correctly and causes problems with the example server and distrib (for example /spell)
    // if (req.getParams().getBool(ShardParams.IS_SHARD,false) && !(handler instanceof SearchHandler))
    //   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,"isShard is only acceptable with search handlers");


    handler.handleRequest(req,rsp);<span style="white-space:pre">	</span><span style="color:#ff6666;">//主要代碼</span>
    postDecorateResponse(handler, req, rsp);

    if (rsp.getToLog().size() > 0) {
      if (requestLog.isInfoEnabled()) {
        requestLog.info(rsp.getToLogAsString(logid));
      }

      if (log.isWarnEnabled() && slowQueryThresholdMillis >= 0) {
        final long qtime = (long) (req.getRequestTimer().getTime());
        if (qtime >= slowQueryThresholdMillis) {
          log.warn("slow: " + rsp.getToLogAsString(logid));
        }
      }
    }
  }

上面主要的代碼是 handler.handleRequest(req,rsp);這個方法調用的是RequestHandlerBase的handleRequest方法，該方法又調用handleRequestBody抽象方法，定義如下：

 public abstract void handleRequestBody( SolrQueryRequest req, SolrQueryResponse rsp ) throws Exception;

ContentStreamHandlerBase類中實現了該方法，代碼如下：

@Override
  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
    SolrParams params = req.getParams();
    UpdateRequestProcessorChain processorChain =
        req.getCore().getUpdateProcessorChain(params);<span style="white-space:pre">		</span><span style="color:#ff0000;">//獲得更新處理鏈</span>

    UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp);

    try {
      ContentStreamLoader documentLoader = newLoader(req, processor);


      Iterable<ContentStream> streams = req.getContentStreams();
      if (streams == null) {
        if (!RequestHandlerUtils.handleCommit(req, processor, params, false) && !RequestHandlerUtils.handleRollback(req, processor, params, false)) {
          throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "missing content stream");
        }
      } else {

        for (ContentStream stream : streams) {
          documentLoader.load(req, rsp, stream, processor);
        }

        // Perhaps commit from the parameters
        RequestHandlerUtils.handleCommit(req, processor, params, false);
        RequestHandlerUtils.handleRollback(req, processor, params, false);
      }
    } finally {
      // finish the request
      processor.finish();
    }
  }

上面這段代碼首先獲得了更新處理鏈如下

可見更新需要經過3個流程，一個是tlog的更新，一個是分佈式轉發，一個是更新鏈。

該方法中有如下代碼，主要是對請求的流確定用什麼documentLoad加載

for (ContentStream stream : streams) {
          documentLoader.load(req, rsp, stream, processor);
        }

以xml格式爲例，這裏會調用xmlloader的load方法，load方法又會調用xmlload裏面的processUpdate方法

這個方法會調用當前processor的processAdd方法，從LogUpdateProcessor開始下面貼出processAdd的代碼

 @Override
  public void processAdd(AddUpdateCommand cmd) throws IOException {
    if (logDebug) { log.debug("PRE_UPDATE " + cmd.toString() + " " + req); }

    // call delegate first so we can log things like the version that get set later
    if (next != null) next.processAdd(cmd);<span style="white-space:pre">	//調用下一個處理鏈進行處理

    // Add a list of added id's to the response
    if (adds == null) {
      adds = new ArrayList<>();
      toLog.add("add",adds);<span style="white-space:pre">	</span>
    }

    if (adds.size() < maxNumToLog) {
      long version = cmd.getVersion();
      String msg = cmd.getPrintableId();
      if (version != 0) msg = msg + " (" + version + ')';
      adds.add(msg);<span style="white-space:pre">				
    }

    numAdds++;
  }

由於不是solrcloud，所以DistributedUpdateProcessor基本上沒有做什麼處理，所以繼續下一個process，就到了RunUpdateProcessor裏面

  @Override
  public void processAdd(AddUpdateCommand cmd) throws IOException {
    
    if (AtomicUpdateDocumentMerger.isAtomicUpdate(cmd)) {
      throw new SolrException
        (SolrException.ErrorCode.BAD_REQUEST,
         "RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain");
    }

    updateHandler.addDoc(cmd);<span style="white-space:pre">	//關鍵代碼
    super.processAdd(cmd);
    changesSinceCommit = true;
  }

這個addDoc調用的是DirectUpdateHandler2的addDoc0方法代碼如下：

  private int addDoc0(AddUpdateCommand cmd) throws IOException {
    int rc = -1;
    RefCounted<IndexWriter> iw = solrCoreState.getIndexWriter(core);
    try {
      IndexWriter writer = iw.get();
      addCommands.incrementAndGet();
      addCommandsCumulative.incrementAndGet();
      
      // if there is no ID field, don't overwrite
      if (idField == null) {
        cmd.overwrite = false;
      }
      
      try {
        IndexSchema schema = cmd.getReq().getSchema();
        
        if (cmd.overwrite) {
          
          // Check for delete by query commands newer (i.e. reordered). This
          // should always be null on a leader
          List<UpdateLog.DBQ> deletesAfter = null;
          if (ulog != null && cmd.version > 0) {
            deletesAfter = ulog.getDBQNewer(cmd.version);
          }
          
          if (deletesAfter != null) {
            log.info("Reordered DBQs detected.  Update=" + cmd + " DBQs="
                + deletesAfter);
            List<Query> dbqList = new ArrayList<>(deletesAfter.size());
            for (UpdateLog.DBQ dbq : deletesAfter) {
              try {
                DeleteUpdateCommand tmpDel = new DeleteUpdateCommand(cmd.req);
                tmpDel.query = dbq.q;
                tmpDel.version = -dbq.version;
                dbqList.add(getQuery(tmpDel));
              } catch (Exception e) {
                log.error("Exception parsing reordered query : " + dbq, e);
              }
            }
            
            addAndDelete(cmd, dbqList);
          } else {
            // normal update
            
            Term updateTerm;
            Term idTerm = new Term(cmd.isBlock() ? "_root_" : idField.getName(), cmd.getIndexedId());
            boolean del = false;
            if (cmd.updateTerm == null) {
              updateTerm = idTerm;
            } else {
              // this is only used by the dedup update processor
              del = true;
              updateTerm = cmd.updateTerm;
            }

            if (cmd.isBlock()) {
              writer.updateDocuments(updateTerm, cmd);
            } else {
              Document luceneDocument = cmd.getLuceneDocument();
              // SolrCore.verbose("updateDocument",updateTerm,luceneDocument,writer);
              writer.updateDocument(updateTerm, luceneDocument);<span style="white-space:pre">	</span>//調用lucene的indexwriter的updateDocument
            }

可以看到，這裏終於和lucene打交道了，用到了indexWriter。並且核心方法是writer.updateDocument(updateTerm, luceneDocument);;

solr爲了豐富的功能和可擴展性，設計模式用了太多了，眼花繚亂。。。。

RunUpdateProcessor處理完後又回到了LogUpdateProcessor的那段代碼，

並寫入日誌，完成一些收尾工作，一條數據的插入就完成了，這個過程涉及到的東西很多，以後我會把indexWriter.updateDocument()方法展開來介紹一下。

如有不對請不吝指正。謝謝

solr索引過程源碼解析

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

lucene索引源碼分析2

基於lucene的mr索引程序的實現

lucene索引源碼分析1

ansj源碼淺析2

ansj源碼淺析3

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結