spring cloud feign調用超時重試retryer

  • 認識一下Retryer接口
  • 認識一下RetryableException異常
  • 認識一下FeignException異常
  • 實際中我們是如何來應用的

Retry 接口 簡單介紹

  • 通過下面的源碼,Retry接口繼承了Cloneable接口。

  • Retry接口裏面有一個方法叫continueOrPropagate,參數是一個RetryableException重試異常的對象,返回值爲void類型

  • Retry接口還有 一個clone()方法,返回類型是Retryer

  • 該接口裏面有個靜態內部類Default,並且實現了Retryer接口

    • 該類有一個默認構造函數,還有一個有參數的構造函數

源碼如下:

package feign;

import static java.util.concurrent.TimeUnit.SECONDS;

對於克隆每次調用`Client.execute(Request, Request.Options)` 實現可以保持狀態,以確定是否重試操作應該繼續。
public interface Retryer extends Cloneable {

  /**
   * 如果重試被允許,返回(睡覺後可能)。 否則傳播例外。
   */
  void continueOrPropagate(RetryableException e);

  Retryer clone();

  public static class Default implements Retryer {

    // 最大重試次數
    private final int maxAttempts;
    // 重試的間隔
    private final long period;
    // 最大重試間隔
    private final long maxPeriod;
    int attempt;
    long sleptForMillis;

    // Default類的默認無參構造函數,
    // 重試間隔100 ms,最大重試間隔1s,最大重試次數默認5次
    public Default() {
      this(100, SECONDS.toMillis(1), 5);
    }

    // 重試間隔,最大重試間隔,最大重試次數,attempt默認是1
    public Default(long period, long maxPeriod, int maxAttempts) {
      this.period = period;
      this.maxPeriod = maxPeriod;
      this.maxAttempts = maxAttempts;
      this.attempt = 1;
    }

    // visible for testing;
    protected long currentTimeMillis() {
      return System.currentTimeMillis();
    }

    // 重寫了Retryer的方法continueOrPropagate
    public void continueOrPropagate(RetryableException e) {
      // 如果重試的次數attempt大於最大重試次數,則拋出重試異常對象RetryableException
      if (attempt++ >= maxAttempts) {
        throw e;
      }

      long interval;
      if (e.retryAfter() != null) {
        interval = e.retryAfter().getTime() - currentTimeMillis();
        if (interval > maxPeriod) {
          interval = maxPeriod;
        }
        if (interval < 0) {
          return;
        }
      } else {
        interval = nextMaxInterval();
      }
      try {
        Thread.sleep(interval);
      } catch (InterruptedException ignored) {
        Thread.currentThread().interrupt();
      }
      sleptForMillis += interval;
    }

    /**
     * 計算時間間隔爲重試嘗試。 的間隔呈指數增加每次嘗試,在nextInterval * = 1.5(其中,1.5是回退因子)的速率,在最大間隔。
     * @return 時間從現在納秒,直到下一次嘗試。
     */
    long nextMaxInterval() {
      long interval = (long) (period * Math.pow(1.5, attempt - 1));
      return interval > maxPeriod ? maxPeriod : interval;
    }

    @Override
    public Retryer clone() {
      return new Default(period, maxPeriod, maxAttempts);
    }
  }

  /**
   * 實現永不重試請求。 它傳播RetryableException
   */
  Retryer NEVER_RETRY = new Retryer() {

    @Override
    public void continueOrPropagate(RetryableException e) {
      throw e;
    }

    @Override
    public Retryer clone() {
      return this;
    }
  };
}

RetryableException簡單介紹

  • 該異常繼承FeignException,也是一個RuntimeException
  • 裏面有一個定義的Long類型的變量retryAfter
  • 該類有兩個構造函數,分別是:
    • RetryableException(String message, Throwable cause, Date retryAfter)
    • RetryableException(String message, Date retryAfter)
  • 該類還有一個無參數的方法,叫做retryAfter,會返回一個Date類型

源碼如下:

package feign;

import java.util.Date;

/**
 * 當引發此異常Response被認爲是可重試,通常經由feign.codec.ErrorDecoder當status是503
 */
public class RetryableException extends FeignException {

  private static final long serialVersionUID = 1L;

  private final Long retryAfter;

  /**
   * retryAfter -通常對應於Util.RETRY_AFTER報頭。
   */
  public RetryableException(String message, Throwable cause, Date retryAfter) {
    super(message, cause);
    this.retryAfter = retryAfter != null ? retryAfter.getTime() : null;
  }

  /**
   * retryAfter -通常對應於Util.RETRY_AFTER報頭。
   */
  public RetryableException(String message, Date retryAfter) {
    super(message);
    this.retryAfter = retryAfter != null ? retryAfter.getTime() : null;
  }

  /**
   * http->503 服務不可用
   * 有時對應於Util.RETRY_AFTER存在於報頭503的狀態。 其他的時間就從專用響應解析。 空如果不明
   */
  public Date retryAfter() {
    return retryAfter != null ? new Date(retryAfter) : null;
  }
}

FeignException 簡單介紹

  • 該類繼承了RuntimeException
  • 有一個int類型的私有變量status,用來表示HTTP的狀態碼
  • 有三個方法,分別是:
    • errorReading(Request request, Response ignored, IOException cause)
    • errorStatus(String methodKey, Response response)
    • errorExecuting(Request request, IOException cause)
  • 主要異常是I/O類的可以進行重試,404無重試效果

源碼如下:

package feign;

import java.io.IOException;

import static java.lang.String.format;

public class FeignException extends RuntimeException {

  private static final long serialVersionUID = 0;
  // HTTP status
  private int status;

  protected FeignException(String message, Throwable cause) {
    super(message, cause);
  }

  protected FeignException(String message) {
    super(message);
  }

  protected FeignException(int status, String message) {
    super(message);
    this.status = status;
  }

  public int status() {
    return this.status;
  }

  static FeignException errorReading(Request request, Response ignored, IOException cause) {
    return new FeignException(
        format("%s reading %s %s", cause.getMessage(), request.method(), request.url()),
        cause);
  }

  public static FeignException errorStatus(String methodKey, Response response) {
    String message = format("status %s reading %s", response.status(), methodKey);
    try {
      if (response.body() != null) {
        String body = Util.toString(response.body().asReader());
        message += "; content:\n" + body;
      }
    } catch (IOException ignored) { // NOPMD
    }
    return new FeignException(response.status(), message);
  }

  static FeignException errorExecuting(Request request, IOException cause) {
    return new RetryableException(
        format("%s executing %s %s", cause.getMessage(), request.method(), request.url()), cause,
        null);
  }
}

如何在項目中應用重試機制?

在上面的介紹中,可以知道Retryer接口,Default類,重試異常類RetryerException,我們可以通過重寫Retryer接口的方法continueOrPropagate來實現重試,比如:

@Slf4j
public class ConnectTimeoutRetryer extends Retryer.Default {
    Supplier<Stream<String>> streamSupplier = () -> Stream.of("connect timed out");

    public ConnectTimeoutRetryer(){
        super();
    }

    @Override
    public void continueOrPropagate(RetryableException e) {
				// 在kibana上可以分析prd上由於feign超時,都會在cause裏面有connect time out關鍵字,因此這裏做判斷,如果異常原因裏面都不是connect time out的,會打印ConnectTimeoutRetryerFeign failed,並拋出RetryableException對象e
        if (streamSupplier.get().noneMatch(x -> e.getCause().getMessage().contains(x))) {
            log.warn("ConnectTimeoutRetryerFeign failed", e);
            throw e;
        }
        log.error("begin to retry:{} ,{}" , e.getMessage(), e);
        super.continueOrPropagate(e);
    }

    //重寫retryer的clone方法
    @Override
    public Retryer clone() {
        return new ConnectTimeoutRetryer();
    }
}

我們這個方案,主要是解決,各個微服務的feign調用之間超時問題,比如網絡不穩定等原因導致的。

下面是重試時的堆棧信息:

2020-05-28 21:17:08,954 [hystrix-zis-zzzz-193] ERROR [com.xxxx.common.service.share.feign.ConnectTimeoutRetryer] [?:?] [trace=xxx,span=xxx] - begin to retry:connect timed out executing POST http://xxx.com/search/rrr ,{} feign.RetryableException: connect timed out executing POST http://xxx.com/search/rrr at feign.FeignException.errorExecuting(FeignException.java:67) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:104) at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76) at feign.hystrix.HystrixInvocationHandler$1.run(HystrixInvocationHandler.java:108) at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:302) at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:298) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41) at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OperatorSubscribeOn$1.call(OperatorSubscribeOn.java:94) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:56) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:47) at org.springframework.cloud.sleuth.instrument.hystrix.SleuthHystrixConcurrencyStrategy$HystrixTraceCallable.call(SleuthHystrixConcurrencyStrategy.java:188) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction.call(HystrixContexSchedulerAction.java:69) at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309) at feign.Client$Default.convertAndSend(Client.java:133) at feign.Client$Default.execute(Client.java:73) at org.springframework.cloud.sleuth.instrument.web.client.feign.TraceFeignClient.execute(TraceFeignClient.java:92) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:97) ... 32 common frames omitted

缺點:該方案是可以解決各個微服務之間feign調用超時的問題,但是Supplier<Stream<String>> streamSupplier = () -> Stream.of("connect timed out");靈活度不夠,只有堆棧cause中有connect time out的時候纔會拋出重試異常RetryerException去進行重試。
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章