- 認識一下
Retryer
接口 - 認識一下
RetryableException
異常 - 認識一下
FeignException
異常 - 實際中我們是如何來應用的
Retry 接口 簡單介紹
-
通過下面的源碼,
Retry
接口繼承了Cloneable
接口。 -
Retry
接口裏面有一個方法叫continueOrPropagate
,參數是一個RetryableException
重試異常的對象,返回值爲void
類型 -
Retry
接口還有 一個clone()
方法,返回類型是Retryer
-
該接口裏面有個靜態內部類
Default
,並且實現了Retryer
接口- 該類有一個默認構造函數,還有一個有參數的構造函數
源碼如下:
package feign;
import static java.util.concurrent.TimeUnit.SECONDS;
對於克隆每次調用`Client.execute(Request, Request.Options)` 實現可以保持狀態,以確定是否重試操作應該繼續。
public interface Retryer extends Cloneable {
/**
* 如果重試被允許,返回(睡覺後可能)。 否則傳播例外。
*/
void continueOrPropagate(RetryableException e);
Retryer clone();
public static class Default implements Retryer {
// 最大重試次數
private final int maxAttempts;
// 重試的間隔
private final long period;
// 最大重試間隔
private final long maxPeriod;
int attempt;
long sleptForMillis;
// Default類的默認無參構造函數,
// 重試間隔100 ms,最大重試間隔1s,最大重試次數默認5次
public Default() {
this(100, SECONDS.toMillis(1), 5);
}
// 重試間隔,最大重試間隔,最大重試次數,attempt默認是1
public Default(long period, long maxPeriod, int maxAttempts) {
this.period = period;
this.maxPeriod = maxPeriod;
this.maxAttempts = maxAttempts;
this.attempt = 1;
}
// visible for testing;
protected long currentTimeMillis() {
return System.currentTimeMillis();
}
// 重寫了Retryer的方法continueOrPropagate
public void continueOrPropagate(RetryableException e) {
// 如果重試的次數attempt大於最大重試次數,則拋出重試異常對象RetryableException
if (attempt++ >= maxAttempts) {
throw e;
}
long interval;
if (e.retryAfter() != null) {
interval = e.retryAfter().getTime() - currentTimeMillis();
if (interval > maxPeriod) {
interval = maxPeriod;
}
if (interval < 0) {
return;
}
} else {
interval = nextMaxInterval();
}
try {
Thread.sleep(interval);
} catch (InterruptedException ignored) {
Thread.currentThread().interrupt();
}
sleptForMillis += interval;
}
/**
* 計算時間間隔爲重試嘗試。 的間隔呈指數增加每次嘗試,在nextInterval * = 1.5(其中,1.5是回退因子)的速率,在最大間隔。
* @return 時間從現在納秒,直到下一次嘗試。
*/
long nextMaxInterval() {
long interval = (long) (period * Math.pow(1.5, attempt - 1));
return interval > maxPeriod ? maxPeriod : interval;
}
@Override
public Retryer clone() {
return new Default(period, maxPeriod, maxAttempts);
}
}
/**
* 實現永不重試請求。 它傳播RetryableException
*/
Retryer NEVER_RETRY = new Retryer() {
@Override
public void continueOrPropagate(RetryableException e) {
throw e;
}
@Override
public Retryer clone() {
return this;
}
};
}
RetryableException簡單介紹
- 該異常繼承
FeignException
,也是一個RuntimeException
- 裏面有一個定義的
Long
類型的變量retryAfter
- 該類有兩個構造函數,分別是:
RetryableException(String message, Throwable cause, Date retryAfter)
RetryableException(String message, Date retryAfter)
- 該類還有一個無參數的方法,叫做
retryAfter
,會返回一個Date
類型
源碼如下:
package feign;
import java.util.Date;
/**
* 當引發此異常Response被認爲是可重試,通常經由feign.codec.ErrorDecoder當status是503
*/
public class RetryableException extends FeignException {
private static final long serialVersionUID = 1L;
private final Long retryAfter;
/**
* retryAfter -通常對應於Util.RETRY_AFTER報頭。
*/
public RetryableException(String message, Throwable cause, Date retryAfter) {
super(message, cause);
this.retryAfter = retryAfter != null ? retryAfter.getTime() : null;
}
/**
* retryAfter -通常對應於Util.RETRY_AFTER報頭。
*/
public RetryableException(String message, Date retryAfter) {
super(message);
this.retryAfter = retryAfter != null ? retryAfter.getTime() : null;
}
/**
* http->503 服務不可用
* 有時對應於Util.RETRY_AFTER存在於報頭503的狀態。 其他的時間就從專用響應解析。 空如果不明
*/
public Date retryAfter() {
return retryAfter != null ? new Date(retryAfter) : null;
}
}
FeignException 簡單介紹
- 該類繼承了
RuntimeException
- 有一個
int
類型的私有變量status
,用來表示HTTP
的狀態碼 - 有三個方法,分別是:
errorReading(Request request, Response ignored, IOException cause)
errorStatus(String methodKey, Response response)
errorExecuting(Request request, IOException cause)
- 主要異常是
I/O
類的可以進行重試,404無重試效果
源碼如下:
package feign;
import java.io.IOException;
import static java.lang.String.format;
public class FeignException extends RuntimeException {
private static final long serialVersionUID = 0;
// HTTP status
private int status;
protected FeignException(String message, Throwable cause) {
super(message, cause);
}
protected FeignException(String message) {
super(message);
}
protected FeignException(int status, String message) {
super(message);
this.status = status;
}
public int status() {
return this.status;
}
static FeignException errorReading(Request request, Response ignored, IOException cause) {
return new FeignException(
format("%s reading %s %s", cause.getMessage(), request.method(), request.url()),
cause);
}
public static FeignException errorStatus(String methodKey, Response response) {
String message = format("status %s reading %s", response.status(), methodKey);
try {
if (response.body() != null) {
String body = Util.toString(response.body().asReader());
message += "; content:\n" + body;
}
} catch (IOException ignored) { // NOPMD
}
return new FeignException(response.status(), message);
}
static FeignException errorExecuting(Request request, IOException cause) {
return new RetryableException(
format("%s executing %s %s", cause.getMessage(), request.method(), request.url()), cause,
null);
}
}
如何在項目中應用重試機制?
在上面的介紹中,可以知道Retryer
接口,Default
類,重試異常類RetryerException
,我們可以通過重寫Retryer
接口的方法continueOrPropagate
來實現重試,比如:
@Slf4j
public class ConnectTimeoutRetryer extends Retryer.Default {
Supplier<Stream<String>> streamSupplier = () -> Stream.of("connect timed out");
public ConnectTimeoutRetryer(){
super();
}
@Override
public void continueOrPropagate(RetryableException e) {
// 在kibana上可以分析prd上由於feign超時,都會在cause裏面有connect time out關鍵字,因此這裏做判斷,如果異常原因裏面都不是connect time out的,會打印ConnectTimeoutRetryerFeign failed,並拋出RetryableException對象e
if (streamSupplier.get().noneMatch(x -> e.getCause().getMessage().contains(x))) {
log.warn("ConnectTimeoutRetryerFeign failed", e);
throw e;
}
log.error("begin to retry:{} ,{}" , e.getMessage(), e);
super.continueOrPropagate(e);
}
//重寫retryer的clone方法
@Override
public Retryer clone() {
return new ConnectTimeoutRetryer();
}
}
我們這個方案,主要是解決,各個微服務的feign
調用之間超時問題,比如網絡不穩定等原因導致的。
下面是重試時的堆棧信息:
2020-05-28 21:17:08,954 [hystrix-zis-zzzz-193] ERROR [com.xxxx.common.service.share.feign.ConnectTimeoutRetryer] [?:?] [trace=xxx,span=xxx] - begin to retry:connect timed out executing POST http://xxx.com/search/rrr ,{} feign.RetryableException: connect timed out executing POST http://xxx.com/search/rrr at feign.FeignException.errorExecuting(FeignException.java:67) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:104) at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76) at feign.hystrix.HystrixInvocationHandler$1.run(HystrixInvocationHandler.java:108) at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:302) at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:298) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51) at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41) at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.Observable.unsafeSubscribe(Observable.java:10211) at rx.internal.operators.OperatorSubscribeOn$1.call(OperatorSubscribeOn.java:94) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:56) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:47) at org.springframework.cloud.sleuth.instrument.hystrix.SleuthHystrixConcurrencyStrategy$HystrixTraceCallable.call(SleuthHystrixConcurrencyStrategy.java:188) at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction.call(HystrixContexSchedulerAction.java:69) at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309) at feign.Client$Default.convertAndSend(Client.java:133) at feign.Client$Default.execute(Client.java:73) at org.springframework.cloud.sleuth.instrument.web.client.feign.TraceFeignClient.execute(TraceFeignClient.java:92) at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:97) ... 32 common frames omitted
缺點:該方案是可以解決各個微服務之間feign
調用超時的問題,但是Supplier<Stream<String>> streamSupplier = () -> Stream.of("connect timed out");
靈活度不夠,只有堆棧cause
中有connect time out
的時候纔會拋出重試異常RetryerException
去進行重試。