httpclient proxy ocketRead0問題解決

問題現場

在多線程環境下使用HttpClient組件對某個HTTP服務發起請求,運行一段時間之後發現客戶端主機CPU利用率呈現出下降趨勢,而不是一個穩定的狀態。 而且,從程序日誌中判斷有線程處於hang住的狀態,應該是被阻塞了。

問題排查

一開始找不到原因,懷疑是多線程併發導致的死鎖問題,但是通過代碼審查並未定位到任何可能的多線程併發問題。 甚至開始懷疑是否是因爲內存資源不夠引起JVM頻繁GC到導致業務線程被暫停,但是從GC的日誌輸出結果看,GC是正常的。 於是,進入一種丈二和尚摸不着頭腦頭腦的狀態,再次Review代碼,發現並未設置請求超時時間,於是設置超時控制,發現問題依然存在,徹底懵逼了。 最後,dump線程堆棧和內存堆棧,再對堆棧數據進行分析。從分析結果看,確認是因爲Socket連接在讀取數據時被阻塞引起線程hang住。搜索“httpclient 超時”關鍵字,找到各式各樣設置HttpClient超時控制的方式,均嘗試過但是並未生效。 實際上到後來才知道,HttpCient的超時控制在不同的版本中設置請求超時參數的方式均各不相同,這才導致了我使用了網上看到的方式設置之後並未生效。當然,根本原因還是因爲對HttpClient這個組件不熟悉導致的.

問題重現

1.HttpClient版本

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.2</version>
</dependency>

2.Java代碼

public class HttpClientTest {
    private AtomicInteger counter = new AtomicInteger(0);
    private String url = "http://www.baidu.com/";

    public static void main(String[] args) {
        new HttpClientTest().test();
    }

    // 執行測試
    private void test() {
        int number = 100000; // 總請求數
        int concurrent = 50; // 每次併發請求數
        CountDownLatch countDownLatch = new CountDownLatch(number); // 計數器
        ExecutorService threadPool = Executors.newFixedThreadPool(concurrent); // 線程池
        int concurrentPer = concurrent;
        boolean over = false;
        while(!over) {
            number = number - concurrent;
            if(number <= 0) {
                concurrentPer = number + concurrent;
                over = true;
            }

            // 線程池批量提交
            for(int i = 0; i < concurrentPer; i++) {
                threadPool.execute(new Runnable() {
                    @Override
                    public void run() {
                        try {
                            request(url);
                            Thread.sleep(100);
                        } catch (IOException | InterruptedException e) {
                            e.printStackTrace();
                        } finally {
                            countDownLatch.countDown();
                        }
                    }
                });
            }
        }
        try {
            countDownLatch.await();
            threadPool.shutdown();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    // 訪問指定地址
    private void request(String url) throws IOException {
        HttpGet httpGet = new HttpGet(url);
        commnicate(httpGet);
    }

    // 負責底層通信處理
    private void commnicate(HttpRequestBase request) throws IOException {
        ResponseHandler<String> responseHandler = new ResponseHandler<String>() {
            @Override
            public String handleResponse(HttpResponse response) throws ClientProtocolException, IOException {
                return EntityUtils.toString(response.getEntity());
            }
        };

        HttpClient client = HttpClients.createDefault();
        String body = client.execute(request, responseHandler); // 線程可能會在這裏被阻塞
        System.out.println(String.format("body size: %s, counter: %s", body.length(), counter.incrementAndGet()));
    }
}

運行上述代碼一段時間後很容易可以重現出問題,如下爲運行控制檯信息:

並且線程全部hang住,進程無法正常結束.

查看端口狀態存在大量請求處於建立連接狀態(ESTABLISHED):

# netstat -anpt
tcp        0      0 172.17.7.81:56408           180.118.128.179:56033       ESTABLISHED 3766/java           
tcp        0      0 172.17.7.81:57644           115.202.238.177:27016       ESTABLISHED 3847/java           
tcp        0      0 172.17.7.81:36616           117.57.21.0:25719           ESTABLISHED 3766/java           
tcp        0      0 172.17.7.81:59944           112.245.197.118:57220       ESTABLISHED 3847/java           
tcp        0      0 172.17.7.81:48722           218.5.215.10:40835          ESTABLISHED 4007/java           
tcp        0      0 172.17.7.81:52734           115.194.17.14:45210         ESTABLISHED 4007/java           
tcp        0      0 172.17.7.81:60586           59.32.37.129:16637          ESTABLISHED 3686/java           
tcp        0      0 172.17.7.81:36776           222.89.86.109:21667         ESTABLISHED 3766/java           
tcp        0      0 172.17.7.81:51690           60.161.249.162:59039        ESTABLISHED 3927/java           
tcp        0      0 172.17.7.81:42226           58.218.200.59:80            TIME_WAIT   -                   
tcp        0      0 172.17.7.81:56566           117.70.47.194:40879         ESTABLISHED 3686/java           
tcp        0      0 172.17.7.81:43266           182.120.202.204:45893       ESTABLISHED 3766/java           
tcp        0      0 172.17.7.81:55630           60.169.223.16:21280         ESTABLISHED 3927/java           
tcp        0      0 172.17.7.81:54922           60.168.81.26:25464          ESTABLISHED 3927/java           
tcp        0      0 172.17.7.81:53352           112.252.97.83:53584         ESTABLISHED 3847/java           
tcp        0      0 172.17.7.81:52684           113.121.242.43:14447        ESTABLISHED 3927/java           
tcp        0      0 172.17.7.81:54750           113.121.241.168:45173       ESTABLISHED 3686/java           
tcp        0      0 172.17.7.81:41510           113.105.202.106:47288       ESTABLISHED 4007/java           
tcp        0      0 172.17.7.81:38804           121.232.148.62:57938        ESTABLISHED 3847/java           
tcp        0      0 172.17.7.81:41468           113.105.202.106:47288       ESTABLISHED 3927/java           
tcp        0      0 172.17.7.81:45444           123.163.81.185:22012        ESTABLISHED 3766/java           
tcp        0      0 172.17.7.81:54810           113.121.241.168:45173       ESTABLISHED 4007/java           
tcp        0      0 172.17.7.81:51542           175.153.23.147:20766        ESTABLISHED 3927/java           
tcp        0      0 172.17.7.81:45644           218.5.215.10:40835          ESTABLISHED 4007/java           
tcp        0      0 172.17.7.81:35730           116.53.197.198:30042        ESTABLISHED 3766/java           
tcp        0      0 172.17.7.81:54738           113.121.241.168:45173       ESTABLISHED 3686/java           
tcp        0      0 172.17.7.81:60600           59.32.37.129:16637          ESTABLISHED 3686/java           
tcp        0      0 172.17.7.81:54862           113.121.241.168:45173       ESTABLISHED 4007/java           
tcp        0      0 172.17.7.81:40980           115.225.153.215:17292       ESTABLISHED 3686/java           
tcp        0      0 172.17.7.81:54166           123.149.162.129:18269       ESTABLISHED 3766/java           
tcp        0      0 172.17.7.81:60712           120.35.190.184:33054        ESTABLISHED 3766/java           
tcp        0      0 172.17.7.81:55802           106.42.211.65:59547         ESTABLISHED 3766/java 

同時,分析線程堆棧信息(jstack -F -l pid > thread_stack.log)可以看到如下信息:

"pool-1-thread-45" #55 prio=5 os_prio=0 tid=0x00007f78702df000 nid=0x33d5 runnable [0x00007f7830c1d000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
    at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
    at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:220)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:164)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:139)
    at org.chench.extra.HttpClientTest.commnicate(HttpClientTest.java:106) # 線程在這裏阻塞
    at org.chench.extra.HttpClientTest.request(HttpClientTest.java:93)
    at org.chench.extra.HttpClientTest.access$100(HttpClientTest.java:31)
    at org.chench.extra.HttpClientTest$1.run(HttpClientTest.java:62)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - <0x0000000086d50638> (a java.util.concurrent.ThreadPoolExecutor$Worker)

從堆棧日誌中可以看到,線程處於RUNNABLE狀態,並且阻塞在at org.chench.extra.HttpClientTest.commnicate(HttpClientTest.java:106)處.

解決方案

通過線程堆棧日誌分析可以定位到線程hang住是因爲HttpClient在執行訪問時被阻塞了,結合源代碼找到阻塞原因是因爲未設置請求超時時間.

上述問題本質上是因爲HttpClient組件並未設置請求超時控制導致的:雖然連接超時,但是讀取失敗,導致線程一直被阻塞. 那麼,應該如何設置HttpClient的超時時間呢?鑑於HttpClient的官方文檔沒有明確說明,並且不同版本的HttpClient組件設置超時控制的方式不一致,所以建議直接查看源碼. HttpClient執行訪問請求時序圖如下:

順藤摸瓜,在MainClientExec.javaexecute()方法中看到有2處使用了timeout參數,其含義各不相同: (1)在獲取HttpClientConnection對象時需要讀取配置參數中的ConnectionRequestTimeout值,該參數用於控制獲取連接的超時時間.

(2)獲取到HttpClientConnection對象之後讀取配置參數中的SocketTimeout值,設置Socket超時時間.

顯然,這2個timeout參數都需要從RequestConfig對象中獲取. 既然找到了使用timeout參數的地方,下一步需要確定該參數是如何設置的.沿着HttpClient的請求時序圖路徑往回查找,在InternalHttpClient.java類的doExecute()方法中可以很清晰地看到設置了RequestConfig對象參數.

@Override
    protected CloseableHttpResponse doExecute(
            final HttpHost target,
            final HttpRequest request,
            final HttpContext context) throws IOException, ClientProtocolException {
        Args.notNull(request, "HTTP request");
        HttpExecutionAware execAware = null;
        if (request instanceof HttpExecutionAware) {
            execAware = (HttpExecutionAware) request;
        }
        try {
            final HttpRequestWrapper wrapper = HttpRequestWrapper.wrap(request, target);
            final HttpClientContext localcontext = HttpClientContext.adapt(
                    context != null ? context : new BasicHttpContext());
            // 通過RequestConfig對象配置連接參數
            RequestConfig config = null; 
            if (request instanceof Configurable) {
                // 如果在HttpRequest對象中設置了RequestConfig屬性,直接使用
                config = ((Configurable) request).getConfig(); 
            }
            if (config == null) {
                // 如果在HttpRequest對象中未設置RequestConfig對象屬性,則獲取HttpParams屬性構造RequestConfig對象
                final HttpParams params = request.getParams(); 
                if (params instanceof HttpParamsNames) {
                    if (!((HttpParamsNames) params).getNames().isEmpty()) {
                        config = HttpClientParamConfig.getRequestConfig(params);
                    }
                } else {
                    config = HttpClientParamConfig.getRequestConfig(params);
                }
            }
            if (config != null) {
                // 使用RequestConfig對象配置連接參數
                localcontext.setRequestConfig(config); 
            }
            setupContext(localcontext);
            final HttpRoute route = determineRoute(target, wrapper, localcontext);
            return this.execChain.execute(route, wrapper, localcontext, execAware);
        } catch (final HttpException httpException) {
            throw new ClientProtocolException(httpException);
        }
    }

(3)HttpClient默認使用的連接池爲PoolingHttpClientConnectionManager,在建立連接時(connect()方法)會使用其中的SocketConfig配置參數對Socket進行配置,如下所示:

  • PoolingHttpClientConnectionManager.java
@Override
public void connect(
        final HttpClientConnection managedConn,
        final HttpRoute route,
        final int connectTimeout,
        final HttpContext context) throws IOException {
    Args.notNull(managedConn, "Managed Connection");
    Args.notNull(route, "HTTP route");
    final ManagedHttpClientConnection conn;
    synchronized (managedConn) {
        final CPoolEntry entry = CPoolProxy.getPoolEntry(managedConn);
        conn = entry.getConnection();
    }
    final HttpHost host;
    if (route.getProxyHost() != null) {
        host = route.getProxyHost();
    } else {
        host = route.getTargetHost();
    }
    final InetSocketAddress localAddress = route.getLocalSocketAddress();
    SocketConfig socketConfig = this.configData.getSocketConfig(host);
    if (socketConfig == null) {
        // 使用配置參數SocketConfig
        socketConfig = this.configData.getDefaultSocketConfig();
    }
    if (socketConfig == null) {
        socketConfig = SocketConfig.DEFAULT;
    }
    this.connectionOperator.connect(
            conn, host, localAddress, connectTimeout, socketConfig, context);
}
  • DefaultHttpClientConnectionOperator.java
@Override
    public void connect(
            final ManagedHttpClientConnection conn,
            final HttpHost host,
            final InetSocketAddress localAddress,
            final int connectTimeout,
            final SocketConfig socketConfig,
            final HttpContext context) throws IOException {
        final Lookup<ConnectionSocketFactory> registry = getSocketFactoryRegistry(context);
        final ConnectionSocketFactory sf = registry.lookup(host.getSchemeName());
        if (sf == null) {
            throw new UnsupportedSchemeException(host.getSchemeName() +
                    " protocol is not supported");
        }
        final InetAddress[] addresses = host.getAddress() != null ?
                new InetAddress[] { host.getAddress() } : this.dnsResolver.resolve(host.getHostName());
        final int port = this.schemePortResolver.resolve(host);
        for (int i = 0; i < addresses.length; i++) {
            final InetAddress address = addresses[i];
            final boolean last = i == addresses.length - 1;

            Socket sock = sf.createSocket(context);
            // 使用socketConfig參數中的超時時間對Socket進行配置
            sock.setSoTimeout(socketConfig.getSoTimeout());
            sock.setReuseAddress(socketConfig.isSoReuseAddress());
            sock.setTcpNoDelay(socketConfig.isTcpNoDelay());
            sock.setKeepAlive(socketConfig.isSoKeepAlive());
            if (socketConfig.getRcvBufSize() > 0) {
                sock.setReceiveBufferSize(socketConfig.getRcvBufSize());
            }
            if (socketConfig.getSndBufSize() > 0) {
                sock.setSendBufferSize(socketConfig.getSndBufSize());
            }

            final int linger = socketConfig.getSoLinger();
            if (linger >= 0) {
                sock.setSoLinger(true, linger);
            }
            conn.bind(sock);

            // ...
        }
    }

經過源碼解讀可以很明確地知道,在HttpClient 4.5.2版本中,設置連接參數有3種方式: (1)在HttpRequest對象中設置RequestConfig對象屬性 (2)在HttpRequest對象中設置HttpParams對象屬性. (3)在連接池對象中設置SocketConfig對象屬性

既然找到了根源,下面分別通過這3種方式設置超時參數進行驗證. 方式1: 通過RequestConfig對象設置超時參數

int timeOut = 5000;
RequestConfig requestConfig = RequestConfig.custom()
        .setConnectionRequestTimeout(timeOut) // 獲取連接超時時間
        .setConnectTimeout(timeOut) // 設置HTTP連接超時時間
        .setSocketTimeout(timeOut) // 設置Socket超時時間
        .build();
request.setConfig(requestConfig);

方式2: 通過HttpParams對象設置超時參數

int timeOut = 5000;
HttpParams  params = new BasicHttpParams();
params.setParameter(CoreConnectionPNames.SO_TIMEOUT, timeOut);  // 設置Socket超時時間
params.setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, timeOut); // 設置HTTP連接超時時間
request.setParams(params);

方式3: 通過連接池對象設置超時參數

int timeOut = 5000;
PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager();
// 對連接池設置SocketConfig對象
connManager.setDefaultSocketConfig(SocketConfig.custom().setSoTimeout(timeOut).build());
client = HttpClients.custom().setConnectionManager(connManager).build();

通過上述3種方式分別設置超時參數驗證,雖然在運行過程中會有報錯,但是不會導致線程被阻塞,進程能正常運行結束:

解決問題之後客戶端CPU使用率恢復正常:

總結/教訓/反思

  1. 之所以會遇到這樣的問題,還是因爲對HttpClient組件不熟悉導致的;另外,在發現問題之後如何快速定位問題,並搜索穩定的解決方案很重要。
  2. HttpClient組件每個版本的API變化都比較大,在使用時一定要徹底清楚當前使用的版本是如何設置超時時間的。而如何確定知道超時時間控制,通過源代碼查看最爲妥當.
  3. 在Java平臺使用Http客戶端組件,可以有多個選擇: (1)直接使用JDK提供的URL類訪問 (2)使用HttpClient組件,有坑,不同版本設置參數的方式變動較大,最好是閱讀一下當前使用版本的源碼實現,正確設置好超時時間等參數 (3)如果使用了Spring MVC框架,還可以使用Spring MVC提供的RestTemplate組件,底層是使用Netty實現的客戶端
  4. 遇到的這個坑本身並不屬於技術難點,但是面對這個問題的解決思路值得總結: (1)程序日誌,運行日誌非常關鍵,是定位問題時第一時間需要查看的 (2)代碼review,逐行逐行地審查,首先排除可能存在的代碼邏輯問題,比如:死鎖等 (3)通過jstack命令查看線程堆棧信息: jstack -l -F <pid> > stack.log (4)通過jmap命令查看內存堆棧信息: jmap -dump:live format=b,file=heap.bin <pid> (5)如果結合搜索引擎和上述排查步驟依然未能解決問題,應該第一時間考慮直接閱讀組件的源代碼實現,特別是使用了開源組件時這可能纔是真正解決問題的最佳路徑

【參考】 https://blog.csdn.net/u011191463/article/details/78664896 HttpClient超時設置詳解 https://my.oschina.net/jywm/blog/1834702 解決httpclient超時設置不生效的問題 https://www.jianshu.com/p/4b3e172c4f2d HttpClient 4.5.2-(四)連接超時的配置 https://www.jianshu.com/p/6a41c95855e3 HttpClient 4.5.2-(五)連接池的配置 https://www.jianshu.com/p/c852cbcf3d68 HttpClient高併發下性能優化-http連接池 https://blog.csdn.net/u011402596/article/details/44619443 HttpClient 多線程處理 https://field-notes.iteye.com/blog/2383759 多線程消費使用HttpClient引發的坑 http://blog.51cto.com/lihao001/1788490 httpclient4.3 導致線程阻塞 https://study121007.iteye.com/blog/2304274 HttpClient4.5.2 連接管理 https://www.jianshu.com/p/c852cbcf3d68 HttpClient高併發下性能優化-http連接池 https://alafqq.iteye.com/blog/2325041 httpclient 多線程執行(網上版本太多了。。。誤人子弟) https://gaozzsoft.iteye.com/blog/2352241 HttpClient 4.5.2版本設置連接超時時間-CloseableHttpClient設置Timeout https://www.cnblogs.com/softidea/p/6964347.html HttpClient 專題 https://blog.csdn.net/Fhaohaizi/article/details/78217903 httpclient4.5如何確保資源釋放 https://blog.csdn.net/u010634066/article/details/83120122 一場HttpClient調用未關閉流引發的問題 https://www.cnblogs.com/mumuxinfei/p/5066633.html Apache HttpClient使用之阻塞陷阱 https://issues.apache.org/jira/browse/HTTPCLIENT-1584 CloseableHttpClient - SSL Handshake has no Socket Timeout https://monkeyissexy.github.io/2016/11/11/httpclient_ssl_handshake_socketTimeout_bug/ httpclient ssl handshake socketTimeout bug 分析解決過程 http://geekerwang.com/2017/10/22/%E8%AE%B0%E5%BD%95%E4%B8%80%E6%AC%A1%E5%9D%91%EF%BC%9AsocketRead-hang/ 記錄一次坑:socketRead hang https://www.cnblogs.com/jessezeng/p/7448636.html 解決: httpclient ssl 驗證導致死鎖問題 http://itbang.me/solu/detail/201 JAVA線程卡死問題如何定位?

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章