一次接口報錯java.io.EOFException: Premature EOF的排查經過[問題已解決]

問題描述

客戶反饋生產的一個接口報錯:java.io.EOFException: Premature EOF

問題排查

接口的架構是:nginx做負載,tomcat做應用
一般接口中出現Premature EOF是返回數據不完整的表現。
先確認其他客戶有沒有問題,反饋是其他客戶請求正常,唯獨對這個客戶的這一個特定參數的請求,接口響應失敗。
再通過postman模擬這個特定請求,發現返回的response body不完整,本應該是返回一個完整json,但是模擬返回的都是不完整的json串,而且每次返回的長度不一樣。

進一步通過fiddler抓eclipse的包,同樣是json不完整,無法解析response:
在這裏插入圖片描述
排查tomcat的日誌,顯示java.net.SocketException: Connection reset,說明tomcat還在傳送數據的時候網絡連接中斷了:

SEVERE: An I/O error has occurred while writing a response message entity to the container output stream.
org.glassfish.jersey.server.internal.process.MappableException: org.apache.catalina.connector.ClientAbortException: java.net.SocketException: Connection reset
        at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:92)
        at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:162)
        at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1130)
        at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:711)
        at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:444)

找網絡工程師確認tcp包的握手情況,反饋是握手正確,沒有網絡異常。
那麼可能是tomcat和上層的nginx交互失敗。
找nginx日誌,發現很多的error:
下面access日誌是相同參數的多個客戶請求,驗證了現象:客戶端返回結果不完整,而且每次都有差別

172.30.3.253 - - [09/Sep/2019:23:14:20 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 153489 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:14:25 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 144581 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:14:31 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 161033 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:14:36 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 156409 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:14:42 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 151098 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:14:47 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 203697 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:14:53 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 153981 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:14:58 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 216269 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:04 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 134753 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:09 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 152029 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:15 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 154949 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:21 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 153165 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:26 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 138285 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:32 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 154625 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:37 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 117518 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:42 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 153489 "-" "PostmanRuntime/7.16.3" "-"
172.30.3.253 - - [09/Sep/2019:23:15:48 -0400] "POST /NifaServer1/context/task/json/upload HTTP/1.1" 200 133293 "-" "PostmanRuntime/7.16.3" "-"

下面的error日誌顯示permission denied:

open() "/apps/icfcc/nginx/proxy_temp/6/80/0000000806" failed (13: Permission denied) while reading upstream

可能是nginx對相關目錄沒有權限,繼續查看權限,發現/apps/icfcc/nginx/proxy_temp下面的文件屬主是500用戶,同時nginx的其他文件夾也有用這個用戶創建的。然而我們的nginx是用nginx用戶啓動的進程,這大概就是原因了。
在這裏插入圖片描述
在這裏插入圖片描述
/apps/icfcc/nginx/proxy_temp是nginx的臨時目錄,開始部署的時候是nginx爲屬主,爲什麼會出現屬主改變呢?參考nginx的proxy_temp目錄權限爲nobody nginx -t操作這篇博文說到的:

後來和一個大牛聊天聊到這個事情,我們的nginx服務啓動用戶是nginx,當時我執行nginx -t 操作時用的是root用戶,如果執行nginx -t的用戶不是nginx目錄的所有者,就會強行改變下面臨時目錄的權限

解決方法

修改相關目錄的屬主爲nginx用戶:

chown -R nginx:nginx nginx/

然後重新啓動nginx
問題解決,接口正常了

至於爲什麼只有特定請求的訪問纔會報錯?
我分析應該是超出特定長度的response都會報錯。這個客戶的這個特定請求,返回的response比較長,超過了proxy_temp_file_write_size(nginx的參數)的設定值,那麼nginx會把一些臨時內容寫入proxy_temp目錄,如果這個目錄沒有權限就會報錯。nginx強行斷開跟tomcat的連接,同時nginx把已經接收的數據返回給客戶端那邊。

如有錯誤歡迎指正。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章