Servlet 輸出中文亂碼的新收穫

原創

chenyg2000

2020-02-21 10:50

http://nanhaochen.blog.51cto.com/228629/47081

又碰到servlet 輸出中文亂碼的問題，惱火。研究了一下，有了新的發現和認識。

原始代碼：

java 代碼

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
PrintWriter pw = response.getWriter();
response.setCharacterEncoding( "utf-8" );
response.setContentType( "text/html; charset=utf-8" );
pw.print( "中文" );
}

無論把3、4兩句改成gbk還是utf-8，頁面訪問到的一律是??

一怒之下用wpe抓包，發現無論設爲utf-8還是gbk抓到的均爲

HTTP 代碼

HTTP/ 1.1 200 OK
Server: Apache-Coyote/ 1.1
Content-Type: text/html;charset=ISO- 8859 - 1
Content-Length: 2
Date: Thu, 08 Mar 2007 06 : 04 : 55 GMT
??

說明3、4兩句沒起作用，檢查代碼，嘗試把2和三四順序調整，亂碼問題解決。

檢查api文檔，發現說明如下

PrintWriter getWriter() throws IOException

Returns a PrintWriter object that can send character text to the client. The PrintWriter uses the character encoding returned by getCharacterEncoding(). If the response's character encoding has not been specified as described in getCharacterEncoding (i.e., the method just returns the default value ISO- 8859 - 1 ), getWriter updates it to ISO- 8859 - 1 .

推斷getWriter()返回的PrintWriter使用的charactor encoding是在這個函數返回時即已確定的，但到底是返回的PrintWriter內部屬性還是運行時的控制，未找到依據。

查看 tomcat中setCharacterEncoding方法的實現時發現如下代碼：

java 代碼

public void setCharacterEncoding(String charset) {
if (isCommitted())
return ;
// Ignore any call from an included servlet
if (included)
return ;
// Ignore any call made after the getWriter has been invoked
// The default should be used
if (usingWriter)
return ;
coyoteResponse.setCharacterEncoding(charset);
isCharacterEncodingSet = true ;
}

其中usingWriter 標誌爲getPrinteWriter方法中設定，可見其控制邏輯爲一旦返回了PrintWriter，本函數即不再生效。但是上述的推斷沒有進一步的證據。

同時我們發現只有usingWriter標誌，卻沒有usingOutputStream標記。猜測使用

ServletOutputStream

輸出不受此限制，經測試寫出如下代碼。

java 代碼

ServletOutputStream out = response.getOutputStream();
out.print( "中文" );
//情況1：正常，瀏覽器按utf-8方式查看
//response.setContentType("text/html; charset=utf-8");
//情況2：瀏覽器缺省按簡體中文查看，手動設爲utf-8方式查看正常
//response.setCharacterEncoding("utf-8");

說明：這種方式不僅不需要在調用getOutputStream()之前設定字符集，甚至在print輸出後設定都有效。

（居然有字數限制，並且提示都沒有，內容就丟了，鬱悶。只好分兩篇了，待續）

查看setCharacterEncoding API文檔，進一步發現：

Calling

setContentType(java.lang.String)

with the

String

text/html

and calling this method with the

String

UTF-8

is equivalent with calling

setContentType

with the

String

text/html; charset=UTF-8

原來只需要用response.setContentType("text/html; charset=utf-8"); 設定就ok，不需要兩次調用。進一步

This method can be called repeatedly to change the character encoding. ......If the character encoding has already been set by

setContentType(java.lang.String)

setLocale(java.util.Locale)

, this method overrides it.

可反覆設置，相互覆蓋，據此寫出如下測試代碼

java 代碼

//情況1：正常，瀏覽器按utf-8方式查看
response.setContentType( "text/html; charset=gbk" );
response.setCharacterEncoding( "utf-8" );
//情況2：正常，瀏覽器按簡體中文方式查看
//response.setContentType("text/html; charset=utf-8");
//response.setCharacterEncoding("gbk");
PrintWriter pw = response.getWriter();
pw.print( "中文" );

結論：

1.在servlet中輸出中文，如果採用PrintWriter方式，需要在調用getPrintWriter()之前調用

setContentType

或者

setCharacterEncoding

；採用ServletOutputStream方式，不受此限。

setContentType

和

setCharacterEncoding

兩方法中設定characterEncoding的方法對服務器效果一致，不需要反覆調用。在輸出文本內容時，採用response.setContentType("text/html; charset=utf-8");似乎更爲方便。

chenyg2000

發佈了31 篇原創文章 · 獲贊 0 · 訪問量 4萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Servlet 輸出中文亂碼的新收穫

工作中用到的腳本合集

微服務實踐Aspire項目發佈到遠程k8s集羣

通過f-string編寫簡潔高效的Python格式化輸出代碼

[轉帖]20個常用的Linux工具命令

[轉帖]PostgreSQL從小白到高手教程 - 第46講：poc-tpch測試

24-5-18 X

http://www.scriptlover.com/controls/

ltpa token

轉--配置 Lotus Domino 使用第三方 CA - Microsoft Windows CA

DIV+CSS佈局中自適應高度的解決方法

轉載-SMTPTimeoutMultiplier should only be used with great care

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結