jsoup抓取網頁信息出現如下問題:
Jsoup獲取部分頁面數據失敗 org.jsoup.UnsupportedMimeTypeException: Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml.
分析下來,應該是請求類型不符合相關要求,錯誤具體如下:org.jsoup.UnsupportedMimeTypeException: Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml. Mimetype=application/javascript, URL=.... at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:472) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:424) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:178) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:167) at calendarSpider.SpiderTest.testOuGuanMatch(SpiderTest.java:174) at calendarSpider.SpiderTest.main(SpiderTest.java:39)度娘相關信息,添加ignoreContentType(true)即可解決問題
原來代碼如下:Connection conn = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.15)");
修改後如下:Connection conn = Jsoup.connect(url).ignoreContentType(true).userAgent("Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.15)"); 重新部署調試,ok