jsoup爬蟲網頁數據出現異常

jsoup抓取網頁信息出現如下問題:

Jsoup獲取部分頁面數據失敗 org.jsoup.UnsupportedMimeTypeException: Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml.

分析下來,應該是請求類型不符合相關要求,錯誤具體如下:

org.jsoup.UnsupportedMimeTypeException: Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml. Mimetype=application/javascript, URL=....
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:472)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:424)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:178)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:167)
    at calendarSpider.SpiderTest.testOuGuanMatch(SpiderTest.java:174)
    at calendarSpider.SpiderTest.main(SpiderTest.java:39)
度娘相關信息,添加ignoreContentType(true)即可解決問題

原來代碼如下:Connection conn = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.15)");

修改後如下:Connection conn = Jsoup.connect(url).ignoreContentType(true).userAgent("Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.15)");  重新部署調試,ok

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章