武漢疫情繫列(3)|java爬取騰訊【新型冠狀病毒肺炎實時闢謠】較真查證平臺數據

目錄

 

前言

一、要爬取的內容

二、抓包數據

首頁文章列表

1、分析

2、代碼demo

搜索文章列表

1、分析

2、代碼demo:

文章內容展示

1、分析

2、展示方案

三、工具類

四、完整代碼


前言

3日之後,這幾天大家應該都在家遠程辦公了吧,我這幾天也是,因爲白天就是在家遠程辦公上班了,所以沒有及時更新《武漢疫情繫列》。到了晚上,因爲白天碼了代碼,晚上娛樂消遣之後纔會繼續更新,不然看了一天代碼,也會有些疲憊

我們都知道,武漢的新型冠狀病毒事件幾乎席捲了咱們整個中國,甚至還傳到了海外,可謂是全民都在爲此事件做自己力所能及的事情,其中就包括咱們程序員,就有大佬們自發的組織起了爲武漢在github上做了一個開源項目武漢防疫信息收集平臺,具體細節內容這裏就不贅述了,詳情可以戳鏈接看我之前的一篇文章:《衆志成城抗肺炎,程序猿也發揮大作用》

這幾天我也爲此做點小小的貢獻吧,作爲一名程序員,這裏就寫一點爬取的教程吧,希望能夠幫助到大家,以及其他程序員,可以快速的開發和集成其他的功能,比如將所有的信息整合,做成另外的一個信息展示平臺 網站或者小程序等等。期待你們的作品,如果有我需要的小小的一份幫助的話,可以在文章底部留言或者私信我,我一定會盡快回復貢獻自己的一份綿薄之力。
 


一、要爬取的內容

騰訊有一個【新型冠狀病毒肺炎實時闢謠】的一個平臺,地址:https://vp.fact.qq.com/home

如果大家整合這些信息平臺,比如封裝成小程序的話,我覺得這些闢謠信息也是必不可少的,所以有必要爬一爬展示出來,大家記得展示的時候,切記要註明來源騰訊較真平臺xxxx之類的,避免騰訊版權糾紛,畢竟咱們是好公民講究版權嘛

 


二、抓包數據

抓包可以利用抓包工具或者直接瀏覽器F12看請求數據 ,這裏我就直接省略跳過去了,想了解的可以自己搜索教程,我用的抓包工具是fiddler。

首頁文章列表

請求地址:https://vp.fact.qq.com/home

 

1、分析

當我請求首頁的時候,可以看到只有這些請求,所以我們獲取數據的時候就稍微有點麻煩了,得用正則去解析html代碼來獲取我們想要的數據了,關於網站的架構模式這篇文章不是重點,我也就不繼續分析和猜測了。

我們首頁要獲取的數據就是如下圖這一部分列表的數據

 查看源代碼,可以看到上圖對應的數據在這裏

在手機上如果查看這個頁面的話,還可以觸底無限拉出更多的歷史數據,看來在瀏覽器是無法實現這個操作了,先不用手機模擬器,咱們用電腦端微信試試,發現是可行的,電腦微信端觸底可以無限拉出更多數據。再次抓包看看無限拉的時候是怎麼樣請求的

 

大家可以看到我圈起來的部分,有一個/loadmore的請求,發現了新大陸,而且我們還可以清楚的看到幾個請求的參數

artnum、page、_、callback

artnum:我暫時不知道什麼意思,

page:應該很簡單,就是頁碼,page=1,page=2,page=3說明剛纔我觸底上拉的數據拉了三頁,那我們猜測一下,page=0的時候,會不會就是剛纔我前面講的要解析html的內容呢?如果page=0可以查出沒有下拉的首頁數據,那我們就不用用正則解析html了,那就更好了,當然這裏只是猜測,等一下試一下。

(後面我測試的時候,發現猜測正確,page=0的時候,確實出來的數據就是第一次進來這個頁面的數據,所以我們不用正則解析html 了,太棒了!)

_:這個猜測應該是一個時間戳

callback:這是一個跨域回調的意思,這裏就不繼續深入研究了,咱們就原模原樣的拼接上去就是了

 

所以新的請求路徑爲:https://vp.fact.qq.com/loadmore?artnum=0&page=0&_=1580922845373&callback=jsonp0


2、代碼demo

/**
     * 騰訊【新型冠狀病毒肺炎實時闢謠】較真查證平臺 文章列表數據
     * @param pageNo 頁碼
     * @return
     */
    public static String getTxFactIndexData(String pageNo){
        //String url="https://vp.fact.qq.com/home";
        //String url="https://vp.fact.qq.com/loadmore?artnum=0&page=0&_=1580922845373&callback=jsonp0";
        String url="https://vp.fact.qq.com/loadmore";
        if(pageNo==null||"".equals(pageNo)){ //如果頁碼沒有傳,默認爲第一頁
            pageNo="0";
        }
        String jsonpNum = "jsonp"+pageNo;
        Map paramObj = new HashMap();
        paramObj.put("artnum","0");
        paramObj.put("page",pageNo);
        paramObj.put("_",System.currentTimeMillis());
        paramObj.put("callback",jsonpNum);

        //模擬請求
        HttpPojo httpPojo = new HttpPojo();
        httpPojo.setHttpHost("vp.fact.qq.com");
        httpPojo.setHttpAccept("text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
        httpPojo.setHttpConnection("keep-alive");
        httpPojo.setHttpUserAgent("Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Mobile Safari/537.36");
        httpPojo.setHttpReferer("https://vp.fact.qq.com/home");
        httpPojo.setHttpOrigin("https://vp.fact.qq.com/home");

        String htmlResult = httpSendGet(url, paramObj, httpPojo); //整個html頁面
        //System.out.println(htmlResult);
        htmlResult = getRegContent(jsonpNum+"?\\((.*?)\\)$", htmlResult, 1);
        System.out.println(htmlResult);

        //遍歷入庫或者存redis等操作
        /*JSONObject dataJo = JSONObject.parseObject(htmlResult);
        String content = dataJo.getString("content");//拿到所有數據
        JSONArray array = JSONArray.parseArray(content);
        for (int i = 0; i < array.size(); i++) {
            JSONObject tripJo = JSONObject.parseObject(array.getString(i));
            String title = tripJo.getString("title");
            System.out.println("title:"+title);

            //入庫操作
        }*/

        return htmlResult;
    }

運行效果:可以自行復制打印出來的結果在 在線json格式化網站解析一下看看:http://www.bejson.com/count.html

{"code":0,"content":[{"title":"紙尿褲材料被徵用去做口罩,很多紙尿褲廠家已停產","author":"較真查證員","authordesc":"較真查證員","id":"8f8296a9feda4fddd02930b1aa08c538","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"醫用口罩和紙尿褲雖然都用到無紡布,但是具體的材質並不一樣...","tag":["口罩","紙尿褲"],"type":1,"videourl":"","cover":"//jiaozhen-70111.picnjc.qpic.cn/v99dWFPnih38WmpN72RGyw?imageView2/2/w/150/h/90","coverrect":"//jiaozhen-70111.picnjc.qpic.cn/v99dWFPnih38WmpN72RGyw","coversqual":"//jiaozhen-70111.picnjc.qpic.cn/v99dWFPnih38WmpN72RGyw?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"紅糖、生薑、大蔥白和大蒜熬水喝,不會感染新冠病毒","author":"美國普渡大學農業與生物系食品工程專業博士","authordesc":"美國普渡大學農業與生物系食品工程專業博士","id":"4216858ab0ff60218452bd9430e51d86","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"可以把這些東西煮水喝當做一種“代茶飲”,如果再加上排骨或雞塊...","tag":["紅糖","生薑"],"type":1,"videourl":"","cover":"https://p.qpic.cn/jiaozhen/0/157b3c9c7a37dc5fae9fa3c53fe5317c/0?imageView2/2/w/150/h/90","coverrect":"https://p.qpic.cn/jiaozhen/0/157b3c9c7a37dc5fae9fa3c53fe5317c/0","coversqual":"https://p.qpic.cn/jiaozhen/0/157b3c9c7a37dc5fae9fa3c53fe5317c/0?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"防溢乳墊貼在非醫用一次性口罩上能過濾病毒","author":"騰訊醫典","authordesc":"騰訊旗下專業醫學科普平臺","id":"ff20cea8f85f7aadaea1b1eabdac7c4e","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"防溢乳墊貼的主要材質包括滌綸、拉絨棉、全面、無紡布、木漿棉等...","tag":["防溢乳墊貼","口罩"],"type":1,"videourl":"","cover":"https://p.qpic.cn/jiaozhen/0/19c29c181ef653df8d4d3c09acea5b99/0?imageView2/2/w/150/h/90","coverrect":"https://p.qpic.cn/jiaozhen/0/19c29c181ef653df8d4d3c09acea5b99/0","coversqual":"https://p.qpic.cn/jiaozhen/0/19c29c181ef653df8d4d3c09acea5b99/0?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"預防肺炎的疫苗能防止新型冠狀病毒感染","author":"聯合國系統內衛生問題的指導和協調機構","authordesc":"聯合國系統內衛生問題的指導和協調機構","id":"b4983676dce077f2ad05cec14f5ce986","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"肺炎球菌疫苗和乙型流感嗜血桿菌疫苗等肺炎疫苗不能預防新型冠狀病毒肺...","tag":["新型冠狀病毒","肺炎"],"type":1,"videourl":"","cover":"//jiaozhen-70111.picnjc.qpic.cn/4H8VTJkYMuVbN5UfqiaGkT?imageView2/2/w/150/h/90","coverrect":"//jiaozhen-70111.picnjc.qpic.cn/4H8VTJkYMuVbN5UfqiaGkT","coversqual":"//jiaozhen-70111.picnjc.qpic.cn/4H8VTJkYMuVbN5UfqiaGkT?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"美國醫生援助中國前集體禱告","author":"國際謠言查證機構","authordesc":"國際謠言查證機構","id":"9e7d1b13fa9032d57c7270036c01cbee","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"經查,這是2018年的老視頻...","tag":["新型冠狀病毒","美國"],"type":1,"videourl":"","cover":"//jiaozhen-70111.picnjc.qpic.cn/qVhuWax5vVkzmNCQJToYjr?imageView2/2/w/150/h/90","coverrect":"//jiaozhen-70111.picnjc.qpic.cn/qVhuWax5vVkzmNCQJToYjr","coversqual":"//jiaozhen-70111.picnjc.qpic.cn/qVhuWax5vVkzmNCQJToYjr?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"美國選擇貨機撤僑,是給中國送防疫物資","author":"國際謠言查證機構","authordesc":"國際謠言查證機構","id":"7835c7a3b0e74d22235ad3734c4bce33","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"經查,這並非事實。美國從武漢撤僑時,並未給中國運送過防疫物資...","tag":["美國","撤僑"],"type":1,"videourl":"","cover":"//jiaozhen-70111.picnjc.qpic.cn/4a9V2XdtqMVLdKEkagXw3k?imageView2/2/w/150/h/90","coverrect":"//jiaozhen-70111.picnjc.qpic.cn/4a9V2XdtqMVLdKEkagXw3k","coversqual":"//jiaozhen-70111.picnjc.qpic.cn/4a9V2XdtqMVLdKEkagXw3k?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"適量飲酒可以抵抗新型冠狀病毒","author":"美國普渡大學農業與生物系食品工程專業博士","authordesc":"美國普渡大學農業與生物系食品工程專業博士","id":"cb9d7a0c4a284758b2122698bf7836c1","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"把“適量飲酒”寫入防疫建議,是沒有任何依據的想當然...","tag":["飲酒","新型冠狀病毒"],"type":1,"videourl":"","cover":"https://p.qpic.cn/jiaozhen/0/1fb11efe3946fc0ad0e69392779668b1/0?imageView2/2/w/150/h/90","coverrect":"https://p.qpic.cn/jiaozhen/0/1fb11efe3946fc0ad0e69392779668b1/0","coversqual":"https://p.qpic.cn/jiaozhen/0/1fb11efe3946fc0ad0e69392779668b1/0?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"新風系統會傳播病毒,導致感染新型冠狀病毒肺炎","author":"卓正醫療皮膚科醫生","authordesc":"卓正醫療皮膚科醫生","id":"3f7766a2e99254d43e96f43479d7892b","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"在室內當感染者打噴嚏、說話或咳嗽時...","tag":["新型冠狀病毒","新風系統"],"type":1,"videourl":"","cover":"//jiaozhen-70111.picnjc.qpic.cn/3LesBmNMRfSWNGojnJ9QwR?imageView2/2/w/150/h/90","coverrect":"//jiaozhen-70111.picnjc.qpic.cn/3LesBmNMRfSWNGojnJ9QwR","coversqual":"//jiaozhen-70111.picnjc.qpic.cn/3LesBmNMRfSWNGojnJ9QwR?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"湖北仙桃開來的轎車上有人被測出高溫後逃跑","author":"泉州晚報社出版的綜合性市民生活報","authordesc":"泉州晚報社出版的綜合性市民生活報","id":"14c3f8f60176dd488393ad113baf521e","date":"2020-02-05","markstyle":"fake","result":"假","explain":"謠言","abstract":"黃山等地網警已經對該信息進行了闢謠...","tag":["湖北","新塘"],"type":1,"videourl":"","cover":"//jiaozhen-70111.picnjc.qpic.cn/hemuxFCyzDZigyHcpZeXV2?imageView2/2/w/150/h/90","coverrect":"//jiaozhen-70111.picnjc.qpic.cn/hemuxFCyzDZigyHcpZeXV2","coversqual":"//jiaozhen-70111.picnjc.qpic.cn/hemuxFCyzDZigyHcpZeXV2?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"},{"title":"口罩接水不滲水,說明能抵抗新型冠狀病毒肺炎","author":"騰訊旗下專業醫學科普平臺","authordesc":"騰訊旗下專業醫學科普平臺","id":"f0b756d05e5c67f88cc062c127461abc","date":"2020-02-04","markstyle":"fake","result":"假","explain":"謠言","abstract":"非N95或外科口罩,即便不滲水,也不能抵抗新型肺炎病毒的傳播...","tag":["口罩","新型冠狀病毒肺炎"],"type":1,"videourl":"","cover":"//jiaozhen-70111.picnjc.qpic.cn/mncY8ccuamxW4sCrtE5ZH4?imageView2/2/w/150/h/90","coverrect":"//jiaozhen-70111.picnjc.qpic.cn/mncY8ccuamxW4sCrtE5ZH4","coversqual":"//jiaozhen-70111.picnjc.qpic.cn/mncY8ccuamxW4sCrtE5ZH4?imageView2/2/w/300/h/300","section":"","iscolled":false,"arttype":"normal"}]}

搜索文章列表

剛纔的文章列表是默認顯示的列表數據,還有一個重要的功能是搜索某個關鍵詞,然後根據關鍵詞展示出來的列表數據,如下圖所示

1、分析

可以看到 /searchresult 這個請求,可以找到相應的參數title、num、_、callback

title:就是搜索的關鍵詞

num:數據展示數目,通過對比找到規律,可以得到公式:num=pageNo*20     (注:pageNo從0算起)

_:時間戳

callback:jsonp後面的數字等於pageNo+1  (注:pageNo從0算起)

所以請求地址:https://vp.fact.qq.com/searchresult?title=%E7%97%85%E6%AF%92&num=0&_=1580925707024&callback=jsonp1


2、代碼demo:

/**
     * 騰訊【新型冠狀病毒肺炎實時闢謠】較真查證平臺 根據關鍵詞搜索文章列表
     * @param pageNo
     * @param titleText
     * @return
     */
    public static String getTxFactSearchResultByTitle(String pageNo,String titleText){
        String htmlResult="fail";
        try {
            //String url="https://vp.fact.qq.com/searchresult?title=%E7%97%85%E6%AF%92&num=0&_=1580925707024&callback=jsonp1";
            String url="https://vp.fact.qq.com/searchresult";
            if(pageNo==null||"".equals(pageNo)){ //如果頁碼沒有傳,默認爲第一頁
                pageNo="0";
            }
            String jsonpNum = "jsonp"+(Integer.parseInt(pageNo)+1);
            Map paramObj = new HashMap();
            paramObj.put("title", URLEncoder.encode(titleText, "utf-8"));
            paramObj.put("num",Integer.parseInt(pageNo)*20);
            paramObj.put("_",System.currentTimeMillis());
            paramObj.put("callback",jsonpNum);

            //模擬請求
            HttpPojo httpPojo = new HttpPojo();
            httpPojo.setHttpHost("vp.fact.qq.com");
            httpPojo.setHttpAccept("text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
            httpPojo.setHttpConnection("keep-alive");
            httpPojo.setHttpUserAgent("Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Mobile Safari/537.36");
            httpPojo.setHttpReferer("https://vp.fact.qq.com/home");
            httpPojo.setHttpOrigin("https://vp.fact.qq.com/home");

             htmlResult = httpSendGet(url, paramObj, httpPojo); //整個html頁面
            //System.out.println(htmlResult);
            htmlResult = getRegContent(jsonpNum+"?\\((.*?)\\)$", htmlResult, 1);
            System.out.println(htmlResult);

            //遍歷入庫或者存redis等操作
            /*JSONObject dataJo = JSONObject.parseObject(htmlResult);
            String content = dataJo.getString("content");//拿到所有數據
            JSONArray array = JSONArray.parseArray(content);
            for (int i = 0; i < array.size(); i++) {
                JSONObject tripJo = JSONObject.parseObject(array.getString(i));
                String source = tripJo.getString("_source");
                JSONObject sourceJo = JSONObject.parseObject(source);
                String title = sourceJo.getString("title");
                System.out.println("title:"+title);

                //入庫操作
            }*/


        }catch (Exception e){
            e.printStackTrace();
        }
        return htmlResult;
    }

 運行效果:可以自行復制打印出來的結果在 在線json格式化網站解析一下看看:http://www.bejson.com/count.html

{"code":0,"total":637,"content":[{"_index":"jiaozhen","_type":"article","_id":"aa3546e01b77bc4b58499915a551e971","_score":3.0637636,"_source":{"id":"aa3546e01b77bc4b58499915a551e971","title":"腹瀉是由病毒引起的","result":"疑-分情況","source":"騰訊較真","abstract":"腹瀉分爲感染性腹瀉與非感染性腹瀉兩種類型。\n非感染性腹瀉不是由病毒引起的,往往是消化不良、暴飲暴食、油膩或辛辣食物刺激等食源性因素所致的,但也有可能是受寒、水土不服、精神緊張等外界刺激導致。\n感染性腹瀉其中有一多半是由病毒引起的,也有可能是細菌感染引起的。","cover":"http://p.qpic.cn/jiaozhen/0/7dd2422271f829467d5e94cb8ff6a531/0","updatedAt":"2019-03-22 08:48:30","date":"2018-11-27","author":"騰訊醫典","authordesc":"騰訊旗下專業醫學科普平臺"},"sort":[3.0637636,72423]},{"_index":"jiaozhen","_type":"article","_id":"99c05bceb88fd80418da72d2b2e1ade0","_score":2.9355164,"_source":{"id":"99c05bceb88fd80418da72d2b2e1ade0","title":"新型冠狀病毒是SARS病毒的進化","result":"假-謠言","source":"騰訊醫典","oriurl":"https://h5.baike.qq.com/mobile/article.html?docid=tx20406001znr0fh&adtag=op.co.jiaoz.ydyw","abstract":"不是!\n近期發佈的《武漢新型冠狀病毒的進化來源和傳染人的分子作用通路》,揭露了新型冠狀病毒的“身份”。\n目前已明確,新型冠狀病毒屬於Beta冠狀病毒屬(Betacoronavirus),與SARS病毒(導致2002年“非典”)、MERS冠狀病毒(“中東呼吸綜合徵”),平均分別有~70%和~40%的序列相似性。\n新型冠狀病毒與SARS病毒的共同祖先是和HKU9-1類似的病毒。在進化樹的位置上,它與SARS病毒相鄰,但並不屬於SARS病毒。\n因此,說新型冠狀病毒是SARS進化是不準確的。","cover":"//jiaozhen-70111.picnjc.qpic.cn/f8e2c79bc3bb4682ab1b317575bdcfd0","updatedAt":"2020-02-03 20:48:04","date":"2020-01-26","author":"梁曉麗","authordesc":"北京醫院執業藥師 "},"sort":[2.9355164,81245]},{"_index":"jiaozhen","_type":"article","_id":"8556230fe0cf93288e72973c9e90587f","_score":2.690024,"_source":{"id":"8556230fe0cf93288e72973c9e90587f","title":"把飯做熟病毒就會被殺死","result":"真-確實如此","source":"騰訊醫典","abstract":"目前已經知道,新型冠狀病毒對熱敏感,56℃ 加熱 30 分鐘即可滅活。在中國疾病控制中心發佈的家庭預防指南中,明確的建議是:不要接觸、購買和食用野生動物(即野味),避免前往售賣活體動物(禽類、海產品、野生動物等)的市場,禽肉蛋要充分煮熟後食用。\n將肉類蛋類做熟,是可以滅活新型冠狀病毒的。爲了保證滅活病毒,加熱時間可以儘量保證在30分鐘。\n那麼,蔬菜也一樣嗎?在傳染病流行期間,不建議喫沙拉這類未經煮熟的蔬菜。最好也加熱煮熟。此外,做飯前、喫飯前要注意洗手,喫飯時使用公筷,飯後徹底清洗餐具,這樣也有助於預防。","cover":"//jiaozhen-70111.picnjc.qpic.cn/9e8cabc0de8b5030592091541fa7d17f","updatedAt":"2020-02-03 20:52:50","date":"2020-02-03","author":"較真團隊","authordesc":"騰訊新聞專業事實查證平臺"},"sort":[2.690024,81544]},{"_index":"jiaozhen","_type":"article","_id":"34da5686374365359ba3f1058b64f23b","_score":2.690024,"_source":{"id":"34da5686374365359ba3f1058b64f23b","title":"風油精能抑制病毒感染","result":"假-謠言","source":"騰訊醫典","oriurl":"https://h5.baike.qq.com/mobile/article.html?docid=tx20406001bvmo9n&adtag=op.co.jiaoz.ydyw","abstract":"不能!\n風油精是由天然藥物和芳香植物精油配製而成,根據國家藥典委員會2016年新修訂的風油精質量標準,它的主要成分爲薄荷腦、水楊酸甲酯(冬青油)、樟腦、丁香酚和桉油。\n這些成分中,薄荷腦和桉油可以激活負責感受溫度的神經細胞,使人感覺到涼,而水楊酸甲脂又能刺激皮膚,帶來灼燒感。兩種感覺疊加,能起到止癢的效果。而風油精中添加的芳香植物精油,有驅蚊效果。\n但風油精及其任何一個主要成分,都沒有證據證明能夠抑制細菌或病毒感染。而且,其中的水楊酸甲脂是有一定毒性的,如果口服,是會引起中毒的。對兒童來說,攝入超過4毫升可能致命。\n塗抹風油精不能抑制病毒,口服還可能致命。疫情正熱,科學地保護家人和自己最重要。","cover":"//jiaozhen-70111.picnjc.qpic.cn/775b6baf32184eb9a51938fa8b75a528","updatedAt":"2020-02-03 20:47:54","date":"2020-01-26","author":"騰訊醫典","authordesc":"騰訊旗下專業醫學科普平臺"},"sort":[2.690024,81254]},{"_index":"jiaozhen","_type":"article","_id":"c727ee3a7a738837963258e28384f513","_score":2.690024,"_source":{"id":"c727ee3a7a738837963258e28384f513","title":"“李詠去世”的視頻是病毒,不要打開","result":"假-謠言","source":"騰訊較真","abstract":"安慶網警官方微博@安慶網警巡查執法迅速查證,並進行闢謠:未查到“李詠去世”的視頻連接,該消息爲謠言。\n但手機點開陌生鏈接確實有可能會中毒,請大家對手機上收到的任何不明來源鏈接保持警惕,尤其是陌生人或疑似朋友被盜號後發來的鏈接。","cover":"http://p.qpic.cn/jiaozhen/0/fcd117eb65cc170339b60c51551300b2/0","updatedAt":"2019-03-22 08:48:30","date":"2018-10-30","author":"騰訊醫典","authordesc":"騰訊旗下專業醫學科普平臺"},"sort":[2.690024,72143]},{"_index":"jiaozhen","_type":"article","_id":"4bc5e92a03bd1fe57dc2b772728e5e51","_score":2.690024,"_source":{"id":"4bc5e92a03bd1fe57dc2b772728e5e51","title":"狗會把埃博拉病毒傳染給人","result":"疑-尚無定論","source":"騰訊較真","abstract":"感染埃博拉病毒的病人或動物的血液、尿液、排泄物、嘔吐物、精液以及其他分泌物中均可能帶有病毒,所以埃博拉病毒的感染渠道有很多,並不能直接證明是狗將埃博拉病毒傳染給人的。\n埃博拉流行疫區的犬隻身上可以查到病毒抗體,證明這些動物確實曾被埃博拉病毒感染。不過這些犬隻感染後並無特殊症狀,說明就算犬類感染了病毒,傳染性也不強,所以目前還沒有直接證據證明病毒可以從感染犬隻傳染至人。","cover":"http://p.qpic.cn/jiaozhen/0/ebb7f04f65e89451951b75b755162b0c/0","updatedAt":"2019-03-22 08:48:29","date":"2018-08-19","author":"趙言昌","authordesc":"石河子大學臨牀醫學院"},"sort":[2.690024,70197]},{"_index":"jiaozhen","_type":"article","_id":"35e6847e3a5607fee13aa23fc5364a3c","_score":2.690024,"_source":{"id":"35e6847e3a5607fee13aa23fc5364a3c","title":"流感只能由流感病毒引起","result":"真-確實如此","source":"蝌蚪五線譜","oriurl":"http://news.kedo.gov.cn/c/2018-01-12/908395.shtml","abstract":"流感只能由流感病毒引起。流感病毒分爲甲型、乙型和丙型三種。不管哪一種,它們在進入呼吸道以後,都會進入上皮細胞大肆繁殖,然後藉助神經酸氨酶,從細胞內逃逸出來,如此循環,不斷“攻城掠地”,引起廣泛的上皮細胞死亡。","updatedAt":"2019-03-22 08:48:28","date":"2018-01-12","author":"","authordesc":""},"sort":[2.690024,69236]},{"_index":"jiaozhen","_type":"article","_id":"2cce4a92af5fd2cce83918b996d7091f","_score":2.690024,"_source":{"id":"2cce4a92af5fd2cce83918b996d7091f","title":"點開順豐上市的紅包會中病毒","result":"假-謠言","source":"謠言過濾器","oriurl":"https://mp.weixin.qq.com/s/z4tW5hrAYcssugPKjbh34w","abstract":"類似所謂的“緊急通知”以前也時有出現,其內容和格式基本一致,唯一不同的是,主角在變,有着“《女人必看》、《釣魚島開戰啦》、《工資調整方案》”等各個不同的版本,這些都是借公安網監“緊急通知”名義來傳播的不實信息。首先,公安部門並未發佈此內容的通知,若有需要,往往會以單位名義發佈,而不會以個人名義。其次,對不明的陌生鏈接、可疑紅包請不要點開,以免感染木馬和上當受騙。最後,希望大家不要輕信這種誇大渲染網絡安全風險的行爲,並且養成良好的安全上網習慣。","updatedAt":"2018-09-03 17:50:05","date":"2017-03-09","author":"於暘","authordesc":"騰訊玄武實驗室負責人  資深專家工程師"},"sort":[2.690024,67758]},{"_index":"jiaozhen","_type":"article","_id":"ddda352e82976a253fb4870fe2958e3a","_score":2.690024,"_source":{"id":"ddda352e82976a253fb4870fe2958e3a","title":"打開《特朗普患中風》的圖片會中病毒","result":"假-謠言","source":"謠言過濾器","oriurl":"https://mp.weixin.qq.com/s?__biz=MjM5MjU0NTc4OQ==&mid=2651747810&idx=1&sn=fdb9dbaa6df3e056f1d08192bd5411b2&chksm=bd5e7b708a29f26660c18373017f8fc4a61051a792804fe57391d3ab9b8dc49dd7bc82688847&mpshare=1&scene=1&srcid=0402sqKTKdDGkxES34O7t4AL&key=c7e8e862d11f55a9bcd2b8264219569ba9e6f38ba2a7d45e288a786704a6cac166dd3e285b44fbfa1a60be1632afb2518ec41f02be69216d2c582d97b0073bc038d6e4bdbca8e8e153c0ce14e4c670fb&ascene=1&uin=MTI2NDA1MTg0MQ%3D%3D&devicetype=Windows-QQBrowser&version=6103000b&lang=zh_CN&pass_ticket=RdmY2","abstract":"此爲各類以“緊急通知”名義來傳播不實信息的內容變種之一。謠言內容一般爲假借各種文件名稱名義,告知用戶不要打開,否則會中毒,甚至導致賬戶財產損失。類似謠言有“微信紅包圖片是病毒”“新版人民幣視頻是病毒”“朴槿惠去世視頻是病毒”等等。目的都在於誇大渲染網絡安全風險。","updatedAt":"2018-09-03 17:39:47","date":"2018-04-02","author":"","authordesc":""},"sort":[2.690024,67684]},{"_index":"jiaozhen","_type":"article","_id":"8fea58e72d44ea1d3d7d291b473011a8","_score":2.690024,"_source":{"id":"8fea58e72d44ea1d3d7d291b473011a8","title":"“朴槿惠死了”的視頻是病毒","result":"假-謠言","source":"騰訊較真","oriurl":"http://view.inews.qq.com/a/20180104A0WF0T00","abstract":"“朴槿惠死了的視頻是病毒”,這是一條老舊謠言。它有各種版本,比如“新版人民幣視頻是病毒”“太原、日本、韓國地震視頻是病毒”。\n在手機上打開別人發過來的圖片或視頻,會不會中毒?這要分兩種情況。一種是“有人給你直接發了一個視頻文件或圖片文件,你看了內容;另一種是“有人給你發了一個視頻或圖片的鏈接,你點擊鏈接,查看了內容”。理論上,只要存在合適的漏洞,這兩種情況下都有可能實現入侵。但第一種漏洞比較罕見,對目前主流的手機和應用來說,並不存在公開的這類漏洞。至於第二種,騰訊玄武實驗室最新的研究表明,部分安全性上欠缺考慮的 APP 確實比較容易在這種情況下被攻擊,其中就包括不少大家很常用的 APP 。","cover":"http://p.qpic.cn/jiaozhen/0/49a9cd89e8794c4788813b5adddfead8/0","updatedAt":"2019-03-22 08:48:27","date":"2018-01-04","author":"夏安","authordesc":"傳染病學博士"},"sort":[2.690024,67155]},{"_index":"jiaozhen","_type":"article","_id":"2e2318d578fafba4b9df1ed5a6838a95","_score":2.690024,"_source":{"id":"2e2318d578fafba4b9df1ed5a6838a95","title":"微信羣中秋祝福信息是病毒","result":"假-謠言","source":"上海闢謠平臺","oriurl":"http://www.shobserver.com/news/detail?id=30850","abstract":"從事軟件開發的極盟網絡CEO鄭志偉告訴記者:不要緊張,此類鏈接只是微信官方跳轉接口。\n雖然沒必要一看到鏈接就與釣魚詐騙聯繫起來,但是也需提示風險:原鏈接被篡改後,有可能會被不法分子利用。","cover":"http://p.qpic.cn/jiaozhen/0/1505286698109/0","updatedAt":"2019-03-22 08:48:26","date":"2016-09-16","author":"宋慧","authordesc":"上海闢謠平臺記者"},"sort":[2.690024,65898]},{"_index":"jiaozhen","_type":"article","_id":"e9868f3db14806442916ac4f23555633","_score":2.690024,"_source":{"id":"e9868f3db14806442916ac4f23555633","title":"“4G流量免費送”鏈接是病毒","result":"假-謠言","source":"上海闢謠平臺","oriurl":"http://www.shobserver.com/news/detail?id=42523","abstract":"這條“緊急通知”是假冒權威部門的名義發佈,內容也大量照搬其他謠言。\n不過網上一些所謂“流量免費送”的鏈接,確實可能存在風險,不要隨便亂點。","cover":"http://p.qpic.cn/jiaozhen/0/1505285877247/0","updatedAt":"2019-03-22 08:48:25","date":"2017-01-17","author":"李曉寧","authordesc":"卓正醫療皮膚科醫生"},"sort":[2.690024,65692]},{"_index":"jiaozhen","_type":"article","_id":"4d48863635b80727b40498903a94c541","_score":2.690024,"_source":{"id":"4d48863635b80727b40498903a94c541","title":"上海68名男女死於CTC5病毒","result":"假-謠言","source":"上海闢謠平臺","oriurl":"http://www.shobserver.com/news/detail?id=54027","abstract":"2016年3月前後,“一名女子感染SB250病毒死亡”的謠言就開始在網上流傳。炮製用“SB”、“250”作爲病毒名稱,就有惡作劇甚至挑戰公衆智商的嫌疑。\nSB250病毒謠言在傳播過程中,出現多個分支,比如“2SB500病毒”和“sk5”,直到最新版本的“CTC5”。\n謠言中所說的上海第三人民醫院,在2014年就已更名爲上海交通大學醫學院附屬第九人民醫院(北院)。上海九院宣傳科工作人員表示,該院近期並沒有“68名男女生感染sB250病毒死亡”事件。工作人員表示,在此之前已關注到這則傳言,希望市民不要輕信誤信。","cover":"http://p.qpic.cn/jiaozhen/0/1505285069735/0","updatedAt":"2019-03-22 08:48:25","date":"2017-05-23","author":"鄭子愚","authordesc":"上海闢謠平臺記者"},"sort":[2.690024,65498]},{"_index":"jiaozhen","_type":"article","_id":"08b449aba0d975b50c5917596535d91b","_score":2.690024,"_source":{"id":"08b449aba0d975b50c5917596535d91b","title":"諾如病毒有望被攻克","result":"真-確實如此","source":"話食科普","oriurl":"https://mp.weixin.qq.com/s/i8UK0OuXOUw34RmEMcHcvw","abstract":"美國貝勒醫學院的Khalil Ettayebi教授帶領的研究團隊終於攻克了這個困擾科學家近半個世紀的難題,首次成功地在幹細胞來源的非轉化人類腸道單層培養系統中培養出了多種諾如病毒毒株。這爲開啓諾如病毒神祕之門找到了鑰匙。","updatedAt":"2019-03-22 08:48:24","date":"2017-01-13","author":"騰訊醫典","authordesc":"騰訊旗下專業醫學科普平臺"},"sort":[2.690024,61953]},{"_index":"jiaozhen","_type":"article","_id":"37a792801918bad2b7001cc1f462ff01","_score":2.690024,"_source":{"id":"37a792801918bad2b7001cc1f462ff01","title":"太原12級地震視頻有病毒","result":"假-謠言","source":"山西晚報","oriurl":"http://www.sxwbs.com/wb/news_13/sx_0/6820561.shtml","abstract":"太原市公安局網絡警察支隊民警表示,首先,單單是太原12級地震的說法就是假消息。全世界範圍內有記錄的最大震級的地震,是1960年智利地震,震級9.5級。在日常生活中,凡是能將發震時間、地點精確“預報”的,肯定都是謠言。因爲目前全世界的地震預報水平都無法達到這樣的精度。其次,這則所謂的病毒提醒消息,是一條藉助勒索病毒蹭熱點的謠言。警方提醒大家,要堅持不信謠、不傳謠,做到對來源不明的鏈接不點擊、不理睬。","updatedAt":"2018-08-31 08:10:30","date":"2017-05-22","author":"曹瀟","authordesc":"話食科普團隊成員"},"sort":[2.690024,22112]},{"_index":"jiaozhen","_type":"article","_id":"73a6c539f8a90a089d71629345ae2d6f","_score":2.6712425,"_source":{"id":"73a6c539f8a90a089d71629345ae2d6f","title":"新型冠狀病毒插入的基因片段是精心選擇,是人工病毒","result":"假-謠言","source":"騰訊較真","abstract":"首先,暗示“新冠肺炎病毒可能誕生於人爲基因改造”的印度作者已經撤稿了。其次,與HIV近似的基因片段並不是“插入”的,這四個短基因序列在很多物種中都存在,且保守。蝙蝠冠狀病毒spike也具有該結構,且100%同源。\n“插入增加蛋白的可塑性和水溶性”也毫無道理。新冠肺炎病毒主要是進入呼吸道,通過S蛋白接觸人的呼吸道尤其是下呼吸道黏膜上皮的ACE2受體。病毒是個生物體,不是個化合物,病毒沒有水溶性一說。病毒作爲生物體,不能溶於水,也沒有必要溶於水。","cover":"//jiaozhen-70111.picnjc.qpic.cn/3e5ed1cf9c6941c8a784442cfc943fdc","updatedAt":"2020-02-03 20:52:56","date":"2020-02-03","author":"一節生薑","authordesc":"賓夕法尼亞大學醫學院病理及實驗醫藥系研究副教授"},"sort":[2.6712425,81514]},{"_index":"jiaozhen","_type":"article","_id":"0ee5b891a540fd29f88cc54ea95c0ce5","_score":2.6712425,"_source":{"id":"0ee5b891a540fd29f88cc54ea95c0ce5","title":"新型冠狀病毒感染者糞便裏查出病毒 ","result":"真-確有此事","source":"騰訊較真","oriurl":"https://view.inews.qq.com/a/20200202A04B6000","abstract":"深圳市第三人民醫院的研究發現某些新型冠狀病毒感染者的糞便裏檢測到病毒,此前美國第一例新型肺炎患者的醫學案例報道中,也提到患者的糞便中查到新型冠狀病毒,這表明病毒很大可能會通過“糞口傳播”。\n冠狀病毒也存在於蝙蝠的糞便中,在“非典”患者的糞便和尿液中,也有SARS病毒,在低溫下還可以長期存活。\n預防“糞口傳染”、“接觸傳染”,關鍵是要勤洗手,洗對手。","cover":"//jiaozhen-70111.picnjc.qpic.cn/442e51a694ea0a5faf43b1fc89c4e05a","updatedAt":"2020-02-03 20:53:06","date":"2020-02-02","author":"王思露","authordesc":"內蒙古營養健康促進會副會長、國家二級公共營養師"},"sort":[2.6712425,81483]},{"_index":"jiaozhen","_type":"article","_id":"2b75236662394dd34dedc5f12a86c8a7","_score":2.6712425,"_source":{"id":"2b75236662394dd34dedc5f12a86c8a7","title":"接觸沾染了病毒的物品屬於接觸傳播,也可能感染病毒","result":"真-確實如此","source":"騰訊醫典","oriurl":"https://h5.baike.qq.com/mobile/article.html?docid=tx20406001fqzx34&adtag=op.co.jiaoz.ydyw","abstract":"最新發布的《新型冠狀病毒感染的肺炎診療方案(試行第四版)》表明,新型冠狀病毒主要感染呼吸道上皮,但傳播途徑除了空氣傳播,還包括接觸傳播。這裏的接觸傳播包括直接接觸和間接接觸:直接接觸主要是指接觸到病人的身體的分泌物,例如眼淚、鼻涕、口水、痰液、嘔吐物、尿、便等;間接接觸則指的是接觸到沾染了病毒的物品,包括門把手、樓梯扶手、桌面、手機、玩具、筆記本電腦等。\n病毒在56℃ 下30分鐘纔會失活,氣溫是不可能達到這個溫度的,這也就意味着,粘在門把手等物體上的病毒是具有傳染性的,一旦用手觸碰後,如果再用手摸臉、揉眼睛等,就可能被傳染。","cover":"//jiaozhen-70111.picnjc.qpic.cn/eb81e82d17df4ef1ad4ef53de5a0da2a","updatedAt":"2020-02-03 20:47:12","date":"2020-01-29","author":"","authordesc":""},"sort":[2.6712425,81348]},{"_index":"jiaozhen","_type":"article","_id":"44ff3c9c3d7a57d8acb22003eec27be6","_score":2.5557194,"_source":{"id":"44ff3c9c3d7a57d8acb22003eec27be6","title":"草莓內含有“諾如病毒”,吃了會中毒","result":"假-僞常識","source":"王思露營養師","oriurl":"http://view.inews.qq.com/a/20180119A0OS3A00","abstract":"“諾如病毒”的確是一種傳染性較強的腸道病毒,如果攝入會引起嘔吐、腹瀉等症狀,而它主要通過空氣和接觸傳播。食用草莓感染諾如病毒的原因主要是由於草莓被含有病毒的污染物侵襲,比如被感染的生物或含病毒的污水等,附着在草莓上,如果沒洗乾淨直接吃了的話就可能感染上諾如病毒。\n草莓本身並不含有“諾如病毒”,罪魁禍首並不是它,而是額外的污染物。\n日常生活中您喫的蔬果,無論是草莓還是橙子,喫之前徹底清洗問題不大。","updatedAt":"2019-03-22 08:48:27","date":"2018-01-19","author":"宋慧","authordesc":"上海闢謠平臺記者"},"sort":[2.5557194,67904]},{"_index":"jiaozhen","_type":"article","_id":"3f7766a2e99254d43e96f43479d7892b","_score":2.4069686,"_source":{"id":"3f7766a2e99254d43e96f43479d7892b","title":"新風系統會傳播病毒,導致感染新型冠狀病毒肺炎","result":"假-謠言","source":"騰訊較真","abstract":"在室內當感染者打噴嚏、說話或咳嗽時,病毒可以通過飛沫傳播和接觸傳播。已經進行的幾項研究數據證實,通風率越高,室內空氣中的飛沫顆粒數量衰減越快,缺乏通氣或低通氣率增加空氣傳播感染率和疾病暴發率。未發現任何研究提供了通氣率與空氣傳播感染之間關係的流行病學證據。\n新風系統是由送風系統和排風系統組成的一套獨立空氣處理系統,無論是管道式新風系統和無管道新風系統都通過新風機淨化室外空氣導入室內,通過管道將室內空氣排出,增加通風可以減少疾病感染機率,辦公環境要預防感染還是要帶口罩和勤洗手。","cover":"//jiaozhen-70111.picnjc.qpic.cn/3LesBmNMRfSWNGojnJ9QwR","updatedAt":"2020-02-05 01:40:20","date":"2020-02-05","author":"王宇歌","authordesc":"美國國立衛生研究院博士後研究員,免疫學博士"},"sort":[2.4069686,81584]}]}

 


文章內容展示

咱們有了文章列表,接下來點擊某個標題進去後,會展示具體內容,我們也要獲取到這些內容,纔算完整的

 

1、分析

我們點擊文章進去,發現會有一個 /article的請求,這個很好理解,也就是說,我們只要將獲取到文章列表的文章id,然後再請求這個鏈接:https://vp.fact.qq.com/article?id=7b962367fdc55a96e6e94d0d2822ae82

就可以獲取到文章內容。

 

文章內容大致可以分爲這三塊,1、流傳說法;2、較真鑑定;3、拓展知識點

上面的真實展示效果和查看源代碼的兩張圖123分別是是一一對應的


2、展示方案

這裏我有兩種解決方案:

(1)、簡單一點,我們獲取到文章列表後,拿到id,當用戶點擊某個標題進去的時候,就讓他直接點擊鏈接:

https://vp.fact.qq.com/article?+id=xxxxxxxx,進真實的鏈接。(可取)

如果是小程序的話,個人小程序可以利用客服會話,點擊鏈接進去或者是企業小程序直接展示web頁面的功能。也這樣會簡單一點,而且也直接進了騰訊的頁面,對版權來說更好一點(可取)

(2)、就是如實的用正則一一解析上面三塊內容,如果是網站展示的話,可以直接解析最外面的div即可。(可取)

如果是小程序的話,也可以直接解析到最外面的div,然後利用小程序的插件wxParse插件,來解析html內容;(可取)

再或者一一解析裏面的內容,然後返回給小程序頁面(這稍微複雜,不建議採用)。

這裏正則我就不展示了,如果有需要的話,可以在下面留言,我會再補充

 


下面放一下爬取要用到的工具類和完整代碼

 

三、工具類

用到的工具類,請戳鏈接:https://blog.csdn.net/qq_27471405/article/details/104140618

 

四、完整代碼



import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.common.apiV2.beans.HttpPojo;

import org.springframework.stereotype.Service;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;



/**
 * Created by yjl on 2020-02-01.
 */
@Service("WuhanService")
public class WuhanService {
    public static void main(String[] args) {
        //getTxFactIndexData("0");
        //getTxFactIndexData("1");
        getTxFactSearchResultByTitle("0","病毒");
    }




    /**
     * 騰訊【新型冠狀病毒肺炎實時闢謠】較真查證平臺 文章列表數據
     * @param pageNo 頁碼
     * @return
     */
    public static String getTxFactIndexData(String pageNo){
        //String url="https://vp.fact.qq.com/home";
        //String url="https://vp.fact.qq.com/loadmore?artnum=0&page=0&_=1580922845373&callback=jsonp0";
        String url="https://vp.fact.qq.com/loadmore";
        if(pageNo==null||"".equals(pageNo)){ //如果頁碼沒有傳,默認爲第一頁
            pageNo="0";
        }
        String jsonpNum = "jsonp"+pageNo;
        Map paramObj = new HashMap();
        paramObj.put("artnum","0");
        paramObj.put("page",pageNo);
        paramObj.put("_",System.currentTimeMillis());
        paramObj.put("callback",jsonpNum);

        //模擬請求
        HttpPojo httpPojo = new HttpPojo();
        httpPojo.setHttpHost("vp.fact.qq.com");
        httpPojo.setHttpAccept("text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
        httpPojo.setHttpConnection("keep-alive");
        httpPojo.setHttpUserAgent("Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Mobile Safari/537.36");
        httpPojo.setHttpReferer("https://vp.fact.qq.com/home");
        httpPojo.setHttpOrigin("https://vp.fact.qq.com/home");

        String htmlResult = httpSendGet(url, paramObj, httpPojo); //整個html頁面
        //System.out.println(htmlResult);
        htmlResult = getRegContent(jsonpNum+"?\\((.*?)\\)$", htmlResult, 1);
        System.out.println(htmlResult);

        //遍歷入庫或者存redis等操作
        /*JSONObject dataJo = JSONObject.parseObject(htmlResult);
        String content = dataJo.getString("content");//拿到所有數據
        JSONArray array = JSONArray.parseArray(content);
        for (int i = 0; i < array.size(); i++) {
            JSONObject tripJo = JSONObject.parseObject(array.getString(i));
            String title = tripJo.getString("title");
            System.out.println("title:"+title);

            //入庫操作
        }*/

        return htmlResult;
    }


    /**
     * 騰訊【新型冠狀病毒肺炎實時闢謠】較真查證平臺 根據關鍵詞搜索文章列表
     * @param pageNo
     * @param titleText
     * @return
     */
    public static String getTxFactSearchResultByTitle(String pageNo,String titleText){
        String htmlResult="fail";
        try {
            //String url="https://vp.fact.qq.com/searchresult?title=%E7%97%85%E6%AF%92&num=0&_=1580925707024&callback=jsonp1";
            String url="https://vp.fact.qq.com/searchresult";
            if(pageNo==null||"".equals(pageNo)){ //如果頁碼沒有傳,默認爲第一頁
                pageNo="0";
            }
            String jsonpNum = "jsonp"+(Integer.parseInt(pageNo)+1);
            Map paramObj = new HashMap();
            paramObj.put("title", URLEncoder.encode(titleText, "utf-8"));
            paramObj.put("num",Integer.parseInt(pageNo)*20);
            paramObj.put("_",System.currentTimeMillis());
            paramObj.put("callback",jsonpNum);

            //模擬請求
            HttpPojo httpPojo = new HttpPojo();
            httpPojo.setHttpHost("vp.fact.qq.com");
            httpPojo.setHttpAccept("text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
            httpPojo.setHttpConnection("keep-alive");
            httpPojo.setHttpUserAgent("Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Mobile Safari/537.36");
            httpPojo.setHttpReferer("https://vp.fact.qq.com/home");
            httpPojo.setHttpOrigin("https://vp.fact.qq.com/home");

             htmlResult = httpSendGet(url, paramObj, httpPojo); //整個html頁面
            //System.out.println(htmlResult);
            htmlResult = getRegContent(jsonpNum+"?\\((.*?)\\)$", htmlResult, 1);
            System.out.println(htmlResult);

            //遍歷入庫或者存redis等操作
            /*JSONObject dataJo = JSONObject.parseObject(htmlResult);
            String content = dataJo.getString("content");//拿到所有數據
            JSONArray array = JSONArray.parseArray(content);
            for (int i = 0; i < array.size(); i++) {
                JSONObject tripJo = JSONObject.parseObject(array.getString(i));
                String source = tripJo.getString("_source");
                JSONObject sourceJo = JSONObject.parseObject(source);
                String title = sourceJo.getString("title");
                System.out.println("title:"+title);

                //入庫操作
            }*/


        }catch (Exception e){
            e.printStackTrace();
        }
        return htmlResult;
    }



    /**
     * http請求
     * @param url
     * @param paramObj
     * @param httpPojo
     * @return
     */
    private static String httpSendGet(String url, Map paramObj, HttpPojo httpPojo){
        String result = "";
        String urlName = url + "?" + parseParam(paramObj);
        //System.out.println("urlName:"+urlName);
        BufferedReader in=null;
        try {

            URL realURL = new URL(urlName);
            URLConnection conn = realURL.openConnection();
            //僞造ip訪問
            String ip = randIP();
            System.out.println("目前僞造的ip:"+ip);
            conn.setRequestProperty("X-Forwarded-For", ip);
            conn.setRequestProperty("HTTP_X_FORWARDED_FOR", ip);
            conn.setRequestProperty("HTTP_CLIENT_IP", ip);
            conn.setRequestProperty("REMOTE_ADDR", ip);
            conn.setRequestProperty("Host", httpPojo.getHttpHost());
            conn.setRequestProperty("accept", httpPojo.getHttpAccept());
            conn.setRequestProperty("connection", httpPojo.getHttpConnection());
            conn.setRequestProperty("user-agent", httpPojo.getHttpUserAgent());
            conn.setRequestProperty("Referer",httpPojo.getHttpReferer()); //僞造訪問來源
            conn.setRequestProperty("Origin", httpPojo.getHttpOrigin()); //僞造訪問域名
            conn.connect();
            Map<String, List<String>> map = conn.getHeaderFields();
            for (String s : map.keySet()) {
                //System.out.println(s + "-->" + map.get(s));
            }
            in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "utf-8"));
            String line;
            while ((line = in.readLine()) != null) {
                result += "\n" + line;
            }


        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            if (in!=null){
                try {
                    in.close();
                }catch (Exception e){
                    e.printStackTrace();
                }

            }
        }
        return result;
    }


    /**
     * 解析map
     * @param paramObj
     * @return
     */
    public static String parseParam(Map paramObj){
        String param="";
        if (paramObj!=null&&!paramObj.isEmpty()){
            for (Object key:paramObj.keySet()){
                String value = paramObj.get(key).toString();
                param+=(key+"="+value+"&");

            }
        }
        return param;
    }

    /**
     * 僞造ip地址
     * @return
     */
    public static String randIP() {
        Random random = new Random(System.currentTimeMillis());
        return (random.nextInt(255) + 1) + "." + (random.nextInt(255) + 1)
                + "." + (random.nextInt(255) + 1) + "."
                + (random.nextInt(255) + 1);
    }

    /**
     * 通過正則獲取數據
     * @param reg
     * @param content
     * @param index
     * @return
     */
    public static String getRegContent(String reg,String content,int index){
        Pattern pattern = Pattern.compile(reg); 	// 講編譯的正則表達式對象賦給pattern
        Matcher matcher = pattern.matcher(content);
        String group="";
        while (matcher.find()){
            group= matcher.group(index);
            //System.out.println(group);
        }
        return group;
    }

}

 


更多其他系列:

https://blog.csdn.net/qq_27471405/category_9693036.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章