織夢採集,dede無法採集端口不爲80的網址錯誤解決

織夢採集,一般用不到採集網址有端口的情況,少數有端口的網址就無法採集了。總結了下dede無法採集端口不爲80的網址錯誤解決:

問題描述,當採集的網址後代端口時(爲防止有推廣嫌疑就把網址換成xxx了。):

測試採集網址:http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1

獲取的列表測試信息網址是不帶端口的結果是不帶端口的數組集合:

測試的列表網址: http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1
Array
(
    [0] => Array
        (
            [title] => 講座回放|施奠東—西湖,世界風景園林的
            [link] => http://www.xxx.com/index.php/main/news/15529.html
            [image] => http://www.xxx.com/uploadfiles/articles/20190528/15529.png
        )

    [1] => Array
        (
            [title] => 喜報|恭賀我院2019年度西湖杯榮獲佳績!
            [link] => http://www.xxx.com/index.php/main/news/15528.html
            [image] => http://www.xxx.com/uploadfiles/articles/20190522/15528.jpg
        )

    [2] => Array
        (
            [title] => 講座預告|西湖——世界風景園林的傑出範
            [link] => http://www.xxx.com/index.php/main/news/15526.html
            [image] => http://www.xxx.com/uploadfiles/articles/20190516/15526.jpg
        )

    [3] => Array
        (
            [title] => 講座回放|胡理琛—西湖七十年流變憶勝
            [link] => http://www.xxx.com/index.php/main/news/15524.html
            [image] => http://www.xxx.com/uploadfiles/articles/20190513/15524.png
        )

    [4] => Array
        (
            [title] => 講座回放|彭嘉恆—“南師、禪及其在西方
            [link] => http://www.xxx.com/index.php/main/news/15518.html
            [image] => http://www.xxx.com/uploadfiles/articles/20190507/15518.png
        )

    [5] => Array
        (
            [title] => 講座預告|胡理琛—西湖七十年流變憶勝
            [link] => http://www.xxx.com/index.php/main/news/15516.html
            [image] => http://www.xxx.com/uploadfiles/articles/20190430/15516.jpg
        )

)

這樣顯然得到的網址是錯誤的。根本無法訪問,也就無法採集了。

經過一番查找,原來是dede 設置HTML的內容和來源網址 的函數問題,漏寫端口判斷了。

include/dedehtml2.class.php

function SetSource 函數裏大概79行加上紅框裏的內容:

再測試一下。ok 了,這樣網址就可以正常打開,採集到了。

付上代碼:

function SetSource(&$html, $url = '', $linktype='')
    {
        $this->__construct();
        $this->CAtt = new DedeAttribute2();
        $url = trim($url);
        $this->SourceHtml = $html;
        $this->BaseUrl = $url;
        //判斷文檔相對於當前的路徑
        $urls = @parse_url($url);
        $port=$urls['port']=='80'?'':':'.$urls['port'];//lyy 爲80時候可以省略,否則就加上
        $this->HomeUrl = $urls['host'].$port;
        $this->BaseUrlPath = $this->HomeUrl.$urls['path'];
        $this->BaseUrlPath = preg_replace("/\/([^\/]*)\.(.*)$/","/",$this->BaseUrlPath);
        $this->BaseUrlPath = preg_replace("/\/$/",'',$this->BaseUrlPath);
        if($linktype!='')
        {
            $this->GetLinkType = $linktype;
        }
        if($html != '')
        {
            $this->Analyser();
        }
    }

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章