織夢採集,一般用不到採集網址有端口的情況,少數有端口的網址就無法採集了。總結了下dede無法採集端口不爲80的網址錯誤解決:
問題描述,當採集的網址後代端口時(爲防止有推廣嫌疑就把網址換成xxx了。):
測試採集網址:http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1
獲取的列表測試信息網址是不帶端口的結果是不帶端口的數組集合:
測試的列表網址: http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1
Array
(
[0] => Array
(
[title] => 講座回放|施奠東—西湖,世界風景園林的
[link] => http://www.xxx.com/index.php/main/news/15529.html
[image] => http://www.xxx.com/uploadfiles/articles/20190528/15529.png
)
[1] => Array
(
[title] => 喜報|恭賀我院2019年度西湖杯榮獲佳績!
[link] => http://www.xxx.com/index.php/main/news/15528.html
[image] => http://www.xxx.com/uploadfiles/articles/20190522/15528.jpg
)
[2] => Array
(
[title] => 講座預告|西湖——世界風景園林的傑出範
[link] => http://www.xxx.com/index.php/main/news/15526.html
[image] => http://www.xxx.com/uploadfiles/articles/20190516/15526.jpg
)
[3] => Array
(
[title] => 講座回放|胡理琛—西湖七十年流變憶勝
[link] => http://www.xxx.com/index.php/main/news/15524.html
[image] => http://www.xxx.com/uploadfiles/articles/20190513/15524.png
)
[4] => Array
(
[title] => 講座回放|彭嘉恆—“南師、禪及其在西方
[link] => http://www.xxx.com/index.php/main/news/15518.html
[image] => http://www.xxx.com/uploadfiles/articles/20190507/15518.png
)
[5] => Array
(
[title] => 講座預告|胡理琛—西湖七十年流變憶勝
[link] => http://www.xxx.com/index.php/main/news/15516.html
[image] => http://www.xxx.com/uploadfiles/articles/20190430/15516.jpg
)
)
這樣顯然得到的網址是錯誤的。根本無法訪問,也就無法採集了。
經過一番查找,原來是dede 設置HTML的內容和來源網址 的函數問題,漏寫端口判斷了。
在
include/dedehtml2.class.php
function SetSource 函數裏大概79行加上紅框裏的內容:
再測試一下。ok 了,這樣網址就可以正常打開,採集到了。
付上代碼:
function SetSource(&$html, $url = '', $linktype='')
{
$this->__construct();
$this->CAtt = new DedeAttribute2();
$url = trim($url);
$this->SourceHtml = $html;
$this->BaseUrl = $url;
//判斷文檔相對於當前的路徑
$urls = @parse_url($url);
$port=$urls['port']=='80'?'':':'.$urls['port'];//lyy 爲80時候可以省略,否則就加上
$this->HomeUrl = $urls['host'].$port;
$this->BaseUrlPath = $this->HomeUrl.$urls['path'];
$this->BaseUrlPath = preg_replace("/\/([^\/]*)\.(.*)$/","/",$this->BaseUrlPath);
$this->BaseUrlPath = preg_replace("/\/$/",'',$this->BaseUrlPath);
if($linktype!='')
{
$this->GetLinkType = $linktype;
}
if($html != '')
{
$this->Analyser();
}
}