爬蟲算法

多次被人問到這個問題，看來需要做一點說明，這裏簡單介紹一下搜索引擎的機器爬蟲的製作和一些基本要注意的事項。



說的簡單易懂一些，網絡爬蟲跟你使用的〖離線閱讀〗工具差不多。說離線，其實還是要跟網絡聯結，否則怎麼抓東西下來？



那麼不同的地方在哪裏？



1】 網絡爬蟲高度可配置性。


2】 網絡爬蟲可以解析抓到的網頁裏的鏈接


3】 網絡爬蟲有簡單的存儲配置


4】 網絡爬蟲擁有智能的根據網頁更新分析功能


5】 網絡爬蟲的效率相當的高



那麼依據特徵，其實也就是要求了，如何設計爬蟲呢？要注意哪些步驟呢？



1】 url 的遍歷和紀錄


這點 larbin 做得非常的好，其實對於url的遍歷是很簡單的，例如：


cat [what you got]| tr /" //n | gawk '{print $2}' | pcregrep ^http://

 


就可以得到一個所由的 url 列表



2】多進程 VS 多線程


各有優點了，現在一臺普通的PC 例如 booso.com 一天可以輕鬆爬下5個G的數據。大約20萬網頁。



3】時間更新控制


最傻的做法是沒有時間更新權重，一通的爬，回頭再一通的爬。


通常在下一次爬的的數據要跟上一次進行比較，如果連續5次都沒有變化，那麼將爬這個網頁的時間間隔擴大1倍。



如果一個網頁在連續5次爬取的時候都有更新，那麼將設置的爬取時間縮短爲原來的1／2。



注意，效率是取勝的關鍵之一。



4】爬的深度是多少呢？


看情況了。如果你比較牛，有幾萬臺服務器做網絡爬蟲，我勸您跳過這一點。


如果你同我一樣只有一臺服務器做網絡爬蟲，那麼這樣一個統計您應該知道：



網頁深度：網頁個數：網頁重要程度


0 : 1 : : 10


1 :20 : :8


2: :600: :5


3: :2000: :2


4 above: 6000: 一般無法計算



好了，爬到三級就差不多了，再深入一是數據量擴大了3／4倍，二是重要度確下降了許多，這叫做“種下的是龍種，收穫的是跳蚤。”



5】爬蟲一般不之間爬對方的網頁，一般是通過一個Proxy出去，這個proxy有緩解壓力的功能，因爲當對方的網頁沒有更新的時候，只要拿到 header 的 tag就可以了，沒有必要全部傳輸一次了，可以大大節約網絡帶寬。



apache webserver裏面紀錄的 304 一般就是被cache的了。



6】請有空的時候照看一下robots.txt



7】存儲結構。


這個人人見智，google 用 gfs 系統，如果你有7／8臺服務器，我勸你用NFS系統，要是你有70／80個服務器的話我建議你用afs 系統，要是你只有一臺服務器，那麼隨便。



給一個代碼片斷，是我寫的新聞搜索引擎是如何進行數據存儲的：



NAME=`echo $URL |perl -p -e 's/([^/w/-/./@])/$1 eq "/n" ? "/n":sprintf("%%%2.2x",ord($1))/eg'`


mkdir -p $AUTHOR


newscrawl.pl $URL --user-agent="news.booso.com+(+http://booso.com

)" -outfile=$AUTHOR/$NAME


<%

' BSD 2.0 license,

' http://www.opensource.org/licenses/bsd-license.php


'

'轉貼或修改請保留bug提供人和bug修復人的信息,包括郵箱和網站名稱,

'如bug提供人和bug修復人另有要求的除外

'可以在 http://www.vtalkback.com/site-map
 對代碼進行測試

'版本 0.1.2

'------------------------------變量初始化-----------------------------------------------------------

ver="0.1.2"

'script configuration

'debug =0

'Response.CharSet="gb2312";

'current_charset="utf-8"

current_charset="gb2312"    '必須使用小寫

'Url="http://www.vtalkback.com
"

'Url="http://www.jwmodel.com
"

Url=request("url")

Url=trim(url)

if right(Url,1)="/" then

  Url=left(url,len(url)-1)

end if


first_page=Url

'response.write first_page& " "& url&" "

'response.flush


'first_page=""

none_http_url=right(url,len(url)-len("http://
"))     '生成無 http://的url



root_url_len=instr(none_http_url,"/")


if(root_url_len=0) then

  root_url_len=len(none_http_url)

end if


root_url="http://
" & left(none_http_url,root_url_len)  '去掉尾部的 '/'

if right(root_url,1)="/" then

  root_url=left(root_url,len(root_url)-1)

end if


'response.write root_url & " <br>"

'response.flush


str_depth = request("url_depth")

FinalDepth=CInt(str_depth)

'---------------Depth limit----------------------

'if FinalDepth>2 then 

'  FinalDepth=2

'end if

'FinalDepth=1


'response.write "str_depth =" & str_depth &  " <br>"

'response.flush


'------------------------------------------------

LimitUrl=1000

'leave sitemapDate empty if you want sitemapDate=now

sitemapDate=""

'sitemapPriority possible value are from 0.1 to 1.0

sitemapPriority="0.7"

'sitemapChangefreq possible value are: always, hourly, daily, weekly, monthly, yearly, never

sitemapChangefreq="monthly"

'see http://www.time.gov/
 for utcOffset

utcOffset=1


Dim objRegExp,objUrlArchive,strHTML,objMatch,crawledUrlArchive,BytesStream,CharsetRegExp,CharsetUrlArchive,oHttp

Set oHttp=Server.CreateObject("WinHttp.WinHttpRequest.5.1")

Server.ScriptTimeout=300

set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")

Set BytesStream   =   Server.CreateObject("ADODB.Stream")   


Set objUrlArchive=Server.CreateObject("Scripting.Dictionary")

Set crawledUrlArchive=Server.CreateObject("Scripting.Dictionary")

Set CharsetUrlArchive=Server.CreateObject("Scripting.Dictionary")

Set objRegExp = New RegExp

objRegExp.IgnoreCase = True

objRegExp.Global = True


'you can change this patterns for your needs

'objRegExp.Pattern = "href=(.*?)[/s|>]"

'objRegExp.Pattern

 = "<!--(.*?)-->|<(/s*)a(/s*)href=(.*?)[" & chr(34) & 

"](.*?)" & "[" & chr(34)&"]"

objRegExp.Pattern = "<!--(.*?)-->|<(/s*)a(/s*)href=(.*?)[/s|>]"

Set CharsetRegExp = New RegExp

CharsetRegExp.IgnoreCase = True

CharsetRegExp.Global = True

CharsetRegExp.Pattern = "<META(.*?)Content-Type(.*?)>"

'to remove elements from html urls

RemoveText=array("<",">","a href=",chr(34),"'","href=")

'to exclude elements from urls

ExcludeUrl=array("mailto:","javascript:",".css",".ico","file:")


'if you want sitemapDate=now

if sitemapDate="" then filelmdate=now()


sitemapDate=iso8601date(filelmdate,utcOffset)


'------------------------------</變量初始化>-----------------------------------------------------------

crawl first_page,1


For Depth=0 to FinalDepth-1


  arrUrl=objUrlArchive.Keys

  arrDepth=objUrlArchive.Items

  For LoopUrl= 0 to ubound(arrurl)

    willCrawlUrl=url&"/"&arrUrl(LoopUrl)

    willCrawldepth=arrDepth(LoopUrl)


'    response.write "willCraw="& willCrawlUrl &" depth="&willCrawldepth&" <br>"

'    response.flush

    

    if crawledUrlArchive.Exists(willCrawlUrl)=false and willCrawldepth < FinalDepth then 

'      response.write "Craw="& willCrawlUrl &" depth="&willCrawldepth&" <br>"

'      response.flush

      crawledUrlArchive.add willCrawlUrl,1

      'if ubound(arrurl)>max_url_count then

      '  Exit For

      'end if

      'debugging

      'response.write "<!-- pagefound='"&loopurl&"'-->"

      crawl willCrawlUrl,willCrawldepth+1      

      if objUrlArchive.Count-1>LimitUrl then exit for   'if you want to limit the url number

    end if


  Next

  erase arrUrl

  erase arrDepth

Next


' create the xml on the fly

'arrDepth=objUrlArchive.Items

'response.write

  "<textarea rows=" &chr(34)& "93" &chr(34)& "name=" 

&chr(34)& "S1" &chr(34)& "cols=" &chr(34)& "138"

 &chr(34)& ">"


writeHead  '輸出文件頭


arrCharset=CharsetUrlArchive.items

arrurl=objUrlArchive.Keys

For LoopUrl=0 to ubound(arrurl)

'  response.write "<loc>"&server.htmlencode(url&"/"&arrUrl(LoopUrl))&"</loc>"     '輸出url

'  response.write


"<loc>&&&&"&server.urlEncode(url&"/"&arrUrl(LoopUrl))&"</loc>"

   '輸出url

  cur_charset=arrCharset(LoopUrl)

  cur_url=arrUrl(LoopUrl)

  writelink cur_url,cur_charset

Next


response.write  Chr(13) & Chr(10)

response.write "</urlset>"

response.write  Chr(13) & Chr(10)


'response.write "</textarea>"

'arrUrl=objUrlArchive.Keys


'response.write "<!-- pagefound='"&ubound(arrurl)+2&"'--> " 

'---------------------------<清除環境變量>-----------------------------------------------

erase arrUrl

erase arrCharset  


'erase arrDepth

objUrlArchive.RemoveAll()

crawledUrlArchive.RemoveAll()

CharsetUrlArchive.RemoveAll()

Set BytesStream   =   Nothing   

set xmlhttp = nothing

set oHttp = nothing

'---------------------------</清除環境變量>-----------------------------------------------

'*************************************************************************************************************

function writeHead()

  response.ContentType = "text/xml; charset=gb2312"

  response.write "<?xml version='1.0' encoding='gb2312'?>"

  response.write Chr(13) & Chr(10)

  response.write "<!-- generator='http://www.vtalkback.com/sitemap/&#39;
 ver='" &ver &"'-->"  

  response.write  Chr(13) & Chr(10) 

  response.write "<!-- pagefound='"&ubound(objUrlArchive.Keys)+2&"'--> "  

  response.write  Chr(13) & Chr(10)

  response.write "<urlset xmlns='http://www.google.com/schemas/sitemap/0.84&#39;
>"

  response.write  Chr(13) & Chr(10)


  response.write "<url>"

  response.write  Chr(13) & Chr(10)

  response.write "<loc>"&url&"/</loc>"

  response.write  Chr(13) & Chr(10)

  response.write "<lastmod>"&sitemapDate&"</lastmod>"

  response.write  Chr(13) & Chr(10)

  response.write "<priority>"&sitemapPriority&"</priority>"

  response.write  Chr(13) & Chr(10)

  response.write "<changefreq>"&sitemapChangefreq&"</changefreq>"

  response.write  Chr(13) & Chr(10)

  response.write "</url>"

  response.write  Chr(13) & Chr(10)

end function

'*************************************************************************************************************

Function writeLink(write_url,write_charset)

'  response.write "<loc>"&write_charset&"</loc>"         


  response.write "<url>"

  response.write  Chr(13) & Chr(10)

  

  if write_charset="gb2312" then      

'    write_url=num2gb(write_url)  

    write_url=urldecode(write_url)

'    response.write write_charset

  else  'utf8

'    response.write write_charset&"aa"

    write_url=urldecode(write_url)  'new_url=url2utf(cur_url) new_url=num2gb(cur_url) new_url=gb2utf(new_url)  

  end if

'  write_url=replace(write_url,"&amp;",GB2UTF8("&"))

    


  response.write "<loc>"&url&"/"&write_url&"</loc>"         '輸出url

  response.write  Chr(13) & Chr(10)

'-------------------------------------------------------------------------------------

  response.write "<lastmod>"&sitemapDate&"</lastmod>"

  response.write  Chr(13) & Chr(10)

  response.write "<priority>"&sitemapPriority&"</priority>"

  response.write  Chr(13) & Chr(10)

  response.write "<changefreq>"&sitemapChangefreq&"</changefreq>"

  response.write  Chr(13) & Chr(10)

  response.write "</url>"

  response.write  Chr(13) & Chr(10)

end function


'***********************************爬行**************************************************************************

Function crawl(sub_url,crawl_depth) 

  sub_url=urldecode(sub_url)

'----------------<sub_url處理>-------------------------------------------

'--------------不能處理重定向bug---由(http://www.xiaoyezi.com
 )報告---------------------------------

'----------------(http://www.vtalkback.com
 , [email protected]) 修補----------------------------------------

'  sub_url1=sub_url

'  response.write "sub_url="& sub_url&"<br>  "

  sub_url=GetAbsoluteURL(sub_url,1)  '讀取重定向後的 url

'  if sub_url1<>sub_url then

'    response.write  sub_url1&" "&sub_url

'    response.flush  

'  end if 

'------------</sub_url處理>-------------------------------------------


'------------<sub_dir處理>-------------------------------------------

'  response.write "sub_url="& sub_url&"<br>  "&len(sub_url)&" "& len(url) &"</br>"

'  response.flush  

  sub_dir=right(sub_url,len(sub_url) - len(url))  ' http://www.vtalkback.com/blog
 -> blog

'  response.write "sub_dir="& sub_dir&" <br>"


  if instr(sub_dir,"?")>0 then

    sub_dir=left(sub_dir,InStr(sub_dir,"?")-1)

  end if

  

  if instr(sub_dir,".")>0 or instr(sub_dir,"?")>0 or instr(sub_dir,"=")>0 or instr(sub_dir,"#")>0 then

    sub_dir=left(sub_dir,InStrRev(sub_dir,"/"))  

  end if

  

  if sub_dir<>"" and right(sub_dir ,1)<>"/" then

    sub_dir=sub_dir&"/"

  end if 

  

'  response.write "sub_url="& sub_url&" <br>"

'  response.flush

'------------</sub_dir處理>----------------------------------------------------

'  response.write "sub_url="& sub_url&" <br>"

'  response.flush

  xmlhttp.open "GET", sub_url, false

  xmlhttp.send ""


'------------------------------------<取網頁charset值>------------------------------------------------

  if   XmlHttp.readystate <> 4 then 

    exit function

  end if


  htmlText   =   xmlhttp.responseText  

  if htmlText="" then

    exit function

  end if

  

  For Each CharsetMatch in CharsetRegExp.Execute(htmlText)

    CharsetMatch=lcase(CharsetMatch)

    char_index=instr(CharsetMatch,"charset=")

    if char_index>0 then

      CharsetMatch=right(CharsetMatch,len(CharsetMatch)-char_index-7)

'      CharsetMatch=trim(CharsetMatch)

      char_index=instr(CharsetMatch,chr(34))

      CharsetMatch=left(CharsetMatch,char_index-1) '去掉雙引號

      current_charset=trim(CharsetMatch)

'      current_charset=CharsetMatch

  

    end if

  next

  

'  response.write "--------------" & current_charset &"--------------------- <br>"

'  response.flush  

'------------------------------------</取網頁charset值>------------------------------------------------

  


'-------------------------------<編碼讀取>------------------------------------------------------------------------------

'-------------------不能處理gb2312的bug---由([email protected])(http://www.sijiholiday.com
 )報告-------------------

'-------------------------(http://www.vtalkback.com
)([email protected]) 修補------------------------------------------


'  strHTML=bytes2BSTR(xmlHttp.responseBody)   


  BytesStream.Type = 1

  BytesStream.Mode =3

  BytesStream.Open

  BytesStream.Write xmlHttp.responseBody

  BytesStream.Position = 0

  BytesStream.Type = 2

  BytesStream.Charset = current_charset 

'    strHTML = xmlhttp.responseText

     strHTML=BytesStream.ReadText   

    BytesStream.close   


'    response.binarywrite   htmlbody 

'  response.write(strHtml)

'  response.flush

'----------------------------------</編碼讀取>-----------------------------------------------------------------------------


  For Each objMatch in objRegExp.Execute(strHTML)

'    response.write objMatch & "<br>"

'    response.flush


    if left(objMatch,4)<>"<!--" then


    for i=0 to ubound(excludeUrl)

      if instr(objmatch,excludeUrl(i))>0 then objmatch=""

    next


    if objmatch<>"" then

'      response.write "objmatch1="& objMatch& "    <br>"

    

'      response.write "obj match is   "&right(objMatch,len(objMatch)-1)&"<br>"

'      response.flush

'------------------------<url整理>---------------------------------------------------------------

'---------------不能處理gb2312的bug---由([email protected])(http://www.sijiholiday.com
 )報告--------------------

'-------------------------(http://www.vtalkback.com
)([email protected]) 修補------------------------------------------

'      objMatch=server.htmlencode(objMatch)

'      response.write objMatch & "<br>"

'      response.flush


'      for i=0 to ubound(RemoveText)     '清除無效字符 chr(34),"'"

'        objMatch=replace(lcase(objMatch),lcase(RemoveText(i))," ")

'      next


      objMatch=lcase(objMatch)

      objMatch=replace(objMatch,chr(34)," ") '去掉url中的符號

      objMatch=replace(objMatch,"'"," ")

      objMatch=replace(objMatch,">"," ")

'      response.write "objmatch2="& objMatch& "    <br>"


'      str_index=instr(objMatch,chr(34))   '去掉引號和引號左邊的內容 

'      objMatch=right(objMatch,len(objMatch)-str_index)


      str_index=instr(objMatch,"=")     '去掉第一個等號和等號左邊的內容 

      objMatch=right(objMatch,len(objMatch)-str_index)


      objMatch=ltrim(objMatch)       '取出有效字符


'      response.write objMatch & "<br>"

'      response.flush


      str_index=instr(objMatch," ")     '去掉空格和空格右邊的內容


'      response.write objMatch &"  "&str_index &"<br>"

'      response.flush

      if str_index <> 0 then      

        objMatch=left(objMatch,str_index-1)

      end if

'      response.write objMatch & str_index &"<br>"

'      response.flush


'      str_index=instr(objMatch,chr(34))   '去掉引號和引號右邊的內容

'      objMatch=left(objMatch,str_index-1)

'------------------------</url整理>---------------------------------------------------------------


'------------------------<root

反斜線處

理>---------------------------------------------------------------------------------  '--------------------

不能處理/url格式的bug---由([email protected])(http://www.gamelee.cn
 )報告-----------------------

'-------------------------(http://www.vtalkback.com
)([email protected]) 修補------------------------------------------

  


      if left(objMatch,1)="/" then           '/blog --> http://www.vtalkback.com/blog
 

        objMatch=root_url & objMatch

      end if


'      response.write objMatch & "<br>"

'      response.flush

'------------------------</root反斜線處理>-------------------------------------------------------------------------


'--------------------------------<去掉root url>----------------------------------------------------------

'      response.write "url2="& url& "    <br>"

'      response.flush


      'in some cases this is better if left(objMatch,len(url))=Url then


      if left(objMatch,len(url))=Url then

        the_url=right(objMatch,len(objMatch) - len(url))

        if the_url<>"" and left(the_url,1)="/" then 

          the_url=right(the_url,len(the_url) - 1) '去掉左邊 "/"

        end if

        objMatch = the_url

        

      elseif left(objMatch,len(none_http_url))=none_http_url then

        the_url=right(objMatch,len(objMatch) - len(none_http_url))

        if the_url<>"" and left(the_url,1)="/" then 

          the_url=right(the_url,len(the_url) - 1) '去掉左邊 "/"

        end if

        objMatch = the_url


      elseif instr(objMatch,"http://
")=0 and objmatch<>"" then

        the_url=sub_dir&objMatch

        if the_url<>"" and left(the_url,1)="/" then 

          the_url=right(the_url,len(the_url) - 1) '去掉左邊 "/"

        end if

        objMatch = the_url

'        response.write "subdir="& sub_dir& "    <br>"

'        response.flush

        

        

      else  '(out of domain)

        objMatch=""

      end if

'--------------------------------</去掉root url>------------------------------------------------------

           end if  


    if objmatch<>"" then


'------------------------<&符號轉換>--------------------------------------------------------------------------------


      objMatch=replace(objMatch,"&","&amp;")      '& to &amp

      objMatch=replace(objMatch,"&amp;#","&#")

      objMatch=replace(objMatch,"&amp;amp;","&amp;")


      if right(objMatch,1)="/" then       '右邊的去掉 "/"

        objMatch=left(objMatch,len(objMatch)-1)

      end if


'      response.write objMatch & "<br>"

'      response.flush

'------------------------</&符號轉換>---------------------------------------------------------------------------


'--------------------------------<編碼處理>------------------------------------------------      

'注 如果原始頁面有%表示的url編碼,此時%會被轉換成爲%25


        if current_charset="gb2312"  then

'          objMatch=gb2num(objMatch)


'        response.write current_charset&" "& sub_url & "<br>"

'        response.write objMatch & "<br>"

        objMatch= server.urlEncode(objMatch)          

'        response.write objMatch &" " & "<br>"

'        response.flush

      else

'        response.write "url2 " & current_charset&" "&sub_url & "<br>"

'        response.write objMatch &" " & "<br>"

'        objMatch= urlDecode(objMatch)

        objMatch= server.urlEncode(objMatch)

'        response.write objMatch &" " & "<br>"

'        response.flush


'        objMatch=server.htmlencode(objMatch)  

'        objMatch=encodeURI(objMatch)

'        objMatch=UTF2GB(objMatch)

'          objMatch=gb2num(objMatch)

      end if


'--------------------------------</編碼處理>------------------------------------------------      

      

'      if objMatch<>newMatch then

'        response.write objMatch & "<br>"

'        response.write newMatch & "<br>"

'        response.flush

'      end if


      if objUrlArchive.Exists(objMatch)= false and the_url<>"" then

        objUrlArchive.Add objMatch,crawl_depth

        CharsetUrlArchive.Add objMatch,current_charset

'        response.write objMatch &"  "&sub_url&  "<br>"  '顯示url所在頁面----------

'        response.flush

        

'        writelink ObjMatch,current_charset        

      end if

    

    end if

    end if  

  Next

End Function


'*************************************************************************************************************


function gb2num(str)

  newStr=""

  for i=1 to len(str)   'gb2312處理

    c=mid(str,i,1)

    if asc(c)<0 then

      gb2312Code=ascW(c)

      if gb2312Code <0 then 

        gb2312Code =gb2312Code+65536

      end if

      newStr=newStr & "&#" & gb2312Code & ";"

    else

      newStr=newStr&c

    end if

  next 

  gb2num=newStr

end function

'*************************************************************************************************************

function url2utf(str)

  url2utf=decodeURI(str)

end function


'*************************************************************************************************************


Function URLDecode(enStr) '注 如果原始頁面有%表示的url編碼,到此時%會成爲%25,decode後還原爲%

  dim deStr 

  dim c,i,v 

  deStr="" 

  

  for i=1 to len(enStr) 

    c=Mid(enStr,i,1) 

    if c="%" then 

      v=eval("&h"+Mid(enStr,i+1,2))   'eval 計算一個表達式的值

      if v<128 then 

        deStr=deStr&chr(v) 

        i=i+2 

      else 

        

        if isvalidhex(mid(enstr,i,3)) then  '雙字節url符號

          if isvalidhex(mid(enstr,i+3,3)) then 

            v=eval("&h"+Mid(enStr,i+1,2)+Mid(enStr,i+4,2)) '+65536

            

            deStr=deStr& chr(v)


            i=i+5 

          else       '單個url符號

            v=eval("&h"+Mid(enStr,i+1,2)+cstr(hex(asc(Mid(enStr,i+3,1))))) 

            deStr=deStr&chr(v) 

            i=i+3 

          end if 

        else 

'          destr=destr&c 

        end if 

      end if 

    else 

      if c="+" then 

        deStr=deStr&" " 

      else 

        deStr=deStr&c 

      end if 

    end if 

  next 

'  response.write  Chr(13) & Chr(10)

'  response.write "enstr="&enStr  

'  response.write  Chr(13) & Chr(10)

'  response.write "destr="&deStr  

'  response.write  Chr(13) & Chr(10)

'  response.flush

  

  URLDecode=deStr 

end function 

'*************************************************************************************************************

function isvalidhex(str) 

  isvalidhex=true 

  str=ucase(str) 

  if len(str)<>3 then isvalidhex=false:exit function 

  if left(str,1)<>"%" then isvalidhex=false:exit function 

    c=mid(str,2,1) 

  if not (((c>="0") and (c<="9")) or ((c>="A") and (c<="Z"))) then isvalidhex=false:exit function 

    c=mid(str,3,1) 

  if not (((c>="0") and (c<="9")) or ((c>="A") and (c<="Z"))) then isvalidhex=false:exit function 

end function 


'*************************************************************************************************************


function num2gb(str)

  newStr=""

  for i=1 to len(str)

    c=mid(str,i,1)

    if c="&" and mid(str,i+1,1)="#" then

      num=""

      for j=i+2 to len(str)

        ch=mid(str,j,1)

        if ch=";" then

          i=j

          exit for

        end if

        num=num &ch

      next

      newStr=newStr & chrW(CLng(num)) 'GB2UTF8(chrW(CLng(num)))        

    else

      newStr=newStr & c

    end if

  next

  num2gb=newStr

end function

'*************************************************************************************************************

'Function GB2UTF(Chinese) 

'  For i = 1 to Len (Chinese) 

'   a = Mid(Chinese, i, 1) 

'   GB2UTF = GB2UTF & "&#x" & Hex(Ascw(a)) & ";" 

'  Next 

'End Function 

'*************************************************************************************************************

Function GB2UTF(Chinese) 

  For i = 1 to Len (Chinese) 

   a = Mid(Chinese, i, 1) 

   GB2UTF = GB2UTF &  Ascw(a)

  Next 

End Function 

'*************************************************************************************************************


function UTF2GB(UTFStr) 

    for Dig=1 to len(UTFStr) 

        if mid(UTFStr,Dig,1)="%" then 

            if len(UTFStr) >= Dig+8 then 

                GBStr=GBStr & ConvChinese(mid(UTFStr,Dig,9)) 

                Dig=Dig+8 

            else 

                GBStr=GBStr & mid(UTFStr,Dig,1) 

            end if 

        else 

            GBStr=GBStr & mid(UTFStr,Dig,1) 

        end if 

    next 

    UTF2GB=GBStr 

end function 

'*************************************************************************************************************


Function iso8601date(dLocal,utcOffset)  

  Dim d

  ' convert local time into UTC

  d = DateAdd("H",-1 * utcOffset,dLocal)


  ' compose the date

  iso8601date = Year(d) & "-" & Right("0" & Month(d),2) & "-" & Right("0" & Day(d),2)

'

 & "T" & _Right("0" & Hour(d),2) & ":" & Right("0" 

& Minute(d),2) & ":" & Right("0" & Second(d),2) & 

"Z"

End Function


'*************************************************************************************************************

Function GetAbsoluteURL(sUrl,iStep) '讀取重定向後的 url

  Dim bUrl,bDat

  GetAbsoluteURL=sUrl

  If iStep>15 Then

          Response.Write "遞歸嵌套超過15層" & "<br />"

    exit function

  End If


  If InStr(sUrl,"?")>0 THen

    Dim tmpUrl : tmpUrl=split(sUrl,"?")

    bUrl=tmpUrl(0)

    bDat=tmpUrl(1)

  Else

    bUrl=sUrl

    bDat=""

  End If


'  Response.Write "<p style=""border-top:solid 1px silver;padding:0px;margin:0px;"">"

'  Response.Write "正在準備獲取 " & sUrl & "<br />"


'  if bDat<>"" Then Response.Write "3 &nbsp;&nbsp;>>參數： " & bDat & "<br />"


  oHttp.Option(6)=0   '禁止自動Redirect

  oHttp.SetTimeouts 5000,5000,30000,5000

'  Response.Write burl&"<br /> "

  oHttp.Open "HEAD",sUrl,False

  On Error Resume Next

  oHttp.Send bDat


'  response.write  oHttp.responseText 

'  response.flush

'  Response.Write "  <br /> "

  

  If Err.Number<>0 Then

'    Response.Write "<font color=""red"">發生錯誤：" & Err.Description & "</font><br />"

    Err.Clear

'    GetAbsoluteURL=""

'    Set oHttp=Nothing

'    Response.Write "</p>"

    Exit Function

  End If

'  Response.Write " <br /> "


  On Error Goto 0

'  Response.Write "&nbsp;&nbsp;>>HTTP 狀態：" & oHttp.Status  & "<br />"


  If oHttp.Status<>200 And oHttp.Status<>302 and oHttp.Status<>301 Then

'    Response.Write "<font color=""red"">HTTP錯誤：" & oHttp.StatusText  & "</font><br />"

    Err.Clear

    GetAbsoluteURL=""

'    Set oHttp=Nothing

'    Response.Write "</p>"

    Exit Function

  End If

  Dim sLoca 

  On Error Resume Next

  sLoca=oHttp.getResponseHeader("Location")


  If Err.Number<>0 Then

    Err.Clear

    sLoca=""

  End If

  

  On Error Goto 0

'  Response.Write "  <br /> "


  If sLoca = "" Then

'    Response.Write

 "&nbsp;&nbsp;>>Content-Type:" & 

oHttp.getResponseHeader("Content-Type") & "<br />"

'    Response.Write "&nbsp;&nbsp;>>沒有返回Location頭<br />"

    GetAbsoluteURL=sUrl

'    Set oHttp=Nothing

'    Response.Write " </p>"

    GetAbsoluteURL=sUrl

    Exit Function

  Else


'    Response.Write

 " &nbsp;&nbsp;>>Content-Type:" & 

oHttp.getResponseHeader("Content-Type") & "<br />"

'    Response.Write " 收到Location頭：" & sLoca & "<br />"

'    Response.Write " </p>"


    '這裏要生成新的URL


    If InStr(sLoca,"://")<=0 Then

      '沒有指定協議，按當前URL的位置重新設置

      Dim ind : ind=InstrRev(sUrl,"/")

      sUrl=Left(sUrl,ind)

      sLoca=sUrl & sLoca

    End If

    GetAbsoluteURL=GetAbsoluteURL(sLoca,iStep+1)


  End If

End Function


%>

<script language="javaScript" runat="Server">

function UTF8toGB(str){

  return decodeURIComponent(str)


}

function encodeURI(str){

  return encodeURIComponent(str)

}

function decodeURI(str){

  return decodeURIComponent(str)

}


function GB2UTF8(str){

  return encodeURIComponent(str)

}


function convert(str) { 

  return string(str.getBytes("UTF-8"),"gb2312"); 

} 


</script>
使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

linux安裝cuda和cudnn

Mellanox網卡開啓SR-IOV

模擬手機設備：使用 Playwright 實現移動端自動化測試

HTML 00 Tutorial

全面系統的AI學習路徑，幫助普通人也能玩轉AI

從零開始：使用 Playwright 腳本錄製實現自動化測試

uni-app實現上拉加載

第 18 章 TTY 驅動

android的binder機制研究二

設計模式解析和實現(C++， java)之二十--狀態（state）模式

設計模式解析和實現(C++)之十五-Observer模式

征服 Apache + Tomcat

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結