使用.NET組件編寫郵箱蒐集工具

        前面轉載了一篇文章介紹ChilkatDotNet組件的使用,下面我將利用這個組件編寫一個從網頁蒐集Email的工具.

       從網頁中搜集信息有兩個難點需要解決:一是編寫可以通過鏈接遍歷網頁的蜘蛛程序,這點ChilkatDotNet組件已經給我們提供了很好的支持.二是從網頁中提取需要的信息,這點可以通過很多方式解決,這裏我選擇的是正則表達式.
       先給一張程序運行時的截圖:


       界面的設計很簡單,3個Textbox+1個RichTextBox+2個Button,3個Textbox分別用來輸入站點地址,起始Url和需要遍歷的鏈接數,RichTextBox用來存放蒐集到的網頁信息,這裏我保存的是網頁url和網頁中的Email地址.
       程序主要分爲兩部分,首先是遍歷站點,代碼如下:

Chilkat.Spider spider = new Chilkat.Spider();

string website = this.textWebsite.Text;

string url = this.textUrl.Text;

int links = Int32.Parse(this.textLinks.Text);

// The spider object crawls a single web site at a time. As you'll see

// in later examples, you can collect outbound links and use them to

// crawl the web. For now, we'll simply spider 10 pages of chilkatsoft.com

spider.Initialize(website);

// Add the 1st URL:

spider.AddUnspidered(url);

// Begin crawling the site by calling CrawlNext repeatedly.

int i;

for (i = 0; i <= links; i++)

{

bool success;

success = spider.CrawlNext();

if (success == true)

{

Invoke(new AppendTextDelegate(AppendText), new object[] { spider.LastUrl + "\r\n" });

GetAllURL(spider.LastUrl.ToString());

}

else

{

// Did we get an error or are there no more URLs to crawl?

if (spider.NumUnspidered == 0)

{

MessageBox.Show("No more URLs to spider");

}

else

{

MessageBox.Show(spider.LastErrorText);

}

// Sleep 1 second before spidering the next URL.

spider.SleepMs(1000);

}

和ChilkatDotNet裏的示例代碼相似,只是增加了從文本框獲取初始條件的代碼.獲取Url地址後,需要提取網頁的內容,再根據正則表達式獲取Email地址.
獲取網頁內容:

HttpWebRequest webRequest1 = (HttpWebRequest)WebRequest.Create(new Uri(URlStr));

webRequest1.Method = "GET";

HttpWebResponse response = (HttpWebResponse)webRequest1.GetResponse();

Stream stream = response.GetResponseStream();

StreamReader streamReader = new StreamReader(stream, Encoding.Default);

String textData = streamReader.ReadToEnd();

streamReader.Close();

response.Close();

提取Email的正則表達式:

@"(?<EmailStr>\b[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}\b)"

關於正則表達式的用法,網上有很多教程,隨便找一個學習一下就行.
這裏我只蒐集了單個站點的Email地址,利用ChilkatDotNet組件不難做到蒐集整個網絡的信息,有興趣的朋友可以自己研究一下.

使用.NET組件編寫郵箱蒐集工具

我的友情鏈接

QQ遊戲輔助工具-大家來找碴（附源碼）

使用Javascript,CSS和Ajax創建ASP.NET自定義控件

MD5加密整理

給新手的索引查詢示例

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結