HtmlUnit是junit的擴展框架之一,該框架模擬瀏覽器的行爲,提供API對網頁的元素進行操作。HtmlUnit支持HTTP、HTTPS、COOKIES、表單的POST和GET;能夠對HTML文檔進行包裝,頁面的各種元素可以被當做對象調用。
HtmlUnit把網頁封裝成一個對象,然後開發調用方法;HtmlUnit下載地址:http://htmlunit.sourceforge.net/ 下砸jar後,用Eclipse創建java project,右鍵選擇properties->java build path->Add External JARs導入jar包後,創建class或junit test case。
一、載入頁面
public void testHomePage() throws Exception{
final WebClient webclient =new WebClient();
//creat a new WebClient object which is equal to browser
webclient.getOptions().setCssEnabled(false);
webclient.getOptions().setJavaScriptEnabled(false);
//Not loading CSS and JavaScript
URL url=new URL("http://www.baidu.com/");
//structure a URL which points to tested URL, such as www.baidu.com
HtmlPage page=(HtmlPage) webclient.getPage(url);
// return corresponding page through method getPage()
System.out.print(page.getTitleText());
assertEquals("百度一下,你就知道",page.getTitleText());
}
WebClient是一個瀏覽器對象,含有多種瀏覽器上可進行的操作,,getPage函數就是通過url取得要訪問的頁面。getPage返回的文檔被轉化爲HtmlPage對象,也就是被包裝爲HTML格式的對象,該對象可以輸出頁面的內容,標題,或者一個表格等等。
二、模擬特定瀏覽器
final
WebClient webclient =new WebClient(BrowserVersion.CHROME);
//simulate chrome browser
三、使用get或xpath方法獲取特定元素
HtmlDivision div2=(HtmlDivision)page2.getHtmlElementById("breadcrumbs");
HtmlDivision div1=(HtmlDivision) page1.getByXPath("//div").get(0); //獲取到第一個div
四、輸入網頁內容
System.out.println(div1.asXml()); //以xml格式輸出
System.out.println(div2.asText()); //以txt格式輸出
五、輸入字符並確認
public void testSearch() throws Exception{
final WebClient webclient =new WebClient();
//creat a new WebClient object which is equal to browser
webclient.getOptions().setCssEnabled(false);
webclient.getOptions().setJavaScriptEnabled(false);
//Not loading CSS and JavaScript
URL url=new URL("http://www.baidu.com/");
//structure a URL which points to tested URL, such as www.baidu.com
HtmlPage page=(HtmlPage) webclient.getPage(url);
// return corresponding page through method getPage()
final HtmlForm form=page.getFormByName("f");
final HtmlSubmitInput button1=form.getInputByValue("百度一下");//get button by value
final HtmlTextInput textField =form.getInputByName("wd"); //get textfield by name
textField.setValueAttribute("python學習");
final HtmlPage nextPage=button1.click(); //submit search keyword
String result =nextPage.asXml(); //get search result
System.out.println(result);