HtmlUnit是junit的扩展框架之一,该框架模拟浏览器的行为,提供API对网页的元素进行操作。HtmlUnit支持HTTP、HTTPS、COOKIES、表单的POST和GET;能够对HTML文档进行包装,页面的各种元素可以被当做对象调用。
HtmlUnit把网页封装成一个对象,然后开发调用方法;HtmlUnit下载地址:http://htmlunit.sourceforge.net/ 下砸jar后,用Eclipse创建java project,右键选择properties->java build path->Add External JARs导入jar包后,创建class或junit test case。
一、载入页面
public void testHomePage() throws Exception{
final WebClient webclient =new WebClient();
//creat a new WebClient object which is equal to browser
webclient.getOptions().setCssEnabled(false);
webclient.getOptions().setJavaScriptEnabled(false);
//Not loading CSS and JavaScript
URL url=new URL("http://www.baidu.com/");
//structure a URL which points to tested URL, such as www.baidu.com
HtmlPage page=(HtmlPage) webclient.getPage(url);
// return corresponding page through method getPage()
System.out.print(page.getTitleText());
assertEquals("百度一下,你就知道",page.getTitleText());
}
WebClient是一个浏览器对象,含有多种浏览器上可进行的操作,,getPage函数就是通过url取得要访问的页面。getPage返回的文档被转化为HtmlPage对象,也就是被包装为HTML格式的对象,该对象可以输出页面的内容,标题,或者一个表格等等。
二、模拟特定浏览器
final
WebClient webclient =new WebClient(BrowserVersion.CHROME);
//simulate chrome browser
三、使用get或xpath方法获取特定元素
HtmlDivision div2=(HtmlDivision)page2.getHtmlElementById("breadcrumbs");
HtmlDivision div1=(HtmlDivision) page1.getByXPath("//div").get(0); //获取到第一个div
四、输入网页内容
System.out.println(div1.asXml()); //以xml格式输出
System.out.println(div2.asText()); //以txt格式输出
五、输入字符并确认
public void testSearch() throws Exception{
final WebClient webclient =new WebClient();
//creat a new WebClient object which is equal to browser
webclient.getOptions().setCssEnabled(false);
webclient.getOptions().setJavaScriptEnabled(false);
//Not loading CSS and JavaScript
URL url=new URL("http://www.baidu.com/");
//structure a URL which points to tested URL, such as www.baidu.com
HtmlPage page=(HtmlPage) webclient.getPage(url);
// return corresponding page through method getPage()
final HtmlForm form=page.getFormByName("f");
final HtmlSubmitInput button1=form.getInputByValue("百度一下");//get button by value
final HtmlTextInput textField =form.getInputByName("wd"); //get textfield by name
textField.setValueAttribute("python学习");
final HtmlPage nextPage=button1.click(); //submit search keyword
String result =nextPage.asXml(); //get search result
System.out.println(result);