16.jsoup 簡單使用

1.簡介

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

jsoup是一個用於處理實際HTML的Java庫。它使用HTML5最佳DOM方法和CSS選擇器，爲提取URL以及提取和處理數據提供了非常方便的API。

2.使用

2.1 導入依賴

implementation 'org.jsoup:jsoup:1.13.1'

2.2 解析字符串內容

String html = "<html><head><title>First parse</title></head>"
  + "<body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);

2.3 解析頁面內容的一部分內容

String html = "<div><p>Lorem ipsum.</p>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();

2.4 從網頁加載

Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();

也可以執行post

Document doc = Jsoup.connect("http://example.com")
  .data("query", "Java")
  .userAgent("Mozilla")
  .cookie("auth", "token")
  .timeout(3000)
  .post();

2.5 從文件加載內容

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

3.提取數據

3.1 使用Dom方法瀏覽文檔

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
  String linkHref = link.attr("href");
  String linkText = link.text();
}

3.2 使用選擇器查找元素

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
//有鏈接的a標籤
Elements links = doc.select("a[href]"); // a with href
//以.png結束的img
Elements pngs = doc.select("img[src$=.png]");
  // img with src ending .png

//以class名爲mastthead的div
Element masthead = doc.select("div.masthead").first();
  // div with class=masthead

//h3 後a 標籤
Elements resultLinks = doc.select("h3.r > a"); // direct a after h3

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

16.jsoup 簡單使用

1.簡介

2.使用

2.1 導入依賴

2.2 解析字符串內容

2.3 解析頁面內容的一部分內容

2.4 從網頁加載

2.5 從文件加載內容

3.提取數據

3.1 使用Dom方法瀏覽文檔

3.2 使用選擇器查找元素

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

Garnet：微軟官方基於.NET開源的高性能分佈式緩存存儲數據庫

Flink執行圖

Java響應式編程

評估統計算法在銀行僞造鈔票檢測中的價值

Dokcer部署Kafka集羣

【Linux命令學習】lsof查看打開的文件

swift 訪問音頻問題

4.3 註解（Annotation）

15.3 動態更新圖片模糊效果實現

16.1 jsoup 簡單使用

android插件化開發指南-讀書筆記（1）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

16.jsoup 簡單使用

1.簡介

2.使用

2.1 導入依賴

2.2 解析字符串內容

2.3 解析頁面內容的一部分內容

2.4 從網頁加載

2.5 從文件加載內容

3.提取數據

3.1 使用Dom方法瀏覽文檔

3.2 使用選擇器 查找元素

3.2 使用選擇器查找元素