Java使用itext在pdf中解析html

原創

2019-07-30 09:30

1、依賴

        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>itextpdf</artifactId>
            <version>5.5.13</version>
        </dependency>

        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>itext-asian</artifactId>
            <version>5.2.0</version>
        </dependency>

        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.11.2</version>
        </dependency>

        <dependency>
            <groupId>com.itextpdf.tool</groupId>
            <artifactId>xmlworker</artifactId>
            <version>5.5.11</version>
        </dependency>

2、Demo

package com.zdj;

import com.itextpdf.text.*;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.draw.LineSeparator;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import org.jsoup.Jsoup;

import java.io.ByteArrayInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;

/**
 * @author zhangdj
 * @date 2019/7/24
 */
public class ItextParseHtml {
    public static void main(String[] args) throws Exception{
        // 1.新建document
        Document document = new Document();
        // 2.建立一個書寫器(Writer)與document對象關聯，通過書寫器(Writer)可以將文檔寫入到磁盤中。
        //創建 PdfWriter 對象 第一個參數是對文檔對象的引用，第二個參數是文件的實際名稱，在該名稱中還會給出其輸出路徑。
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("D:/test.pdf"));
        // 3.打開文檔
        document.open();
        //要解析的html
        String content = "<h1 style=\"text-align:center\"><span style=\"color:#3498db\"><strong>內容標題</strong></span></h1>↵↵<p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;<span style=\"font-size:22px\"><span style=\"color:#2ecc71\">拜訪計劃內容拜訪計劃內容拜訪計劃內容,拜訪計劃內容拜訪計劃內容拜訪計劃內容,拜訪計劃內容拜訪計劃內容,拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容.</span></span></p>↵↵<figure class=\"easyimage easyimage-full\"><img alt=\"\" sizes=\"100vw\" src=\"http://oss.job.cnoocmall.com/cnooc/crmB40245B912624D3DB30BA81B435FD087.jpg\" srcset=\"\" width=\"1024\" />↵<figcaption></figcaption>↵</figure>↵↵<p><span style=\"font-size:22px\"><span style=\"color:#2ecc71\">&nbsp; &nbsp; &nbsp; 拜訪計劃內容2拜訪計劃內容拜訪計劃,內容拜訪計劃內容拜訪計劃內容拜訪計劃,內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計,劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計,劃內容拜訪計劃內容拜,訪計劃內容拜訪計劃內容,拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容拜訪計劃內容.</span></span></p>";
        //html轉換成普通文字,方法如下:
        org.jsoup.nodes.Document contentDoc = Jsoup.parseBodyFragment(content);
        org.jsoup.nodes.Document.OutputSettings outputSettings = new org.jsoup.nodes.Document.OutputSettings();
        outputSettings.syntax(org.jsoup.nodes.Document.OutputSettings.Syntax.xml);
        contentDoc.outputSettings(outputSettings);
        String parsedHtml = contentDoc.outerHtml();
        //這兒的font-family不支持漢字，{font-family:仿宋} 是不可以的。
        InputStream cssIs = new ByteArrayInputStream("* {font-family: PingFang-SC-Medium.otf;}".getBytes("UTF-8"));
        //第四個參數是html中的css文件的輸入流
        //第五個參數是字體提供者，使用系統默認支持的字體時，可以不傳。
        XMLWorkerHelper.getInstance().parseXHtml(writer, document, new ByteArrayInputStream(parsedHtml.getBytes()), cssIs);
        // 5.關閉文檔
        document.close();
    }
}

3、效果圖

4、文章來源

https://blog.csdn.net/sprionzgyp/article/details/79583391

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Java使用itext在pdf中解析html

1、依賴

2、Demo

3、效果圖

4、文章來源

SpringBoot整合Mybatis Mapper掃描問題

idea SpringBoot自動生成webapp目錄和web.xml文件

SpringMVC相應json數據，報406的兩種可能

centos6使用rpm 安裝nginx(可以不聯網)、配置靜態資源訪問、遇到的問題以及解決方法

go mysql數據庫中存入時間問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結