九、XML

文章目錄

1、XML簡介

XML是可擴展標記語言（Extensible Markup Language）的縮寫，它是一種數據表示格式，可以描述十分複雜的數據結構常用於傳輸和存儲數據。

一個XML文檔大概是長這樣：

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE note SYSTEM "book.dtd">
<book id="1">
    <name>Java核心技術</name>
    <author>Cay S. Horstmann</author>
    <isbn lang="CN">1234567</isbn>
    <tags>
        <tag>Java</tag>
        <tag>Network</tag>
    </tags>
    <pubDate/>
</book>

xml默認使用UTF-8編碼；

xml內容經常通過網絡作爲消息傳說。

1.1 XML的結構

xml有着固定的結構，第一行一定是 <?xml version="1.0" ?> ，可以加上可選的編碼。

<!DOCTYPE note SYSTEM "book.dtd"> 聲明的是文檔定義類型（DTD: Document Type Definition），是可選的。

下面的纔是xml的文檔內容。需要注意的是一個xml文檔有且僅有一個根元素。

當內容中出現了特殊符號時，需要轉義，因爲xml文檔中已經使用 < 、 >、 '' 等做標識符。

字符	表示
<	`<`
>	`>`
&	`&`
"	`"`
’	`'`

例如內容爲 Java<tm> 時應該寫成：

<name>Java&lt;tm&gt;</name>

2、解析XML

XML是一種樹形結構的文檔，它有着兩種標準的解析API：

DOM：一次性讀取XML，並在內存中表示爲樹形結構；
SAX：以流的形式讀取XML，使用事件回調。

以下面的xml爲例（book.xml）：

<?xml version="1.0" encoding="UTF-8" ?>
<book id="1" category="computer">
    <name>Java核心技術</name>
    <author>Cay S. Horstmann</author>
    <isbn lang="CN">1234567</isbn>
    <tags>
        <tag>Java</tag>
        <tag>Network</tag>
    </tags>
    <pubDate/>
</book>

2.1 使用DOM

book.xml 解析爲DOM結構如下：

Java提供了DOM API來解析xml，使用了下面的對象來表示xml的內容：

Document：代表整個xml文檔；
Element：代表一個xml元素；

xml元素指的是從 開始標籤 到 結束標籤 的部分。一個元素可以包括：
- 其他元素
- 文本
- 屬性（下面的Attribute）
Attribute：代表一個元素的某個屬性。

使用DOM API解析XML文檔的代碼如下：

public class XMLUtil {

    public static void main(String args[]) throws IOException, SAXException, ParserConfigurationException {
        parseXML();
    }

    public static void parseXML() throws ParserConfigurationException, IOException, SAXException {
        InputStream input = XMLUtil.class.getResourceAsStream("/book.xml");
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(input);
        printNode(doc, 0);
    }

  	//遍歷以讀取指定元素的值：
    static void printNode(Node n, int indent) {
        for (int i = 0; i < indent; i++) {
            System.out.print(' ');
        }
        switch (n.getNodeType()) {
            case Node.DOCUMENT_NODE:
                System.out.println("Document: " + n.getNodeName());
                break;
            case Node.ELEMENT_NODE:
                System.out.println("Element: " + n.getNodeName());
                break;
            case Node.TEXT_NODE:
                System.out.println("Text: " + n.getNodeName() + " = " + n.getNodeValue());
                break;
            case Node.ATTRIBUTE_NODE:
                System.out.println("Attr: " + n.getNodeName() + " = " + n.getNodeValue());
                break;
            case Node.CDATA_SECTION_NODE:
                System.out.println("CDATA: " + n.getNodeName() + " = " + n.getNodeValue());
                break;
            case Node.COMMENT_NODE:
                System.out.println("Comment: " + n.getNodeName() + " = " + n.getNodeValue());
                break;
            default:
                System.out.println("NodeType: " + n.getNodeType() + ", NodeName: " + n.getNodeName());
        }
        for (Node child = n.getFirstChild(); child != null; child = child.getNextSibling()) {
            printNode(child, indent + 1);
        }
    }

}

其中的 DocumentBuilder.parse() 用於解析一個xml，它可以接受InputStream、File、URL，會返回一個Document對象，這個對象代表了整個xml文檔的屬性結構

輸出的機構如下：

//輸出結果
Document: #document
 Element: book
  Text: #text = 
    
  Element: name
   Text: #text = Java核心技術
  Text: #text = //輸出這個是因爲在xml中，元素 `name` 和元素 `author` 存在換行，解析器把當成了text處理
    
  Element: author
   Text: #text = Cay S. Horstmann
  Text: #text = 
    
  Element: isbn
   Text: #text = 1234567
  Text: #text = 
    
  Element: tags
   Text: #text = 
        
   Element: tag
    Text: #text = Java
   Text: #text = 
        
   Element: tag
    Text: #text = Network
   Text: #text = 
    
  Text: #text = 
    
  Element: pubDate
   Text: #text = 2020-03-13
  Text: #text =

2.2 使用SAX

使用DOM解析的優點是簡單省事，但它的主要缺點是如果文件過大，佔用內存太大。

針對內存太大的問題，就有了另外一種解析xml的方法是SAX（Simple API for XML）。它是一種基於流的解析方法，邊讀取XML邊解析，並以事件回調的方法讓調用者獲取數據。也正是因爲一邊讀取一邊解析，所以不論XML文件多大，佔用的內存都很小。

SAX解析會觸發一系列的事件：

startDocument：開始讀取XML文檔；
startElement：讀取到了一個元素，例如<book>；
characters：讀取到了字符；
endElement：讀取到了一個結束的元素，例如</book>；
endDocument：讀取XML文檔結束。

如果用SAX API解析XML，其Java代碼如下：

public class XMLUtil {

    public static void main(String args[]) throws Exception {
        parseXML2();
    }
  
    public static void parseXML2() throws Exception {
        InputStream input = XMLUtil.class.getResourceAsStream("/book.xml");
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser saxParser = spf.newSAXParser();
        saxParser.parse(input, new MyHandler());
    }
}
class MyHandler extends DefaultHandler {
    public void startDocument() throws SAXException {
        print("start document");
    }

    public void endDocument() throws SAXException {
        print("end document");
    }

    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        print("start element:", localName, qName);
    }

    public void endElement(String uri, String localName, String qName) throws SAXException {
        print("end element:", localName, qName);
    }

    public void characters(char[] ch, int start, int length) throws SAXException {
        print("characters:", new String(ch, start, length));
    }

    public void error(SAXParseException e) throws SAXException {
        print("error:", e);
    }

    void print(Object... objs) {
        for (Object obj : objs) {
            System.out.print(obj);
            System.out.print(" ");
        }
        System.out.println();
    }
}

關於startElement 中的參數是什麼意思？

/**
     * Receive notification of the start of an element.
     *
     * <p>By default, do nothing.  Application writers may override this
     * method in a subclass to take specific actions at the start of
     * each element (such as allocating a new tree node or writing
     * output to a file).</p>
     *
     * @param uri The Namespace URI, or the empty string if the
     *        element has no Namespace URI or if Namespace
     *        processing is not being performed.
     * @param localName The local name (without prefix), or the
     *        empty string if Namespace processing is not being
     *        performed.
     * @param qName The qualified name (with prefix), or the
     *        empty string if qualified names are not available.
     * @param attributes The attributes attached to the element.  If
     *        there are no attributes, it shall be an empty
     *        Attributes object.
     * @exception org.xml.sax.SAXException Any SAX exception, possibly
     *            wrapping another exception.
     * @see org.xml.sax.ContentHandler#startElement
     */

2.3 轉爲JavaBean

DOM和SAX兩種解析XML的標準接口，使用起來都不直觀。

幸運的，我們可以使用 Jackson 這個開源庫把XML轉化爲JavaBean。

首選需要導入依賴包：

<dependency>
  <groupId>com.fasterxml.jackson.dataformat</groupId>
  <artifactId>jackson-dataformat-yaml</artifactId>
  <version>2.10.3</version>
</dependency>

創建一個JavaBean Book.java ：

/*Book.java*/
public class Book {
    private long id;
    private String name;
    private String category;
    private String author;
    private String isbn;
    private List<String> tags;
    private String pubDate;
  //省略getter和setter
  //省略toString
}

解析測試：

@Test
public void m2() throws IOException {
  InputStream input = Main.class.getResourceAsStream("/book.xml");
  JacksonXmlModule module = new JacksonXmlModule();
  XmlMapper mapper = new XmlMapper(module);
  Book book = mapper.readValue(input, Book.class);

  System.out.println(book);
}

//輸出結果
Book{id=1, name='Java核心技術', category='computer', author='Cay S. Horstmann', isbn='1234567', tags=[Java, Network], pubDate='2020-03-13'}

Java——XML

九、XML

文章目錄

1、XML簡介

1.1 XML的結構

2、解析XML

2.1 使用DOM

2.2 使用SAX

2.3 轉爲JavaBean

AI 畫圖真刺激，手把手教你如何用 ComfyUI 來畫出刺激的圖

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

HBase中刪除操作的迷惑操作

SpringBoot——外部配置

MyBatis Generator完整配置

SpringBoot——任務

CSS——彈性盒子

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結