（轉）Aspose.words編程指南之DOM樹再識，各層結構之間的關係

原文鏈接：https://blog.csdn.net/sinat_30276961/article/details/48155223

https://blog.csdn.net/sinat_30276961/article/details/48155223

上一篇Aspose.words編程指南之DOM樹結構初識，Node類繼承關係及說明我運行了第一個簡單的應用，並且講述了它的加載、保存和轉換方式。然後從它設計理念，講解了DOM的概念。這一篇將會繼續講解DOM基本概念和節點之間獲取方式。我相信，如果你仔細看了這篇博客，會對Aspose.words的DOM結構有個深入的瞭解。

Document Object Model

結構圖

上一篇內容中，我們從一個例子入手，大概瞭解了DOM樹的概念，並講解了Node類的繼承關係和概念。這裏，我們從結構圖宏觀的查看下它們之間的關係。

Document和Section

在上一篇中，我們知道：

Document是文檔樹的根節點，提供訪問整個文檔的入口
Section對象對應文檔中的一節

也就是說，Document裏包含一個或多個Section。

這裏，我們從一張圖，來看看這兩者之間的關係。

這張圖很清晰的描述了Document和Section在文檔中扮演的角色。我們來羅列一下：

1.一個Document裏包含一個或多個Section。
2.一個Section裏包含一個Body node和0個或多個HeaderFooter nodes
3.Body和HeaderFooter node都包含0個或多個Block-level Nodes。
4.一個Document裏可以包含0個或1個GlossaryDocument。

一般來說，一個word文檔包含一個或多個Sections。一個Section可以定自己的頁面大小，邊距，文字方向排布，文本欄數，也可以定義headers 和footers。一個文檔的Sections由section breaks分隔。

一個Section包含主要文本和頭注、腳註。這些統一被稱爲”Stories”。在Aspose.words裏，Section node包含Story node : Body & HeaderFooter。主要文本被管理在Body裏，頭注和腳註被管理在HeaderFooter裏。

任何文本story包含一個或多個paragraph和一個或多個table。這部分被稱爲Block-level nodes。

此外，一個Document可以包含一個glossary document。一個glossary document保存0個或多個building blocks。也就是說一個Document可以包含一個GlossaryDocument。一個GlossaryDocument包含0個或多個BuildingBlock，每個BuildingBlock包含0個或多個Section，並能管理這些Section做插入，拷貝和移除等動作。

Block-level Nodes

上面說到，任何文本story包含一個或多個paragraph和一個或多個table。這部分就是Block-level Nodes。

我們先貼張圖直觀的看下里麪包含了什麼。

我們來分析一下上面的結構圖：

1.Block-level Nodes在一個DOM樹裏可以出現在很多節點裏。這裏羅列了八個(Body, HeaderFooter, Footnote, Comment, Shape, Cell, CustomXmlMarkup, StructuredDocumentTag)

2.Block-level Nodes裏最重要的兩個是：table和paragraph。

3.一個Table包含0個或多個Row-level Nodes。

4.一個Paragraph包含0個或多個Inline-level Nodes。

5.CustomXmlMarkup和StructuredDocumentTag還可以嵌套Block-level Nodes。

用通俗一點的話來說，Block-level Nodes就是文檔裏的一塊內容，它可以包含表格和段落。表格裏有一行一行的內容，段落裏有內置的一行一行的內容。它還可以包含標籤，標籤裏還可以嵌套包含一塊內容。

這樣講是不是清晰了點，不過爲了能更簡單的查看代碼，我們還是儘量用術語來定義，習慣了就好。如果實在不懂，可以查看上一篇關於Node類的定義說明。

Inline-level Nodes

我們從上面可以看到，一個Paragraph包含0個或多個Inline-level Nodes。這裏，我們來看看Inline-level Nodes。

照例先上張圖。

哇~這一層裏有這麼多成員！沒事，我們慢慢來看。

我們繼續總結一下上面的結構圖：

1.Paragraph，Smarttag，CustomXmlMarkup，StructuredDocumentTag這幾個節點都可以包含Inline-level節點。這裏面，最經常包含Inline-level節點的是Paragraph。

2.Paragraph可以包含多個run節點，每個run節點的格式可以都不一樣。

3.Paragraph可以包含書籤，BookmarkStart和BookmarkEnd。

4.Paragraph可以包含註釋，CommentRangeStart、CommentRangeEnd、Comment和Footnote。

5.Paragraph可以包含Word fields（這一塊不熟，如果word用到過域應該會更容易理解這塊），FieldStart、FieldSeparator、FieldEnd和FormField

6.Paragraph可以包含shapes, drawings, images等，通過Shape和GroupShape節點。

7.Paragraph可以包含標籤，SmartTag, CustomXmlMarkup和StructuredDocumentTag

GroupShape可以包含Shape或者繼續嵌套GroupShape；Shape，Footnode，Comment可以包含Block-level節點；

好吧，內嵌層的成員很多，部分成員可以繼續嵌套內嵌層的成員，也有部分成員可以內嵌塊層。關係還蠻複雜的~~

Table, Row and Cell

在塊層裏，我們有提到過，一個Table包含0個或多個Row-level Nodes。我們接下去看看這兩者之間的關係。

照例上圖先~

我們繼續看圖說話。

1.一個Table可以包含很多Row
2.一個Row可以包含很多Cell
3.Cell可以繼續包含塊層節點。尼瑪，又能嵌套~
4.CustomXmlMarkup和StructuredDocumentTag這兩個分別是Block-level、Row-level和Cell-level的成員。也就是說，可以在這幾層不斷嵌套。

Document樹查看

Document Explorer

既然Aspose.words會把一個文檔解析成一個DOM數，那麼是否有工具可以清晰的查看某個word文檔的DOM樹結構呢？

我們可以自己寫啊！通過解析，再添加log。這顯然是個很不錯的想法。不過，其實是有現成的。

我們可以去它的官網找DEMO，裏面包含Document explorer。Aspose.Words for Java libs & examples。很遺憾的是，目前android端有用的例子只有可憐的一個：DocumentViewer。其功能是用來查看word文檔。所以，我只能從其他平臺的例子裏去找了。發現.NET和java都有Document explorer，於是乎搭建環境，把java的例子跑起來看了下，效果還行。

Tree Nodes

我們再來看張UML圖：

就像樹形結構的特點，每個節點都處在一個樹中，它會有一定的關係。

比方說圖中的Node，它包含了CompositeNode和Inline，所以它相對於CompositeNode和Inline是父節點，而CompositeNode和Inline是它的子節點。CompositeNode和Inline有相同的父節點，所以它們是兄弟節點。記住，Document節點永遠是一個文檔DOM樹的根節點。

兄弟節點之間是有先後關係的，就像上面Document explorer解析的一個文檔，body下有很多paragraph，它們有先後順序。我們再看張圖：

上圖也展示了兄弟節點之間的先後順序關係。我們可以看到Body下有兩個子節點：Paragraph和Table，它們的先後順序通過數組遊標確定。

只有繼承自CompositeNode，才能做爲父節點包含子節點；如果繼承自Node，是無法包含子節點的。

接下去，我們再來看看Node節點的關係在代碼裏怎麼體現。

Parent Node

查看一個節點的父節點，我們可以通過Node.ParentNode這個屬性。如果一個Node剛創建出來，還沒添加到dom樹裏；或者一個節點從DOM樹裏移除了，那麼它是沒有父節點的，Node.ParentNode此時爲null。你可以通過在父節點調用Node.Remove來移除它的子節點。

根節點當然是沒有父節點的。

如下代碼展示怎麼獲得父節點：

// Create a new empty document. It has one section.
Document doc = new Document();

// The section is the first child node of the document.
Node section = doc.getFirstChild();

// The section's parent node is the document.
System.out.println("Section parent is the document: " + (doc == section.getParentNode()));

Owner Document

這裏需要強調一點的是，一個Node(節點)是永遠要屬於某個Document的，哪怕它是剛被創建出來還是已經被移除出DOM樹。我們可以通過該節點的Node.Document查看它所屬的Document。

我們通過一個例子來看看：

// Open a file from disk.
Document doc = new Document();

// Creating a new node of any type requires a document passed into the constructor.
Paragraph para = new Paragraph(doc);

// The new paragraph node does not yet have a parent.
System.out.println("Paragraph has no parent node: " + (para.getParentNode() == null));

// But the paragraph node knows its document.
System.out.println("Both nodes' documents are the same: " + (para.getDocument() == doc));

// The fact that a node always belongs to a document allows us to access and modify
// properties that reference the document-wide data such as styles or lists.
para.getParagraphFormat().setStyleName("Heading 1");

// Now add the paragraph to the main text of the first section.
doc.getFirstSection().getBody().appendChild(para);

// The paragraph node is now a child of the Body node.
System.out.println("Paragraph has a parent node: " + (para.getParentNode() != null));

上述代碼，我們可以看到，在創建某個節點時，會馬上傳入Document對象，這樣，它就保存了根節點。此時，它是沒有父節點的。在後面，通過doc.getFirstSection().getBody().appendChild(para);纔有了父節點。

Child Nodes

最有效的查找子節點的方式是通過CompositeNode的CompositeNode.FirstChild和CompositeNode.LastChild這兩個屬性，如果沒有子節點，會返回null。

CompositeNode也提供了CompositeNode.ChildNodes的collection，可以方便我們遍歷。

不過，我們需要注意的是CompositeNode.ChildNodes是在不斷動態變化的，在每次添加或者移除時，就會更新它。關於這一塊，我會在後面章節再展開講解。

你可以通過CompositeNode.HasChildNodes這個屬性，直接查看某個節點是否有子節點。

下面的例子，展示了遍歷子節點的方法：

NodeCollection children = paragraph.getChildNodes();
for (Node child : (Iterable<Node>) children)
{
    // Paragraph may contain children of various types such as runs, shapes and so on.
    if (child.getNodeType() == NodeType.RUN)
    {
        // Say we found the node that we want, do something useful.
        Run run = (Run)child;
        System.out.println(run.getText());
    }
}

NodeCollection children = paragraph.getChildNodes();
for (int i = 0; i < children.getCount(); i++)
{
    Node child = children.get(i);

    // Paragraph may contain children of various types such as runs, shapes and so on.
    if (child.getNodeType() == NodeType.RUN)
    {
        // Say we found the node that we want, do something useful.
        Run run = (Run)child;
        System.out.println(run.getText());
    }
}

上一篇有說到NodeType，在這裏使用是最合適的。

Sibling Nodes

關於兄弟節點，我先前有說過，它們是有先後順序的。就像是家裏兄弟姐妹一樣，出生有先後。

那麼怎麼獲取哥哥\姐姐或者弟弟\妹妹呢？通過Node.PreviousSibling和Node.NextSibling這兩個屬性。如果是最小的，那麼它的Node.NextSibling就會是null；同理，如果是最大的，那麼它的Node.PreviousSibling就會是null。

注意點，兄弟節點在Aspose內部是通過單鏈去維護的，所以Node.NextSibling比Node.PreviousSibling高效的多。

下面代碼展示瞭如何遍歷某個節點開始往下的所有的子節點

childNode.isComposite()這個是查看該節點是否是CompositeNode。

Typed Access to Children and Parent

可以看到，在上面的代碼例子裏，我們遍歷的時候，獲得某個節點，需要通過NodeType去判斷該節點的類型，然後進行強轉。

如果你不喜歡這種暴力編程方式，可以通過以下方式：

1.父節點公開了明確的子節點類型FirstXXX和LastXXX屬性。比方說，Document裏有Document.FirstSection和Document.LastSection。同樣的，Table裏也有Table.FirstRow和Table.LastRow。
2.當然，父節點也公開了子節點類型的collection，比方說，Document.Sections, Body.Paragraphs等等。
3.子節點也公開了父節點類型，比方說，Run.ParentParagraph和Paragraph.ParentSection.

如下代碼展示這種方案：

public void recurseAllNodes() throws Exception
{
    // Open a document.
    Document doc = new Document(getMyDir() + "Node.RecurseAllNodes.doc");

    // Invoke the recursive function that will walk the tree.
    traverseAllNodes(doc);
}

/**
 * A simple function that will walk through all children of a specified node recursively
 * and print the type of each node to the screen.
 */
public void traverseAllNodes(CompositeNode parentNode) throws Exception
{
    // This is the most efficient way to loop through immediate children of a node.
    for (Node childNode = parentNode.getFirstChild(); childNode != null; childNode = childNode.getNextSibling())
    {
        // Do some useful work.
        System.out.println(Node.nodeTypeToString(childNode.getNodeType()));

        // Recurse into the node if it is a composite node.
        if (childNode.isComposite())
            traverseAllNodes((CompositeNode)childNode);
    }
}

好了，關於Aspose.words的DOM結構就講到這裏。下一篇Aspose.words編程指南之Working with Document深入講解最核心的Node類—Document。

（轉）Aspose.words編程指南之DOM樹再識，各層結構之間的關係

Document Object Model

結構圖

Document和Section

Block-level Nodes

Inline-level Nodes

Table, Row and Cell

Document樹查看

Document Explorer

Tree Nodes

Parent Node

Owner Document

Child Nodes

Sibling Nodes

Typed Access to Children and Parent

（轉）Python - 淺談Python的編譯與反編譯

(轉)徹底搞懂Python切片操作

（轉）關於python打包成exe的一點經驗之談

（轉)多線程優化

（轉）python文件句柄只能用一次的誤解

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結