利用 xml.dom.minidom 生成 xml,解決屬性無序問題和xml聲明單獨一行

1.問題描述

屬性無序問題和xml聲明不是單獨一行

# cat HKEX-EPS_20180830_003249795.xml

<?xml version="1.0" encoding="UTF-8"?><ETCML><IISHeadline><News Encoding="UTF-8" Language="en-us" TimeStamp="20180830194015"><NewsID>2468438</NewsID><NewsDate>20180830194015</NewsDate><ProviderID>HKEX-EPS</ProviderID><Type>AMENDED</Type><Language>en-us</Language><HeadlineTChi></HeadlineTChi><HeadlineSChi></HeadlineSChi><HeadlineEng>CHANGE OF COMPANY NAME,STOCK SHORT NAME AND COMPANY LOGO</HeadlineEng><ExpiryDate>20180831</ExpiryDate><MktCode>MAIN</MktCode><Cancel>false</Cancel><AttachmentList Total="1"><Attachement><FilePath>HKEX-EPS_20180830_003249795_0.PDF</FilePath><FileContentType>APPLICATION/PDF</FileContentType><FileSize>521386</FileSize></Attachement></AttachmentList><AnnouncementTypeList Total="4"><AnnTypeCd>12700</AnnTypeCd><AnnTypeCd>19790</AnnTypeCd><AnnTypeCd>10000</AnnTypeCd><AnnTypeCd>18540</AnnTypeCd></AnnouncementTypeList><RelatedStockList Total="1"><RelatedStock><Code>1400</Code><NameTChi>?地科技股份</NameTChi><NameSChi>滿地科技股份</NameSChi><NameEng>MOODY TECH HLDG</NameEng></RelatedStock></RelatedStockList></News></IISHeadline><Product>ET Net IIS Category List</Product><Provider>ET Net Ltd</Provider><Copyright>?2018 ET Net Limited. All rights reserved.</Copyright></ETCML>

 

達到效果:

cat HKEX-EPS_20180830_003249795.xml

<?xml version="1.0" encoding="UTF-8"?>

<ETCML><IISHeadline><News TimeStamp="20180830194015" Encoding="UTF-8" Language="en-us"><NewsID>2468438</NewsID><NewsDate>20180830194015</NewsDate><ProviderID>HKEX-EPS</ProviderID><Type>AMENDED</Type><Language>en-us</Language><HeadlineTChi></HeadlineTChi><HeadlineSChi></HeadlineSChi><HeadlineEng>CHANGE OF COMPANY NAME,STOCK SHORT NAME AND COMPANY LOGO</HeadlineEng><ExpiryDate>20180831</ExpiryDate><MktCode>MAIN</MktCode><Cancel>false</Cancel><AttachmentList Total="1"><Attachement><FilePath>HKEX-EPS_20180830_003249795_0.PDF</FilePath><FileContentType>APPLICATION/PDF</FileContentType><FileSize>521386</FileSize></Attachement></AttachmentList><AnnouncementTypeList Total="4"><AnnTypeCd>12700</AnnTypeCd><AnnTypeCd>19790</AnnTypeCd><AnnTypeCd>10000</AnnTypeCd><AnnTypeCd>18540</AnnTypeCd></AnnouncementTypeList><RelatedStockList Total="1"><RelatedStock><Code>1400</Code><NameTChi>?地科技股份</NameTChi><NameSChi>滿地科技股份</NameSChi><NameEng>MOODY TECH HLDG</NameEng></RelatedStock></RelatedStockList></News></IISHeadline><Product>ET Net IIS Category List</Product><Provider>ET Net Ltd</Provider><Copyright>?2018 ET Net Limited. All rights reserved.</Copyright></ETCML>

 

2操作步驟

2.1環境說明

系統自帶python2.6.6  升級爲 python2.7.10

如果沒有升級python2.7

>>> import sys

>>> sys.path

路徑爲 /usr/lib64/python2.6/xml/dom

使用的模塊是

import xml.dom.minidom

 

2.2換行處理

# cd /usr/local/lib/python2.7/xml/dom/

原始配置

    def writexml(self, writer, indent="", addindent="", newl="",

                 encoding = None):

        if encoding is None:

            writer.write('<?xml version="1.0" ?>'+newl)

        else:

            writer.write('<?xml version="1.0" encoding="%s"?>%s' % (encoding, newl))

        for node in self.childNodes:

            node.writexml(writer, indent, addindent, newl)

修改配置

    def writexml(self, writer, indent="", addindent="", newl="",

                 encoding = None):

        if encoding is None:

            writer.write('<?xml version="1.0" ?>'+'\n')

        else:

            writer.write('<?xml version="1.0" encoding="%s"?>%s' % (encoding, '\n'))

        for node in self.childNodes:

            node.writexml(writer, indent, addindent, newl)

 

2.3屬性有序處理

原始配置

    def __init__(self, tagName, namespaceURI=EMPTY_NAMESPACE, prefix=None,

                 localName=None):

        self.tagName = self.nodeName = tagName

        self.prefix = prefix

        self.namespaceURI = namespaceURI

        self.childNodes = NodeList()

 

        self._attrs = {}   # attributes are double-indexed:

        self._attrsNS = {} #    tagName -> Attribute

                           #    URI,localName -> Attribute

                           # in the future: consider lazy generation

                           # of attribute objects this is too tricky

                           # for now because of headaches with

                           # namespaces.

......

    def writexml(self, writer, indent="", addindent="", newl=""):

        # indent = current indentation

        # addindent = indentation to add to higher levels

        # newl = newline string

        writer.write(indent+"<" + self.tagName)

        attrs = self._get_attributes()

        a_names = attrs.keys()

        a_names.sort()

 

修改配置:

    def __init__(self, tagName, namespaceURI=EMPTY_NAMESPACE, prefix=None,

                 localName=None):

        self.tagName = self.nodeName = tagName

        self.prefix = prefix

        self.namespaceURI = namespaceURI

        self.childNodes = NodeList()

        #self._attrs = {}   # attributes are double-indexed:

        self._attrs = OrderedDict()   # attributes are double-indexed:

        self._attrsNS = {} #    tagName -> Attribute

                           #    URI,localName -> Attribute

                           # in the future: consider lazy generation

                           # of attribute objects this is too tricky

                           # for now because of headaches with

                           # namespaces.

......

    def writexml(self, writer, indent="", addindent="", newl=""):

        # indent = current indentation

        # addindent = indentation to add to higher levels

        # newl = newline string

        writer.write(indent+"<" + self.tagName)

        attrs = self._get_attributes()

        a_names = attrs.keys()

        #a_names.sort()

 

 

3.總結

親測可用


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章