XMLFeedSpider

這個Spider是用於解析XML的，它可以通過指定的節點迭代的解析XML。迭代器可以選擇iternodes，xml或html。由於xml和html都需要一次性讀取整個DOM然後才能解析XML，這樣會有性能的問題，所以推薦使用iternodes迭代器。但是當解析有錯誤標記的XML時，使用html迭代器會有些幫助。
使用XMLFeedSpider必須定義以下類屬性來設置迭代器和標籤名：
iterator: 使用的迭代器。它的值可以是：
- ‘iternodes’ - 基於正則表達式的一個快速的迭代器。
- ‘html’ - 使用Selector的一個迭代器。它使用的是DOM解析，必須將所有的DOM載入內存中，這樣對比較大的XML可能會是一個問題。
- ‘xml’ - 使用Selector的一個迭代器。它使用的是DOM解析，必須將所有的DOM載入內存中，這樣對比較大的XML可能會是一個問題。
默認的值是iternodes。

itertag
將要迭代的節點（或元素）的名字。如：

itertag = 'product'

namespaces
一個(prefix, uri)的元組，定義了Spider將處理的文件中可以使用的命名空間。元組中的prefix和uri會使用register_namespace方法自動註冊命名空間。
itertag屬性的值可以使用指定節點（或元素）的命名空間來賦值。如：

class YourSpider(XMLFeedSpider):
    namespaces = [('n', 'http://www.sitemaps.org/schemas/sitemap/0.9')]
    itertag = 'n:url'
    # ...

除了這些新屬性，還有以下可以重寫的方法：
adapt_response(response)
在Spider開始解析response之前，當response從Spider的中間件剛返回時，接收response的一個方法。可以用來在解析response之前修改response的內容，該方法必須返回一個response。

parse_node(response, selector)
對與提供的標籤名匹配的節點（或元素）調用該方法。接收response和Selector，必須重寫這個方法，不然Spider不會工作。該方法必須返回一個Item對象或Request對象或包含前兩者任意一個的可迭代對象。

**process_results(response, results)
對Spider返回的每一個結果（Item或request）調用該方法，其目的是在結果返回給框架的核心之前做最後的處理，例如設置item的ID。它接收結果的列表和response，必須返回一個結果的列表（Items或Requests）
Example：

from scrapy.spiders import XMLFeedSpider
from myproject.items import TestItem

class MySpider(XMLFeedSpider):
    name = 'example.com'
    allowed_domains = ['example.com']
    start_urls = ['http://www.example.com/feed.xml']
    iterator = 'iternodes' # This is actually unnecessary, since it's the default value
    itertag = 'item'

    def parse_node(self, response, node):
        self.logger.info('Hi, this is a <%s> node!: %s', self.itertag, ''.join(node.extract()))
        item = TestItem()
        item['id'] = node.xpath('@id').extract()
        item['name'] = node.xpath('name').extract()
        item['description'] = node.xpath('description').extract()
        return item

Scrapy - 普通的Spider（二）

XMLFeedSpider

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

CentOS7下firewalld使用

MAC MySQL安裝及配置

CentOS7下使用Yum安裝MySQL

CentOS下安裝Jenkins

MAC MacVim及Vundle安裝

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結