DPDK(14):rte_mbuf

本文主要介紹rte_mbuf與rte_mempool數據結構之間的組織關係、以及網卡接收到的數據是如何存儲在rte_mbuf中的。

 

一、rte_mbuf、rte_mempool及網卡收到的數據包在內存中的組織結構

      

調用rte_mempool_create()函數創建rte_mempool的時候,指定申請多少個rte_mbuff及每個rte_mbuf中elt_size的大小。elt_size是爲網卡接收的數據包預先分配的內存的大小,該內存塊就是rte_mbuf->pkt.data的實際存儲區域。具體如上圖所示。

在申請的rte_mempool內存塊中,最前面存儲struct rte_mempool數據結構,後面緊接着是rte_pktmbuf_pool_private數據,再後面就是N個rte_mbuf內存塊。

每個rte_mbuf內存中,最前面同樣存儲的是struct rte_mbuf數據結果,後面是RTE_PKTMBUF_HEADROOM,最後面就是實際網卡接收到的數據,如下:

複製代碼
   struct rte_mbuf *m = _m;
    uint32_t buf_len = mp->elt_size - sizeof(struct rte_mbuf);

    RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf));

    memset(m, 0, mp->elt_size);

    /* start of buffer is just after mbuf structure */
    m->buf_addr = (char *)m + sizeof(struct rte_mbuf);
    m->buf_physaddr = rte_mempool_virt2phy(mp, m) +
            sizeof(struct rte_mbuf);
    m->buf_len = (uint16_t)buf_len;

    /* keep some headroom between start of buffer and data */
    m->pkt.data = (char*) m->buf_addr + RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len);

    /* init some constant fields */
    m->type = RTE_MBUF_PKT;
    m->pool = mp;
    m->pkt.nb_segs = 1;
    m->pkt.in_port = 0xff;
複製代碼

 

二、網卡接收的數據是如何存儲到rte_mbuf中的?

以e1000網卡爲例,在網卡初始化的時候,調用eth_igb_rx_init()初始化網卡的收包隊列。每個收包隊列數據結果如下:

複製代碼
/**
 * Structure associated with each RX queue.
 */
struct igb_rx_queue {
    struct rte_mempool  *mb_pool;   /**< mbuf pool to populate RX ring. */
    volatile union e1000_adv_rx_desc *rx_ring; /**< RX ring virtual address. */
    uint64_t            rx_ring_phys_addr; /**< RX ring DMA address. */
    volatile uint32_t   *rdt_reg_addr; /**< RDT register address. */
    volatile uint32_t   *rdh_reg_addr; /**< RDH register address. */
    struct igb_rx_entry *sw_ring;   /**< address of RX software ring. */
    struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
    struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
    uint16_t            nb_rx_desc; /**< number of RX descriptors. */
    uint16_t            rx_tail;    /**< current value of RDT register. */
    uint16_t            nb_rx_hold; /**< number of held free RX desc. */
    uint16_t            rx_free_thresh; /**< max free RX desc to hold. */
    uint16_t            queue_id;   /**< RX queue index. */
    uint16_t            reg_idx;    /**< RX queue register index. */
    uint8_t             port_id;    /**< Device port identifier. */
    uint8_t             pthresh;    /**< Prefetch threshold register. */
    uint8_t             hthresh;    /**< Host threshold register. */
    uint8_t             wthresh;    /**< Write-back threshold register. */
    uint8_t             crc_len;    /**< 0 if CRC stripped, 4 otherwise. */
    uint8_t             drop_en;  /**< If not 0, set SRRCTL.Drop_En. */
};
複製代碼

我們只關注其中兩個成員變量,rx_ring和sw_ring。rx_ring記錄的是union e1000_adv_rx_desc數組,每個union e1000_adv_rx_desc中指定了網卡接收數據的DMA地址,網卡收到數據後,直接往該地址寫數據。sw_ring數組記錄的是每個具體的rte_mbuf地址,每個rte_mbuf的rte_mbuff->buf_phyaddr + RTE_PKTMBUF_HEADROOM映射後的DMA地址就存儲在rx_ring隊列的union e1000_adv_rx_desc數據結構中。rte_mbuff->buf_phyaddr + RTE_PKTMBUF_HEADROOM指向的就是rte_mbuf->pkt.data的地址。此時,rte_mbuf、rte_mbuf->pkt.data,已經網卡的收包隊列就關聯起來了。具體如下:

複製代碼
static int
igb_alloc_rx_queue_mbufs(struct igb_rx_queue *rxq)
{
    struct igb_rx_entry *rxe = rxq->sw_ring;
    uint64_t dma_addr;
    unsigned i;

    /* Initialize software ring entries. */
    for (i = 0; i < rxq->nb_rx_desc; i++) {
        volatile union e1000_adv_rx_desc *rxd;
        struct rte_mbuf *mbuf = rte_rxmbuf_alloc(rxq->mb_pool);

        if (mbuf == NULL) {
            PMD_INIT_LOG(ERR, "RX mbuf alloc failed "
                "queue_id=%hu\n", rxq->queue_id);
            return (-ENOMEM);
        }
        dma_addr =
            rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
        rxd = &rxq->rx_ring[i];
        rxd->read.hdr_addr = dma_addr;
        rxd->read.pkt_addr = dma_addr;
        rxe[i].mbuf = mbuf;
    }

    return 0;
}
複製代碼

網卡收到數據後,向rx_ring指定的DMA地址上寫數據,其實,就是往每個rte_mbuf->pkt.data寫數據。應用程序在調用rte_eth_rx_burst()收包時,以e1000網卡爲例,最後調用的是eth_igb_recv_pkts(),就是從每個收包隊列中,從sw_ring數組中將rte_mbuf取出來,然後重啓申請新的rte_mbuf替換到rx_ring中,重新關聯rte_mbuf、union e1000_adv_rx_desc、sw_ring以及rte_mbuf->pkt.data的DMA地址。如下簡圖所示。

    

 

錯誤之處,歡迎指出。

轉載請標明轉自http://www.cnblogs.com/MerlinJ/p/4284706.html

發佈了24 篇原創文章 · 獲贊 8 · 訪問量 5萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章