dpdk添加設備基本流程

    本文主要以ixgbe設備爲例,說明向dpdk添加一個ixgbe設備的大致過程。

1、使用dpdk的程序(如ovs)調用rte_dev_probe向dpdk註冊一個設備,rte_dev_probe的核心處理函數爲local_dev_probe,這個函數主要包含了設備總線的匹配,pci設備的bar空間映射,以及最終爲設備添加ixgbe驅動。下面看一下這個函數

int
local_dev_probe(const char *devargs, struct rte_device **new_dev)
{
	struct rte_device *dev;
	struct rte_devargs *da;
	int ret;

	*new_dev = NULL;
	da = calloc(1, sizeof(*da));
	if (da == NULL)
		return -ENOMEM;

        //找到該設備的總線(pci bus)
	ret = rte_devargs_parse(da, devargs);
	if (ret)
		goto err_devarg;

	if (da->bus->plug == NULL) {
		RTE_LOG(ERR, EAL, "Function plug not supported by bus (%s)\n",
			da->bus->name);
		ret = -ENOTSUP;
		goto err_devarg;
	}

	ret = rte_devargs_insert(&da);
	if (ret)
		goto err_devarg;

	/* the rte_devargs will be referenced in the matching rte_device */
        //調用rte_pci_scan將設備添加的bus總線上
	ret = da->bus->scan();
	if (ret)
		goto err_devarg;

	dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
	if (dev == NULL) {
		RTE_LOG(ERR, EAL, "Cannot find device (%s)\n",
			da->name);
		ret = -ENODEV;
		goto err_devarg;
	}
	/* Since there is a matching device, it is now its responsibility
	 * to manage the devargs we've just inserted. From this point
	 * those devargs shouldn't be removed manually anymore.
	 */
        //爲設備映射bar資源、找到對應的驅動模塊
	ret = dev->bus->plug(dev);
	if (ret && !rte_dev_is_probed(dev)) { /* if hasn't ever succeeded */
		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
			dev->name);
		return ret;
	}

	*new_dev = dev;
	return ret;

err_devarg:
	if (rte_devargs_remove(da) != 0) {
		free(da->args);
		free(da);
	}
	return ret;
}

2、local_dev_probe的plug最終調用pci_plug,然後遍歷bus上的所有驅動爲設備匹配驅動,匹配驅動的函數爲rte_pci_match,從這個函數可以看出,其實就是通過匹配驅動的id_table裏的信息是否能匹配上設備的pci信息,以ixgbe爲例,這裏的id_table一開始就定義好的,然後存放在struct rte_pci_driver rte_ixgbe_pmd裏,最終通過RTE_PMD_REGISTER_PCI將ixgbe_pmd驅動註冊到pci總線上。

int
rte_pci_match(const struct rte_pci_driver *pci_drv,
	      const struct rte_pci_device *pci_dev)
{
	const struct rte_pci_id *id_table;

	for (id_table = pci_drv->id_table; id_table->vendor_id != 0;
	     id_table++) {
		/* check if device's identifiers match the driver's ones */
		if (id_table->vendor_id != pci_dev->id.vendor_id &&
				id_table->vendor_id != PCI_ANY_ID)
			continue;
		if (id_table->device_id != pci_dev->id.device_id &&
				id_table->device_id != PCI_ANY_ID)
			continue;
		if (id_table->subsystem_vendor_id !=
		    pci_dev->id.subsystem_vendor_id &&
		    id_table->subsystem_vendor_id != PCI_ANY_ID)
			continue;
		if (id_table->subsystem_device_id !=
		    pci_dev->id.subsystem_device_id &&
		    id_table->subsystem_device_id != PCI_ANY_ID)
			continue;
		if (id_table->class_id != pci_dev->id.class_id &&
				id_table->class_id != RTE_CLASS_ANY_ID)
			continue;

		return 1;
	}

	return 0;
}

3、爲設備找到驅動後,接下來一步比較重要的是爲設備映射資源信息,如果使用vfio驅動,調用pci_vfio_map_resource,這個函數一開始先通過rte_vfio_setup_device爲設備分配vfio_container_id、vfio_group_id,同時設置iommu_type,然後調用dma_map_func將rte_eal_get_configuration()->mem_config的內存信息進行dma映射,這裏的mem_config表示dpdk管理的內存信息(從這裏看,dpdk應該是一開始會將所有內存都進行dma映射?後面驅動的rx_ring、tx_ring分配dma地址的時候,貌似也沒有進一步的dma映射,而是直接使用這裏分配好的iova地址。)。 完成dma映射後,通過VFIO_DEVICE_GET_INFO獲取設備的信息(主要是pci的bar個數信息及中斷信息)。

    獲取到pci的bar個數信息後,先通過pci_vfio_get_region_info獲取每個bar region的地址偏移及大小信息,然後再通過pci_vfio_mmap_bar將其映射到用戶空間,並將映射後的bar地址信息存放在rte_pci_device->mem_resource。

4、接下來主要是調用驅動的probe函數,初始化設備信息,如ixgbe,最終調用eth_ixgbe_pci_probe,該probe函數主要是調用rte_eth_dev_create

int __rte_experimental
rte_eth_dev_create(struct rte_device *device, const char *name,
	size_t priv_data_size,
	ethdev_bus_specific_init ethdev_bus_specific_init,
	void *bus_init_params,
	ethdev_init_t ethdev_init, void *init_params)
{
	struct rte_eth_dev *ethdev;
	int retval;

	RTE_FUNC_PTR_OR_ERR_RET(*ethdev_init, -EINVAL);

	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
                //分配一個dev數據結構
		ethdev = rte_eth_dev_allocate(name);
		if (!ethdev)
			return -ENODEV;

		if (priv_data_size) {
			ethdev->data->dev_private = rte_zmalloc_socket(
				name, priv_data_size, RTE_CACHE_LINE_SIZE,
				device->numa_node);

			if (!ethdev->data->dev_private) {
				RTE_LOG(ERR, EAL, "failed to allocate private data");
				retval = -ENOMEM;
				goto probe_failed;
			}
		}
	} else {
		ethdev = rte_eth_dev_attach_secondary(name);
		if (!ethdev) {
			RTE_LOG(ERR, EAL, "secondary process attach failed, "
				"ethdev doesn't exist");
			return  -ENODEV;
		}
	}

	ethdev->device = device;

	if (ethdev_bus_specific_init) {
                //初始設備的numa_node等信息
		retval = ethdev_bus_specific_init(ethdev, bus_init_params);
		if (retval) {
			RTE_LOG(ERR, EAL,
				"ethdev bus specific initialisation failed");
			goto probe_failed;
		}
	}

        //初始化硬件設備,如初始化ixgbe的收發函數,mac地址,設備的pci信息以及對設備關閉中斷模式等
        //另外這個函數比較重要的是將前面映射的bar0空間地址複製到ixgbe_hw->hw_addr,後面驅動通過這個
        //地址操作相關寄存器
	retval = ethdev_init(ethdev, init_params);
	if (retval) {
		RTE_LOG(ERR, EAL, "ethdev initialisation failed");
		goto probe_failed;
	}

	rte_eth_dev_probing_finish(ethdev);

	return retval;

probe_failed:
	rte_eth_dev_release_port(ethdev);
	return retval;
}

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章