本文主要以ixgbe設備爲例,說明向dpdk添加一個ixgbe設備的大致過程。
1、使用dpdk的程序(如ovs)調用rte_dev_probe向dpdk註冊一個設備,rte_dev_probe的核心處理函數爲local_dev_probe,這個函數主要包含了設備總線的匹配,pci設備的bar空間映射,以及最終爲設備添加ixgbe驅動。下面看一下這個函數
int
local_dev_probe(const char *devargs, struct rte_device **new_dev)
{
struct rte_device *dev;
struct rte_devargs *da;
int ret;
*new_dev = NULL;
da = calloc(1, sizeof(*da));
if (da == NULL)
return -ENOMEM;
//找到該設備的總線(pci bus)
ret = rte_devargs_parse(da, devargs);
if (ret)
goto err_devarg;
if (da->bus->plug == NULL) {
RTE_LOG(ERR, EAL, "Function plug not supported by bus (%s)\n",
da->bus->name);
ret = -ENOTSUP;
goto err_devarg;
}
ret = rte_devargs_insert(&da);
if (ret)
goto err_devarg;
/* the rte_devargs will be referenced in the matching rte_device */
//調用rte_pci_scan將設備添加的bus總線上
ret = da->bus->scan();
if (ret)
goto err_devarg;
dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
if (dev == NULL) {
RTE_LOG(ERR, EAL, "Cannot find device (%s)\n",
da->name);
ret = -ENODEV;
goto err_devarg;
}
/* Since there is a matching device, it is now its responsibility
* to manage the devargs we've just inserted. From this point
* those devargs shouldn't be removed manually anymore.
*/
//爲設備映射bar資源、找到對應的驅動模塊
ret = dev->bus->plug(dev);
if (ret && !rte_dev_is_probed(dev)) { /* if hasn't ever succeeded */
RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
dev->name);
return ret;
}
*new_dev = dev;
return ret;
err_devarg:
if (rte_devargs_remove(da) != 0) {
free(da->args);
free(da);
}
return ret;
}
2、local_dev_probe的plug最終調用pci_plug,然後遍歷bus上的所有驅動爲設備匹配驅動,匹配驅動的函數爲rte_pci_match,從這個函數可以看出,其實就是通過匹配驅動的id_table裏的信息是否能匹配上設備的pci信息,以ixgbe爲例,這裏的id_table一開始就定義好的,然後存放在struct rte_pci_driver rte_ixgbe_pmd裏,最終通過RTE_PMD_REGISTER_PCI將ixgbe_pmd驅動註冊到pci總線上。
int
rte_pci_match(const struct rte_pci_driver *pci_drv,
const struct rte_pci_device *pci_dev)
{
const struct rte_pci_id *id_table;
for (id_table = pci_drv->id_table; id_table->vendor_id != 0;
id_table++) {
/* check if device's identifiers match the driver's ones */
if (id_table->vendor_id != pci_dev->id.vendor_id &&
id_table->vendor_id != PCI_ANY_ID)
continue;
if (id_table->device_id != pci_dev->id.device_id &&
id_table->device_id != PCI_ANY_ID)
continue;
if (id_table->subsystem_vendor_id !=
pci_dev->id.subsystem_vendor_id &&
id_table->subsystem_vendor_id != PCI_ANY_ID)
continue;
if (id_table->subsystem_device_id !=
pci_dev->id.subsystem_device_id &&
id_table->subsystem_device_id != PCI_ANY_ID)
continue;
if (id_table->class_id != pci_dev->id.class_id &&
id_table->class_id != RTE_CLASS_ANY_ID)
continue;
return 1;
}
return 0;
}
3、爲設備找到驅動後,接下來一步比較重要的是爲設備映射資源信息,如果使用vfio驅動,調用pci_vfio_map_resource,這個函數一開始先通過rte_vfio_setup_device爲設備分配vfio_container_id、vfio_group_id,同時設置iommu_type,然後調用dma_map_func將rte_eal_get_configuration()->mem_config的內存信息進行dma映射,這裏的mem_config表示dpdk管理的內存信息(從這裏看,dpdk應該是一開始會將所有內存都進行dma映射?後面驅動的rx_ring、tx_ring分配dma地址的時候,貌似也沒有進一步的dma映射,而是直接使用這裏分配好的iova地址。)。 完成dma映射後,通過VFIO_DEVICE_GET_INFO獲取設備的信息(主要是pci的bar個數信息及中斷信息)。
獲取到pci的bar個數信息後,先通過pci_vfio_get_region_info獲取每個bar region的地址偏移及大小信息,然後再通過pci_vfio_mmap_bar將其映射到用戶空間,並將映射後的bar地址信息存放在rte_pci_device->mem_resource。
4、接下來主要是調用驅動的probe函數,初始化設備信息,如ixgbe,最終調用eth_ixgbe_pci_probe,該probe函數主要是調用rte_eth_dev_create
int __rte_experimental
rte_eth_dev_create(struct rte_device *device, const char *name,
size_t priv_data_size,
ethdev_bus_specific_init ethdev_bus_specific_init,
void *bus_init_params,
ethdev_init_t ethdev_init, void *init_params)
{
struct rte_eth_dev *ethdev;
int retval;
RTE_FUNC_PTR_OR_ERR_RET(*ethdev_init, -EINVAL);
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
//分配一個dev數據結構
ethdev = rte_eth_dev_allocate(name);
if (!ethdev)
return -ENODEV;
if (priv_data_size) {
ethdev->data->dev_private = rte_zmalloc_socket(
name, priv_data_size, RTE_CACHE_LINE_SIZE,
device->numa_node);
if (!ethdev->data->dev_private) {
RTE_LOG(ERR, EAL, "failed to allocate private data");
retval = -ENOMEM;
goto probe_failed;
}
}
} else {
ethdev = rte_eth_dev_attach_secondary(name);
if (!ethdev) {
RTE_LOG(ERR, EAL, "secondary process attach failed, "
"ethdev doesn't exist");
return -ENODEV;
}
}
ethdev->device = device;
if (ethdev_bus_specific_init) {
//初始設備的numa_node等信息
retval = ethdev_bus_specific_init(ethdev, bus_init_params);
if (retval) {
RTE_LOG(ERR, EAL,
"ethdev bus specific initialisation failed");
goto probe_failed;
}
}
//初始化硬件設備,如初始化ixgbe的收發函數,mac地址,設備的pci信息以及對設備關閉中斷模式等
//另外這個函數比較重要的是將前面映射的bar0空間地址複製到ixgbe_hw->hw_addr,後面驅動通過這個
//地址操作相關寄存器
retval = ethdev_init(ethdev, init_params);
if (retval) {
RTE_LOG(ERR, EAL, "ethdev initialisation failed");
goto probe_failed;
}
rte_eth_dev_probing_finish(ethdev);
return retval;
probe_failed:
rte_eth_dev_release_port(ethdev);
return retval;
}