OVS源碼--netdev、ofproto、dpif(十三)

How to Port Open vSwitch to New Software or Hardware

Open vSwitch (OVS) is intended to be easily ported to new software and hardware platforms. This document describes the types of changes that are most likely to be necessary in porting OVS to Unix-like platforms.(Porting OVS to other kinds of platforms is likely to be more difficult.)

Vocabulary

For historical reasons, different words are used for essentially the same concept in different areas of the Open vSwitch source tree. Here is a concordance, indexed by the area of the source tree:

    datapath/       vport           ---
    vswitchd/       iface           port
    ofproto/        port            bundle
    ofproto/bond.c  slave           bond
    lib/lacp.c      slave           lacp
    lib/netdev.c    netdev          ---
    database        Interface       Port

Open vSwitch Architectural Overview

The following diagram shows the very high-level architecture of Open vSwitch from a porter’s perspective.

               +-------------------+
               |    ovs-vswitchd   |<-->ovsdb-server
               +-------------------+
               |      ofproto      |<-->OpenFlow controllers
               +--------+-+--------+
               | netdev | | ofproto|
               +--------+ |provider|
               | netdev | +--------+
               |provider|
               +--------+

Some of the components are generic. Modulo bugs or inadequacies,these components should not need to be modified as part of a port:

  • “ovs-vswitchd” is the main Open vSwitch userspace program, invswitchd/. It reads the desired Open vSwitch configuration from the ovsdb-server program over an IPC channel and passes this configuration down to the “ofproto” library. It also passes certain status and statistical information from ofproto back into the database.

  • “ofproto” is the Open vSwitch library, in ofproto/, that implements an OpenFlow switch. It talks to OpenFlow controllers over the network and to switch hardware or software through an “ofproto provider”, explained further below.

  • “netdev” is the Open vSwitch library, in lib/netdev.c, that abstracts interacting with network devices, that is, Ethernet interfaces. The netdev library is a thin layer over “netdev provider” code, explained further below.

The other components may need attention during a port. You will almost certainly have to implement a “netdev provider”. Depending on the type of port you are doing and the desired performance, you may also have to implement an “ofproto provider” or a lower-level component called a “dpif” provider.

The following sections talk about these components in more detail.

Writing a netdev Provider

A “netdev provider” implements an operating system and hardware specific interface to “network devices”, e.g. eth0 on Linux. Open vSwitch must be able to open each port on a switch as a netdev, so you will need to implement a “netdev provider” that works with your switch hardware and software.

struct netdev_class, in lib/netdev-provider.h, defines the interfaces required to implement a netdev. That structure contains many function pointers, each of which has a comment that is meant to describe its behavior in detail. If the requirements are unclear, please report this as a bug.

The netdev interface can be divided into a few rough categories:

  • Functions required to properly implement OpenFlow features. For example, OpenFlow requires the ability to report the Ethernet hardware address of a port. These functions must be implemented for minimally correct operation.

  • Functions required to implement optional Open vSwitch features. For example, the Open vSwitch support for in-band control requires netdev support for inspecting the TCP/IP stack’s ARP table. These functions must be implemented if the corresponding OVS features are to work, but may be omitted initially.

  • Functions needed in some implementations but not in others. For example, most kinds of ports (see below) do not need functionality to receive packets from a network device.

The existing netdev implementations may serve as useful examples during a port:

  • lib/netdev-linux.c implements netdev functionality for Linux network devices, using Linux kernel calls. It may be a good place to start for full-featured netdev implementations.

  • lib/netdev-vport.c provides support for “virtual ports” implemented by the Open vSwitch datapath module for the Linux kernel. This may serve as a model for minimal netdev implementations.

  • lib/netdev-dummy.c is a fake netdev implementation useful only for testing.

Porting Strategies

After a netdev provider has been implemented for a system’s network devices, you may choose among three basic porting strategies.

The lowest-effort strategy is to use the “userspace switch” implementation built into Open vSwitch. This ought to work, without writing any more code, as long as the netdev provider that you implemented supports receiving packets. It yields poor performance, however, because every packet passes through the ovs-vswitchd process. See [INSTALL.userspace.md] for instructions on how to configure a userspace switch.

If the userspace switch is not the right choice for your port, then you will have to write more code. You may implement either an “ofproto provider” or a “dpif provider”. Which you should choose depends on a few different factors:

  • Only an ofproto provider can take full advantage of hardware with built-in support for wildcards (e.g. an ACL table or a TCAM).

  • A dpif provider can take advantage of the Open vSwitch built-in implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and other features. An ofproto provider has to provide its own implementations, if the hardware can support them at all.

  • A dpif provider is usually easier to implement, but most appropriate for software switching. It “explodes” wildcard rules into exact-match entries (with an optional wildcard mask).This allows fast hash lookups in software, but makes inefficient use of TCAMs in hardware that support wildcarding.

The following sections describe how to implement each kind of port.

ofproto Providers

An “ofproto provider” is what ofproto uses to directly monitor and control an OpenFlow-capable switch. struct ofproto_class, in ofproto/ofproto-provider.h, defines the interfaces to implement an ofproto provider for new hardware or software. That structure contains many function pointers, each of which has a comment that is meant to describe its behavior in detail. If the requirements are unclear, please report this as a bug.

The ofproto provider interface is preliminary. Please let us know if it seems unsuitable for your purpose. We will try to improve it.

Writing a dpif Provider

Open vSwitch has a built-in ofproto provider named “ofproto-dpif”, which is built on top of a library for manipulating datapaths, called “dpif”. A “datapath” is a simple flow table, one that is only required to support exact-match flows, that is, flows without wildcards. When a packet arrives on a network device, the datapath looks for it in this table. If there is a match, then it performs the associated actions. If there is no match, the datapath passes the packet up to ofproto-dpif, which maintains the full OpenFlow flow table. If the packet matches in this flow table, then ofproto-dpif executes its actions and inserts a new entry into the dpif flow table. (Otherwise, ofproto-dpif passes the packet up to ofproto to send the packet to the OpenFlow controller, if one is configured.)

When calculating the dpif flow, ofproto-dpif generates an exact-match flow that describes the missed packet. It makes an effort to figure out what fields can be wildcarded based on the switch’s configuration and OpenFlow flow table. The dpif is free to ignore the suggested wildcards and only support the exact-match entry. However, if the dpif supports wildcarding, then it can use the masks to match multiple flows with fewer entries and potentially significantly reduce the number of flow misses handled by ofproto-dpif.

The “dpif” library in turn delegates much of its functionality to a “dpif provider”. The following diagram shows how dpif providers fit into the Open vSwitch architecture:

            _
           |   +-------------------+
           |   |    ovs-vswitchd   |<-->ovsdb-server
           |   +-------------------+
           |   |      ofproto      |<-->OpenFlow controllers
           |   +--------+-+--------+  _
           |   | netdev | |ofproto-|   |
 userspace |   +--------+ |  dpif  |   |
           |   | netdev | +--------+   |
           |   |provider| |  dpif  |   |
           |   +---||---+ +--------+   |
           |       ||     |  dpif  |   | implementation of
           |       ||     |provider|   | ofproto provider
           |_      ||     +---||---+   |
                   ||         ||       |
            _  +---||-----+---||---+   |
           |   |          |datapath|   |
    kernel |   |          +--------+  _|
           |   |                   |
           |_  +--------||---------+
                        ||
                     physical
                       NIC

struct dpif_class, in lib/dpif-provider.h, defines the interfaces required to implement a dpif provider for new hardware or software. That structure contains many function pointers, each of which has a comment that is meant to describe its behavior in detail. If the requirements are unclear, please report this as a bug.

There are two existing dpif implementations that may serve as useful examples during a port:

  • lib/dpif-netlink.c is a Linux-specific dpif implementation that talks to an Open vSwitch-specific kernel module (whose sources are in the “datapath” directory). The kernel module performs all of the switching work, passing packets that do not match any flow table entry up to userspace. This dpif implementation is essentially a wrapper around calls into the kernel module.

  • lib/dpif-netdev.c is a generic dpif implementation that performs all switching internally. This is how the Open vSwitch userspace switch is implemented.

數據結構

vswitchd是ovs中最核心的組件,openflow的相關邏輯都在vswitchd裏實現,一般來說,ovs分爲datapath, vswitchd以及ovsdb三個部分,datapath一般是和具體是數據面平臺相關的,比如白盒交換機,或者linux內核等,同時datapath不是必須的組件。ovsdb用於存儲vswitch本身的配置信息,比如端口,拓撲,規則等。vswitchd在ovs dist包裏是以用戶態進程形式呈現的,但這個不是絕對的,上文摘錄的部分給出了把ovs移植到其他平臺上的方法,也算是目前官方僅有的一篇大致描述了ovs架構的文檔

可以看出vswitchd本身是分層的結構,最上面的daemon層主要用於和ovsdb通信,做配置的下發和更新等,中間是ofproto層,用於和openflow控制器通信,以及通過ofproto_class暴露了ofproto provider接口,不同平臺上openflow的具體實現就通過ofproto_class統一了接口。

在ovs的定義裏,netdev代表了具體平臺的設備實現,e.g. linux內核的net_device或者移植到交換機平臺下的port等,struct netdev_class定義了netdev-provider的具體實現需要的接口,具體的平臺實現需要支持這些統一的接口,從而完成netdev設備的創建,銷燬,打開,關閉等一系列操作。
不同的netdev類型通過netdev_register_provider被註冊,vswitchd內部會保存一個struct cmap netdev_classes保存所有註冊的netdev類型,struct netdev定義如下

struct netdev

/* A network device (e.g. an Ethernet device).
 *
 * Network device implementations may read these members but should not modify
 * them. */
struct netdev {
    /* The following do not change during the lifetime of a struct netdev. */
    char *name;                         /* Name of network device. */
    const struct netdev_class *netdev_class; /* Functions to control
                                                this device. */
 
    /* A sequence number which indicates changes in one of 'netdev''s
     * properties.   It must be nonzero so that users have a value which
     * they may use as a reset when tracking 'netdev'.
     *
     * Minimally, the sequence number is required to change whenever
     * 'netdev''s flags, features, ethernet address, or carrier changes. */
    uint64_t change_seq;
 
    /* A netdev provider might be unable to change some of the device's
     * parameter (n_rxq, mtu) when the device is in use.  In this case
     * the provider can notify the upper layer by calling
     * netdev_request_reconfigure().  The upper layer will react by stopping
     * the operations on the device and calling netdev_reconfigure() to allow
     * the configuration changes.  'last_reconfigure_seq' remembers the value
     * of 'reconfigure_seq' when the last reconfiguration happened. */
    struct seq *reconfigure_seq;
    uint64_t last_reconfigure_seq;
 
    /* If this is 'true', the user explicitly specified an MTU for this
     * netdev.  Otherwise, Open vSwitch is allowed to override it. */
    bool mtu_user_config;
 
    /* The core netdev code initializes these at netdev construction and only
     * provide read-only access to its client.  Netdev implementations may
     * modify them. */
    int n_txq;
    int n_rxq;
    int ref_cnt;                        /* Times this devices was opened. */
    struct shash_node *node;            /* Pointer to element in global map. */
    struct ovs_list saved_flags_list; /* Contains "struct netdev_saved_flags". */
};

struct netdev_class

struct netdev_class的定義如下,可以看出netdev_class更接近於一個ops結構體,同時加入了設備的隊列管理操作

/* Network device class structure, to be defined by each implementation of a
 * network device.
 *
 * These functions return 0 if successful or a positive errno value on failure,
 * except where otherwise noted.
 *
 *
 * Data Structures
 * ===============
 *
 * These functions work primarily with two different kinds of data structures:
 *
 *   - "struct netdev", which represents a network device.
 *
 *   - "struct netdev_rxq", which represents a handle for capturing packets
 *     received on a network device
 *
 * Each of these data structures contains all of the implementation-independent
 * generic state for the respective concept, called the "base" state.  None of
 * them contains any extra space for implementations to use.  Instead, each
 * implementation is expected to declare its own data structure that contains
 * an instance of the generic data structure plus additional
 * implementation-specific members, called the "derived" state.  The
 * implementation can use casts or (preferably) the CONTAINER_OF macro to
 * obtain access to derived state given only a pointer to the embedded generic
 * data structure.
 *
 *
 * Life Cycle
 * ==========
 *
 * Four stylized functions accompany each of these data structures:
 *
 *            "alloc"          "construct"        "destruct"       "dealloc"
 *            ------------   ----------------  ---------------  --------------
 * netdev      ->alloc        ->construct        ->destruct        ->dealloc
 * netdev_rxq  ->rxq_alloc    ->rxq_construct    ->rxq_destruct    ->rxq_dealloc
 *
 * Any instance of a given data structure goes through the following life
 * cycle:
 *
 *   1. The client calls the "alloc" function to obtain raw memory.  If "alloc"
 *      fails, skip all the other steps.
 *
 *   2. The client initializes all of the data structure's base state.  If this
 *      fails, skip to step 7.
 *
 *   3. The client calls the "construct" function.  The implementation
 *      initializes derived state.  It may refer to the already-initialized
 *      base state.  If "construct" fails, skip to step 6.
 *
 *   4. The data structure is now initialized and in use.
 *
 *   5. When the data structure is no longer needed, the client calls the
 *      "destruct" function.  The implementation uninitializes derived state.
 *      The base state has not been uninitialized yet, so the implementation
 *      may still refer to it.
 *
 *   6. The client uninitializes all of the data structure's base state.
 *
 *   7. The client calls the "dealloc" to free the raw memory.  The
 *      implementation must not refer to base or derived state in the data
 *      structure, because it has already been uninitialized.
 *
 * If netdev support multi-queue IO then netdev->construct should set initialize
 * netdev->n_rxq to number of queues.
 *
 * Each "alloc" function allocates and returns a new instance of the respective
 * data structure.  The "alloc" function is not given any information about the
 * use of the new data structure, so it cannot perform much initialization.
 * Its purpose is just to ensure that the new data structure has enough room
 * for base and derived state.  It may return a null pointer if memory is not
 * available, in which case none of the other functions is called.
 *
 * Each "construct" function initializes derived state in its respective data
 * structure.  When "construct" is called, all of the base state has already
 * been initialized, so the "construct" function may refer to it.  The
 * "construct" function is allowed to fail, in which case the client calls the
 * "dealloc" function (but not the "destruct" function).
 *
 * Each "destruct" function uninitializes and frees derived state in its
 * respective data structure.  When "destruct" is called, the base state has
 * not yet been uninitialized, so the "destruct" function may refer to it.  The
 * "destruct" function is not allowed to fail.
 *
 * Each "dealloc" function frees raw memory that was allocated by the
 * "alloc" function.  The memory's base and derived members might not have ever
 * been initialized (but if "construct" returned successfully, then it has been
 * "destruct"ed already).  The "dealloc" function is not allowed to fail.
 *
 *
 * Device Change Notification
 * ==========================
 *
 * Minimally, implementations are required to report changes to netdev flags,
 * features, ethernet address or carrier through connectivity_seq. Changes to
 * other properties are allowed to cause notification through this interface,
 * although implementations should try to avoid this. connectivity_seq_get()
 * can be used to acquire a reference to the struct seq. The interface is
 * described in detail in seq.h. */
struct netdev_class {
    /* Type of netdevs in this class, e.g. "system", "tap", "gre", etc.
     *
     * One of the providers should supply a "system" type, since this is
     * the type assumed if no type is specified when opening a netdev.
     * The "system" type corresponds to an existing network device on
     * the system. */
    const char *type;
 
    /* If 'true' then this netdev should be polled by PMD threads. */
    bool is_pmd;
 
/* ## ------------------- ## */
/* ## Top-Level Functions ## */
/* ## ------------------- ## */
 
    /* Called when the netdev provider is registered, typically at program
     * startup.  Returning an error from this function will prevent any network
     * device in this class from being opened.
     *
     * This function may be set to null if a network device class needs no
     * initialization at registration time. */
    int (*init)(void);
 
    /* Performs periodic work needed by netdevs of this class.  May be null if
     * no periodic work is necessary.
     *
     * 'netdev_class' points to the class.  It is useful in case the same
     * function is used to implement different classes. */
    void (*run)(const struct netdev_class *netdev_class);
 
    /* Arranges for poll_block() to wake up if the "run" member function needs
     * to be called.  Implementations are additionally required to wake
     * whenever something changes in any of its netdevs which would cause their
     * ->change_seq() function to change its result.  May be null if nothing is
     * needed here.
     *
     * 'netdev_class' points to the class.  It is useful in case the same
     * function is used to implement different classes. */
    void (*wait)(const struct netdev_class *netdev_class);
 
/* ## ---------------- ## */
/* ## netdev Functions ## */
/* ## ---------------- ## */
 
    /* Life-cycle functions for a netdev.  See the large comment above on
     * struct netdev_class. */
    struct netdev *(*alloc)(void);
    int (*construct)(struct netdev *);
    void (*destruct)(struct netdev *);
    void (*dealloc)(struct netdev *);
 
    /* Fetches the device 'netdev''s configuration, storing it in 'args'.
     * The caller owns 'args' and pre-initializes it to an empty smap.
     *
     * If this netdev class does not have any configuration options, this may
     * be a null pointer. */
    int (*get_config)(const struct netdev *netdev, struct smap *args);
 
    /* Changes the device 'netdev''s configuration to 'args'.
     *
     * If this netdev class does not support configuration, this may be a null
     * pointer. */
    int (*set_config)(struct netdev *netdev, const struct smap *args);
 
    /* Returns the tunnel configuration of 'netdev'.  If 'netdev' is
     * not a tunnel, returns null.
     *
     * If this function would always return null, it may be null instead. */
    const struct netdev_tunnel_config *
        (*get_tunnel_config)(const struct netdev *netdev);
 
    /* Build Tunnel header.  Ethernet and ip header parameters are passed to
     * tunnel implementation to build entire outer header for given flow. */
    int (*build_header)(const struct netdev *, struct ovs_action_push_tnl *data,
                        const struct netdev_tnl_build_header_params *params);
 
    /* build_header() can not build entire header for all packets for given
     * flow.  Push header is called for packet to build header specific to
     * a packet on actual transmit.  It uses partial header build by
     * build_header() which is passed as data. */
    void (*push_header)(struct dp_packet *packet,
                        const struct ovs_action_push_tnl *data);
 
    /* Pop tunnel header from packet, build tunnel metadata and resize packet
     * for further processing.
     * Returns NULL in case of error or tunnel implementation queued packet for further
     * processing. */
    struct dp_packet * (*pop_header)(struct dp_packet *packet);
 
    /* Returns the id of the numa node the 'netdev' is on.  If there is no
     * such info, returns NETDEV_NUMA_UNSPEC. */
    int (*get_numa_id)(const struct netdev *netdev);
 
    /* Configures the number of tx queues of 'netdev'. Returns 0 if successful,
     * otherwise a positive errno value.
     *
     * 'n_txq' specifies the exact number of transmission queues to create.
     *
     * The caller will call netdev_reconfigure() (if necessary) before using
     * netdev_send() on any of the newly configured queues, giving the provider
     * a chance to adjust its settings.
     *
     * On error, the tx queue configuration is unchanged. */
    int (*set_tx_multiq)(struct netdev *netdev, unsigned int n_txq);
 
    /* Sends buffers on 'netdev'.
     * Returns 0 if successful (for every buffer), otherwise a positive errno
     * value.  Returns EAGAIN without blocking if one or more packets cannot be
     * queued immediately. Returns EMSGSIZE if a partial packet was transmitted
     * or if a packet is too big or too small to transmit on the device.
     *
     * If the function returns a non-zero value, some of the packets might have
     * been sent anyway.
     *
     * If 'may_steal' is false, the caller retains ownership of all the
     * packets.  If 'may_steal' is true, the caller transfers ownership of all
     * the packets to the network device, regardless of success.
     *
     * If 'concurrent_txq' is true, the caller may perform concurrent calls
     * to netdev_send() with the same 'qid'. The netdev provider is responsible
     * for making sure that these concurrent calls do not create a race
     * condition by using locking or other synchronization if required.
     *
     * The network device is expected to maintain one or more packet
     * transmission queues, so that the caller does not ordinarily have to
     * do additional queuing of packets.  'qid' specifies the queue to use
     * and can be ignored if the implementation does not support multiple
     * queues.
     *
     * May return EOPNOTSUPP if a network device does not implement packet
     * transmission through this interface.  This function may be set to null
     * if it would always return EOPNOTSUPP anyhow.  (This will prevent the
     * network device from being usefully used by the netdev-based "userspace
     * datapath".  It will also prevent the OVS implementation of bonding from
     * working properly over 'netdev'.) */
    int (*send)(struct netdev *netdev, int qid, struct dp_packet_batch *batch,
                bool may_steal, bool concurrent_txq);
 
    /* Registers with the poll loop to wake up from the next call to
     * poll_block() when the packet transmission queue for 'netdev' has
     * sufficient room to transmit a packet with netdev_send().
     *
     * The network device is expected to maintain one or more packet
     * transmission queues, so that the caller does not ordinarily have to
     * do additional queuing of packets.  'qid' specifies the queue to use
     * and can be ignored if the implementation does not support multiple
     * queues.
     *
     * May be null if not needed, such as for a network device that does not
     * implement packet transmission through the 'send' member function. */
    void (*send_wait)(struct netdev *netdev, int qid);
 
    /* Sets 'netdev''s Ethernet address to 'mac' */
    int (*set_etheraddr)(struct netdev *netdev, const struct eth_addr mac);
 
    /* Retrieves 'netdev''s Ethernet address into 'mac'.
     *
     * This address will be advertised as 'netdev''s MAC address through the
     * OpenFlow protocol, among other uses. */
    int (*get_etheraddr)(const struct netdev *netdev, struct eth_addr *mac);
 
    /* Retrieves 'netdev''s MTU into '*mtup'.
     *
     * The MTU is the maximum size of transmitted (and received) packets, in
     * bytes, not including the hardware header; thus, this is typically 1500
     * bytes for Ethernet devices.
     *
     * If 'netdev' does not have an MTU (e.g. as some tunnels do not), then
     * this function should return EOPNOTSUPP.  This function may be set to
     * null if it would always return EOPNOTSUPP. */
    int (*get_mtu)(const struct netdev *netdev, int *mtup);
 
    /* Sets 'netdev''s MTU to 'mtu'.
     *
     * If 'netdev' does not have an MTU (e.g. as some tunnels do not), then
     * this function should return EOPNOTSUPP.  This function may be set to
     * null if it would always return EOPNOTSUPP. */
    int (*set_mtu)(struct netdev *netdev, int mtu);
 
    /* Returns the ifindex of 'netdev', if successful, as a positive number.
     * On failure, returns a negative errno value.
     *
     * The desired semantics of the ifindex value are a combination of those
     * specified by POSIX for if_nametoindex() and by SNMP for ifIndex.  An
     * ifindex value should be unique within a host and remain stable at least
     * until reboot.  SNMP says an ifindex "ranges between 1 and the value of
     * ifNumber" but many systems do not follow this rule anyhow.
     *
     * This function may be set to null if it would always return -EOPNOTSUPP.
     */
    int (*get_ifindex)(const struct netdev *netdev);
 
    /* Sets 'carrier' to true if carrier is active (link light is on) on
     * 'netdev'.
     *
     * May be null if device does not provide carrier status (will be always
     * up as long as device is up).
     */
    int (*get_carrier)(const struct netdev *netdev, bool *carrier);
 
    /* Returns the number of times 'netdev''s carrier has changed since being
     * initialized.
     *
     * If null, callers will assume the number of carrier resets is zero. */
    long long int (*get_carrier_resets)(const struct netdev *netdev);
 
    /* Forces ->get_carrier() to poll 'netdev''s MII registers for link status
     * instead of checking 'netdev''s carrier.  'netdev''s MII registers will
     * be polled once every 'interval' milliseconds.  If 'netdev' does not
     * support MII, another method may be used as a fallback.  If 'interval' is
     * less than or equal to zero, reverts ->get_carrier() to its normal
     * behavior.
     *
     * Most network devices won't support this feature and will set this
     * function pointer to NULL, which is equivalent to returning EOPNOTSUPP.
     */
    int (*set_miimon_interval)(struct netdev *netdev, long long int interval);
 
    /* Retrieves current device stats for 'netdev' into 'stats'.
     *
     * A network device that supports some statistics but not others, it should
     * set the values of the unsupported statistics to all-1-bits
     * (UINT64_MAX). */
    int (*get_stats)(const struct netdev *netdev, struct netdev_stats *);
 
    /* Stores the features supported by 'netdev' into each of '*current',
     * '*advertised', '*supported', and '*peer'.  Each value is a bitmap of
     * NETDEV_F_* bits.
     *
     * This function may be set to null if it would always return EOPNOTSUPP.
     */
    int (*get_features)(const struct netdev *netdev,
                        enum netdev_features *current,
                        enum netdev_features *advertised,
                        enum netdev_features *supported,
                        enum netdev_features *peer);
 
    /* Set the features advertised by 'netdev' to 'advertise', which is a
     * set of NETDEV_F_* bits.
     *
     * This function may be set to null for a network device that does not
     * support configuring advertisements. */
    int (*set_advertisements)(struct netdev *netdev,
                              enum netdev_features advertise);
 
    /* Attempts to set input rate limiting (policing) policy, such that up to
     * 'kbits_rate' kbps of traffic is accepted, with a maximum accumulative
     * burst size of 'kbits' kb.
     *
     * This function may be set to null if policing is not supported. */
    int (*set_policing)(struct netdev *netdev, unsigned int kbits_rate,
                        unsigned int kbits_burst);
 
    /* Adds to 'types' all of the forms of QoS supported by 'netdev', or leaves
     * it empty if 'netdev' does not support QoS.  Any names added to 'types'
     * should be documented as valid for the "type" column in the "QoS" table
     * in vswitchd/vswitch.xml (which is built as ovs-vswitchd.conf.db(8)).
     *
     * Every network device must support disabling QoS with a type of "", but
     * this function must not add "" to 'types'.
     *
     * The caller is responsible for initializing 'types' (e.g. with
     * sset_init()) before calling this function.  The caller retains ownership
     * of 'types'.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*get_qos_types)(const struct netdev *netdev, struct sset *types);
 
    /* Queries 'netdev' for its capabilities regarding the specified 'type' of
     * QoS.  On success, initializes 'caps' with the QoS capabilities.
     *
     * Should return EOPNOTSUPP if 'netdev' does not support 'type'.  May be
     * NULL if 'netdev' does not support QoS at all. */
    int (*get_qos_capabilities)(const struct netdev *netdev,
                                const char *type,
                                struct netdev_qos_capabilities *caps);
 
    /* Queries 'netdev' about its currently configured form of QoS.  If
     * successful, stores the name of the current form of QoS into '*typep'
     * and any details of configuration as string key-value pairs in
     * 'details'.
     *
     * A '*typep' of "" indicates that QoS is currently disabled on 'netdev'.
     *
     * The caller initializes 'details' before calling this function.  The
     * caller takes ownership of the string key-values pairs added to
     * 'details'.
     *
     * The netdev retains ownership of '*typep'.
     *
     * '*typep' will be one of the types returned by netdev_get_qos_types() for
     * 'netdev'.  The contents of 'details' should be documented as valid for
     * '*typep' in the "other_config" column in the "QoS" table in
     * vswitchd/vswitch.xml (which is built as ovs-vswitchd.conf.db(8)).
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*get_qos)(const struct netdev *netdev,
                   const char **typep, struct smap *details);
 
    /* Attempts to reconfigure QoS on 'netdev', changing the form of QoS to
     * 'type' with details of configuration from 'details'.
     *
     * On error, the previous QoS configuration is retained.
     *
     * When this function changes the type of QoS (not just 'details'), this
     * also resets all queue configuration for 'netdev' to their defaults
     * (which depend on the specific type of QoS).  Otherwise, the queue
     * configuration for 'netdev' is unchanged.
     *
     * 'type' should be "" (to disable QoS) or one of the types returned by
     * netdev_get_qos_types() for 'netdev'.  The contents of 'details' should
     * be documented as valid for the given 'type' in the "other_config" column
     * in the "QoS" table in vswitchd/vswitch.xml (which is built as
     * ovs-vswitchd.conf.db(8)).
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*set_qos)(struct netdev *netdev,
                   const char *type, const struct smap *details);
 
    /* Queries 'netdev' for information about the queue numbered 'queue_id'.
     * If successful, adds that information as string key-value pairs to
     * 'details'.  Returns 0 if successful, otherwise a positive errno value.
     *
     * Should return EINVAL if 'queue_id' is greater than or equal to the
     * number of supported queues (as reported in the 'n_queues' member of
     * struct netdev_qos_capabilities by 'get_qos_capabilities').
     *
     * The caller initializes 'details' before calling this function.  The
     * caller takes ownership of the string key-values pairs added to
     * 'details'.
     *
     * The returned contents of 'details' should be documented as valid for the
     * given 'type' in the "other_config" column in the "Queue" table in
     * vswitchd/vswitch.xml (which is built as ovs-vswitchd.conf.db(8)).
     */
    int (*get_queue)(const struct netdev *netdev,
                     unsigned int queue_id, struct smap *details);
 
    /* Configures the queue numbered 'queue_id' on 'netdev' with the key-value
     * string pairs in 'details'.  The contents of 'details' should be
     * documented as valid for the given 'type' in the "other_config" column in
     * the "Queue" table in vswitchd/vswitch.xml (which is built as
     * ovs-vswitchd.conf.db(8)).  Returns 0 if successful, otherwise a positive
     * errno value.  On failure, the given queue's configuration should be
     * unmodified.
     *
     * Should return EINVAL if 'queue_id' is greater than or equal to the
     * number of supported queues (as reported in the 'n_queues' member of
     * struct netdev_qos_capabilities by 'get_qos_capabilities'), or if
     * 'details' is invalid for the type of queue.
     *
     * This function does not modify 'details', and the caller retains
     * ownership of it.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*set_queue)(struct netdev *netdev,
                     unsigned int queue_id, const struct smap *details);
 
    /* Attempts to delete the queue numbered 'queue_id' from 'netdev'.
     *
     * Should return EINVAL if 'queue_id' is greater than or equal to the
     * number of supported queues (as reported in the 'n_queues' member of
     * struct netdev_qos_capabilities by 'get_qos_capabilities').  Should
     * return EOPNOTSUPP if 'queue_id' is valid but may not be deleted (e.g. if
     * 'netdev' has a fixed set of queues with the current QoS mode).
     *
     * May be NULL if 'netdev' does not support QoS at all, or if all of its
     * QoS modes have fixed sets of queues. */
    int (*delete_queue)(struct netdev *netdev, unsigned int queue_id);
 
    /* Obtains statistics about 'queue_id' on 'netdev'.  Fills 'stats' with the
     * queue's statistics.  May set individual members of 'stats' to all-1-bits
     * if the statistic is unavailable.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*get_queue_stats)(const struct netdev *netdev, unsigned int queue_id,
                           struct netdev_queue_stats *stats);
 
    /* Attempts to begin dumping the queues in 'netdev'.  On success, returns 0
     * and initializes '*statep' with any data needed for iteration.  On
     * failure, returns a positive errno value.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*queue_dump_start)(const struct netdev *netdev, void **statep);
 
    /* Attempts to retrieve another queue from 'netdev' for 'state', which was
     * initialized by a successful call to the 'queue_dump_start' function for
     * 'netdev'.  On success, stores a queue ID into '*queue_id' and fills
     * 'details' with the configuration of the queue with that ID.  Returns EOF
     * if the last queue has been dumped, or a positive errno value on error.
     * This function will not be called again once it returns nonzero once for
     * a given iteration (but the 'queue_dump_done' function will be called
     * afterward).
     *
     * The caller initializes and clears 'details' before calling this
     * function.  The caller takes ownership of the string key-values pairs
     * added to 'details'.
     *
     * The returned contents of 'details' should be documented as valid for the
     * given 'type' in the "other_config" column in the "Queue" table in
     * vswitchd/vswitch.xml (which is built as ovs-vswitchd.conf.db(8)).
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*queue_dump_next)(const struct netdev *netdev, void *state,
                           unsigned int *queue_id, struct smap *details);
 
    /* Releases resources from 'netdev' for 'state', which was initialized by a
     * successful call to the 'queue_dump_start' function for 'netdev'.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*queue_dump_done)(const struct netdev *netdev, void *state);
 
    /* Iterates over all of 'netdev''s queues, calling 'cb' with the queue's
     * ID, its statistics, and the 'aux' specified by the caller.  The order of
     * iteration is unspecified, but (when successful) each queue must be
     * visited exactly once.
     *
     * 'cb' will not modify or free the statistics passed in. */
    int (*dump_queue_stats)(const struct netdev *netdev,
                            void (*cb)(unsigned int queue_id,
                                       struct netdev_queue_stats *,
                                       void *aux),
                            void *aux);
 
    /* Assigns 'addr' as 'netdev''s IPv4 address and 'mask' as its netmask.  If
     * 'addr' is INADDR_ANY, 'netdev''s IPv4 address is cleared.
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*set_in4)(struct netdev *netdev, struct in_addr addr,
                   struct in_addr mask);
 
    /* Returns all assigned IP address to  'netdev' and returns 0.
     * API allocates array of address and masks and set it to
     * '*addr' and '*mask'.
     * Otherwise, returns a positive errno value and sets '*addr', '*mask
     * and '*n_addr' to NULL.
     *
     * The following error values have well-defined meanings:
     *
     *   - EADDRNOTAVAIL: 'netdev' has no assigned IPv6 address.
     *
     *   - EOPNOTSUPP: No IPv6 network stack attached to 'netdev'.
     *
     * 'addr' may be null, in which case the address itself is not reported. */
    int (*get_addr_list)(const struct netdev *netdev, struct in6_addr **in,
                         struct in6_addr **mask, int *n_in6);
 
    /* Adds 'router' as a default IP gateway for the TCP/IP stack that
     * corresponds to 'netdev'.
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*add_router)(struct netdev *netdev, struct in_addr router);
 
    /* Looks up the next hop for 'host' in the host's routing table.  If
     * successful, stores the next hop gateway's address (0 if 'host' is on a
     * directly connected network) in '*next_hop' and a copy of the name of the
     * device to reach 'host' in '*netdev_name', and returns 0.  The caller is
     * responsible for freeing '*netdev_name' (by calling free()).
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*get_next_hop)(const struct in_addr *host, struct in_addr *next_hop,
                        char **netdev_name);
 
    /* Retrieves driver information of the device.
     *
     * Populates 'smap' with key-value pairs representing the status of the
     * device.  'smap' is a set of key-value string pairs representing netdev
     * type specific information.  For more information see
     * ovs-vswitchd.conf.db(5).
     *
     * The caller is responsible for destroying 'smap' and its data.
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*get_status)(const struct netdev *netdev, struct smap *smap);
 
    /* Looks up the ARP table entry for 'ip' on 'netdev' and stores the
     * corresponding MAC address in 'mac'.  A return value of ENXIO, in
     * particular, indicates that there is no ARP table entry for 'ip' on
     * 'netdev'.
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*arp_lookup)(const struct netdev *netdev, ovs_be32 ip,
                      struct eth_addr *mac);
 
    /* Retrieves the current set of flags on 'netdev' into '*old_flags'.  Then,
     * turns off the flags that are set to 1 in 'off' and turns on the flags
     * that are set to 1 in 'on'.  (No bit will be set to 1 in both 'off' and
     * 'on'; that is, off & on == 0.)
     *
     * This function may be invoked from a signal handler.  Therefore, it
     * should not do anything that is not signal-safe (such as logging). */
    int (*update_flags)(struct netdev *netdev, enum netdev_flags off,
                        enum netdev_flags on, enum netdev_flags *old_flags);
 
    /* If the provider called netdev_request_reconfigure(), the upper layer
     * will eventually call this.  The provider can update the device
     * configuration knowing that the upper layer will not call rxq_recv() or
     * send() until this function returns.
     *
     * On error, the configuration is indeterminant and the device cannot be
     * used to send and receive packets until a successful configuration is
     * applied. */
    int (*reconfigure)(struct netdev *netdev);
 
/* ## -------------------- ## */
/* ## netdev_rxq Functions ## */
/* ## -------------------- ## */
 
/* If a particular netdev class does not support receiving packets, all these
 * function pointers must be NULL. */
 
    /* Life-cycle functions for a netdev_rxq.  See the large comment above on
     * struct netdev_class. */
    struct netdev_rxq *(*rxq_alloc)(void);
    int (*rxq_construct)(struct netdev_rxq *);
    void (*rxq_destruct)(struct netdev_rxq *);
    void (*rxq_dealloc)(struct netdev_rxq *);
 
    /* Attempts to receive a batch of packets from 'rx'.  In 'batch', the
     * caller supplies 'packets' as the pointer to the beginning of an array
     * of NETDEV_MAX_BURST pointers to dp_packet.  If successful, the
     * implementation stores pointers to up to NETDEV_MAX_BURST dp_packets into
     * the array, transferring ownership of the packets to the caller, stores
     * the number of received packets into 'count', and returns 0.
     *
     * The implementation does not necessarily initialize any non-data members
     * of 'packets' in 'batch'.  That is, the caller must initialize layer
     * pointers and metadata itself, if desired, e.g. with pkt_metadata_init()
     * and miniflow_extract().
     *
     * Implementations should allocate buffers with DP_NETDEV_HEADROOM bytes of
     * headroom.
     *
     * Returns EAGAIN immediately if no packet is ready to be received or
     * another positive errno value if an error was encountered. */
    int (*rxq_recv)(struct netdev_rxq *rx, struct dp_packet_batch *batch);
 
    /* Registers with the poll loop to wake up from the next call to
     * poll_block() when a packet is ready to be received with
     * netdev_rxq_recv() on 'rx'. */
    void (*rxq_wait)(struct netdev_rxq *rx);
 
    /* Discards all packets waiting to be received from 'rx'. */
    int (*rxq_drain)(struct netdev_rxq *rx);
};

目前已經實現的netdev_class包括,netdev_linux_class, netdev_internal_class, netdev_tap_class, dpdk_class, dpdk_ring_class, dpdk_vhost_class, dpdk_vhost_client_class, patch_clas等。lib/netdev-linux.c裏面實現的netdev通過調用內核api實現了基於內核的網絡設備netdev,lib/netdev-vport.c則基於datapath模塊實現了基於vport的網絡設備netdev。
ofproto層通過ofproto_class定義了openflow的接口,除此之外,還有幾個重要的數據結構和ofproto相關,struct ofproto, struct ofport, struct rule, struct oftable, struct ofgroup

stuct ofproto

struct ofproto代表了一個openflow switch結構體,內部包含了struct ofproto_class, struct ofport的hash map,struct oftable, struct ofgroup的hash map etc.

/* An OpenFlow switch.
 *
 * With few exceptions, ofproto implementations may look at these fields but
 * should not modify them. */
struct ofproto {
    struct hmap_node hmap_node; /* In global 'all_ofprotos' hmap. */
    const struct ofproto_class *ofproto_class;
    char *type;                 /* Datapath type. */
    char *name;                 /* Datapath name. */
 
    /* Settings. */
    uint64_t fallback_dpid;     /* Datapath ID if no better choice found. */
    uint64_t datapath_id;       /* Datapath ID. */
    bool forward_bpdu;          /* Option to allow forwarding of BPDU frames
                                 * when NORMAL action is invoked. */
    char *mfr_desc;             /* Manufacturer (NULL for default). */
    char *hw_desc;              /* Hardware (NULL for default). */
    char *sw_desc;              /* Software version (NULL for default). */
    char *serial_desc;          /* Serial number (NULL for default). */
    char *dp_desc;              /* Datapath description (NULL for default). */
    enum ofputil_frag_handling frag_handling;
 
    /* Datapath. */
    struct hmap ports;          /* Contains "struct ofport"s. */
    struct shash port_by_name;
    struct simap ofp_requests;  /* OpenFlow port number requests. */
    uint16_t alloc_port_no;     /* Last allocated OpenFlow port number. */
    uint16_t max_ports;         /* Max possible OpenFlow port num, plus one. */
    struct hmap ofport_usage;   /* Map ofport to last used time. */
    uint64_t change_seq;        /* Change sequence for netdev status. */
 
    /* Flow tables. */
    long long int eviction_group_timer; /* For rate limited reheapification. */
    struct oftable *tables;
    int n_tables;
    ovs_version_t tables_version;  /* Controls which rules are visible to
                                    * table lookups. */
 
    /* Rules indexed on their cookie values, in all flow tables. */
    struct hindex cookies OVS_GUARDED_BY(ofproto_mutex);
    struct hmap learned_cookies OVS_GUARDED_BY(ofproto_mutex);
 
    /* List of expirable flows, in all flow tables. */
    struct ovs_list expirable OVS_GUARDED_BY(ofproto_mutex);
 
    /* Meter table.
     * OpenFlow meters start at 1.  To avoid confusion we leave the first
     * pointer in the array un-used, and index directly with the OpenFlow
     * meter_id. */
    struct ofputil_meter_features meter_features;
    struct meter **meters; /* 'meter_features.max_meter' + 1 pointers. */
 
    /* OpenFlow connections. */
    struct connmgr *connmgr;
 
    /* Delayed rule executions.
     *
     * We delay calls to ->ofproto_class->rule_execute() past releasing
     * ofproto_mutex during a flow_mod, because otherwise a "learn" action
     * triggered by the executing the packet would try to recursively modify
     * the flow table and reacquire the global lock. */
    struct guarded_list rule_executes; /* Contains "struct rule_execute"s. */
 
    int min_mtu;                    /* Current MTU of non-internal ports. */
 
    /* Groups. */
    struct cmap groups;               /* Contains "struct ofgroup"s. */
    uint32_t n_groups[4] OVS_GUARDED; /* # of existing groups of each type. */
    struct ofputil_group_features ogf;
};

struct ofport

struct ofport代表了openflow switch的一個端口,同時關聯一個struct netdev的設備抽象

/* An OpenFlow port within a "struct ofproto".
 *
 * The port's name is netdev_get_name(port->netdev).
 *
 * With few exceptions, ofproto implementations may look at these fields but
 * should not modify them. */
struct ofport {
    struct hmap_node hmap_node; /* In struct ofproto's "ports" hmap. */
    struct ofproto *ofproto;    /* The ofproto that contains this port. */
    struct netdev *netdev;
    struct ofputil_phy_port pp;
    ofp_port_t ofp_port;        /* OpenFlow port number. */
    uint64_t change_seq;
    long long int created;      /* Time created, in msec. */
    int mtu;
};

struct rule

struct rule表示一條openflow規則,rule裏面包含了一組struct rule_actions

struct rule {
    /* Where this rule resides in an OpenFlow switch.
     *
     * These are immutable once the rule is constructed, hence 'const'. */
    struct ofproto *const ofproto; /* The ofproto that contains this rule. */
    const struct cls_rule cr;      /* In owning ofproto's classifier. */
    const uint8_t table_id;        /* Index in ofproto's 'tables' array. */
    bool removed;                  /* Rule has been removed from the ofproto
                                    * data structures. */
 
    /* Protects members marked OVS_GUARDED.
     * Readers only need to hold this mutex.
     * Writers must hold both this mutex AND ofproto_mutex.
     * By implication writers can read *without* taking this mutex while they
     * hold ofproto_mutex. */
    struct ovs_mutex mutex OVS_ACQ_AFTER(ofproto_mutex);
 
    /* Number of references.
     * The classifier owns one reference.
     * Any thread trying to keep a rule from being freed should hold its own
     * reference. */
    struct ovs_refcount ref_count;
 
    /* A "flow cookie" is the OpenFlow name for a 64-bit value associated with
     * a flow.. */
    ovs_be64 flow_cookie OVS_GUARDED;
    struct hindex_node cookie_node OVS_GUARDED_BY(ofproto_mutex);
 
    enum ofputil_flow_mod_flags flags OVS_GUARDED;
 
    /* Timeouts. */
    uint16_t hard_timeout OVS_GUARDED; /* In seconds from ->modified. */
    uint16_t idle_timeout OVS_GUARDED; /* In seconds from ->used. */
 
    /* Eviction precedence. */
    const uint16_t importance;
 
    /* Removal reason for sending flow removed message.
     * Used only if 'flags' has OFPUTIL_FF_SEND_FLOW_REM set and if the
     * value is not OVS_OFPRR_NONE. */
    uint8_t removed_reason;
 
    /* Eviction groups (see comment on struct eviction_group for explanation) .
     *
     * 'eviction_group' is this rule's eviction group, or NULL if it is not in
     * any eviction group.  When 'eviction_group' is nonnull, 'evg_node' is in
     * the ->eviction_group->rules hmap. */
    struct eviction_group *eviction_group OVS_GUARDED_BY(ofproto_mutex);
    struct heap_node evg_node OVS_GUARDED_BY(ofproto_mutex);
 
    /* OpenFlow actions.  See struct rule_actions for more thread-safety
     * notes. */
    const struct rule_actions * const actions;
 
    /* In owning meter's 'rules' list.  An empty list if there is no meter. */
    struct ovs_list meter_list_node OVS_GUARDED_BY(ofproto_mutex);
 
    /* Flow monitors (e.g. for NXST_FLOW_MONITOR, related to struct ofmonitor).
     *
     * 'add_seqno' is the sequence number when this rule was created.
     * 'modify_seqno' is the sequence number when this rule was last modified.
     * See 'monitor_seqno' in connmgr.c for more information. */
    enum nx_flow_monitor_flags monitor_flags OVS_GUARDED_BY(ofproto_mutex);
    uint64_t add_seqno OVS_GUARDED_BY(ofproto_mutex);
    uint64_t modify_seqno OVS_GUARDED_BY(ofproto_mutex);
 
    /* Optimisation for flow expiry.  In ofproto's 'expirable' list if this
     * rule is expirable, otherwise empty. */
    struct ovs_list expirable OVS_GUARDED_BY(ofproto_mutex);
 
    /* Times.  Last so that they are more likely close to the stats managed
     * by the provider. */
    long long int created OVS_GUARDED; /* Creation time. */
 
    /* Must hold 'mutex' for both read/write, 'ofproto_mutex' not needed. */
    long long int modified OVS_GUARDED; /* Time of last modification. */
};
 
struct rule_actions {
    /* Flags.
     *
     * 'has_meter' is true if 'ofpacts' contains an OFPACT_METER action.
     *
     * 'has_learn_with_delete' is true if 'ofpacts' contains an OFPACT_LEARN
     * action whose flags include NX_LEARN_F_DELETE_LEARNED. */
    bool has_meter;
    bool has_learn_with_delete;
    bool has_groups;
 
    /* Actions. */
    uint32_t ofpacts_len;         /* Size of 'ofpacts', in bytes. */
    struct ofpact ofpacts[];      /* Sequence of "struct ofpacts". */
};
 
struct ofpact {
    /* We want the space advantage of an 8-bit type here on every
     * implementation, without giving up the advantage of having a useful type
     * on implementations that support packed enums. */
#ifdef HAVE_PACKED_ENUM
    enum ofpact_type type;      /* OFPACT_*. */
#else
    uint8_t type;               /* OFPACT_* */
#endif
 
    uint8_t raw;                /* Original type when added, if any. */
    uint16_t len;               /* Length of the action, in bytes, including
                                 * struct ofpact, excluding padding. */
};

struct ofproto_class

struct ofproto_class是一個接口工廠類,對應的實現是ofproto-dpif,我們先來看接口定義

struct ofproto_class {
/* ## ----------------- ## */
/* ## Factory Functions ## */
/* ## ----------------- ## */
 
    /* Initializes provider.  The caller may pass in 'iface_hints',
     * which contains an shash of "struct iface_hint" elements indexed
     * by the interface's name.  The provider may use these hints to
     * describe the startup configuration in order to reinitialize its
     * state.  The caller owns the provided data, so a provider must
     * make copies of anything required.  An ofproto provider must
     * remove any existing state that is not described by the hint, and
     * may choose to remove it all. */
    void (*init)(const struct shash *iface_hints);
 
    /* Enumerates the types of all supported ofproto types into 'types'.  The
     * caller has already initialized 'types'.  The implementation should add
     * its own types to 'types' but not remove any existing ones, because other
     * ofproto classes might already have added names to it. */
    void (*enumerate_types)(struct sset *types);
 
    /* Enumerates the names of all existing datapath of the specified 'type'
     * into 'names' 'all_dps'.  The caller has already initialized 'names' as
     * an empty sset.
     *
     * 'type' is one of the types enumerated by ->enumerate_types().
     *
     * Returns 0 if successful, otherwise a positive errno value.
     */
    int (*enumerate_names)(const char *type, struct sset *names);
 
    /* Deletes the datapath with the specified 'type' and 'name'.  The caller
     * should have closed any open ofproto with this 'type' and 'name'; this
     * function is allowed to fail if that is not the case.
     *
     * 'type' is one of the types enumerated by ->enumerate_types().
     * 'name' is one of the names enumerated by ->enumerate_names() for 'type'.
     *
     * Returns 0 if successful, otherwise a positive errno value.
     */
    int (*del)(const char *type, const char *name);
 
    /* Returns the type to pass to netdev_open() when a datapath of type
     * 'datapath_type' has a port of type 'port_type', for a few special
     * cases when a netdev type differs from a port type.  For example,
     * when using the userspace datapath, a port of type "internal"
     * needs to be opened as "tap".
     *
     * Returns either 'type' itself or a string literal, which must not
     * be freed. */
    const char *(*port_open_type)(const char *datapath_type,
                                  const char *port_type);
 
    /* Performs any periodic activity required on ofprotos of type
     * 'type'.
     *
     * An ofproto provider may implement it or not, depending on whether
     * it needs type-level maintenance.
     *
     * Returns 0 if successful, otherwise a positive errno value. */
    int (*type_run)(const char *type);
 
    /* Causes the poll loop to wake up when a type 'type''s 'run'
     * function needs to be called, e.g. by calling the timer or fd
     * waiting functions in poll-loop.h.
     *
     * An ofproto provider may implement it or not, depending on whether
     * it needs type-level maintenance. */
    void (*type_wait)(const char *type);
 
    /* Performs any periodic activity required by 'ofproto'.  It should:
     *
     *   - Call connmgr_send_packet_in() for each received packet that missed
     *     in the OpenFlow flow table or that had a OFPP_CONTROLLER output
     *     action.
     *
     *   - Call ofproto_rule_expire() for each OpenFlow flow that has reached
     *     its hard_timeout or idle_timeout, to expire the flow.
     *
     * Returns 0 if successful, otherwise a positive errno value. */
    int (*run)(struct ofproto *ofproto);
 
    /* Causes the poll loop to wake up when 'ofproto''s 'run' function needs to
     * be called, e.g. by calling the timer or fd waiting functions in
     * poll-loop.h.  */
    void (*wait)(struct ofproto *ofproto);
 
    /* Every "struct rule" in 'ofproto' is about to be deleted, one by one.
     * This function may prepare for that, for example by clearing state in
     * advance.  It should *not* actually delete any "struct rule"s from
     * 'ofproto', only prepare for it.
     *
     * This function is optional; it's really just for optimization in case
     * it's cheaper to delete all the flows from your hardware in a single pass
     * than to do it one by one. */
    void (*flush)(struct ofproto *ofproto);
 
    /* Helper for the OpenFlow OFPT_TABLE_FEATURES request.
     *
     * The 'features' array contains 'ofproto->n_tables' elements.  Each
     * element is initialized as:
     *
     *   - 'table_id' to the array index.
     *
     *   - 'name' to "table#" where # is the table ID.
     *
     *   - 'metadata_match' and 'metadata_write' to OVS_BE64_MAX.
     *
     *   - 'config' to the table miss configuration.
     *
     *   - 'max_entries' to 1,000,000.
     *
     *   - Both 'nonmiss' and 'miss' to:
     *
     *     * 'next' to all 1-bits for all later tables.
     *
     *     * 'instructions' to all instructions.
     *
     *     * 'write' and 'apply' both to:
     *
     *       - 'ofpacts': All actions.
     *
     *       - 'set_fields': All fields.
     *
     *   - 'match', 'mask', and 'wildcard' to all fields.
     *
     * If 'stats' is nonnull, it also contains 'ofproto->n_tables' elements.
     * Each element is initialized as:
     *
     *   - 'table_id' to the array index.
     *
     *   - 'active_count' to the 'n_flows' of struct ofproto for the table.
     *
     *   - 'lookup_count' and 'matched_count' to 0.
     *
     * The implementation should update any members in each element for which
     * it has better values:
     *
     *   - Any member of 'features' to better describe the implementation's
     *     capabilities.
     *
     *   - 'lookup_count' to the number of packets looked up in this flow table
     *     so far.
     *
     *   - 'matched_count' to the number of packets looked up in this flow
     *     table so far that matched one of the flow entries.
     */
    void (*query_tables)(struct ofproto *ofproto,
                         struct ofputil_table_features *features,
                         struct ofputil_table_stats *stats);
 
    /* Sets the current tables version the provider should use for classifier
     * lookups.  This must be called with a new version number after each set
     * of flow table changes has been completed, so that datapath revalidation
     * can be triggered. */
    void (*set_tables_version)(struct ofproto *ofproto, ovs_version_t version);
 
    struct ofproto *(*alloc)(void);
    int (*construct)(struct ofproto *ofproto);
    void (*destruct)(struct ofproto *ofproto);
    void (*dealloc)(struct ofproto *ofproto);
 
    struct ofport *(*port_alloc)(void);
    int (*port_construct)(struct ofport *ofport);
    void (*port_destruct)(struct ofport *ofport, bool del);
    void (*port_dealloc)(struct ofport *ofport);
 
    /* Called after 'ofport->netdev' is replaced by a new netdev object.  If
     * the ofproto implementation uses the ofport's netdev internally, then it
     * should switch to using the new one.  The old one has been closed.
     *
     * An ofproto implementation that doesn't need to do anything in this
     * function may use a null pointer. */
    void (*port_modified)(struct ofport *ofport);
 
    /* Called after an OpenFlow request changes a port's configuration.
     * 'ofport->pp.config' contains the new configuration.  'old_config'
     * contains the previous configuration.
     *
     * The caller implements OFPUTIL_PC_PORT_DOWN using netdev functions to
     * turn NETDEV_UP on and off, so this function doesn't have to do anything
     * for that bit (and it won't be called if that is the only bit that
     * changes). */
    void (*port_reconfigured)(struct ofport *ofport,
                              enum ofputil_port_config old_config);
 
    /* Looks up a port named 'devname' in 'ofproto'.  On success, returns 0 and
     * initializes '*port' appropriately. Otherwise, returns a positive errno
     * value.
     *
     * The caller owns the data in 'port' and must free it with
     * ofproto_port_destroy() when it is no longer needed. */
    int (*port_query_by_name)(const struct ofproto *ofproto,
                              const char *devname, struct ofproto_port *port);
 
    /* Attempts to add 'netdev' as a port on 'ofproto'.  Returns 0 if
     * successful, otherwise a positive errno value.  The caller should
     * inform the implementation of the OpenFlow port through the
     * ->port_construct() method.
     *
     * It doesn't matter whether the new port will be returned by a later call
     * to ->port_poll(); the implementation may do whatever is more
     * convenient. */
    int (*port_add)(struct ofproto *ofproto, struct netdev *netdev);
 
    /* Deletes port number 'ofp_port' from the datapath for 'ofproto'.  Returns
     * 0 if successful, otherwise a positive errno value.
     *
     * It doesn't matter whether the new port will be returned by a later call
     * to ->port_poll(); the implementation may do whatever is more
     * convenient. */
    int (*port_del)(struct ofproto *ofproto, ofp_port_t ofp_port);
 
    /* Refreshes datapath configuration of 'port'.
     * Returns 0 if successful, otherwise a positive errno value. */
    int (*port_set_config)(const struct ofport *port, const struct smap *cfg);
 
    /* Get port stats */
    int (*port_get_stats)(const struct ofport *port,
                          struct netdev_stats *stats);
 
    /* Port iteration functions.
     *
     * The client might not be entirely in control of the ports within an
     * ofproto.  Some hardware implementations, for example, might have a fixed
     * set of ports in a datapath.  For this reason, the client needs a way to
     * iterate through all the ports that are actually in a datapath.  These
     * functions provide that functionality.
     *
     * The 'state' pointer provides the implementation a place to
     * keep track of its position.  Its format is opaque to the caller.
     *
     * The ofproto provider retains ownership of the data that it stores into
     * ->port_dump_next()'s 'port' argument.  The data must remain valid until
     * at least the next call to ->port_dump_next() or ->port_dump_done() for
     * 'state'.  The caller will not modify or free it.
     *
     * Details
     * =======
     *
     * ->port_dump_start() attempts to begin dumping the ports in 'ofproto'.
     * On success, it should return 0 and initialize '*statep' with any data
     * needed for iteration.  On failure, returns a positive errno value, and
     * the client will not call ->port_dump_next() or ->port_dump_done().
     *
     * ->port_dump_next() attempts to retrieve another port from 'ofproto' for
     * 'state'.  If there is another port, it should store the port's
     * information into 'port' and return 0.  It should return EOF if all ports
     * have already been iterated.  Otherwise, on error, it should return a
     * positive errno value.  This function will not be called again once it
     * returns nonzero once for a given iteration (but the 'port_dump_done'
     * function will be called afterward).
     *
     * ->port_dump_done() allows the implementation to release resources used
     * for iteration.  The caller might decide to stop iteration in the middle
     * by calling this function before ->port_dump_next() returns nonzero.
     *
     * Usage Example
     * =============
     *
     * int error;
     * void *state;
     *
     * error = ofproto->ofproto_class->port_dump_start(ofproto, &state);
     * if (!error) {
     *     for (;;) {
     *         struct ofproto_port port;
     *
     *         error = ofproto->ofproto_class->port_dump_next(
     *                     ofproto, state, &port);
     *         if (error) {
     *             break;
     *         }
     *         // Do something with 'port' here (without modifying or freeing
     *         // any of its data).
     *     }
     *     ofproto->ofproto_class->port_dump_done(ofproto, state);
     * }
     * // 'error' is now EOF (success) or a positive errno value (failure).
     */
    int (*port_dump_start)(const struct ofproto *ofproto, void **statep);
    int (*port_dump_next)(const struct ofproto *ofproto, void *state,
                          struct ofproto_port *port);
    int (*port_dump_done)(const struct ofproto *ofproto, void *state);
 
    struct rule *(*rule_alloc)(void);
    enum ofperr (*rule_construct)(struct rule *rule)
        /* OVS_REQUIRES(ofproto_mutex) */;
    void (*rule_insert)(struct rule *rule, struct rule *old_rule,
                        bool forward_counts)
        /* OVS_REQUIRES(ofproto_mutex) */;
    void (*rule_delete)(struct rule *rule) /* OVS_REQUIRES(ofproto_mutex) */;
    void (*rule_destruct)(struct rule *rule);
    void (*rule_dealloc)(struct rule *rule);
 
    /* Applies the actions in 'rule' to 'packet'.  (This implements sending
     * buffered packets for OpenFlow OFPT_FLOW_MOD commands.)
     *
     * Takes ownership of 'packet' (so it should eventually free it, with
     * ofpbuf_delete()).
     *
     * 'flow' reflects the flow information for 'packet'.  All of the
     * information in 'flow' is extracted from 'packet', except for
     * flow->tunnel and flow->in_port, which are assigned the correct values
     * for the incoming packet.  The register values are zeroed.  'packet''s
     * header pointers and offsets (e.g. packet->l3) are appropriately
     * initialized.  packet->l3 is aligned on a 32-bit boundary.
     *
     * The implementation should add the statistics for 'packet' into 'rule'.
     *
     * Returns 0 if successful, otherwise an OpenFlow error code. */
    enum ofperr (*rule_execute)(struct rule *rule, const struct flow *flow,
                                struct dp_packet *packet);
 
    /* Implements the OpenFlow OFPT_PACKET_OUT command.  The datapath should
     * execute the 'ofpacts_len' bytes of "struct ofpacts" in 'ofpacts'.
     *
     * The caller retains ownership of 'packet' and of 'ofpacts', so
     * ->packet_out() should not modify or free them.
     *
     * This function must validate that it can correctly implement 'ofpacts'.
     * If not, then it should return an OpenFlow error code.
     *
     * 'flow' reflects the flow information for 'packet'.  All of the
     * information in 'flow' is extracted from 'packet', except for
     * flow->in_port (see below).  flow->tunnel and its register values are
     * zeroed.
     *
     * flow->in_port comes from the OpenFlow OFPT_PACKET_OUT message.  The
     * implementation should reject invalid flow->in_port values by returning
     * OFPERR_OFPBRC_BAD_PORT.  (If the implementation called
     * ofproto_init_max_ports(), then the client will reject these ports
     * itself.)  For consistency, the implementation should consider valid for
     * flow->in_port any value that could possibly be seen in a packet that it
     * passes to connmgr_send_packet_in().  Ideally, even an implementation
     * that never generates packet-ins (e.g. due to hardware limitations)
     * should still allow flow->in_port values for every possible physical port
     * and OFPP_LOCAL.  The only virtual ports (those above OFPP_MAX) that the
     * caller will ever pass in as flow->in_port, other than OFPP_LOCAL, are
     * OFPP_NONE and OFPP_CONTROLLER.  The implementation should allow both of
     * these, treating each of them as packets generated by the controller as
     * opposed to packets originating from some switch port.
     *
     * (Ordinarily the only effect of flow->in_port is on output actions that
     * involve the input port, such as actions that output to OFPP_IN_PORT,
     * OFPP_FLOOD, or OFPP_ALL.  flow->in_port can also affect Nicira extension
     * "resubmit" actions.)
     *
     * 'packet' is not matched against the OpenFlow flow table, so its
     * statistics should not be included in OpenFlow flow statistics.
     *
     * Returns 0 if successful, otherwise an OpenFlow error code. */
    enum ofperr (*packet_out)(struct ofproto *ofproto, struct dp_packet *packet,
                              const struct flow *flow,
                              const struct ofpact *ofpacts,
                              size_t ofpacts_len);
 
    enum ofperr (*nxt_resume)(struct ofproto *ofproto,
                              const struct ofputil_packet_in_private *);
 
    /* Registers meta-data associated with the 'n_qdscp' Qualities of Service
     * 'queues' attached to 'ofport'.  This data is not intended to be
     * sufficient to implement QoS.  Instead, providers may use this
     * information to implement features which require knowledge of what queues
     * exist on a port, and some basic information about them.
     *
     * EOPNOTSUPP as a return value indicates that this ofproto_class does not
     * support QoS, as does a null pointer. */
    int (*set_queues)(struct ofport *ofport,
                      const struct ofproto_port_queue *queues, size_t n_qdscp);
 
    /* If 's' is nonnull, this function registers a "bundle" associated with
     * client data pointer 'aux' in 'ofproto'.  A bundle is the same concept as
     * a Port in OVSDB, that is, it consists of one or more "slave" devices
     * (Interfaces, in OVSDB) along with VLAN and LACP configuration and, if
     * there is more than one slave, a bonding configuration.  If 'aux' is
     * already registered then this function updates its configuration to 's'.
     * Otherwise, this function registers a new bundle.
     *
     * If 's' is NULL, this function unregisters the bundle registered on
     * 'ofproto' associated with client data pointer 'aux'.  If no such bundle
     * has been registered, this has no effect.
     *
     * This function affects only the behavior of the NXAST_AUTOPATH action and
     * output to the OFPP_NORMAL port.  An implementation that does not support
     * it at all may set it to NULL or return EOPNOTSUPP.  An implementation
     * that supports only a subset of the functionality should implement what
     * it can and return 0. */
    int (*bundle_set)(struct ofproto *ofproto, void *aux,
                      const struct ofproto_bundle_settings *s);
 
    /* If 'port' is part of any bundle, removes it from that bundle.  If the
     * bundle now has no ports, deletes the bundle.  If the bundle now has only
     * one port, deconfigures the bundle's bonding configuration. */
    void (*bundle_remove)(struct ofport *ofport);
 
    /* These functions should be NULL if an implementation does not support
     * them.  They must be all null or all non-null.. */
 
    /* Initializes 'features' to describe the metering features supported by
     * 'ofproto'. */
    void (*meter_get_features)(const struct ofproto *ofproto,
                               struct ofputil_meter_features *features);
 
    /* If '*id' is UINT32_MAX, adds a new meter with the given 'config'.  On
     * success the function must store a provider meter ID other than
     * UINT32_MAX in '*id'.  All further references to the meter will be made
     * with the returned provider meter id rather than the OpenFlow meter id.
     * The caller does not try to interpret the provider meter id, giving the
     * implementation the freedom to either use the OpenFlow meter_id value
     * provided in the meter configuration, or any other value suitable for the
     * implementation.
     *
     * If '*id' is a value other than UINT32_MAX, modifies the existing meter
     * with that meter provider ID to have configuration 'config', while
     * leaving '*id' unchanged.  On failure, the existing meter configuration
     * is left intact. */
    enum ofperr (*meter_set)(struct ofproto *ofproto, ofproto_meter_id *id,
                             const struct ofputil_meter_config *config);
 
    /* Gets the meter and meter band packet and byte counts for maximum of
     * 'stats->n_bands' bands for the meter with provider ID 'id' within
     * 'ofproto'.  The caller fills in the other stats values.  The band stats
     * are copied to memory at 'stats->bands' provided by the caller.  The
     * number of returned band stats is returned in 'stats->n_bands'. */
    enum ofperr (*meter_get)(const struct ofproto *ofproto,
                             ofproto_meter_id id,
                             struct ofputil_meter_stats *stats);
 
    /* Deletes a meter, making the 'ofproto_meter_id' invalid for any
     * further calls. */
    void (*meter_del)(struct ofproto *, ofproto_meter_id);
 
/* ## --------------------- ## */
/* ## Datapath information  ## */
/* ## --------------------- ## */
    /* Retrieve the version string of the datapath. The version
     * string can be NULL if it can not be determined.
     *
     * The version retuned is read only. The caller should not
     * free it.
     *
     * This function should be NULL if an implementation does not support it.
     */
    const char *(*get_datapath_version)(const struct ofproto *);
 
/* ## ------------------- ## */
/* ## Connection tracking ## */
/* ## ------------------- ## */
    /* Flushes the connection tracking tables. If 'zone' is not NULL,
     * only deletes connections in '*zone'. */
    void (*ct_flush)(const struct ofproto *, const uint16_t *zone);
};

下面來看看ofproto_class的具體實現ofproto-dpif,ofproto-dpif用於datapath未命中的報文,通過查找openflow流表計算出具體的action,並下發到datapath中,同時ofproto-dpif還會進一步把報文上送給ofproto,最終交給集中式控制器來處理。

/* Ofproto-dpif -- DPIF based ofproto implementation.
 *
 * Ofproto-dpif provides an ofproto implementation for those platforms which
 * implement the netdev and dpif interface defined in netdev.h and dpif.h.  The
 * most important of which is the Linux Kernel Module (dpif-linux), but
 * alternatives are supported such as a userspace only implementation
 * (dpif-netdev), and a dummy implementation used for unit testing.
 *
 * Ofproto-dpif is divided into three major chunks.
 *
 * - ofproto-dpif.c
 *   The main ofproto-dpif module is responsible for implementing the
 *   provider interface, installing and removing datapath flows, maintaining
 *   packet statistics, running protocols (BFD, LACP, STP, etc), and
 *   configuring relevant submodules.
 *
 * - ofproto-dpif-upcall.c
 *   Ofproto-dpif-upcall is responsible for retrieving upcalls from the kernel,
 *   processing miss upcalls, and handing more complex ones up to the main
 *   ofproto-dpif module.  Miss upcall processing boils down to figuring out
 *   what each packet's actions are, executing them (i.e. asking the kernel to
 *   forward it), and handing it up to ofproto-dpif to decided whether or not
 *   to install a kernel flow.
 *
 * - ofproto-dpif-xlate.c
 *   Ofproto-dpif-xlate is responsible for translating OpenFlow actions into
 *   datapath actions. */

ofproto-dpif的結構圖如下

           |   +-------------------+
           |   |    ovs-vswitchd   |<-->ovsdb-server
           |   +-------------------+
           |   |      ofproto      |<-->OpenFlow controllers
           |   +--------+-+--------+  _
           |   | netdev | |ofproto-|   |
 userspace |   +--------+ |  dpif  |   |
           |   | netdev | +--------+   |
           |   |provider| |  dpif  |   |
           |   +---||---+ +--------+   |
           |       ||     |  dpif  |   | implementation of
           |       ||     |provider|   | ofproto provider
           |_      ||     +---||---+   |
                   ||         ||       |
            _  +---||-----+---||---+   |
           |   |          |datapath|   |
    kernel |   |          +--------+  _|
           |   |                   |
           |_  +--------||---------+
                        ||
                     physical
                       NIC

struct dpif_class

struct dpif_class是datapath interface實現的工廠接口類,用於和實際的datapath, e.g. openvswitch.ko或者userspace datapath交互。目前已有的兩個dpif的實現是dpif-netlink和dpif-netdev,前者是基於內核datapath的dpif實現,後者基於用戶態datapath。代碼可以在lib/dpif-netlink.c以及lib/dpif-netdev.c裏找到。

struct dpif_class的接口定義在lib/dpif-provider.h中,僅供參考

原文鏈接:https://blog.csdn.net/majieyue/article/details/52746921

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章