Neutron-server初始化 — Neutron L2 Agent服務初始化

OpenvSwitch,簡稱OVS是一個虛擬交換軟件,主要用於虛擬機VM環境,作爲一個虛擬交換機,支持Xen/XenServer, KVM, and VirtualBox多種虛擬化技術。在這種某一臺機器的虛擬化的環境中,一個虛擬交換機(vswitch)主要有兩個作用:1. 傳遞虛擬機VM之間的流量。2. 實現VM和外界網絡的通信。

在openstack中目前用的比較多的L2層agent應該就是openvswitch agent了。本文大致分析了一下openvswithc agent做了哪些事。

Ovs agent初始化

以常用的openvswitch agent爲例,可以執行以下命令啓動agent服務:
CLI:

service neutron-openvswitch-agent start

setup.cfg配置文件的以下內容可以知道,實際執行的方法是:
neutron.plugins.openvswitch.agent.ovs_neutron_agent:main

[entry_points]  
console-scripts =   
    ...  
    neutron-openvswitch-agent = neutron.plugins.openvswitch.agent.ovs_neutron_agent:main  
    ...

a. 啓動過程解析

neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:main

def main(bridge_classes):
    try:
        # 從配置文件中讀取agent的配置,主要是network_mappings,各個bridges名稱 
        agent_config = create_agent_config_map(cfg.CONF)
    except ValueError:
        LOG.exception(_LE("Agent failed to create agent config map"))
        raise SystemExit(1)
    prepare_xen_compute()
    validate_local_ip(agent_config['local_ip'])
    try:
        # 創建agent實例
        agent = OVSNeutronAgent(bridge_classes, **agent_config)
    except (RuntimeError, ValueError) as e:
        LOG.error(_LE("%s Agent terminated!"), e)
        sys.exit(1)
    # Agent initialized successfully
    agent.daemon_loop()

啓動時做了以下工作:
1. 設置plugin_rpc,這是用來與neutron-server通信的。
2. 設置state_rpc,用於agent狀態信息上報。
3. 設置connection,用於接收neutron-server的消息。
4. 啓動狀態週期上報。
5. 設置br-int。
6. 設置bridge_mapping對應的網橋。
7. 初始化sg_agent,用於處理security group。
8. 週期檢測br-int上的端口變化,調用process_network_ports處理添加/刪除端口。

b. neutron-server/nova與ovs agent的交互解析

  1. neutron-server和neutron-openvswitch-agent的消息隊列如下:
    這裏寫圖片描述
    neutron-server可能會發生上述四種消息廣播給neutron-openvswitch-agent。openvswitch agent會先看一下端口是否在本地,如果在本地則進行對應動作。

  2. nova與neutron-openvswitch-agent的交互,這張圖片來源於GongYongSheng在香港峯會的PPT:
    這裏寫圖片描述
    首先boot虛機時,nova-compute發消息給neutron-server請求創建port。之後,在driver裏面在br-int上建立port後,neutron-openvswitch-port循環檢測br-int會發現新增端口,對其設定合適的openflow規則以及localvlan,最後將port狀態設置爲ACTIVE。

neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:_init_

c. OVSNeutronAgent函數解析

在OVSNeutronAgent的docstring中,概要說明了agent實現虛擬的方式,有以下幾點:
1) 創建br-int, br-tun以及每個物理網絡接口一個bridge。
2) 虛擬機的虛擬網卡都會接入到br-int。使用同一個虛擬網絡的虛擬網卡共享一個local的VLAN(與外部網絡的VLAN無關,vlan id可以重疊)。這個local的VLAN id會映射到外部網絡的某個VLAN id。
3) 對於network_type是VLAN或者FLAT的網絡,在br-int和各個物理網絡bridge之間創建一個虛擬網卡,用於限定流規則、映射或者刪除VLAN id等處理。
4) 對於network_type是GRE的,每個租戶在不同hypervisor之間的網絡通信通過一個邏輯交換機標識符(Logical Switch identifier)進行區分,並創建一個連通各個hypervisor的br-tun的通道(tunnel)網絡。Port patching用於連通br-int和各個hypervisor的br-tun上的VLAN。

neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:OVSNeutronAgent

class OVSNeutronAgent(sg_rpc.SecurityGroupAgentRpcCallbackMixin,
                      l2population_rpc.L2populationRpcCallBackTunnelMixin,
                      dvr_rpc.DVRAgentRpcCallbackMixin):
    '''Implements OVS-based tunneling, VLANs and flat networks.

    Two local bridges are created: an integration bridge (defaults to
    'br-int') and a tunneling bridge (defaults to 'br-tun'). An
    additional bridge is created for each physical network interface
    used for VLANs and/or flat networks.

    All VM VIFs are plugged into the integration bridge. VM VIFs on a
    given virtual network share a common "local" VLAN (i.e. not
    propagated externally). The VLAN id of this local VLAN is mapped
    to the physical networking details realizing that virtual network.

    For virtual networks realized as GRE tunnels, a Logical Switch
    (LS) identifier is used to differentiate tenant traffic on
    inter-HV tunnels. A mesh of tunnels is created to other
    Hypervisors in the cloud. These tunnels originate and terminate on
    the tunneling bridge of each hypervisor. Port patching is done to
    connect local VLANs on the integration bridge to inter-hypervisor
    tunnels on the tunnel bridge.

    For each virtual network realized as a VLAN or flat network, a
    veth or a pair of patch ports is used to connect the local VLAN on
    the integration bridge with the physical network bridge, with flow
    rules adding, modifying, or stripping VLAN tags as necessary.
    '''

    # history
    #   1.0 Initial version
    #   1.1 Support Security Group RPC
    #   1.2 Support DVR (Distributed Virtual Router) RPC
    #   1.3 Added param devices_to_update to security_groups_provider_updated
    #   1.4 Added support for network_update
    target = oslo_messaging.Target(version='1.4')

    def __init__(self, bridge_classes, integ_br, tun_br, local_ip,
                 bridge_mappings, polling_interval, tunnel_types=None,
                 veth_mtu=None, l2_population=False,
                 enable_distributed_routing=False,
                 minimize_polling=False,
                 ovsdb_monitor_respawn_interval=(
                     constants.DEFAULT_OVSDBMON_RESPAWN),
                 arp_responder=False,
                 prevent_arp_spoofing=True,
                 use_veth_interconnection=False,
                 quitting_rpc_timeout=None,
                 conf=None):
        '''Constructor.

        :param bridge_classes: a dict for bridge classes.
        :param integ_br: name of the integration bridge.
        :param tun_br: name of the tunnel bridge.
        :param local_ip: local IP address of this hypervisor.
        :param bridge_mappings: mappings from physical network name to bridge.
        :param polling_interval: interval (secs) to poll DB.
        :param tunnel_types: A list of tunnel types to enable support for in
               the agent. If set, will automatically set enable_tunneling to
               True.
        :param veth_mtu: MTU size for veth interfaces.
        :param l2_population: Optional, whether L2 population is turned on
        :param minimize_polling: Optional, whether to minimize polling by
               monitoring ovsdb for interface changes.
        :param ovsdb_monitor_respawn_interval: Optional, when using polling
               minimization, the number of seconds to wait before respawning
               the ovsdb monitor.
        :param arp_responder: Optional, enable local ARP responder if it is
               supported.
        :param prevent_arp_spoofing: Optional, enable suppression of any ARP
               responses from ports that don't match an IP address that belongs
               to the ports. Spoofing rules will not be added to ports that
               have port security disabled.
        :param use_veth_interconnection: use veths instead of patch ports to
               interconnect the integration bridge to physical bridges.
        :param quitting_rpc_timeout: timeout in seconds for rpc calls after
               SIGTERM is received
        :param conf: an instance of ConfigOpts
        '''
        super(OVSNeutronAgent, self).__init__()
        self.conf = conf or cfg.CONF

        self.fullsync = True
        # init bridge classes with configured datapath type.
        self.br_int_cls, self.br_phys_cls, self.br_tun_cls = (
            functools.partial(bridge_classes[b],
                              datapath_type=self.conf.OVS.datapath_type)
            for b in ('br_int', 'br_phys', 'br_tun'))

        self.use_veth_interconnection = use_veth_interconnection
        self.veth_mtu = veth_mtu
        # local VLAN id範圍是[1, 2094]
        self.available_local_vlans = set(moves.range(p_const.MIN_VLAN_TAG,
                                                     p_const.MAX_VLAN_TAG))
        self.tunnel_types = tunnel_types or []
        self.l2_pop = l2_population
        # TODO(ethuleau): Change ARP responder so it's not dependent on the
        #                 ML2 l2 population mechanism driver.
        self.enable_distributed_routing = enable_distributed_routing
        self.arp_responder_enabled = arp_responder and self.l2_pop
        self.prevent_arp_spoofing = prevent_arp_spoofing

        if tunnel_types:
            self.enable_tunneling = True
        else:
            self.enable_tunneling = False

        # Validate agent configurations
        self._check_agent_configurations()

        # Keep track of int_br's device count for use by _report_state()
        self.int_br_device_count = 0

        self.agent_uuid_stamp = uuid.uuid4().int & UINT64_BITMASK
        # 創建br-int,重置流表規則等,通過調用brctl, ovs-vsctl, ip等命令實現 
        self.int_br = self.br_int_cls(integ_br)
        self.setup_integration_br()
        # Stores port update notifications for processing in main rpc loop
        # Stores port update notifications for processing in main rpc loop 
        self.updated_ports = set()
        # Stores port delete notifications
        self.deleted_ports = set()

        self.network_ports = collections.defaultdict(set)
        # keeps association between ports and ofports to detect ofport change
        self.vifname_to_ofport_map = {}
        # 配置plugin的rpcapi連接(topic='q-plugin',接口neutron.agent.rpc.py:PluginApi)並監聽其它服務對agent的rpc的調用(topic='q-agent-notifier')
        self.setup_rpc()
        self.init_extension_manager(self.connection)
        # 配置文件中傳入的參數 
        self.bridge_mappings = bridge_mappings
        # 給每個mapping創建一個bridge,並連接到br-int
        self.setup_physical_bridges(self.bridge_mappings)
        self.local_vlan_map = {}

        self._reset_tunnel_ofports()

        self.polling_interval = polling_interval
        self.minimize_polling = minimize_polling
        self.ovsdb_monitor_respawn_interval = ovsdb_monitor_respawn_interval
        self.local_ip = local_ip
        self.tunnel_count = 0
        self.vxlan_udp_port = self.conf.AGENT.vxlan_udp_port
        self.dont_fragment = self.conf.AGENT.dont_fragment
        self.tunnel_csum = cfg.CONF.AGENT.tunnel_csum
        self.tun_br = None
        self.patch_int_ofport = constants.OFPORT_INVALID
        self.patch_tun_ofport = constants.OFPORT_INVALID
        if self.enable_tunneling:
            # The patch_int_ofport and patch_tun_ofport are updated
            # here inside the call to setup_tunnel_br()
            self.setup_tunnel_br(tun_br)

        self.dvr_agent = ovs_dvr_neutron_agent.OVSDVRNeutronAgent(
            self.context,
            self.dvr_plugin_rpc,
            self.int_br,
            self.tun_br,
            self.bridge_mappings,
            self.phys_brs,
            self.int_ofports,
            self.phys_ofports,
            self.patch_int_ofport,
            self.patch_tun_ofport,
            self.conf.host,
            self.enable_tunneling,
            self.enable_distributed_routing)

        self.agent_state = {
            'binary': 'neutron-openvswitch-agent',
            'host': self.conf.host,
            'topic': n_const.L2_AGENT_TOPIC,
            'configurations': {'bridge_mappings': bridge_mappings,
                               'tunnel_types': self.tunnel_types,
                               'tunneling_ip': local_ip,
                               'l2_population': self.l2_pop,
                               'arp_responder_enabled':
                               self.arp_responder_enabled,
                               'enable_distributed_routing':
                               self.enable_distributed_routing,
                               'log_agent_heartbeats':
                               self.conf.AGENT.log_agent_heartbeats,
                               'extensions': self.ext_manager.names()},
            'agent_type': self.conf.AGENT.agent_type,
            'start_flag': True}

        report_interval = self.conf.AGENT.report_interval
        if report_interval:
            heartbeat = loopingcall.FixedIntervalLoopingCall(
                self._report_state)
            heartbeat.start(interval=report_interval)

        if self.enable_tunneling:
            self.setup_tunnel_br_flows()

        self.dvr_agent.setup_dvr_flows()

        # Collect additional bridges to monitor
        self.ancillary_brs = self.setup_ancillary_bridges(integ_br, tun_br)

        # In order to keep existed device's local vlan unchanged,
        # restore local vlan mapping at start
        self._restore_local_vlan_map()

        # Security group agent support
        # 創建tunnel的代碼省略  
        # Security group agent supprot
        self.sg_agent = sg_rpc.SecurityGroupAgentRpc(self.context,
                self.sg_plugin_rpc, self.local_vlan_map,
                defer_refresh_firewall=True)

        # Initialize iteration counter
        self.iter_num = 0
        self.run_daemon_loop = True

        self.catch_sigterm = False
        self.catch_sighup = False

        # The initialization is complete; we can start receiving messages
        self.connection.consume_in_threads()

        self.quitting_rpc_timeout = quitting_rpc_timeout

d. 啓動agent.daemon_loop()

OVSNeutronAgent初始化完成後啓動agent.daemon_loop()

1) daemon_loop
    def daemon_loop(self):
        # Start everything.
        LOG.info(_LI("Agent initialized successfully, now running... "))
        signal.signal(signal.SIGTERM, self._handle_sigterm)
        if hasattr(signal, 'SIGHUP'):
            signal.signal(signal.SIGHUP, self._handle_sighup)
        with polling.get_polling_manager(
            self.minimize_polling,
            self.ovsdb_monitor_respawn_interval) as pm:

            self.rpc_loop(polling_manager=pm)
2) rpc_loop

rpc_loop()中最重要的兩個函數爲tunnel_sync(查詢並建立隧道)和process_network_ports(處理port和安全組變更)

    def rpc_loop(self, polling_manager=None):
        if not polling_manager:
            polling_manager = polling.get_polling_manager(
                minimize_polling=False)
        sync = True
        ports = set()
        updated_ports_copy = set()
        ancillary_ports = set()
        tunnel_sync = True
        ovs_restarted = False
        consecutive_resyncs = 0
        need_clean_stale_flow = True
        while self._check_and_handle_signal():
            if self.fullsync:
                LOG.info(_LI("rpc_loop doing a full sync."))
                sync = True
                self.fullsync = False
            port_info = {}
            ancillary_port_info = {}
            start = time.time()
            LOG.debug("Agent rpc_loop - iteration:%d started",
                      self.iter_num)
            if sync:
                LOG.info(_LI("Agent out of sync with plugin!"))
                polling_manager.force_polling()
                consecutive_resyncs = consecutive_resyncs + 1
                if consecutive_resyncs >= constants.MAX_DEVICE_RETRIES:
                    LOG.warn(_LW("Clearing cache of registered ports, retrials"
                                 " to resync were > %s"),
                             constants.MAX_DEVICE_RETRIES)
                    ports.clear()
                    ancillary_ports.clear()
                    sync = False
                    consecutive_resyncs = 0
            else:
                consecutive_resyncs = 0
            ovs_status = self.check_ovs_status()
            if ovs_status == constants.OVS_RESTARTED:
                self.setup_integration_br()
                self.setup_physical_bridges(self.bridge_mappings)
                if self.enable_tunneling:
                    self._reset_tunnel_ofports()
                    self.setup_tunnel_br()
                    self.setup_tunnel_br_flows()
                    tunnel_sync = True
                if self.enable_distributed_routing:
                    self.dvr_agent.reset_ovs_parameters(self.int_br,
                                                 self.tun_br,
                                                 self.patch_int_ofport,
                                                 self.patch_tun_ofport)
                    self.dvr_agent.reset_dvr_parameters()
                    self.dvr_agent.setup_dvr_flows()
            elif ovs_status == constants.OVS_DEAD:
                # Agent doesn't apply any operations when ovs is dead, to
                # prevent unexpected failure or crash. Sleep and continue
                # loop in which ovs status will be checked periodically.
                port_stats = self.get_port_stats({}, {})
                self.loop_count_and_wait(start, port_stats)
                continue
            # Notify the plugin of tunnel IP
            if self.enable_tunneling and tunnel_sync:
                LOG.info(_LI("Agent tunnel out of sync with plugin!"))
                try:
                    tunnel_sync = self.tunnel_sync()
                except Exception:
                    LOG.exception(_LE("Error while synchronizing tunnels"))
                    tunnel_sync = True
            ovs_restarted |= (ovs_status == constants.OVS_RESTARTED)
            if self._agent_has_updates(polling_manager) or ovs_restarted:
                try:
                    LOG.debug("Agent rpc_loop - iteration:%(iter_num)d - "
                              "starting polling. Elapsed:%(elapsed).3f",
                              {'iter_num': self.iter_num,
                               'elapsed': time.time() - start})
                    # Save updated ports dict to perform rollback in
                    # case resync would be needed, and then clear
                    # self.updated_ports. As the greenthread should not yield
                    # between these two statements, this will be thread-safe
                    updated_ports_copy = self.updated_ports
                    self.updated_ports = set()
                    reg_ports = (set() if ovs_restarted else ports)
                    # 從br-int確定配置更新或者刪除的端口信息  
                    port_info = self.scan_ports(reg_ports, sync,
                                                updated_ports_copy)
                    self.process_deleted_ports(port_info)
                    ofport_changed_ports = self.update_stale_ofport_rules()
                    if ofport_changed_ports:
                        port_info.setdefault('updated', set()).update(
                            ofport_changed_ports)
                    LOG.debug("Agent rpc_loop - iteration:%(iter_num)d - "
                              "port information retrieved. "
                              "Elapsed:%(elapsed).3f",
                              {'iter_num': self.iter_num,
                               'elapsed': time.time() - start})
                    # Treat ancillary devices if they exist
                    if self.ancillary_brs:
                        ancillary_port_info = self.scan_ancillary_ports(
                            ancillary_ports, sync)
                        LOG.debug("Agent rpc_loop - iteration:%(iter_num)d - "
                                  "ancillary port info retrieved. "
                                  "Elapsed:%(elapsed).3f",
                                  {'iter_num': self.iter_num,
                                   'elapsed': time.time() - start})
                    sync = False
                    # Secure and wire/unwire VIFs and update their status
                    # on Neutron server
                    if (self._port_info_has_changes(port_info) or
                        self.sg_agent.firewall_refresh_needed() or
                        ovs_restarted):
                        LOG.debug("Starting to process devices in:%s",
                                  port_info)
                        # If treat devices fails - must resync with plugin
                        # # If treat devices fails - must resync with plugin  
                        # 這個方法會從plugin查詢port的詳情,根據port的admin_state_up狀態,分別執行self.port_bound()或者self.port_dead()  
                        # 並調用plugin rpc的update_device_up或update_device_down方法更新端口狀態  
                        sync = self.process_network_ports(port_info,
                                                          ovs_restarted)
                        if not sync and need_clean_stale_flow:
                            self.cleanup_stale_flows()
                            need_clean_stale_flow = False
                        LOG.debug("Agent rpc_loop - iteration:%(iter_num)d - "
                                  "ports processed. Elapsed:%(elapsed).3f",
                                  {'iter_num': self.iter_num,
                                   'elapsed': time.time() - start})

                    ports = port_info['current']

                    if self.ancillary_brs:
                        sync |= self.process_ancillary_network_ports(
                            ancillary_port_info)
                        LOG.debug("Agent rpc_loop - iteration: "
                                  "%(iter_num)d - ancillary ports "
                                  "processed. Elapsed:%(elapsed).3f",
                                  {'iter_num': self.iter_num,
                                   'elapsed': time.time() - start})
                        ancillary_ports = ancillary_port_info['current']

                    polling_manager.polling_completed()
                    # Keep this flag in the last line of "try" block,
                    # so we can sure that no other Exception occurred.
                    if not sync:
                        ovs_restarted = False
                        self._dispose_local_vlan_hints()
                except Exception:
                    LOG.exception(_LE("Error while processing VIF ports"))
                    # Put the ports back in self.updated_port
                    self.updated_ports |= updated_ports_copy
                    sync = True
            port_stats = self.get_port_stats(port_info, ancillary_port_info)
            self.loop_count_and_wait(start, port_stats)

參考:
about雲: http://www.aboutyun.com/thread-10306-1-1.html
csdn blog: http://blog.csdn.net/u013920085/article/details/50099147

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章