stackstorm 29. 源碼分析之----stackstorm的actionrunner服務併發能力分析

目標:
弄清楚st2actionrunner的併發處理能力

1 分析actionrunner邏輯流程
在st2actionrunner服務中兩個消費者服務ActionExecutionScheduler和ActionExecutionDispatcher,
其中ActionExecutionScheduler服務既是消費者,也作爲生產者發送消息給ActionExecutionDispatcher處理,
ActionExecutionDispatcher接收到消息調用runner去執行action,生成execution,最終完成整個處理過程。
如果採用mistral runner則會建立workflow和execution, st2會關聯mistral中的execution到st2本身的execution,
用於追溯execution的執行情況。

2 ActionExecutionScheduler也是發送消息的,所以真正的執行runner是
ActionExecutionDispatcher,其調用RunnerContainer。
其中python runner在:
st2/contrib/runners/python_runner/python_runner/python_runner.py
PythonRunner類的run方法調用run_command方法
對應於
from st2common.util.green.shell import run_command
進入:
st2/st2common/st2common/util/green/shell.py的
def run_command(cmd, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False,
                cwd=None, env=None, timeout=60, preexec_func=None, kill_func=None,
                read_stdout_func=None, read_stderr_func=None,
                read_stdout_buffer=None, read_stderr_buffer=None):
    """
    Run the provided command in a subprocess and wait until it completes.

    :param cmd: Command to run.
    :type cmd: ``str`` or ``list``

    :param stdin: Process stdin.
    :type stdin: ``object``

    :param stdout: Process stdout.
    :type stdout: ``object``

    :param stderr: Process stderr.
    :type stderr: ``object``

    :param shell: True to use a shell.
    :type shell ``boolean``

    :param cwd: Optional working directory.
    :type cwd: ``str``

    :param env: Optional environment to use with the command. If not provided,
                environment from the current process is inherited.
    :type env: ``dict``

    :param timeout: How long to wait before timing out.
    :type timeout: ``float``

    :param preexec_func: Optional pre-exec function.
    :type preexec_func: ``callable``

    :param kill_func: Optional function which will be called on timeout to kill the process.
                      If not provided, it defaults to `process.kill`
    :type kill_func: ``callable``

    :param read_stdout_func: Function which is responsible for reading process stdout when
                                 using live read mode.
    :type read_stdout_func: ``func``

    :param read_stdout_func: Function which is responsible for reading process stderr when
                                 using live read mode.
    :type read_stdout_func: ``func``


    :rtype: ``tuple`` (exit_code, stdout, stderr, timed_out)
    """
    LOG.debug('Entering st2common.util.green.run_command.')

    assert isinstance(cmd, (list, tuple) + six.string_types)

    if (read_stdout_func and not read_stderr_func) or (read_stderr_func and not read_stdout_func):
        raise ValueError('Both read_stdout_func and read_stderr_func arguments need '
                         'to be provided.')

    if read_stdout_func and not (read_stdout_buffer or read_stderr_buffer):
        raise ValueError('read_stdout_buffer and read_stderr_buffer arguments need to be provided '
                         'when read_stdout_func is provided')

    if not env:
        LOG.debug('env argument not provided. using process env (os.environ).')
        env = os.environ.copy()

    # Note: We are using eventlet friendly implementation of subprocess
    # which uses GreenPipe so it doesn't block
    LOG.debug('Creating subprocess.')
    process = subprocess.Popen(args=cmd, stdin=stdin, stdout=stdout, stderr=stderr,
                               env=env, cwd=cwd, shell=shell, preexec_fn=preexec_func)

    if read_stdout_func:
        LOG.debug('Spawning read_stdout_func function')
        read_stdout_thread = eventlet.spawn(read_stdout_func, process.stdout, read_stdout_buffer)

    if read_stderr_func:
        LOG.debug('Spawning read_stderr_func function')
        read_stderr_thread = eventlet.spawn(read_stderr_func, process.stderr, read_stderr_buffer)

    def on_timeout_expired(timeout):
        global timed_out

        try:
            LOG.debug('Starting process wait inside timeout handler.')
            process.wait(timeout=timeout)
        except subprocess.TimeoutExpired:
            # Command has timed out, kill the process and propagate the error.
            # Note: We explicitly set the returncode to indicate the timeout.
            LOG.debug('Command execution timeout reached.')
            process.returncode = TIMEOUT_EXIT_CODE

            if kill_func:
                LOG.debug('Calling kill_func.')
                kill_func(process=process)
            else:
                LOG.debug('Killing process.')
                process.kill()

            if read_stdout_func and read_stderr_func:
                LOG.debug('Killing read_stdout_thread and read_stderr_thread')
                read_stdout_thread.kill()
                read_stderr_thread.kill()

    LOG.debug('Spawning timeout handler thread.')
    timeout_thread = eventlet.spawn(on_timeout_expired, timeout)
    LOG.debug('Attaching to process.')

    if read_stdout_func and read_stderr_func:
        LOG.debug('Using real-time stdout and stderr read mode, calling process.wait()')
        process.wait()
    else:
        LOG.debug('Using delayed stdout and stderr read mode, calling process.communicate()')
        stdout, stderr = process.communicate()

    timeout_thread.cancel()
    exit_code = process.returncode

    if read_stdout_func and read_stderr_func:
        # Wait on those green threads to finish reading from stdout and stderr before continuing
        read_stdout_thread.wait()
        read_stderr_thread.wait()

        stdout = read_stdout_buffer.getvalue()
        stderr = read_stderr_buffer.getvalue()

    if exit_code == TIMEOUT_EXIT_CODE:
        LOG.debug('Timeout.')
        timed_out = True
    else:
        LOG.debug('No timeout.')
        timed_out = False

    LOG.debug('Returning.')
    return (exit_code, stdout, stderr, timed_out)

分析:
最關鍵的部分
    process = subprocess.Popen(args=cmd, stdin=stdin, stdout=stdout, stderr=stderr,
                               env=env, cwd=cwd, shell=shell, preexec_fn=preexec_func)

    if read_stdout_func:
        LOG.debug('Spawning read_stdout_func function')
        read_stdout_thread = eventlet.spawn(read_stdout_func, process.stdout, read_stdout_buffer)

    if read_stderr_func:
        LOG.debug('Spawning read_stderr_func function')
        read_stderr_thread = eventlet.spawn(read_stderr_func, process.stderr, read_stderr_buffer)

這裏用了eventlet.green的subprocess,具體如下:
from eventlet.green import subprocess
用eventlet.spawn來開啓協程一旦執行完成,就將結果寫入到actionexecution結果中。
用協程判斷是否超時
最終返回結果:
return (exit_code, stdout, stderr, timed_out)

因爲沒有特殊設置eventlet的相關參數
eventlet.green.subprocess
參考文檔:
https://kite.com/python/docs/eventlet.green.subprocess

中途還調用了:
st2/st2common/st2common/services/action.py

def store_execution_output_data(execution_db, action_db, data, output_type='output',
                                timestamp=None):
    """
    Store output from an execution as a new document in the collection.
    """
    execution_id = str(execution_db.id)
    action_ref = action_db.ref
    runner_ref = getattr(action_db, 'runner_type', {}).get('name', 'unknown')
    timestamp = timestamp or date_utils.get_datetime_utc_now()

    output_db = ActionExecutionOutputDB(execution_id=execution_id,
                                        action_ref=action_ref,
                                        runner_ref=runner_ref,
                                        timestamp=timestamp,
                                        output_type=output_type,
                                        data=data)
    output_db = ActionExecutionOutput.add_or_update(output_db, publish=True,
                                                    dispatch_trigger=False)

    return output_db

參考:
https://blog.csdn.net/qingyuanluofeng/java/article/details/105398730

總結:
1)調度action執行的服務是消息消費者,對應3副本,在該消費者中針對定時任務調用
python runner去執行定時任務的python腳本,執行定時任務的python腳本採用了
協程去操作,具體就是:
from eventlet.green import subprocess
process = subprocess.Popen(args=cmd, stdin=stdin, stdout=stdout, stderr=stderr,
                               env=env, cwd=cwd, shell=shell, preexec_fn=preexec_func)
read_stdout_thread = eventlet.spawn(read_stdout_func, process.stdout, read_stdout_buffer)
read_stderr_thread = eventlet.spawn(read_stderr_func, process.stderr, read_stderr_buffer)

3 看消費者本身
3.1 分析ActionExecutionDispatcher
對應代碼st2/st2actions/st2actions/worker.py
class ActionExecutionDispatcher(MessageHandler):

    message_type = LiveActionDB

    def __init__(self, connection, queues):
        super(ActionExecutionDispatcher, self).__init__(connection, queues)
        self.container = RunnerContainer()
        self._running_liveactions = set()

    def get_queue_consumer(self, connection, queues):
        # We want to use a special ActionsQueueConsumer which uses 2 dispatcher pools
        return ActionsQueueConsumer(connection=connection, queues=queues, handler=self)

    def process(self, liveaction):
      ......

分析:
關鍵就是消費者ActionsQueueConsumer(connection=connection, queues=queues, handler=self)
看其是否支持併發處理

3.2 分析ActionsQueueConsumer
st2/st2common/st2common/transport/consumers.py
代碼如下:
class ActionsQueueConsumer(QueueConsumer):
    """
    Special Queue Consumer for action runner which uses multiple BufferedDispatcher pools:

    1. For regular (non-workflow) actions
    2. One for workflow actions

    This way we can ensure workflow actions never block non-workflow actions.
    """

    def __init__(self, connection, queues, handler):
        self.connection = connection

        self._queues = queues
        self._handler = handler

        workflows_pool_size = cfg.CONF.actionrunner.workflows_pool_size
        actions_pool_size = cfg.CONF.actionrunner.actions_pool_size
        self._workflows_dispatcher = BufferedDispatcher(dispatch_pool_size=workflows_pool_size,
                                                        name='workflows-dispatcher')
        self._actions_dispatcher = BufferedDispatcher(dispatch_pool_size=actions_pool_size,
                                                      name='actions-dispatcher')

    def process(self, body, message):
        try:
            if not isinstance(body, self._handler.message_type):
                raise TypeError('Received an unexpected type "%s" for payload.' % type(body))

            action_is_workflow = getattr(body, 'action_is_workflow', False)
            if action_is_workflow:
                # Use workflow dispatcher queue
                dispatcher = self._workflows_dispatcher
            else:
                # Use queue for regular or workflow actions
                dispatcher = self._actions_dispatcher

            LOG.debug('Using BufferedDispatcher pool: "%s"', str(dispatcher))
            dispatcher.dispatch(self._process_message, body)
        except:
            LOG.exception('%s failed to process message: %s', self.__class__.__name__, body)
        finally:
            # At this point we will always ack a message.
            message.ack()

    def shutdown(self):
        self._workflows_dispatcher.shutdown()
        self._actions_dispatcher.shutdown()

分析:
self._actions_dispatcher = BufferedDispatcher(dispatch_pool_size=actions_pool_size, name='actions-dispatcher')
協程池的大小爲:
workflows_pool_size = cfg.CONF.actionrunner.workflows_pool_size
actions_pool_size = cfg.CONF.actionrunner.actions_pool_size

找到:
st2/st2common/st2common/config.py
    dispatcher_pool_opts = [
        cfg.IntOpt('workflows_pool_size', default=40,
                   help='Internal pool size for dispatcher used by workflow actions.'),
        cfg.IntOpt('actions_pool_size', default=60,
                   help='Internal pool size for dispatcher used by regular actions.')
    ]
    do_register_opts(dispatcher_pool_opts, group='actionrunner')

所以workflow的協程池大小默認爲40,action的協程池大小默認爲60
查看/etc/st2/st2.conf文件內容:
[actionrunner]
logging = /etc/st2/logging.actionrunner.conf
[api]
# allow_origin is required for handling CORS in st2 web UI.
allow_origin = *
# Host and port to bind the API server.
host = 127.0.0.1
logging = /etc/st2/logging.api.conf
mask_secrets = true
port = 9101
[auth]
# Base URL to the API endpoint excluding the version (e.g. http://myhost.net:9101/)
api_url = http://st2api:9101
mode = standalone
# Note: Settings below are only used in "standalone" mode
# backend: flat_file
# backend_kwargs: '{"file_path": "/etc/st2/htpasswd"}'
backend = keystone
backend_kwargs = {"keystone_url": "http://keystone-api.openstack.svc.cluster.local:80", "keystone_version": 3, "keystone_mode": "email"}
debug = false
enable = true
host = 127.0.0.1
logging = /etc/st2/logging.auth.conf
port = 9100
use_ssl = false
[content]
packs_base_paths = /opt/stackstorm/packs.dev
[coordination]
url = redis://[email protected]:6379
[database]
host = mongodb.openstack.svc.cluster.local
password = dozer
port = 27017
username = dozer
[exporter]
logging = /etc/st2/logging.exporter.conf
[garbagecollector]
action_executions_output_ttl = 14
action_executions_ttl = 14
logging = /etc/st2/logging.garbagecollector.conf
purge_inquiries = true
trigger_instances_ttl = 14
[keyvalue]
encryption_key_path = /etc/st2/keys/datastore_key.json
[log]
excludes = requests,paramiko
mask_secrets = true
redirect_stderr = false
[messaging]
url = amqp://rabbitmq:[email protected]:5672
[mistral]
api_url = http://st2api:9101
v2_base_url = http://mistral-api:8989/v2
[notifier]
logging = /etc/st2/logging.notifier.conf
[rbac]
enable = false
permission_isolation = true
sync_remote_groups = true
[resultstracker]
logging = /etc/st2/logging.resultstracker.conf
[rulesengine]
logging = /etc/st2/logging.rulesengine.conf
[sensorcontainer]
logging = /etc/st2/logging.sensorcontainer.conf
[ssh_runner]
remote_dir = /tmp
[stream]
logging = /etc/st2/logging.stream.conf
[syslog]
facility = local7
host = 127.0.0.1
port = 514
protocol = udp
[system]
base_path = /opt/stackstorm
[system_user]
ssh_key_file = /home/adminATexample.org/.ssh/admin_rsa
user = [email protected]

關鍵就是:
self._actions_dispatcher = BufferedDispatcher(dispatch_pool_size=actions_pool_size, name='actions-dispatcher')


3.3 分析BufferedDispatcher
調用:
st2/st2common/st2common/util/greenpooldispatch.py的

class BufferedDispatcher(object):

    def __init__(self, dispatch_pool_size=50, monitor_thread_empty_q_sleep_time=5,
                 monitor_thread_no_workers_sleep_time=1, name=None):
        self._pool_limit = dispatch_pool_size
        self._dispatcher_pool = eventlet.GreenPool(dispatch_pool_size)
        self._dispatch_monitor_thread = eventlet.greenthread.spawn(self._flush)
        self._monitor_thread_empty_q_sleep_time = monitor_thread_empty_q_sleep_time
        self._monitor_thread_no_workers_sleep_time = monitor_thread_no_workers_sleep_time
        self._name = name

        self._work_buffer = Queue.Queue()

        # Internal attributes we use to track how long the pool is busy without any free workers
        self._pool_last_free_ts = time.time()

    @property
    def name(self):
        return self._name or id(self)

    def dispatch(self, handler, *args):
        self._work_buffer.put((handler, args), block=True, timeout=1)
        self._flush_now()

    def shutdown(self):
        self._dispatch_monitor_thread.kill()

    def _flush(self):
        while True:
            while self._work_buffer.empty():
                eventlet.greenthread.sleep(self._monitor_thread_empty_q_sleep_time)
            while self._dispatcher_pool.free() <= 0:
                eventlet.greenthread.sleep(self._monitor_thread_no_workers_sleep_time)
            self._flush_now()

    def _flush_now(self):
        if self._dispatcher_pool.free() <= 0:
            now = time.time()

            if (now - self._pool_last_free_ts) >= POOL_BUSY_THRESHOLD_SECONDS:
                LOG.info(POOL_BUSY_LOG_MESSAGE % (self.name, POOL_BUSY_THRESHOLD_SECONDS))

            return

        # Update the time of when there were free threads available
        self._pool_last_free_ts = time.time()

        while not self._work_buffer.empty() and self._dispatcher_pool.free() > 0:
            (handler, args) = self._work_buffer.get_nowait()
            self._dispatcher_pool.spawn(handler, *args)

分析:
1)self._dispatcher_pool = eventlet.GreenPool(dispatch_pool_size)

2)def dispatch(self, handler, *args):
向任務隊列中壓入待執行的方法和該方法所需的參數,
只要任務隊列和協程池不空,就從任務隊列中取出當前待處理任務及其參數,放入協程池中處理。

3)_flush_now(self):
只要任務隊列和協程池不空,就從任務隊列中取出當前待處理任務及其參數,放入協程池中處理。

4)最關鍵的部分就是在協程池中處理消息:
self._dispatcher_pool.spawn(handler, *args)


4 總結
1) st2actionrunner默認支持workflow的協程池大小默認爲40,支持action的協程池大小默認爲60
如果是3個actionrunner服務,則支持120個workflow併發處理,180個action併發處理。
2) st2actionrunner中主入口類ActionExecutionDispatcher在其消費者
ActionsQueueConsumer中預先實例化了兩個BufferedDispatcher,一個用於處理workflow作爲runner的BufferedDispatcher,
一個用於處理python作爲runner的BufferedDispatcher。根據待處理消息判斷其是哪一種放入對應BufferedDispatcher。
BufferedDispatcher其內部是一個隊列+協程池。
將待處理消息和處理方法放入BufferedDispatcher的隊列中,然後從隊列中獲取待處理消息和處理方法,然後使用協程池去處理消息。
所以st2actionrunner本質上是併發的。


參考:
stackstorm 2.6代碼

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章