Flume原理初探:基本執行原理概述

Flume

Flume是一款在數據收集領域使用較多的一個apache的開源工具,是一款分佈式、可靠和高可用的系統,能夠高效的從不同的源中收集、聚合上傳大量的日誌數據到結構化的存儲模塊中,Flume的使用不僅限於日誌數據聚合, 由於數據源是可定製的,因此Flume可用於傳輸大量事件數據,包括但不限於網絡流量數據,社交媒體生成的數據,電子郵件消息以及幾乎所有可能的數據源。在實際的應用場景中Flume再大數據領域應用較爲廣泛,本文主要是簡單的概述一下Flume的基本的架構流程,本通過官網的配置文件來大致分析一下Flume的工作流程。

Flume架構

Flume在使用方面比較簡單易用,主要通過配置文件來進行上傳的數據源的配置、上傳到哪裏去,並且通過配置不同的通道策略來實現在不同場景下所要求的上傳的數據的安全性的問題。這三個方面其實就是對應到Flume中的Source、Sink和Channel三個概念。首先可以查看一下架構圖。

在這裏插入圖片描述

首先需要定義需要上傳的數據源是什麼,監控的數據源可以是文件也可以是監聽的一個端口來接受數據,然後通過Flume中定義的Source來抓取對應的數據,獲取數據之後就可以通過定義好的Channel來進行數據的轉發,一個Source可以往多個Channel中發送這樣可以通過對不同Channel將數據發送到不同的遠端,並且可以再Channel中定義轉發數據的策略,可選擇將數據保存在內存中也可以選擇將獲取的數據保存在文件中,最後達到一定的閾值之後將數據發送到Sink中,Sink中就對應時發送到HDFS還是HABSE或者其他的消費者中,這樣就達到了將數據進行消費發送的目的。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = node1
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

這個配置就是Flume提供的例子,定義了a1的agent信息,a1的sources爲r1,即netcat命令監聽在44444端口,獲取這個端口中傳入的數據,a1的sink就是logger即日誌的控制檯輸出,定義的channels就是內存類型,然後將該管道依次綁定到r1和k1上面。

Flume示例流程分析

初始化與配置解析流程

大家可通過官網提供的啓動命令啓動該配置文件,通過查看bin/flume-ng文件可知啓動的入口類爲flume-ng-node/src/main/java/org/apache/flume/node/Application.java類;

  public static void main(String[] args) {

    try {
      SSLUtil.initGlobalSSLParameters();

      Options options = new Options(); 
 
      Option option = new Option("n", "name", true, "the name of this agent");    // 通過命令行啓動設置ageent name
      option.setRequired(true);                                                   // 設置爲必須輸入
      options.addOption(option);                                                  // 添加到命令行輸入解析選項中

      option = new Option("f", "conf-file", true,
          "specify a config file (required if -z missing)");                      // 獲取配置文件
      option.setRequired(false);
      options.addOption(option);

      option = new Option(null, "no-reload-conf", false,
          "do not reload config file if changed");
      options.addOption(option);                                                  // 是否重新加載配置

      // Options for Zookeeper
      option = new Option("z", "zkConnString", true,
          "specify the ZooKeeper connection to use (required if -f missing)");    // 是否連接zk
      option.setRequired(false);
      options.addOption(option);

      option = new Option("p", "zkBasePath", true,
          "specify the base path in ZooKeeper for agent configs");                // 配置的zk路徑
      option.setRequired(false);
      options.addOption(option);

      option = new Option("h", "help", false, "display help text");               // 展示幫助信息
      options.addOption(option);

      CommandLineParser parser = new GnuParser();
      CommandLine commandLine = parser.parse(options, args);                      // 解析命令行傳入的數據

      if (commandLine.hasOption('h')) {
        new HelpFormatter().printHelp("flume-ng agent", options, true);
        return;
      }

      String agentName = commandLine.getOptionValue('n');                         // 獲取agnent名稱
      boolean reload = !commandLine.hasOption("no-reload-conf");                  // 是否在修改完配置文件後自動加載

      boolean isZkConfigured = false;
      if (commandLine.hasOption('z') || commandLine.hasOption("zkConnString")) {
        isZkConfigured = true;
      }

      Application application;
      if (isZkConfigured) {
        // get options
        String zkConnectionStr = commandLine.getOptionValue('z');
        String baseZkPath = commandLine.getOptionValue('p');

        if (reload) {
          EventBus eventBus = new EventBus(agentName + "-event-bus");
          List<LifecycleAware> components = Lists.newArrayList();
          PollingZooKeeperConfigurationProvider zookeeperConfigurationProvider =
              new PollingZooKeeperConfigurationProvider(
                  agentName, zkConnectionStr, baseZkPath, eventBus);
          components.add(zookeeperConfigurationProvider);
          application = new Application(components);
          eventBus.register(application);
        } else {
          StaticZooKeeperConfigurationProvider zookeeperConfigurationProvider =
              new StaticZooKeeperConfigurationProvider(
                  agentName, zkConnectionStr, baseZkPath);
          application = new Application();
          application.handleConfigurationEvent(zookeeperConfigurationProvider.getConfiguration());
        }
      } else {
        File configurationFile = new File(commandLine.getOptionValue('f'));         // 獲取配置文件

        /*
         * The following is to ensure that by default the agent will fail on
         * startup if the file does not exist.
         */
        if (!configurationFile.exists()) {                                          // 檢查配置文件是否存在
          // If command line invocation, then need to fail fast
          if (System.getProperty(Constants.SYSPROP_CALLED_FROM_SERVICE) ==
              null) {
            String path = configurationFile.getPath();                              // 獲取文件路徑
            try {
              path = configurationFile.getCanonicalPath();
            } catch (IOException ex) {
              logger.error("Failed to read canonical path for file: " + path,
                  ex);
            }
            throw new ParseException(
                "The specified configuration file does not exist: " + path);
          }
        }
        List<LifecycleAware> components = Lists.newArrayList();                   // 生成保存所有組件的數組

        if (reload) {
          EventBus eventBus = new EventBus(agentName + "-event-bus");             // 是否監控配置文件自動重載
          PollingPropertiesFileConfigurationProvider configurationProvider =
              new PollingPropertiesFileConfigurationProvider(
                  agentName, configurationFile, eventBus, 30);                    // 每三十秒檢查一下文件是否修改
          components.add(configurationProvider);
          application = new Application(components);
          eventBus.register(application);                                         // 如果重載則重新啓動組件
        } else {
          PropertiesFileConfigurationProvider configurationProvider =
              new PropertiesFileConfigurationProvider(agentName, configurationFile);    // 解析配置文件
          application = new Application();
          application.handleConfigurationEvent(configurationProvider.getConfiguration());   // 加載配置相關信息
        }
      }
      application.start();                                                              // 開始執行

      final Application appReference = application;
      Runtime.getRuntime().addShutdownHook(new Thread("agent-shutdown-hook") {          // 主要是在退出時 停止所有組件
        @Override
        public void run() {
          appReference.stop();
        }
      });

    } catch (Exception e) {
      logger.error("A fatal error occurred while running. Exception follows.", e);
    }
  }

主要就是完成了一些配置的加載的工作,加載完成之後再來進行啓動運行,在此我們查看PropertiesFileConfigurationProvider相關的配置的加載流程,主要通過getConfiguration來進行。

public class PropertiesFileConfigurationProvider extends
    AbstractConfigurationProvider {

  private static final Logger LOGGER = LoggerFactory
      .getLogger(PropertiesFileConfigurationProvider.class);
  private static final String DEFAULT_PROPERTIES_IMPLEMENTATION = "java.util.Properties";

  private final File file;

  public PropertiesFileConfigurationProvider(String agentName, File file) {
    super(agentName);
    this.file = file;
  }

  @Override
  public FlumeConfiguration getFlumeConfiguration() {
    BufferedReader reader = null;
    try {
      reader = new BufferedReader(new FileReader(file));          // 獲取配置文件
      String resolverClassName = System.getProperty("propertiesImplementation",
          DEFAULT_PROPERTIES_IMPLEMENTATION);
      Class<? extends Properties> propsclass = Class.forName(resolverClassName)
          .asSubclass(Properties.class);                          // 獲取解析配置文件的類
      Properties properties = propsclass.newInstance();           // 生成屬性類
      properties.load(reader);                                    // 加載解析的屬性
      return new FlumeConfiguration(toMap(properties));           // 實例化換一個配置類
    } catch (IOException ex) {
      LOGGER.error("Unable to load file:" + file
          + " (I/O failure) - Exception follows.", ex);
    } catch (ClassNotFoundException e) {
      LOGGER.error("Configuration resolver class not found", e);
    } catch (InstantiationException e) {
      LOGGER.error("Instantiation exception", e);
    } catch (IllegalAccessException e) {
      LOGGER.error("Illegal access exception", e);
    } finally {
      if (reader != null) {
        try {
          reader.close();
        } catch (IOException ex) {
          LOGGER.warn(
              "Unable to close file reader for file: " + file, ex);
        }
      }
    }
    return new FlumeConfiguration(new HashMap<String, String>());     // 如果解析出錯則返回一個空的實例
  }
}

由於該類繼承自AbstractConfigurationProvider,所以調用的是AbstractConfigurationProvider類的getConfiguration()方法。

  public AbstractConfigurationProvider(String agentName) {
    super();
    this.agentName = agentName;                                 // 保存agent名稱
    this.sourceFactory = new DefaultSourceFactory();            // source的工廠類
    this.sinkFactory = new DefaultSinkFactory();                // sink的工廠類
    this.channelFactory = new DefaultChannelFactory();          // channel的工廠類

    channelCache = new HashMap<Class<? extends Channel>, Map<String, Channel>>();
  }

  protected abstract FlumeConfiguration getFlumeConfiguration();

  public MaterializedConfiguration getConfiguration() {
    MaterializedConfiguration conf = new SimpleMaterializedConfiguration();         // 生成一個配置實例
    FlumeConfiguration fconfig = getFlumeConfiguration();                           // 獲取配置信息,從配置文件中獲取
    AgentConfiguration agentConf = fconfig.getConfigurationFor(getAgentName());     // 獲取對應agent的配置信息
    if (agentConf != null) {
      Map<String, ChannelComponent> channelComponentMap = Maps.newHashMap();        // 生成channel的字典
      Map<String, SourceRunner> sourceRunnerMap = Maps.newHashMap();                // source的字典
      Map<String, SinkRunner> sinkRunnerMap = Maps.newHashMap();                    // sink的字典
      try {
        loadChannels(agentConf, channelComponentMap);                               // 加載配置文件中配置的channel類
        loadSources(agentConf, channelComponentMap, sourceRunnerMap);               // 加載配置中的source和channel的字典  完成映射關係
        loadSinks(agentConf, channelComponentMap, sinkRunnerMap);                   // 加載channel到sink兩個通道的字典  完成映射關係
        Set<String> channelNames = new HashSet<String>(channelComponentMap.keySet());
        for (String channelName : channelNames) {
          ChannelComponent channelComponent = channelComponentMap.get(channelName);   // 遍歷獲取的channel
          if (channelComponent.components.isEmpty()) {
            LOGGER.warn(String.format("Channel %s has no components connected" +
                " and has been removed.", channelName));
            channelComponentMap.remove(channelName);
            Map<String, Channel> nameChannelMap =
                channelCache.get(channelComponent.channel.getClass());
            if (nameChannelMap != null) {
              nameChannelMap.remove(channelName);
            }
          } else {
            LOGGER.info(String.format("Channel %s connected to %s",
                channelName, channelComponent.components.toString()));
            conf.addChannel(channelName, channelComponent.channel);                   // 添加到配置中
          }
        }
        for (Map.Entry<String, SourceRunner> entry : sourceRunnerMap.entrySet()) {    // 遍歷所有的source 並添加到配置中
          conf.addSourceRunner(entry.getKey(), entry.getValue());
        }
        for (Map.Entry<String, SinkRunner> entry : sinkRunnerMap.entrySet()) {        // 遍歷所有的sink並添加到配置中
          conf.addSinkRunner(entry.getKey(), entry.getValue());
        }
      } catch (InstantiationException ex) {
        LOGGER.error("Failed to instantiate component", ex);
      } finally {
        channelComponentMap.clear();
        sourceRunnerMap.clear();
        sinkRunnerMap.clear();
      }
    } else {
      LOGGER.warn("No configuration found for this host:{}", getAgentName());
    }
    return conf;
  }

  public String getAgentName() {
    return agentName;
  }

  private void loadChannels(AgentConfiguration agentConf,
      Map<String, ChannelComponent> channelComponentMap)
          throws InstantiationException {
    LOGGER.info("Creating channels");

    /*
     * Some channels will be reused across re-configurations. To handle this,
     * we store all the names of current channels, perform the reconfiguration,
     * and then if a channel was not used, we delete our reference to it.
     * This supports the scenario where you enable channel "ch0" then remove it
     * and add it back. Without this, channels like memory channel would cause
     * the first instances data to show up in the seconds.
     */
    ListMultimap<Class<? extends Channel>, String> channelsNotReused =
        ArrayListMultimap.create();
    // assume all channels will not be re-used
    for (Map.Entry<Class<? extends Channel>, Map<String, Channel>> entry :
         channelCache.entrySet()) {                                 
      Class<? extends Channel> channelKlass = entry.getKey();
      Set<String> channelNames = entry.getValue().keySet();
      channelsNotReused.get(channelKlass).addAll(channelNames);
    }

    Set<String> channelNames = agentConf.getChannelSet();
    Map<String, ComponentConfiguration> compMap = agentConf.getChannelConfigMap();   // 獲取配置好的channel
    /*
     * Components which have a ComponentConfiguration object
     */
    for (String chName : channelNames) {
      ComponentConfiguration comp = compMap.get(chName);                // 獲取名稱
      if (comp != null) {
        Channel channel = getOrCreateChannel(channelsNotReused,
            comp.getComponentName(), comp.getType());                   // 如果該類沒有則創建該channel實例
        try {
          Configurables.configure(channel, comp);                       // 初始化該實例配置
          channelComponentMap.put(comp.getComponentName(),
              new ChannelComponent(channel));                           // 生成一個該channel的實例
          LOGGER.info("Created channel " + chName);
        } catch (Exception e) {
          String msg = String.format("Channel %s has been removed due to an " +
              "error during configuration", chName);
          LOGGER.error(msg, e);
        }
      }
    }
    /*
     * Components which DO NOT have a ComponentConfiguration object
     * and use only Context
     */
    for (String chName : channelNames) {
      Context context = agentConf.getChannelContext().get(chName);
      if (context != null) {
        Channel channel = getOrCreateChannel(channelsNotReused, chName,
            context.getString(BasicConfigurationConstants.CONFIG_TYPE));
        try {
          Configurables.configure(channel, context);
          channelComponentMap.put(chName, new ChannelComponent(channel));
          LOGGER.info("Created channel " + chName);
        } catch (Exception e) {
          String msg = String.format("Channel %s has been removed due to an " +
              "error during configuration", chName);
          LOGGER.error(msg, e);
        }
      }
    }
    /*
     * Any channel which was not re-used, will have it's reference removed
     */
    for (Class<? extends Channel> channelKlass : channelsNotReused.keySet()) {    // 移除未使用到的channel
      Map<String, Channel> channelMap = channelCache.get(channelKlass);
      if (channelMap != null) {
        for (String channelName : channelsNotReused.get(channelKlass)) {
          if (channelMap.remove(channelName) != null) {
            LOGGER.info("Removed {} of type {}", channelName, channelKlass);
          }
        }
        if (channelMap.isEmpty()) {
          channelCache.remove(channelKlass);
        }
      }
    }
  }

  private Channel getOrCreateChannel(
      ListMultimap<Class<? extends Channel>, String> channelsNotReused,
      String name, String type)
      throws FlumeException {

    Class<? extends Channel> channelClass = channelFactory.getClass(type);      // 通過工廠類來獲取指定類型的channel類
    /*
     * Channel has requested a new instance on each re-configuration
     */
    if (channelClass.isAnnotationPresent(Disposable.class)) {
      Channel channel = channelFactory.create(name, type);
      channel.setName(name);
      return channel;
    }
    Map<String, Channel> channelMap = channelCache.get(channelClass);     // 檢查是否緩存該類
    if (channelMap == null) {
      channelMap = new HashMap<String, Channel>();
      channelCache.put(channelClass, channelMap);
    }
    Channel channel = channelMap.get(name);
    if (channel == null) {
      channel = channelFactory.create(name, type);                        // 創建一個該類實例並保存
      channel.setName(name); 
      channelMap.put(name, channel);
    }
    channelsNotReused.get(channelClass).remove(name);
    return channel;
  }

  private void loadSources(AgentConfiguration agentConf,
      Map<String, ChannelComponent> channelComponentMap,
      Map<String, SourceRunner> sourceRunnerMap)
      throws InstantiationException {

    Set<String> sourceNames = agentConf.getSourceSet();                   // 獲取source配置的信息
    Map<String, ComponentConfiguration> compMap =
        agentConf.getSourceConfigMap();
    /*
     * Components which have a ComponentConfiguration object
     */
    for (String sourceName : sourceNames) {
      ComponentConfiguration comp = compMap.get(sourceName);
      if (comp != null) {
        SourceConfiguration config = (SourceConfiguration) comp;

        Source source = sourceFactory.create(comp.getComponentName(),
            comp.getType());                                              // 通過工廠類生成一個source
        try {
          Configurables.configure(source, config); 
          Set<String> channelNames = config.getChannels();                // 獲取channels 
          List<Channel> sourceChannels =
                  getSourceChannels(channelComponentMap, source, channelNames);     // 查找對應的source下面所有的sourceChannels 列表
          if (sourceChannels.isEmpty()) {
            String msg = String.format("Source %s is not connected to a " +
                "channel",  sourceName);
            throw new IllegalStateException(msg);
          }
          ChannelSelectorConfiguration selectorConfig =
              config.getSelectorConfiguration();                        // 獲取配置的驅動事件

          ChannelSelector selector = ChannelSelectorFactory.create(
              sourceChannels, selectorConfig);                          // 生成該channel的selector

          ChannelProcessor channelProcessor = new ChannelProcessor(selector);   // 實例化該ChannelProcessor
          Configurables.configure(channelProcessor, config);                    // 配置該channelProcessor

          source.setChannelProcessor(channelProcessor);                         // 設置source的通道處理實例
          sourceRunnerMap.put(comp.getComponentName(),
              SourceRunner.forSource(source));                                  // 保存字典中
          for (Channel channel : sourceChannels) {
            ChannelComponent channelComponent =
                Preconditions.checkNotNull(channelComponentMap.get(channel.getName()),
                                           String.format("Channel %s", channel.getName()));
            channelComponent.components.add(sourceName);
          }
        } catch (Exception e) {
          String msg = String.format("Source %s has been removed due to an " +
              "error during configuration", sourceName);
          LOGGER.error(msg, e);
        }
      }
    }
    /*
     * Components which DO NOT have a ComponentConfiguration object
     * and use only Context
     */
    Map<String, Context> sourceContexts = agentConf.getSourceContext();           // 獲取上下文
    for (String sourceName : sourceNames) {
      Context context = sourceContexts.get(sourceName);
      if (context != null) {
        Source source =
            sourceFactory.create(sourceName,
                                 context.getString(BasicConfigurationConstants.CONFIG_TYPE));   // 創建配置的source
        try {
          Configurables.configure(source, context);
          String[] channelNames = context.getString(
              BasicConfigurationConstants.CONFIG_CHANNELS).split("\\s+");
          List<Channel> sourceChannels =
                  getSourceChannels(channelComponentMap, source, Arrays.asList(channelNames));    // 獲取channel列表
          if (sourceChannels.isEmpty()) {
            String msg = String.format("Source %s is not connected to a " +
                "channel",  sourceName);
            throw new IllegalStateException(msg);
          }
          Map<String, String> selectorConfig = context.getSubProperties(
              BasicConfigurationConstants.CONFIG_SOURCE_CHANNELSELECTOR_PREFIX);

          ChannelSelector selector = ChannelSelectorFactory.create(
              sourceChannels, selectorConfig);                                      // 生成一個channelSelector

          ChannelProcessor channelProcessor = new ChannelProcessor(selector);
          Configurables.configure(channelProcessor, context);                     // 初始化並添加到字典中
          source.setChannelProcessor(channelProcessor);
          sourceRunnerMap.put(sourceName,
              SourceRunner.forSource(source));
          for (Channel channel : sourceChannels) {
            ChannelComponent channelComponent =
                Preconditions.checkNotNull(channelComponentMap.get(channel.getName()),
                                           String.format("Channel %s", channel.getName()));
            channelComponent.components.add(sourceName);
          }
        } catch (Exception e) {
          String msg = String.format("Source %s has been removed due to an " +
              "error during configuration", sourceName);
          LOGGER.error(msg, e);
        }
      }
    }
  }

  private List<Channel> getSourceChannels(Map<String, ChannelComponent> channelComponentMap,
                  Source source, Collection<String> channelNames) throws InstantiationException {
    List<Channel> sourceChannels = new ArrayList<Channel>();                
    for (String chName : channelNames) {                                        // 遍歷所有的channel
      ChannelComponent channelComponent = channelComponentMap.get(chName);      // 獲取加載完成的channel
      if (channelComponent != null) {
        checkSourceChannelCompatibility(source, channelComponent.channel);
        sourceChannels.add(channelComponent.channel);                           // 添加該channel
      }
    }
    return sourceChannels;
  }

  private void checkSourceChannelCompatibility(Source source, Channel channel)
      throws InstantiationException {
    if (source instanceof BatchSizeSupported && channel instanceof TransactionCapacitySupported) {
      long transCap = ((TransactionCapacitySupported) channel).getTransactionCapacity();
      long batchSize = ((BatchSizeSupported) source).getBatchSize();
      if (transCap < batchSize) {
        String msg = String.format(
            "Incompatible source and channel settings defined. " +
                "source's batch size is greater than the channels transaction capacity. " +
                "Source: %s, batch size = %d, channel %s, transaction capacity = %d",
            source.getName(), batchSize,
            channel.getName(), transCap);
        throw new InstantiationException(msg);
      }
    }
  }

  private void checkSinkChannelCompatibility(Sink sink, Channel channel)
      throws InstantiationException {
    if (sink instanceof BatchSizeSupported && channel instanceof TransactionCapacitySupported) {
      long transCap = ((TransactionCapacitySupported) channel).getTransactionCapacity();
      long batchSize = ((BatchSizeSupported) sink).getBatchSize();
      if (transCap < batchSize) {
        String msg = String.format(
            "Incompatible sink and channel settings defined. " +
                "sink's batch size is greater than the channels transaction capacity. " +
                "Sink: %s, batch size = %d, channel %s, transaction capacity = %d",
            sink.getName(), batchSize,
            channel.getName(), transCap);
        throw new InstantiationException(msg);
      }
    }
  }

  private void loadSinks(AgentConfiguration agentConf,
      Map<String, ChannelComponent> channelComponentMap, Map<String, SinkRunner> sinkRunnerMap)
      throws InstantiationException {
    Set<String> sinkNames = agentConf.getSinkSet();                     // 獲取sink的配置信息
    Map<String, ComponentConfiguration> compMap =
        agentConf.getSinkConfigMap();
    Map<String, Sink> sinks = new HashMap<String, Sink>();
    /*
     * Components which have a ComponentConfiguration object
     */
    for (String sinkName : sinkNames) {
      ComponentConfiguration comp = compMap.get(sinkName);              // 獲取對應的sick配置
      if (comp != null) {
        SinkConfiguration config = (SinkConfiguration) comp;
        Sink sink = sinkFactory.create(comp.getComponentName(), comp.getType());    // 通過該工程方法生成一個類實例
        try { 
          Configurables.configure(sink, config);                                    // 添加配置
          ChannelComponent channelComponent = channelComponentMap.get(config.getChannel());  // 獲取channel
          if (channelComponent == null) {
            String msg = String.format("Sink %s is not connected to a " +
                "channel",  sinkName);
            throw new IllegalStateException(msg);
          }
          checkSinkChannelCompatibility(sink, channelComponent.channel);
          sink.setChannel(channelComponent.channel);                                // 設置該sink的channel
          sinks.put(comp.getComponentName(), sink);                                 // 保存該sink
          channelComponent.components.add(sinkName);                                // 添加該sink名稱
        } catch (Exception e) {
          String msg = String.format("Sink %s has been removed due to an " +
              "error during configuration", sinkName);
          LOGGER.error(msg, e);
        }
      }
    }
    /*
     * Components which DO NOT have a ComponentConfiguration object
     * and use only Context
     */
    Map<String, Context> sinkContexts = agentConf.getSinkContext();                    // 加載不在配置信息中的sink
    for (String sinkName : sinkNames) {
      Context context = sinkContexts.get(sinkName);
      if (context != null) {
        Sink sink = sinkFactory.create(sinkName, context.getString(
            BasicConfigurationConstants.CONFIG_TYPE));
        try {
          Configurables.configure(sink, context);
          ChannelComponent channelComponent =
              channelComponentMap.get(
                  context.getString(BasicConfigurationConstants.CONFIG_CHANNEL));
          if (channelComponent == null) {
            String msg = String.format("Sink %s is not connected to a " +
                "channel",  sinkName);
            throw new IllegalStateException(msg);
          }
          checkSinkChannelCompatibility(sink, channelComponent.channel);
          sink.setChannel(channelComponent.channel);
          sinks.put(sinkName, sink);
          channelComponent.components.add(sinkName);
        } catch (Exception e) {
          String msg = String.format("Sink %s has been removed due to an " +
              "error during configuration", sinkName);
          LOGGER.error(msg, e);
        }
      }
    }

    loadSinkGroups(agentConf, sinks, sinkRunnerMap);                    // 加載分組信息
  }

  private void loadSinkGroups(AgentConfiguration agentConf,
      Map<String, Sink> sinks, Map<String, SinkRunner> sinkRunnerMap)
          throws InstantiationException {
    Set<String> sinkGroupNames = agentConf.getSinkgroupSet();
    Map<String, ComponentConfiguration> compMap =
        agentConf.getSinkGroupConfigMap();
    Map<String, String> usedSinks = new HashMap<String, String>();
    for (String groupName: sinkGroupNames) {
      ComponentConfiguration comp = compMap.get(groupName);
      if (comp != null) {
        SinkGroupConfiguration groupConf = (SinkGroupConfiguration) comp;
        List<Sink> groupSinks = new ArrayList<Sink>();
        for (String sink : groupConf.getSinks()) {
          Sink s = sinks.remove(sink);
          if (s == null) {
            String sinkUser = usedSinks.get(sink);
            if (sinkUser != null) {
              throw new InstantiationException(String.format(
                  "Sink %s of group %s already " +
                      "in use by group %s", sink, groupName, sinkUser));
            } else {
              throw new InstantiationException(String.format(
                  "Sink %s of group %s does "
                      + "not exist or is not properly configured", sink,
                      groupName));
            }
          }
          groupSinks.add(s);
          usedSinks.put(sink, groupName);
        }
        try {
          SinkGroup group = new SinkGroup(groupSinks);
          Configurables.configure(group, groupConf);
          sinkRunnerMap.put(comp.getComponentName(),
              new SinkRunner(group.getProcessor()));
        } catch (Exception e) {
          String msg = String.format("SinkGroup %s has been removed due to " +
              "an error during configuration", groupName);
          LOGGER.error(msg, e);
        }
      }
    }
    // add any unassigned sinks to solo collectors
    for (Entry<String, Sink> entry : sinks.entrySet()) {
      if (!usedSinks.containsValue(entry.getKey())) {
        try {
          SinkProcessor pr = new DefaultSinkProcessor();
          List<Sink> sinkMap = new ArrayList<Sink>();
          sinkMap.add(entry.getValue());
          pr.setSinks(sinkMap);
          Configurables.configure(pr, new Context());
          sinkRunnerMap.put(entry.getKey(), new SinkRunner(pr));
        } catch (Exception e) {
          String msg = String.format("SinkGroup %s has been removed due to " +
              "an error during configuration", entry.getKey());
          LOGGER.error(msg, e);
        }
      }
    }
  }
  private static class ChannelComponent {
    final Channel channel;
    final List<String> components;

    ChannelComponent(Channel channel) {
      this.channel = channel;
      components = Lists.newArrayList();
    }
  }

  protected Map<String, String> toMap(Properties properties) {
    Map<String, String> result = Maps.newHashMap();
    Enumeration<?> propertyNames = properties.propertyNames();
    while (propertyNames.hasMoreElements()) {
      String name = (String) propertyNames.nextElement();
      String value = properties.getProperty(name);
      result.put(name, value);
    }
    return result;
  }

該類的主要方法就是通過配置文件來加載對應的channel、source和sink配置信息,從實例配置中配置的channel配置爲memory,source的type爲netcat,sink的type爲logger,其對應的工作類的信息如下;

memory --> flume-ng-core/src/java/org/apache/flume/channel/MemoryChannel.java
netcat --> flume-ng-core/src/java/org/apache/flume/source/NetcatSource.java
logger --> flume-ng-core/src/java/org/apache/flume/sink/LoggerSink.java

此時配置文件加載完成,接着就需要啓動所有的加載的source,channel和sink實例,此時就回到Application中的

handleConfigurationEvent進行加載完成的組件的運行。

  @Subscribe
  public void handleConfigurationEvent(MaterializedConfiguration conf) {
    try {
      lifecycleLock.lockInterruptibly();
      stopAllComponents();                  // 先停止已經在運行的組件
      startAllComponents(conf);             // 開啓組件啓動
    } catch (InterruptedException e) {
      logger.info("Interrupted while trying to handle configuration event");
      return;
    } finally {
      // If interrupted while trying to lock, we don't own the lock, so must not attempt to unlock
      if (lifecycleLock.isHeldByCurrentThread()) {
        lifecycleLock.unlock();
      }
    }
  }
  
    private void startAllComponents(MaterializedConfiguration materializedConfiguration) {
    logger.info("Starting new configuration:{}", materializedConfiguration);

    this.materializedConfiguration = materializedConfiguration;

    for (Entry<String, Channel> entry :
        materializedConfiguration.getChannels().entrySet()) {             // 先獲取所有從配置文件中實例化的channel
      try {
        logger.info("Starting Channel " + entry.getKey());
        supervisor.supervise(entry.getValue(),
            new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);    // 加入到supervisor中運行
      } catch (Exception e) {
        logger.error("Error while starting {}", entry.getValue(), e);
      }
    }

    /*
     * Wait for all channels to start.
     */
    for (Channel ch : materializedConfiguration.getChannels().values()) {     // 檢查所有的channel爲運行狀態 需要等待所有的channel狀態爲開始
      while (ch.getLifecycleState() != LifecycleState.START
          && !supervisor.isComponentInErrorState(ch)) {
        try {
          logger.info("Waiting for channel: " + ch.getName() +
              " to start. Sleeping for 500 ms");
          Thread.sleep(500);
        } catch (InterruptedException e) {
          logger.error("Interrupted while waiting for channel to start.", e);
          Throwables.propagate(e);
        }
      }
    }

    for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners().entrySet()) { // 獲取所有的SinkRunner
      try {
        logger.info("Starting Sink " + entry.getKey());
        supervisor.supervise(entry.getValue(),
            new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);        // 開啓sinkRunner
      } catch (Exception e) {
        logger.error("Error while starting {}", entry.getValue(), e);
      }
    }

    for (Entry<String, SourceRunner> entry :
         materializedConfiguration.getSourceRunners().entrySet()) {      // 獲取所有的source 並且開啓source運行
      try {
        logger.info("Starting Source " + entry.getKey());
        supervisor.supervise(entry.getValue(),
            new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
      } catch (Exception e) {
        logger.error("Error while starting {}", entry.getValue(), e);
      }
    }

    this.loadMonitoring();              // 啓動監控
  }

主要j就是再前面初始化完成之後,依次先啓動channel然後再啓動sink最後啓動source,依次運行通過supervisor.supervise來運行;

  public synchronized void supervise(LifecycleAware lifecycleAware,
      SupervisorPolicy policy, LifecycleState desiredState) {
    if (this.monitorService.isShutdown()
        || this.monitorService.isTerminated()
        || this.monitorService.isTerminating()) {
      throw new FlumeException("Supervise called on " + lifecycleAware + " " +
          "after shutdown has been initiated. " + lifecycleAware + " will not" +
          " be started");
    }               // 檢查執行器是否終止

    Preconditions.checkState(!supervisedProcesses.containsKey(lifecycleAware),
        "Refusing to supervise " + lifecycleAware + " more than once");

    if (logger.isDebugEnabled()) {
      logger.debug("Supervising service:{} policy:{} desiredState:{}",
          new Object[] { lifecycleAware, policy, desiredState });
    }

    Supervisoree process = new Supervisoree();      // 生成一個Process
    process.status = new Status();                  // 生成一個狀態

    process.policy = policy;
    process.status.desiredState = desiredState;
    process.status.error = false;

    MonitorRunnable monitorRunnable = new MonitorRunnable();      // 生成一個運行的實例
    monitorRunnable.lifecycleAware = lifecycleAware;              // 配置執行的組件
    monitorRunnable.supervisoree = process;
    monitorRunnable.monitorService = monitorService;              // 設置服務

    supervisedProcesses.put(lifecycleAware, process);             // 保存

    ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(
        monitorRunnable, 0, 3, TimeUnit.SECONDS);                 // 添加到等待執行
    monitorFutures.put(lifecycleAware, future);
  }


    @Override
    public void run() {
      logger.debug("checking process:{} supervisoree:{}", lifecycleAware,
          supervisoree);

      long now = System.currentTimeMillis();              // 獲取當前時間

      try {
        if (supervisoree.status.firstSeen == null) {
          logger.debug("first time seeing {}", lifecycleAware);

          supervisoree.status.firstSeen = now;
        }

        supervisoree.status.lastSeen = now;
        synchronized (lifecycleAware) {
          if (supervisoree.status.discard) {
            // Unsupervise has already been called on this.
            logger.info("Component has already been stopped {}", lifecycleAware);
            return;
          } else if (supervisoree.status.error) {
            logger.info("Component {} is in error state, and Flume will not"
                + "attempt to change its state", lifecycleAware);
            return;
          }

          supervisoree.status.lastSeenState = lifecycleAware.getLifecycleState();     // 獲取當前的狀態

          if (!lifecycleAware.getLifecycleState().equals(
              supervisoree.status.desiredState)) {

            logger.debug("Want to transition {} from {} to {} (failures:{})",
                new Object[] { lifecycleAware, supervisoree.status.lastSeenState,
                    supervisoree.status.desiredState,
                    supervisoree.status.failures });

            switch (supervisoree.status.desiredState) {                 // 獲取當前需要的狀態
              case START:
                try {
                  lifecycleAware.start();                               // 如果是開始則開始執行
                } catch (Throwable e) {
                  logger.error("Unable to start " + lifecycleAware
                      + " - Exception follows.", e);
                  if (e instanceof Error) {
                    // This component can never recover, shut it down.
                    supervisoree.status.desiredState = LifecycleState.STOP;     // 出錯則設置爲停止
                    try {
                      lifecycleAware.stop();                                    // 調用停止
                      logger.warn("Component {} stopped, since it could not be"
                          + "successfully started due to missing dependencies",
                          lifecycleAware);
                    } catch (Throwable e1) {
                      logger.error("Unsuccessful attempt to "
                          + "shutdown component: {} due to missing dependencies."
                          + " Please shutdown the agent"
                          + "or disable this component, or the agent will be"
                          + "in an undefined state.", e1);
                      supervisoree.status.error = true;
                      if (e1 instanceof Error) {
                        throw (Error) e1;
                      }
                      // Set the state to stop, so that the conf poller can
                      // proceed.
                    }
                  }
                  supervisoree.status.failures++;
                }
                break;
              case STOP:
                try {
                  lifecycleAware.stop();                                // 如果是停止信號則停止
                } catch (Throwable e) {
                  logger.error("Unable to stop " + lifecycleAware
                      + " - Exception follows.", e);
                  if (e instanceof Error) {
                    throw (Error) e;
                  }
                  supervisoree.status.failures++;
                }
                break;
              default:
                logger.warn("I refuse to acknowledge {} as a desired state",
                    supervisoree.status.desiredState);
            }

            if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {
              logger.error(
                  "Policy {} of {} has been violated - supervisor should exit!",
                  supervisoree.policy, lifecycleAware);
            }
          }
        }
      } catch (Throwable t) {
        logger.error("Unexpected error", t);
      }
      logger.debug("Status check complete");
    }
  }

主要就是通過線程池來執行每個組件,每個組件都是被包裹在MonitorRunnable中運行,最後會調用組件實例的start方法。

NetcatSource的啓動流程
  @Override
  public void start() {

    logger.info("Source starting");

    counterGroup.incrementAndGet("open.attempts");

    try {
      SocketAddress bindPoint = new InetSocketAddress(hostName, port);      // 綁定ip和端口

      serverSocket = ServerSocketChannel.open();                            // 打開端口
      serverSocket.socket().setReuseAddress(true);                          // 設置端口可重用
      serverSocket.socket().bind(bindPoint);

      logger.info("Created serverSocket:{}", serverSocket);
    } catch (IOException e) {
      counterGroup.incrementAndGet("open.errors");
      logger.error("Unable to bind to socket. Exception follows.", e);
      stop();
      throw new FlumeException(e);
    }

    handlerService = Executors.newCachedThreadPool(new ThreadFactoryBuilder()
        .setNameFormat("netcat-handler-%d").build());                         // 初始化一個線程池來處理請求

    AcceptHandler acceptRunnable = new AcceptHandler(maxLineLength);          // 設置一個接受處理handler
    acceptThreadShouldStop.set(false);                                        // 設置參數
    acceptRunnable.counterGroup = counterGroup;
    acceptRunnable.handlerService = handlerService;
    acceptRunnable.shouldStop = acceptThreadShouldStop;
    acceptRunnable.ackEveryEvent = ackEveryEvent;
    acceptRunnable.source = this;
    acceptRunnable.serverSocket = serverSocket;
    acceptRunnable.sourceEncoding = sourceEncoding;

    acceptThread = new Thread(acceptRunnable);                                // 開啓線程接受數據

    acceptThread.start();                                                     // 運行該接受線程

    logger.debug("Source started");
    super.start();                                                            // 調用父類設置狀態爲開始狀態
  }

可知調用了start方法其實就是開啓了一個線程並用線程池來處理接入進來的請求。

  private static class AcceptHandler implements Runnable {

    private ServerSocketChannel serverSocket;
    private CounterGroup counterGroup;
    private ExecutorService handlerService;
    private EventDrivenSource source;
    private AtomicBoolean shouldStop;
    private boolean ackEveryEvent;
    private String sourceEncoding;

    private final int maxLineLength;

    public AcceptHandler(int maxLineLength) {
      this.maxLineLength = maxLineLength;
    }

    @Override
    public void run() {
      logger.debug("Starting accept handler");

      while (!shouldStop.get()) {                                         // 檢查是否是停止狀態
        try {
          SocketChannel socketChannel = serverSocket.accept();            // 接受連接

          NetcatSocketHandler request = new NetcatSocketHandler(maxLineLength);

          request.socketChannel = socketChannel;
          request.counterGroup = counterGroup;
          request.source = source;
          request.ackEveryEvent = ackEveryEvent;
          request.sourceEncoding = sourceEncoding;

          handlerService.submit(request);                                 // 提交獲取的數據

          counterGroup.incrementAndGet("accept.succeeded");               // 計數加1
        } catch (ClosedByInterruptException e) {
          // Parent is canceling us.
        } catch (IOException e) {
          logger.error("Unable to accept connection. Exception follows.", e);
          counterGroup.incrementAndGet("accept.failed");
        }
      }

      logger.debug("Accept handler exiting");
    }
  }

通過該端口接受到的數據都通過NetcatSocketHandler的線程來進行處理接受的數據;

    @Override
    public void run() {
      logger.debug("Starting connection handler");
      Event event = null;

      try {
        Reader reader = Channels.newReader(socketChannel, sourceEncoding);          // 獲取讀
        Writer writer = Channels.newWriter(socketChannel, sourceEncoding);          // 獲取寫
        CharBuffer buffer = CharBuffer.allocate(maxLineLength);                     // 獲取緩存
        buffer.flip(); // flip() so fill() sees buffer as initially empty

        while (true) {
          // this method blocks until new data is available in the socket
          int charsRead = fill(buffer, reader);
          logger.debug("Chars read = {}", charsRead);                               // 阻塞直到可讀

          // attempt to process all the events in the buffer
          int eventsProcessed = processEvents(buffer, writer);                      // 處理讀事件 
          logger.debug("Events processed = {}", eventsProcessed);

          if (charsRead == -1) {                                                    // 如果爲-1則讀出錯
            // if we received EOF before last event processing attempt, then we
            // have done everything we can
            break;
          } else if (charsRead == 0 && eventsProcessed == 0) {  
            if (buffer.remaining() == buffer.capacity()) {                          // 檢查緩存是否有剩餘
              // If we get here it means:
              // 1. Last time we called fill(), no new chars were buffered
              // 2. After that, we failed to process any events => no newlines
              // 3. The unread data in the buffer == the size of the buffer
              // Therefore, we are stuck because the client sent a line longer
              // than the size of the buffer. Response: Drop the connection.
              logger.warn("Client sent event exceeding the maximum length");
              counterGroup.incrementAndGet("events.failed");
              writer.write("FAILED: Event exceeds the maximum length (" +
                  buffer.capacity() + " chars, including newline)\n");
              writer.flush();
              break;
            }
          }
        }

        socketChannel.close();                                                      // 關閉連接

        counterGroup.incrementAndGet("sessions.completed");                         // 連接加1
      } catch (IOException e) {
        counterGroup.incrementAndGet("sessions.broken");
        try {
          socketChannel.close();
        } catch (IOException ex) {
          logger.error("Unable to close socket channel. Exception follows.", ex);
        }
      }

      logger.debug("Connection handler exiting");
    }

    /**
     * <p>Consume some number of events from the buffer into the system.</p>
     *
     * Invariants (pre- and post-conditions): <br/>
     *   buffer should have position @ beginning of unprocessed data. <br/>
     *   buffer should have limit @ end of unprocessed data. <br/>
     *
     * @param buffer The buffer containing data to process
     * @param writer The channel back to the client
     * @return number of events successfully processed
     * @throws IOException
     */
    private int processEvents(CharBuffer buffer, Writer writer)
        throws IOException {

      int numProcessed = 0;

      boolean foundNewLine = true;
      while (foundNewLine) { 
        foundNewLine = false;

        int limit = buffer.limit();                                 // 獲取緩存限制
        for (int pos = buffer.position(); pos < limit; pos++) {     // 獲取緩存位置
          if (buffer.get(pos) == '\n') {

            // parse event body bytes out of CharBuffer
            buffer.limit(pos); // temporary limit
            ByteBuffer bytes = Charsets.UTF_8.encode(buffer);
            buffer.limit(limit); // restore limit

            // build event object
            byte[] body = new byte[bytes.remaining()];
            bytes.get(body);
            Event event = EventBuilder.withBody(body);

            // process event
            ChannelException ex = null;
            try {
              source.getChannelProcessor().processEvent(event);     // 調用getChannelProcessor來處理該數據
            } catch (ChannelException chEx) {
              ex = chEx;
            }

            if (ex == null) {
              counterGroup.incrementAndGet("events.processed");       // 處理成功計數加
              numProcessed++;
              if (true == ackEveryEvent) {
                writer.write("OK\n");
              }
            } else {
              counterGroup.incrementAndGet("events.failed");          // 失敗計數加
              logger.warn("Error processing event. Exception follows.", ex);
              writer.write("FAILED: " + ex.getMessage() + "\n");
            }
            writer.flush();                                           // 刷新

            // advance position after data is consumed
            buffer.position(pos + 1); // skip newline                 處理新的數據
            foundNewLine = true;

            break;
          }
        }

      }

      return numProcessed;
    }

該處理線程主要是接受從端口接受到的數據,然後將接受到的數據通過設置的channelProcessor來處理接受的數據,該類的處理如下;

  public void processEvent(Event event) {

    event = interceptorChain.intercept(event);          // 看是否添加了過濾鏈
    if (event == null) {
      return;
    }

    // Process required channels
    List<Channel> requiredChannels = selector.getRequiredChannels(event);   // 獲取需要的channels
    for (Channel reqChannel : requiredChannels) {                           // 遍歷channels
      Transaction tx = reqChannel.getTransaction();                         // 獲取channel的事務
      Preconditions.checkNotNull(tx, "Transaction object must not be null");
      try {
        tx.begin();                                                         // 開啓事務
 
        reqChannel.put(event);                                              // 向channel中添加數據

        tx.commit();                                                        // 提交
      } catch (Throwable t) {
        tx.rollback();                                                      // 如果出錯則回滾
        if (t instanceof Error) {
          LOG.error("Error while writing to required channel: " + reqChannel, t);
          throw (Error) t;
        } else if (t instanceof ChannelException) {
          throw (ChannelException) t;
        } else {
          throw new ChannelException("Unable to put event on required " +
              "channel: " + reqChannel, t);
        }
      } finally {
        if (tx != null) {
          tx.close();
        }
      }
    }

    // Process optional channels
    List<Channel> optionalChannels = selector.getOptionalChannels(event);     // 獲取optional的channel
    for (Channel optChannel : optionalChannels) {
      Transaction tx = null;
      try {
        tx = optChannel.getTransaction();                                     // 開啓事務
        tx.begin();

        optChannel.put(event);                                                // 添加數據

        tx.commit();                                                          // 提交
      } catch (Throwable t) {
        tx.rollback();                                                        // 如果出錯則回滾
        LOG.error("Unable to put event on optional channel: " + optChannel, t);
        if (t instanceof Error) {
          throw (Error) t;
        }
      } finally {
        if (tx != null) {
          tx.close();
        }
      }
    }
  }

其中selector默認是ReplicatingChannelSelector類,該類就把所有需要將數據發送出去的channel放在了

requiredChannels列表中,至此,Source的數據已經發送到了channel中。

MemoryChannel執行流程
  @Override
  public synchronized void start() {
    channelCounter.start();                                   // 啓動counter
    channelCounter.setChannelSize(queue.size());              // 設置隊列大小
    channelCounter.setChannelCapacity(Long.valueOf(
            queue.size() + queue.remainingCapacity()));       // 設置隊列的容量
    super.start();                                            // 啓動
  }

這是channel在啓動的時候進行了隊列的設置等操作,當Source調用事務來提交數據的時候執行的是如下的代碼;

public abstract class BasicTransactionSemantics implements Transaction {

  private State state;
  private long initialThreadId;

  protected void doBegin() throws InterruptedException {}
  protected abstract void doPut(Event event) throws InterruptedException;
  protected abstract Event doTake() throws InterruptedException;
  protected abstract void doCommit() throws InterruptedException;
  protected abstract void doRollback() throws InterruptedException;
  protected void doClose() {}

  protected BasicTransactionSemantics() {
    state = State.NEW;
    initialThreadId = Thread.currentThread().getId();
  }

  /**
   * <p>
   * The method to which {@link BasicChannelSemantics} delegates calls
   * to <code>put</code>.
   * </p>
   */
  protected void put(Event event) {
    Preconditions.checkState(Thread.currentThread().getId() == initialThreadId,
        "put() called from different thread than getTransaction()!");
    Preconditions.checkState(state.equals(State.OPEN),
        "put() called when transaction is %s!", state);
    Preconditions.checkArgument(event != null,
        "put() called with null event!");

    try {
      doPut(event);
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      throw new ChannelException(e.toString(), e);
    }
  }

  /**
   * <p>
   * The method to which {@link BasicChannelSemantics} delegates calls
   * to <code>take</code>.
   * </p>
   */
  protected Event take() {
    Preconditions.checkState(Thread.currentThread().getId() == initialThreadId,
        "take() called from different thread than getTransaction()!");
    Preconditions.checkState(state.equals(State.OPEN),
        "take() called when transaction is %s!", state);

    try {
      return doTake();                                  // 調用doTake
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      return null;
    }
  }

  /**
   * @return the current state of the transaction
   */
  protected State getState() {
    return state;
  }

  @Override
  public void begin() {
    Preconditions.checkState(Thread.currentThread().getId() == initialThreadId,
        "begin() called from different thread than getTransaction()!");
    Preconditions.checkState(state.equals(State.NEW),
        "begin() called when transaction is " + state + "!");

    try {
      doBegin();                                          // 調用doBegin 並設置狀態爲OPEN
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      throw new ChannelException(e.toString(), e);
    }
    state = State.OPEN;
  }

  @Override
  public void commit() {
    Preconditions.checkState(Thread.currentThread().getId() == initialThreadId,
        "commit() called from different thread than getTransaction()!");
    Preconditions.checkState(state.equals(State.OPEN),
        "commit() called when transaction is %s!", state);

    try {
      doCommit();                                         // 提交併設置狀態爲COMPLETED
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      throw new ChannelException(e.toString(), e);
    }
    state = State.COMPLETED;
  }

  @Override
  public void rollback() {
    Preconditions.checkState(Thread.currentThread().getId() == initialThreadId,
        "rollback() called from different thread than getTransaction()!");
    Preconditions.checkState(state.equals(State.OPEN),
        "rollback() called when transaction is %s!", state);

    state = State.COMPLETED;
    try {
      doRollback();                                       // 回滾並設置爲完成
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      throw new ChannelException(e.toString(), e);
    }
  }

  @Override
  public void close() {
    Preconditions.checkState(Thread.currentThread().getId() == initialThreadId,
        "close() called from different thread than getTransaction()!");
    Preconditions.checkState(
            state.equals(State.NEW) || state.equals(State.COMPLETED),
            "close() called when transaction is %s"
            + " - you must either commit or rollback first", state);

    state = State.CLOSED;                                   // 設置爲關閉
    doClose();
  }

  @Override
  public String toString() {
    StringBuilder builder = new StringBuilder();
    builder.append("BasicTransactionSemantics: {");
    builder.append(" state:").append(state);
    builder.append(" initialThreadId:").append(initialThreadId);
    builder.append(" }");
    return builder.toString();
  }

  /**
   * <p>
   * The state of the {@link Transaction} to which it belongs.
   * </p>
   * <dl>
   * <dt>NEW</dt>
   * <dd>A newly created transaction that has not yet begun.</dd>
   * <dt>OPEN</dt>
   * <dd>A transaction that is open. It is permissible to commit or rollback.
   * </dd>
   * <dt>COMPLETED</dt>
   * <dd>This transaction has been committed or rolled back. It is illegal to
   * perform any further operations beyond closing it.</dd>
   * <dt>CLOSED</dt>
   * <dd>A closed transaction. No further operations are permitted.</dd>
   * </dl>
   */
  protected static enum State {
    NEW, OPEN, COMPLETED, CLOSED
  }
}

此時的MemoryTransaction實現如下;

public class MemoryChannel extends BasicChannelSemantics implements TransactionCapacitySupported {
  private static Logger LOGGER = LoggerFactory.getLogger(MemoryChannel.class);
  private static final Integer defaultCapacity = 100;
  private static final Integer defaultTransCapacity = 100;
  private static final double byteCapacitySlotSize = 100;
  private static final Long defaultByteCapacity = (long)(Runtime.getRuntime().maxMemory() * .80);
  private static final Integer defaultByteCapacityBufferPercentage = 20;

  private static final Integer defaultKeepAlive = 3;

  private class MemoryTransaction extends BasicTransactionSemantics {
    private LinkedBlockingDeque<Event> takeList;
    private LinkedBlockingDeque<Event> putList;
    private final ChannelCounter channelCounter;
    private int putByteCounter = 0;
    private int takeByteCounter = 0;

    public MemoryTransaction(int transCapacity, ChannelCounter counter) {
      putList = new LinkedBlockingDeque<Event>(transCapacity);        // 設置一個put的阻塞隊列
      takeList = new LinkedBlockingDeque<Event>(transCapacity);       // 設置一個拿的阻塞隊列

      channelCounter = counter;
    }

    @Override
    protected void doPut(Event event) throws InterruptedException {
      channelCounter.incrementEventPutAttemptCount();
      int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);

      if (!putList.offer(event)) {                                  // 檢查隊列是否滿了 滿了則報錯
        throw new ChannelException(
            "Put queue for MemoryTransaction of capacity " +
            putList.size() + " full, consider committing more frequently, " +
            "increasing capacity or increasing thread count");
      }
      putByteCounter += eventByteSize;
    }

    @Override
    protected Event doTake() throws InterruptedException {
      channelCounter.incrementEventTakeAttemptCount();
      if (takeList.remainingCapacity() == 0) {                            // 檢查takeList是否爲空 如果爲空則報錯
        throw new ChannelException("Take list for MemoryTransaction, capacity " +
            takeList.size() + " full, consider committing more frequently, " +
            "increasing capacity, or increasing thread count");
      }
      if (!queueStored.tryAcquire(keepAlive, TimeUnit.SECONDS)) {
        return null;
      }
      Event event;
      synchronized (queueLock) {
        event = queue.poll();                                             // 取出隊列頭部的數據
      }
      Preconditions.checkNotNull(event, "Queue.poll returned NULL despite semaphore " +
          "signalling existence of entry");
      takeList.put(event);                                                // 在takeList中加入該數據

      int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);
      takeByteCounter += eventByteSize;

      return event;                                                       // 返回該事件
    }

    @Override
    protected void doCommit() throws InterruptedException {
      int remainingChange = takeList.size() - putList.size();             // 獲取當前的變化值
      if (remainingChange < 0) {
        if (!bytesRemaining.tryAcquire(putByteCounter, keepAlive, TimeUnit.SECONDS)) {
          throw new ChannelException("Cannot commit transaction. Byte capacity " +
              "allocated to store event body " + byteCapacity * byteCapacitySlotSize +
              "reached. Please increase heap space/byte capacity allocated to " +
              "the channel as the sinks may not be keeping up with the sources");
        }
        if (!queueRemaining.tryAcquire(-remainingChange, keepAlive, TimeUnit.SECONDS)) {
          bytesRemaining.release(putByteCounter);
          throw new ChannelFullException("Space for commit to queue couldn't be acquired." +
              " Sinks are likely not keeping up with sources, or the buffer size is too tight");
        }
      }
      int puts = putList.size();                                          // 獲取當前的put的列表大小
      int takes = takeList.size();                                        // 獲取take的列表大小
      synchronized (queueLock) {
        if (puts > 0) {
          while (!putList.isEmpty()) {                                    // 如果putList不爲空
            if (!queue.offer(putList.removeFirst())) {                    // 是否可以將putList中的數據移動到queue中
              throw new RuntimeException("Queue add failed, this shouldn't be able to happen");
            }
          }
        }
        putList.clear();                                                  // 如果都可以移動則清空
        takeList.clear();
      }
      bytesRemaining.release(takeByteCounter);
      takeByteCounter = 0;
      putByteCounter = 0;

      queueStored.release(puts);
      if (remainingChange > 0) {
        queueRemaining.release(remainingChange);                              
      }
      if (puts > 0) {
        channelCounter.addToEventPutSuccessCount(puts);                 // 添加計數
      }
      if (takes > 0) {  
        channelCounter.addToEventTakeSuccessCount(takes);               // 添加計數
      }

      channelCounter.setChannelSize(queue.size());
    }

    @Override
    protected void doRollback() {
      int takes = takeList.size();                                          // 回滾獲取到的值
      synchronized (queueLock) {
        Preconditions.checkState(queue.remainingCapacity() >= takeList.size(),
            "Not enough space in memory channel " +
            "queue to rollback takes. This should never happen, please report");
        while (!takeList.isEmpty()) {                                       // 如果不爲空
          queue.addFirst(takeList.removeLast());                            // 將takeList中的數據都移動到queue中
        }
        putList.clear();                                                    // putList 清空
      }
      putByteCounter = 0;
      takeByteCounter = 0;

      queueStored.release(takes);
      channelCounter.setChannelSize(queue.size());
    }

  }

  // lock to guard queue, mainly needed to keep it locked down during resizes
  // it should never be held through a blocking operation
  private Object queueLock = new Object();

  @GuardedBy(value = "queueLock")
  private LinkedBlockingDeque<Event> queue;

  // invariant that tracks the amount of space remaining in the queue(with all uncommitted takeLists deducted)
  // we maintain the remaining permits = queue.remaining - takeList.size()
  // this allows local threads waiting for space in the queue to commit without denying access to the
  // shared lock to threads that would make more space on the queue
  private Semaphore queueRemaining;

  // used to make "reservations" to grab data from the queue.
  // by using this we can block for a while to get data without locking all other threads out
  // like we would if we tried to use a blocking call on queue
  private Semaphore queueStored;

  // maximum items in a transaction queue
  private volatile Integer transCapacity;
  private volatile int keepAlive;
  private volatile int byteCapacity;
  private volatile int lastByteCapacity;
  private volatile int byteCapacityBufferPercentage;
  private Semaphore bytesRemaining;
  private ChannelCounter channelCounter;

  public MemoryChannel() {
    super();
  }

  /**
   * Read parameters from context
   * <li>capacity = type long that defines the total number of events allowed at one time in the queue.
   * <li>transactionCapacity = type long that defines the total number of events allowed in one transaction.
   * <li>byteCapacity = type long that defines the max number of bytes used for events in the queue.
   * <li>byteCapacityBufferPercentage = type int that defines the percent of buffer between byteCapacity and the estimated event size.
   * <li>keep-alive = type int that defines the number of second to wait for a queue permit
   */
  @Override
  public void configure(Context context) {                              // 根據配置設置channel的相關配置信息
    Integer capacity = null;
    try {
      capacity = context.getInteger("capacity", defaultCapacity);       // 獲取容量大小
    } catch (NumberFormatException e) {
      capacity = defaultCapacity;                                       // 如果沒有配置則使用默認
      LOGGER.warn("Invalid capacity specified, initializing channel to "
          + "default capacity of {}", defaultCapacity);
    }

    if (capacity <= 0) {
      capacity = defaultCapacity;
      LOGGER.warn("Invalid capacity specified, initializing channel to "
          + "default capacity of {}", defaultCapacity);
    }
    try {
      transCapacity = context.getInteger("transactionCapacity", defaultTransCapacity);  // 獲取開啓的事務的默認值
    } catch (NumberFormatException e) {
      transCapacity = defaultTransCapacity;                                             // 如果出錯則設置爲默認的事務值
      LOGGER.warn("Invalid transation capacity specified, initializing channel"
          + " to default capacity of {}", defaultTransCapacity);
    }

    if (transCapacity <= 0) {
      transCapacity = defaultTransCapacity;
      LOGGER.warn("Invalid transation capacity specified, initializing channel"
          + " to default capacity of {}", defaultTransCapacity);
    }
    Preconditions.checkState(transCapacity <= capacity,
        "Transaction Capacity of Memory Channel cannot be higher than " +
            "the capacity.");                                                           // 容量必須要大於事務容量

    try {
      byteCapacityBufferPercentage = context.getInteger("byteCapacityBufferPercentage",
                                                        defaultByteCapacityBufferPercentage);     // 獲取容量的內存佔用比
    } catch (NumberFormatException e) {
      byteCapacityBufferPercentage = defaultByteCapacityBufferPercentage;
    }

    try {
      byteCapacity = (int) ((context.getLong("byteCapacity", defaultByteCapacity).longValue() *
          (1 - byteCapacityBufferPercentage * .01)) / byteCapacitySlotSize);                // 計算容量的佔用比率值
      if (byteCapacity < 1) {
        byteCapacity = Integer.MAX_VALUE;                                                   // 如果計算小於1則使用最大值
      }
    } catch (NumberFormatException e) {
      byteCapacity = (int) ((defaultByteCapacity * (1 - byteCapacityBufferPercentage * .01)) /
          byteCapacitySlotSize);
    }

    try {
      keepAlive = context.getInteger("keep-alive", defaultKeepAlive);         // 是否keep-alive
    } catch (NumberFormatException e) {
      keepAlive = defaultKeepAlive;
    }

    if (queue != null) {
      try {
        resizeQueue(capacity);                                              // 隊列如果不爲空 重新設置隊列大小
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }
    } else {
      synchronized (queueLock) {
        queue = new LinkedBlockingDeque<Event>(capacity);                   // 生成一個阻塞隊列
        queueRemaining = new Semaphore(capacity);
        queueStored = new Semaphore(0);
      }
    }

    if (bytesRemaining == null) {
      bytesRemaining = new Semaphore(byteCapacity);
      lastByteCapacity = byteCapacity;
    } else {
      if (byteCapacity > lastByteCapacity) {               
        bytesRemaining.release(byteCapacity - lastByteCapacity);
        lastByteCapacity = byteCapacity;
      } else {
        try {
          if (!bytesRemaining.tryAcquire(lastByteCapacity - byteCapacity, keepAlive,
                                         TimeUnit.SECONDS)) {
            LOGGER.warn("Couldn't acquire permits to downsize the byte capacity, resizing has been aborted");
          } else {
            lastByteCapacity = byteCapacity;
          }
        } catch (InterruptedException e) {
          Thread.currentThread().interrupt();
        }
      }
    }

    if (channelCounter == null) {
      channelCounter = new ChannelCounter(getName());             // 生成一個channelCounter實例
    }
  }

  private void resizeQueue(int capacity) throws InterruptedException {
    int oldCapacity;
    synchronized (queueLock) {
      oldCapacity = queue.size() + queue.remainingCapacity();         // 獲取舊的容量大小
    }

    if (oldCapacity == capacity) {                                    // 如果舊值與新值相同則返回
      return;
    } else if (oldCapacity > capacity) {
      if (!queueRemaining.tryAcquire(oldCapacity - capacity, keepAlive, TimeUnit.SECONDS)) {
        LOGGER.warn("Couldn't acquire permits to downsize the queue, resizing has been aborted");
      } else {
        synchronized (queueLock) {
          LinkedBlockingDeque<Event> newQueue = new LinkedBlockingDeque<Event>(capacity);  // 否則重新生成一個隊列 並將舊值放入
          newQueue.addAll(queue);
          queue = newQueue;
        }
      }
    } else {
      synchronized (queueLock) {
        LinkedBlockingDeque<Event> newQueue = new LinkedBlockingDeque<Event>(capacity);
        newQueue.addAll(queue);
        queue = newQueue;
      }
      queueRemaining.release(capacity - oldCapacity);         // 重新調用剩餘的值
    }
  }

  @Override
  public synchronized void start() {
    channelCounter.start();                                   // 啓動counter
    channelCounter.setChannelSize(queue.size());              // 設置隊列大小
    channelCounter.setChannelCapacity(Long.valueOf(
            queue.size() + queue.remainingCapacity()));       // 設置隊列的容量
    super.start();                                            // 啓動
  }

  @Override
  public synchronized void stop() {
    channelCounter.setChannelSize(queue.size());
    channelCounter.stop();
    super.stop();
  }

  @Override
  protected BasicTransactionSemantics createTransaction() { 
    return new MemoryTransaction(transCapacity, channelCounter);      // 創建一個事務
  }

  private long estimateEventSize(Event event) {           // 獲取body的長度
    byte[] body = event.getBody();
    if (body != null && body.length != 0) {
      return body.length;
    }
    //Each event occupies at least 1 slot, so return 1.
    return 1;
  }

  @VisibleForTesting
  int getBytesRemainingValue() {
    return bytesRemaining.availablePermits();
  }

  public long getTransactionCapacity() {
    return transCapacity;
  }
}

通過兩個put和take隊列來模擬實現當前的數據是否可以提交到隊列中去,將數據提交到隊列中。

sinkRuner執行流程

在sink的解析過程中,會將sink對應的處理函數放入到SinkRunner進行包裝,此時LoggerSink的同樣被包裝進入了SinkRunner類中;

  @Override
  public void start() {
    SinkProcessor policy = getPolicy();       // 獲取policy

    policy.start();                           // 運行policy

    runner = new PollingRunner();             // 初始化一個policy

    runner.policy = policy;
    runner.counterGroup = counterGroup;
    runner.shouldStop = new AtomicBoolean();    // 原子bool值 是否停止

    runnerThread = new Thread(runner);
    runnerThread.setName("SinkRunner-PollingRunner-" +
        policy.getClass().getSimpleName());
    runnerThread.start();                         // 線程運行runner

    lifecycleState = LifecycleState.START;
  }
  
  public static class PollingRunner implements Runnable {

    private SinkProcessor policy;
    private AtomicBoolean shouldStop;
    private CounterGroup counterGroup;

    @Override
    public void run() {
      logger.debug("Polling sink runner starting");

      while (!shouldStop.get()) {                                         // 是否停止
        try {
          if (policy.process().equals(Sink.Status.BACKOFF)) {             // 調用policy的process來處理
            counterGroup.incrementAndGet("runner.backoffs");

            Thread.sleep(Math.min(
                counterGroup.incrementAndGet("runner.backoffs.consecutive")
                * backoffSleepIncrement, maxBackoffSleep));
          } else {
            counterGroup.set("runner.backoffs.consecutive", 0L);
          }
        } catch (InterruptedException e) {
          logger.debug("Interrupted while processing an event. Exiting.");
          counterGroup.incrementAndGet("runner.interruptions");
        } catch (Exception e) {
          logger.error("Unable to deliver event. Exception follows.", e);
          if (e instanceof EventDeliveryException) {
            counterGroup.incrementAndGet("runner.deliveryErrors");
          } else {
            counterGroup.incrementAndGet("runner.errors");
          }
          try {
            Thread.sleep(maxBackoffSleep);
          } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
          }
        }
      }
      logger.debug("Polling runner exiting. Metrics:{}", counterGroup);
    }

  }

java在配置文件加載的過程中;如果沒有配置group則會默認生成一個SinkRunner類來包裝;

		// loadSinks --> loadSinkGroups
		// add any unassigned sinks to solo collectors
    for (Entry<String, Sink> entry : sinks.entrySet()) {
      if (!usedSinks.containsValue(entry.getKey())) {
        try {
          SinkProcessor pr = new DefaultSinkProcessor();
          List<Sink> sinkMap = new ArrayList<Sink>();
          sinkMap.add(entry.getValue());
          pr.setSinks(sinkMap);
          Configurables.configure(pr, new Context());
          sinkRunnerMap.put(entry.getKey(), new SinkRunner(pr));
        } catch (Exception e) {
          String msg = String.format("SinkGroup %s has been removed due to " +
              "an error during configuration", entry.getKey());
          LOGGER.error(msg, e);
        }
      }

此時就是通過policy的process方法來監聽是否有數據傳入;

public class LoggerSink extends AbstractSink implements Configurable {

  private static final Logger logger = LoggerFactory
      .getLogger(LoggerSink.class);

  // Default Max bytes to dump
  public static final int DEFAULT_MAX_BYTE_DUMP = 16;

  // Max number of bytes to be dumped
  private int maxBytesToLog = DEFAULT_MAX_BYTE_DUMP;

  public static final String MAX_BYTES_DUMP_KEY = "maxBytesToLog";

  @Override
  public void configure(Context context) {
    String strMaxBytes = context.getString(MAX_BYTES_DUMP_KEY);
    if (!Strings.isNullOrEmpty(strMaxBytes)) {
      try {
        maxBytesToLog = Integer.parseInt(strMaxBytes);
      } catch (NumberFormatException e) {
        logger.warn(String.format(
            "Unable to convert %s to integer, using default value(%d) for maxByteToDump",
            strMaxBytes, DEFAULT_MAX_BYTE_DUMP));
        maxBytesToLog = DEFAULT_MAX_BYTE_DUMP;
      }
    }
  }

  @Override
  public Status process() throws EventDeliveryException {
    Status result = Status.READY;
    Channel channel = getChannel();
    Transaction transaction = channel.getTransaction();       // 獲取事務
    Event event = null;

    try {
      transaction.begin();                                    // 開啓事務
      event = channel.take();                                 // 從通道中獲取數據

      if (event != null) {                                    // 如果不爲空並且可以打應信息
        if (logger.isInfoEnabled()) {               
          logger.info("Event: " + EventHelper.dumpEvent(event, maxBytesToLog));   // 輸出打印信息
        }
      } else {
        // No event found, request back-off semantics from the sink runner
        result = Status.BACKOFF;                              // 否則沒有數據停止
      }
      transaction.commit();                                   // 提交事務
    } catch (Exception ex) {
      transaction.rollback();
      throw new EventDeliveryException("Failed to log event: " + event, ex);
    } finally {
      transaction.close();
    }

    return result;
  }
}

至此,結果的輸出層的流程也執行完成。

總結

Flume分佈式可靠的傳輸的主要流程如上所述,根據最簡單的配置文件進行了基本流程的梳理,通過梳理可知,channel與source、sink的數據交互通過隊列來完成,並且通過類似於事務的形式來確保數據能夠被正確的傳遞,並控制數據的正常消費,如果sink出問題則不會提交數據消費,或者如果channel中還好數據沒有被消費也不會被消費掉。由於本人才疏學淺,如有錯誤請批評指正。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章