Tensorflow serving架構與代碼解析

Tensorflow serving

架構

角色

Servables

類似於模型服務，產出流式結果

streaming results
experimental APIs（提高API）
asynchronous modes of operation（異步處理模型操作）

Models

Models
TensorFlow Serving represents a model as one or more servables. A machine-learned model may include one or more algorithms (including learned weights) and lookup or embedding tables.

You can represent a composite model as either of the following:

multiple independent servables

single composite servable

A servable may also correspond to a fraction of a model. For example, a large lookup table could be sharded across many TensorFlow Serving instances.

Loaders

Loaders 管理模型的生命週期，支持加載與卸載

Sources

Sources are plugin modules that find and provide servables
supplies one Loader instance
Sources -> SourceAdapters -> loaders

Aspired versions

期望的版本

Managers

loading Servables

serving Servables

unloading Servables

required resources aren’t available
wait to unload until a newer version finishes loading

簡單來說：
1、Sources create Loaders for Servable Versions.
2、Loaders are sent as Aspired Versions to the Manager, which loads and serves them to client requests

詳細來說：

1、A Source plugin creates a Loader for a specific version. The Loader contains whatever metadata it needs to load the Servable.（Source 創造特定版本的Loader）

2、The Source uses a callback to notify the Manager of the Aspired Version.(Source通過callback告訴Manager有新版本)

3、The Manager applies the configured Version Policy to determine the next action to take, which could be to unload a previously loaded version or to load the new version.（Manager判斷是否有足夠的資源）

4、If the Manager determines that it’s safe, it gives the Loader the required resources and tells the Loader to load the new version.（有資源情況下，Manager給充足的資源到Loader準備工作）

5、Clients ask the Manager for the Servable, either specifying a version explicitly or just requesting the latest version. The Manager returns a handle for the Servable. （客戶端請求版本Manager返回信息）

舉例來說：

1、The Source detects a new version of the model weights. It creates a Loader that contains a pointer to the model data on disk.（Source發現目錄有新的模型，創建Loader）

2、The Source notifies the Dynamic Manager of the Aspired Version.（告訴Manager）

3、The Dynamic Manager applies the Version Policy and decides to load the new version.（Manager檢查是否啓動）

4、he Dynamic Manager tells the Loader that there is enough memory. The Loader instantiates the TensorFlow graph with the new weights.(Manager告訴loader可以加載，loader開始加載圖)

5、A client requests a handle to the latest version of the model, and the Dynamic Manager returns a handle to the new version of the Servable.（請求返回結果）

loader 與 Manager

loader：管理模型生命週期

Manager：管理服務生命週期

特性

TensorFlow Serving通過Model Version Policy來配置多個模型的多個版本同時serving；

默認只加載model的latest version；

支持基於文件系統的模型自動發現和加載；

請求處理延遲低；

無狀態，支持橫向擴展；

可以使用A/B測試不同Version Model；

支持從本地文件系統掃描和加載TensorFlow模型；

支持從HDFS掃描和加載TensorFlow模型；

提供了用於client調用的gRPC接口；

接收批量請求

Batcher

Batching of multiple requests into a single request can significantly reduce the cost of performing inference, especially in the presence of hardware accelerators such as GPUs. TensorFlow Serving includes a request batching widget that lets clients easily batch their type-specific inferences across requests into batch requests that algorithm systems can more efficiently process. See the Batching Guide for more information

流程：

Source 加載本地資源文件，並根據模型數量創建相應的Loader, 將Aspired Versions配置信息傳給Manager, Manager判斷是否有足夠的資源，如果充足，則通知Loader去加載模型並卸載舊模型，待加載完成後，Manger 會啓動服務對外服務

client 發送請求到ServableHandle, 轉發到Manager 預測完成後返回。

源碼剖析

ServerCore （main.cc）

int main(int argc, char** argv) {
  ...

  ServerCore::Options options;
  options.model_server_config = model_server_config;
  options.servable_state_monitor_creator = &CreateServableStateMonitor;
  options.custom_model_config_loader = &LoadCustomModelConfig;

  ::google::protobuf::Any source_adapter_config;
  SavedModelBundleSourceAdapterConfig
      saved_model_bundle_source_adapter_config;
  source_adapter_config.PackFrom(saved_model_bundle_source_adapter_config);
  (*(*options.platform_config_map.mutable_platform_configs())
      [kTensorFlowModelPlatform].mutable_source_adapter_config()) =
      source_adapter_config;

  std::unique_ptr<ServerCore> core;
  TF_CHECK_OK(ServerCore::Create(options, &core));
  RunServer(port, std::move(core));

  return 0;
}

根據配置啓動服務

FileSystemStoragePathSourceConfig ServerCore::CreateStoragePathSourceConfig(
    const ModelServerConfig& config) const {
  FileSystemStoragePathSourceConfig source_config;
  source_config.set_file_system_poll_wait_seconds(
      options_.file_system_poll_wait_seconds);
  for (const auto& model : config.model_config_list().config()) {
    LOG(INFO) << " (Re-)adding model: " << model.name();
    FileSystemStoragePathSourceConfig::ServableToMonitor* servable =
        source_config.add_servables();
    servable->set_servable_name(model.name());
    servable->set_base_path(model.base_path());
    // TODO(akhorlin): remove this logic once the corresponding deprecated
    // field is removed (b/62834753).
    if (!model.has_model_version_policy()) {
      switch (model.version_policy()) {
        case FileSystemStoragePathSourceConfig::LATEST_VERSION:
          servable->mutable_servable_version_policy()->mutable_latest();
          break;
        case FileSystemStoragePathSourceConfig::ALL_VERSIONS:
          servable->mutable_servable_version_policy()->mutable_all();
          break;
        default:
          LOG(FATAL) << "Unknown version policy: "  // Crash ok.
                     << model.version_policy();
      }
    } else {
      *servable->mutable_servable_version_policy() =
          model.model_version_policy();
    }
  }
  return source_config;
}

啓動GRPC服務

void RunServer(int port, std::unique_ptr<ServerCore> core,
           bool use_saved_model) {
  // "0.0.0.0" is the way to listen on localhost in gRPC.
  const string server_address = "0.0.0.0:" + std::to_string(port);
  PredictionServiceImpl service(std::move(core), use_saved_model);
  ServerBuilder builder;
  std::shared_ptr<grpc::ServerCredentials> creds = InsecureServerCredentials();
  builder.AddListeningPort(server_address, creds);
  builder.RegisterService(&service);
  builder.SetMaxMessageSize(tensorflow::kint32max);
  std::unique_ptr<Server> server(builder.BuildAndStart());
  LOG(INFO) << "Running ModelServer at " << server_address << " ...";
  server->Wait();
}

掃描本地並加載模型

void PeriodicFunction::RunLoop(const int64 start) {
  	{
    if (options_.startup_delay_micros > 0) {
      const int64 deadline = start + options_.startup_delay_micros;
      options_.env->SleepForMicroseconds(deadline - start);
    }

    while (!stop_thread_.HasBeenNotified()) {
      VLOG(3) << "Running function.";
      const int64 begin = options_.env->NowMicros();
      function_();

      // Take the max() here to guard against time going backwards which
      // sometimes happens in multiproc machines.
      const int64 end =
          std::max(static_cast<int64>(options_.env->NowMicros()), begin);

      // The deadline is relative to when the last function started.
      const int64 deadline = begin + interval_micros_;

      // We want to sleep until 'deadline'.
      if (deadline > end) {
        if (end > begin) {
          VLOG(3) << "Reducing interval_micros from " << interval_micros_
                  << " to " << (deadline - end);
        }
        options_.env->SleepForMicroseconds(deadline - end);
      } else {
        VLOG(3) << "Function took longer than interval_micros, so not sleeping";
      }
    }

ServerCore, which internally wraps an AspiredVersionsManager. （ServerCore內部包裝AspiredVersionsManager）

ServerCore::Create() takes a ServerCore::Options parameter. Here are a few commonly used options:

ModelServerConfig that specifies models to be loaded. Models are declared either through model_config_list, which declares a static list of models, or through custom_model_config, which defines a custom way to declare a list of models that may get updated at runtime.（在model_config_list指定配置加載模型）

PlatformConfigMap that maps from the name of the platform (such as tensorflow) to the PlatformConfig, which is used to create the SourceAdapter. SourceAdapter adapts StoragePath (the path where a model version is discovered) to model Loader (loads the model version from storage path and provides state transition interfaces to the Manager). If PlatformConfig contains SavedModelBundleSourceAdapterConfig, a SavedModelBundleSourceAdapter will be created, which we will explain later.（PlatformConfigMap 平臺配置被使用來創建SourceAdapter， SourceAdapter 通過StoragePath服務相應的Loader(提供狀態接口給Manger), 如果PlatformConfig 包含SavedModelBundleSourceAdapterConfig信息，則SavedModelBundleSourceAdapter 會被創造）

SavedModelBundle

SavedModelBundle is a key component of TensorFlow Serving. It represents a TensorFlow model loaded from a given path and provides the same Session::Run interface as TensorFlow to run inference. SavedModelBundleSourceAdapter adapts storage path to Loader so that model lifetime can be managed by Manager. Please note that SavedModelBundle is the successor of deprecated SessionBundle. Users are encouraged to use SavedModelBundle as the support for SessionBundle will soon be removed.（建議使用SavedModelBundle而不是SessionBundle，SavedModelBundleSourceAdapter 關聯 Loader以至於Loader被Manager管理）

ServerCore 工作如下：

1、Instantiates a FileSystemStoragePathSource that monitors model export paths declared in model_config_list（通過model_config_list實例化FileSystemStoragePathSource ）

2、Instantiates a SourceAdapter using the PlatformConfigMap with the model platform declared in model_config_list and connects the FileSystemStoragePathSource to it. This way, whenever a new model version is discovered under the export path, the SavedModelBundleSourceAdapter adapts it to a Loader.（Loader --> SavedModelBundleSourceAdapter ——> FileSystemStoragePathSource ——>SetAspiredVersionsCallback(開啓監聽新模型) ）

3、Instantiates a specific implementation of Manager called AspiredVersionsManager that manages all such Loader instances created by the SavedModelBundleSourceAdapter. ServerCore exports the Manager interface by delegating the calls to AspiredVersionsManager.（AspiredVersionsManager -> Loader(被SavedModelBundleSourceAdapter創建)）

附錄：

https://blog.csdn.net/wuhuaiyu/article/details/77336372

https://blog.csdn.net/xlie/article/details/81949947

https://blog.csdn.net/appletesttest/article/details/89647758

https://naurril.github.io/howtos/2018/08/22/inside_tfs.html#par20

Tensorflow serving架構與代碼解析