Ceph 之RGW Pub-Sub Module

Overview

Pub-Sub module 顧名思義是一個發佈訂閱相關的模塊。Pub-Sub module 爲對象存儲的變更事件提供一種發佈-訂閱機制。而發佈-訂閱架構本身應用非常廣泛,如公有云Google Cloud,AWS 的PubSub 服務,Redis 的發佈訂閱機制等等,發佈訂閱架構提供了將發送者和接收者分離的多對多異步消息傳遞。

事件會往預定義好的主題中發佈,主題可以被訂閱,也可以從主題中拉取事件。事件被確認後就會從訂閱歷史中刪除,events_retention_days(默認7天)後會被自動確認。

Pub-Sub module 仍在開發中,且最近的一次完整backport Nautilus 還未發佈(https://github.com/ceph/ceph/pull/30579 未包含在最新發布版本Nautilus 14.2.4)。

Pub-Sub module 中有四個基本概念:

  • Topic(主題):topic 關聯特定的存儲桶(需要通過notification 關聯特定存儲桶),一個存儲桶可以關聯多個topic,每個topic 擁有一個subscriptions 列表。
  • Notification(通知):指定topic 和bucket 創建notification,notification 發佈指定存儲桶的事件在關聯的topic上。notification 不指定endpoint(需要在topic 指定推送endpoint)。notification API分爲S3 兼容(bucket notification,屬於bucket 下的操作)和非S3 兼容。
  • Subscription(訂閱):指定topic 創建subscription,subscription接收訂閱主題的事件推送,且可以拉取指定topic 上的事件。subscription 會指定endpoint,用於後面事件推送。
  • Event(事件):存儲桶或其中的對象發生變更時即發生事件,如ObjectCreated、ObjectRemoved等等。事件根據是普通subscription 還是notification,選擇存儲、推送事件或僅推送事件通知(事件推送需要有指定的endpoint)。

Usage

目前pub-sub sync module 還在開發中,功能不完善,pub-sub 相關radosgw-admin api 未給出CLI 說明,CLI 拉取events 會觸發core dump(見Q&A)。

配置multisite 

所有sync module 都是基於multisite 框架的,multisite 通過多個zone的關聯,每個zone 包含一個或多個RGW,根據sync module 的不同,進行相應的數據或元數據同步 。

通常所說的multisite 即爲sync module 中的default module,可以進行數據、元數據的同步。pub-sub module 作爲sync module的一種,同樣需要通過多個zone 之間的同步搭起multisite 框架,然後通過pub-sub module 進行相應數據同步。

更具體sync module 及multisite 原理說明可參考:RGW Sync ModuleRGW Multisite

下面以2個zone 的方式說明pub-sub module 的multisite 配置。

新建pubsub zone,並配置tier-type=pubsub 及tier-config

bin/radosgw-admin -c ceph.conf realm create --rgw-realm=default --default --master

bin/radosgw-admin -c ceph.conf zonegroup modify --rgw-realm=default --rgw-zonegroup=default --default --master --endpoints="http://192.168.180.138:8000"

bin/radosgw-admin -c ceph.conf zone modify --rgw-realm=default --rgw-zonegroup=default --rgw-zone=default --access-key=bl_deliver --secret-key=bl_deliver --bl-deliver --default --master --endpoints="http://192.168.180.138:8000"

bin/radosgw-admin -c ceph.conf zone modify --rgw-realm=default --rgw-zonegroup=default --rgw-zone=default --access-key=ms_sync --secret-key=ms_sync --system

bin/radosgw-admin -c ceph.conf zone create --rgw-zone pubsub --rgw-zonegroup default --rgw-realm default --tier-type=pubsub --tier-config=uid=user1,data_bucket_prefix=pubsub,data_oid_prefix=pubsub-,events_retention_days=1 --sync-from-all=false --sync-from=default --endpoints="http://192.168.180.138:8001"
bin/radosgw-admin -c ceph.conf zone modify --rgw-realm default --rgw-zonegroup default --rgw-zone pubsub --access-key=ms_sync --secret-key=ms_sync --system

bin/radosgw-admin -c ceph.conf period update --commit

將某一RGW 配置爲pubsub rgw

[client.rgw.8001]
        rgw zone = pubsub

查看同步狀態

[root@stor14 build]# bin/radosgw-admin sync status -c ceph.conf
          realm 3528d7ca-aac2-4161-ab7b-e16d63e7faaa (default)
      zonegroup 8a290331-13a5-4822-81c7-b840ef228312 (default)
           zone 49c1fd93-060c-441b-9a64-7b9e10efc7f6 (default)
  metadata sync no sync (zone is master)
      data sync source: daff70b4-6df3-4d21-ae20-9673d06e89db (pubsub)
                        not syncing from zone
[root@stor14 build]# bin/radosgw-admin sync status -c ceph.conf --rgw-zone pubsub
          realm 3528d7ca-aac2-4161-ab7b-e16d63e7faaa (default)
      zonegroup 8a290331-13a5-4822-81c7-b840ef228312 (default)
           zone daff70b4-6df3-4d21-ae20-9673d06e89db (pubsub)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 49c1fd93-060c-441b-9a64-7b9e10efc7f6 (default)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
[root@stor14 build]# bin/radosgw-admin bucket list -c ceph.conf
[
    "test1"
]
[root@stor14 build]# bin/radosgw-admin bucket list -c ceph.conf --rgw-zone pubsub
[
    "test1"
]

可以看到同步正常,存儲桶test1已經同步至pubsub zone

pub-sub config:

創建發佈
[root@stor12 build]# radosgw-admin pubsub topic create --uid user1 --topic topic1 --rgw-zone pubsub --endpoints="http://192.168.180.136:8000" #推送遠端設置爲136節點
list 發佈
[root@stor12 build]# radosgw-admin pubsub topics list --uid user1 --topic topic1 --rgw-zone pubsub
{
    "topics": [
        {
            "topic": {
                "user": "user1",
                "name": "topic1",
                "dest": {
                    "bucket_name": "",
                    "oid_prefix": "",
                    "push_endpoint": "",
                    "push_endpoint_args": "",
                    "push_endpoint_topic": ""
                },
                "arn": ""
            },
            "subs": []
        }
    ]
}
查詢發佈
[root@stor12 build]# radosgw-admin pubsub topic get --uid user1 --topic topic1 --rgw-zone pubsub
{
    "topic": {
        "user": "user1",
        "name": "topic1",
        "dest": {
            "bucket_name": "",
            "oid_prefix": "",
            "push_endpoint": "",
            "push_endpoint_args": "",
            "push_endpoint_topic": ""
        },
        "arn": ""
    },
    "subs": []
}
創建notification,並指定存儲桶、關聯topic,這樣notification 發佈指定存儲桶相關事件至關聯topic 
[root@stor12 build]# radosgw-admin pubsub notification create --uid user1 --topic topic1 --rgw-zone pubsub --bucket test1
創建訂閱
[root@stor12 build]# radosgw-admin pubsub sub create --sub-name sub1 --uid user1 --topic topic1 --rgw-zone pubsub --sub-push-endpoint="http://192.168.180.136:8000" --sub-dest-bucket test1
查詢訂閱
[root@stor12 build]# radosgw-admin pubsub sub get --uid user1 --sub-name sub1 --rgw-zone pubsub
{
    "user": "user1",
    "name": "sub1",
    "topic": "topic1",
    "dest": {
        "bucket_name": "pubsubuser1-topic1",
        "oid_prefix": "pubsub-",
        "push_endpoint": "",
        "push_endpoint_args": "",
        "push_endpoint_topic": ""
    },
    "s3_id": ""
}
拉取events
[root@stor12 build]# radosgw-admin pubsub sub pull --sub-name sub1 --uid user1 --topic topic1 --rgw-zone pubsub
{
    "next_marker": "",
    "is_truncated": "false",
    "events": [
        {
            "id": "1576056050.369301.c7191f0d",
            "event": "OBJECT_CREATE",
            "timestamp": "2019-12-11T09:20:50.369301Z",
            "info": {
                "attrs": {
                    "mtime": "2019-12-11T09:20:48.036424Z"
                },
                "bucket": {
                    "bucket_id": "96733e46-48af-4b89-a3dc-3f3b39ffb2a3.4539.1",
                    "name": "test1",
                    "tenant": ""
                },
                "key": {
                    "instance": "",
                    "name": "f3"
                }
            }
        }
    ]
} 

Pub-Sub test

cd ceph/build
RGW_MULTI_TEST_CONF=./test_multi.conf nosetests -s --verbosity=2  ../src/test/rgw/test_multi.py -m "test_ps*"

Pub-Sub REST API

在pub-sub rest api 上,Ceph 實現和AWS 標準是有卻別的。

API
AWS
Ceph
說明
Topic

AWS Simple Notification Service(SNS) API 包含很多

  • GetTopic 是獲取所有topic 信息
  • CreateTopic 不支持指定endpoint

僅實現了CreateTopic、DeleteTopic、ListTopics、GetTopic

  • GetTopic可指定topic,也可以獲取所有topic
  • CreateTopic 支持指定endpoint

 Ceph 僅實現了部分AWS SNS API,但Ceph相對做了部分擴展

Subscription 無相關API 有對應Subscription API AWS Pub-Sub 服務中Subscription 概念是隱含在notification 中的,創建或刪除notification 就會創建或刪除subscription
Notification

AWS S3 Bucket Notification API

  • GetNotification 獲取指定bucket的所有notification
  • 不支持對象前綴、後綴篩選
  • 不支持相同事件發送至不同的notification
  • 刪除notification:指定刪除notification,刪除對應存儲桶或設置指定notification 爲空

兩類API:兼容AWS S3 Bucket Notification API 和非S3 兼容API

  • GetNotification可以指定notification獲取
  • 支持對象前綴、後綴篩選
  • 支持相同事件發送至不同的notification
  • 刪除notification:必須顯式刪除,不支持設置指定notification 爲空來刪除notification

Ceph 實現了兩類API,且做了部分擴展

Event 獲取notification 推送

對於兩類API(是否S3兼容)的返回是不一樣的(具體可見代碼解析event 數據結構說明)

僅支持部分event type,且在bucket notification 中和pub-sub module 中支持的程度不一樣,具體可見#event-types  。

需要說明的是pub-sub REST API 訪問的endpoint 必須是pub-sub zone 下的RGW endpoint。

Topic

CREATE TOPIC

創建topic

PUT /topics/<topic-name>[?push-endpoint=<endpoint>[&amqp-exchange=<exchange>][&amqp-ack-level=none|broker][&verify-ssl=true|false][&kafka-ack-level=none|broker]]
  • push-endpoint:發送推送通知的endpoint的URI(僅在notification 時需要,直接創建subscription 會自行指定endpoint),包含三類:
    • HTTP endpoint:http[s]://<fqdn>[:<port]
    • AMQP endpoint:amqp://[<user>:<password>@]<fqdn>[:<port>][/<vhost>]
    • KAFKA endpoint:kafka://<fqdn>[:<port]
  • 其中topic 作爲一種資源,在response中topic 以ARN 形式表示,格式如下:
    • arn:aws:sns:<zone-group>:<tenant>:<topic

GET TOPIC INFORMATION

獲取topic 信息

GET /topics/<topic-name>

Response 是一段json 數據:

{
    "topic":{
        "user":"",
        "name":"",
        "dest":{
            "bucket_name":"",
            "oid_prefix":"",
            "push_endpoint":"",
            "push_endpoint_args":""
        },
        "arn":""
    },
    "subs":[]
}

DELETE TOPIC

刪除指定topic

DELETE /topics/<topic-name>

LIST TOPICS

GET /topics

S3 兼容Notification

關於Notification 有以下說明:

  • 創建notification 會自動創建notification Id同名的訂閱(subscription);
  • 刪除notification 會自動刪除自動創建的訂閱(subscription);
  • 刪除bucket 會自動刪除notification,但是不會刪除notification對應的subscription,subscription相關的事件仍然可以訪問;
  • S3 notification 屬於存儲桶bucket 下操作。

CREATE NOTIFICATION

創建指定存儲桶上的指定topic 的notification(publisher)

PUT /<bucket name>?notification HTTP/1.1

DELETE NOTIFICATION

刪除指定存儲桶上指定或所有notification

DELETE /bucket?notification[=<notification-id>] HTTP/1.1

GET/LIST NOTIFICATION

獲取指定notification,或list 存儲桶上所有的notifications

GET /bucket?notification[=<notification-id>] HTTP/1.1

非S3 兼容Notification

CREATE A NOTIFICATION

PUT /notifications/bucket/<bucket>?topic=<topic-name>[&events=<event>[,<event>]]

DELETE NOTIFICATION INFORMATION

DELETE /notifications/bucket/<bucket>?topic=<topic-name>

LIST NOTIFICATIONS

列出指定存儲桶上的所有關聯事件

GET /notifications/bucket/<bucket>

Subscription

CREATE A SUBSCRIPTION

創建一個訂閱subscription.

PUT /subscriptions/<sub-name>?topic=<topic-name>[?push-endpoint=<endpoint>[&amqp-exchange=<exchange>][&amqp-ack-level=none|broker][&verify-ssl=true|false][&kafka-ack-level=none|broker]]

Request parameters:

  • topic-name: name of topic

  • push-endpoint: 發送推送通知的endpoint的URI,同樣包含三類:http,amqp及kafka

GET SUBSCRIPTION INFORMATION

獲取指定訂閱的信息

GET /subscriptions/<sub-name>

Response:

{
    "user":"",
    "name":"",
    "topic":"",
    "dest":{
        "bucket_name":"",
        "oid_prefix":"",
        "push_endpoint":"",
        "push_endpoint_args":""
    }
    "s3_id":""
}

DELETE SUBSCRIPTION

刪除指定訂閱subscription.

DELETE /subscriptions/<sub-name>

Events

PULL EVENTS

拉取指定訂閱(sub)的關聯事件.

GET /subscriptions/<sub-name>?events[&max-entries=<max-entries>][&marker=<marker>]

Request parameters:

  • marker: 事件列表的page number,若未指定則從最早的事件開始

  • max-entries: 返回的事件數的最大值,默認是100

ACK EVENT

對事件確認,確認後的事件會從訂閱歷史中刪除。

POST /subscriptions/<sub-name>?ack&event-id=<event-id>

Request parameters:

  • event-id: 待確認的事件id

Pub-Sub Module 的實現

代碼組織架構

可以將Pub-Sub 模塊的代碼大致分爲10個部分,PubSub module各子模塊的架構層次如下:

0. sync service

RGW 服務化後,分出了sync module service。

具體請見sync module 通用部分解析。

services/svc_sync_modules.h
services/svc_sync_modules.cc
 
class RGWSI_SyncModules : public RGWServiceInstance

1. sync module 基類

sync module 的同步模塊實例化基類,同步模塊管理基類,同步模塊同步處理基類,負責各子模塊註冊,各子模塊繼承:

rgw_sync_module.h
rgw_sync_module.cc

class RGWDataSyncModule 
class RGWSyncModuleInstance // sync module 實例基類
class RGWSyncModule
class RGWSyncModulesManager
class RGWStatRemoteObjCBCR : public RGWCoroutine
class RGWCallStatRemoteObjCR : public RGWCoroutine
void rgw_register_sync_modules(RGWSyncModulesManager *modules_manager);

2. pubsub submodule 實例化及管理

pubsub submodule管理的實例化及pubsub submodule管理,繼承自rgw_sync_module

rgw_sync_module_pubsub.h
rgw_sync_module_pubsub.cc

class RGWPSSyncModule : public RGWSyncModule // 獲取RGWPSSyncModuleInstance 實例
class RGWPSSyncModuleInstance : public RGWSyncModuleInstance // 
class RGWPSDataSyncModule : public RGWDataSyncModule // 數據同步實現,實際是生成多種Coroutine協程函數對象。

struct PSConfig // pubsub 基本配置,包含:s3用戶,topics list,subs list等
struct PSTopicConfig // topic 配置:topic name,相關subs list
struct PSSubConfig // sub 配置,包含:預推送的endpoint的信息,相關topic 等
struct PSNotificationConfig // notification 配置:相關topic
struct objstore_event // 對象存儲事件定義:id,bucket,obj,mtime,attrs list 等
class PSEvent
class RGWSingletonCR
class PSSubscription
class PSManager
class RGWPSFindBucketTopicsCR : public RGWCoroutine
class RGWPSHandleObjEventCR : public RGWCoroutine
class RGWPSHandleRemoteObjCR : public RGWCallStatRemoteObjCR 
class RGWPSGenericObjEventCBCR : public RGWCoroutine

class RGWPSDataSyncModule : public RGWDataSyncModule

3. pubsub op 實現

pubsub op 實現(在新增這2個文件之前,這部分放在rgw_sync_module_pubsub_rest中實現的)

rgw_rest_pubsub_common.h
rgw_rest_pubsub_common.cc

// create a topic
class RGWPSCreateTopicOp : public RGWDefaultResponseOp
// list all topics
class RGWPSListTopicsOp : public RGWOp
// get topic information
class RGWPSGetTopicOp : public RGWOp
// delete a topic
class RGWPSDeleteTopicOp : public RGWDefaultResponseOp
// create a subscription
class RGWPSCreateSubOp : public RGWDefaultResponseOp
// get subscription information (including push-endpoint if exist)
class RGWPSGetSubOp : public RGWOp
// delete subscription
class RGWPSDeleteSubOp : public RGWDefaultResponseOp
// acking of an event
class RGWPSAckSubEventOp : public RGWDefaultResponseOp
// fetching events from a subscription
// dpending on whether the subscription was created via s3 compliant API or not
// the matching events will be returned
class RGWPSPullSubEventsOp : public RGWOp
// notification creation
class RGWPSCreateNotifOp : public RGWDefaultResponseOp
// delete a notification
class RGWPSDeleteNotifOp : public RGWDefaultResponseOp
// get topics/notifications on a bucket
class RGWPSListNotifsOp : public RGWOp

4. pubsub rest api

pubsub rest api,繼承自rgw_rest_pubsub_common.h 中類資源

rgw_sync_module_pubsub_rest.h
rgw_sync_module_pubsub_rest.cc

// command: PUT /topics/<topic-name>[&push-endpoint=<endpoint>[&<arg1>=<value1>]]
class RGWPSCreateTopic_ObjStore : public RGWPSCreateTopicOp
// command: GET /topics
class RGWPSListTopics_ObjStore : public RGWPSListTopicsOp
// command: GET /topics/<topic-name>
class RGWPSGetTopic_ObjStore : public RGWPSGetTopicOp
// command: DELETE /topics/<topic-name>
class RGWPSDeleteTopic_ObjStore : public RGWPSDeleteTopicOp
// ceph specifc topics handler factory
class RGWHandler_REST_PSTopic : public RGWHandler_REST_S3
// command: PUT /subscriptions/<sub-name>?topic=<topic-name>[&push-endpoint=<endpoint>[&<arg1>=<value1>]]...
class RGWPSCreateSub_ObjStore : public RGWPSCreateSubOp
// command: GET /subscriptions/<sub-name>
class RGWPSGetSub_ObjStore : public RGWPSGetSubOp
// command: DELETE /subscriptions/<sub-name>
class RGWPSDeleteSub_ObjStore : public RGWPSDeleteSubOp
....

5. pubsub 資源讀寫實現

pubsub 相關資源的增、刪、查實現:數據結構定義,方法定義

rgw_pubsub.h
rgw_pubsub.cc

struct rgw_pubsub_event; // 事件
struct rgw_pubsub_sub_dest; // 訂閱目標位置
struct rgw_pubsub_sub_config; // 訂閱配置
struct rgw_pubsub_topic // 主題
struct rgw_pubsub_topic_subs // 主題相關的訂閱列表
struct rgw_pubsub_bucket_topics // 存儲桶相關的主題列表
struct rgw_pubsub_user_topics // 用戶的主題列表

// pubsub 相關方法定義
class RGWUserPubSub:{
  class Bucke;
  class Sub}

6. pubsub 推送實現

pubsub 推送實現,endpoint 定義(http,amqp,kafka...),在pubsub notification 中的publish() 中會用到:

rgw_pubsub_push.h
rgw_pubsub_push.cc

class RGWPubSubEndpoint // endpoint 基類
class RGWPubSubHTTPEndpoint : public RGWPubSubEndpoint // HTTP endpoint 實現
class RGWPubSubAMQPEndpoint : public RGWPubSubEndpoint { // AMQP endpoint 實現
  class NoAckPublishCR : public RGWCoroutine
  class AckPublishCR : public RGWCoroutine, public RGWIOProvider
}
class RGWPubSubKafkaEndpoint : public RGWPubSubEndpoin //kafka endpoint 實現

7. pubsub: allow pubsub REST API on master

rgw_rest_pubsub.h
rgw_rest_pubsub.cc

class RGWHandler_REST_PSNotifs_S3 : public RGWHandler_REST_S3
class RGWHandler_REST_PSTopic_AWS : public RGWHandler_REST
class RGWPSCreateTopic_ObjStore_AWS : public RGWPSCreateTopicOp
class RGWPSListTopics_ObjStore_AWS : public RGWPSListTopicsOp

8. publish notification

發佈函數: int publish(const req_state* s,
                                             const ceph::real_time& mtime,
                                             const std::string& etag,
                                             EventType event_type,
                                             rgw::sal::RGWRadosStore* store);
rgw_notify.h
rgw_notify.cc

rgw_notify_event_types.h
rgw_notify_event_types.cc 
 
// 事件類型
  enum EventType {
    ObjectCreated                        = 0xF,
    ObjectCreatedPut                     = 0x1,
    ObjectCreatedPost                    = 0x2,
    ObjectCreatedCopy                    = 0x4,
    ObjectCreatedCompleteMultipartUpload = 0x8,
    ObjectRemoved                        = 0xF0,
    ObjectRemovedDelete                  = 0x10,
    ObjectRemovedDeleteMarkerCreated     = 0x20,
    UnknownEvent                         = 0x100
  };

rgw_op.cc: 在各個OP 執行的末尾通過rgw::notify::publish()向notification manager發送請求
rgw_rest_s3.cc: 獲取pubsub OP

9. 配套資源

rgw_arn.h: AWS resource namespace,詳見[其他-ARN]
rgw_amqp.h: amqp resource
rgw_kafka.h: kafka resource

發佈-訂閱實現

發佈及訂閱發起

首先是要配置發佈和訂閱,這一部分較簡單。

主要通過radosgw-admin CLI 和 HTTP API 調用實現主題、訂閱的創建。

以CLI 創建訂閱爲例,其他路徑類似。

在rgw_admin.cc 中會直接調到rgw_pubsub.cc。前面已經說明,rgw_pubsub.cc 實現了pubsub 相關資源(topic、subscription等等)的增、刪、查實現,包含數據結構定義和方法定義等。

  if (opt_cmd == OPT_PUBSUB_SUB_CREATE) {
    if (get_tier_type(store) != "pubsub") {
      cerr << "ERROR: only pubsub tier type supports this command" << std::endl;
      return EINVAL;
    }
    ...

    rgw_pubsub_topic_subs topic;
    int ret = ups.get_topic(topic_name, &topic);
    ...

    rgw_pubsub_sub_dest dest_config;
    dest_config.bucket_name = sub_dest_bucket;
    dest_config.oid_prefix = sub_oid_prefix;
    dest_config.push_endpoint = sub_push_endpoint;

    auto psmodule = static_cast<RGWPSSyncModuleInstance *>(store->getRados()->get_sync_module().get());
    auto conf = psmodule->get_effective_conf();
    ...
    auto sub = ups.get_sub(sub_name);
    ret = sub->subscribe(topic_name, dest_config); // 寫入配置的subscription 信息
    ...
  }

訂閱配置處理

int RGWUserPubSub::Sub::subscribe(const string& topic, const rgw_pubsub_sub_dest& dest, const std::string& s3_id)
{
  RGWObjVersionTracker user_objv_tracker;
  rgw_pubsub_user_topics topics;
  rgw::sal::RGWRadosStore *store = ps->store;
 
  int ret = ps->read_user_topics(&topics, &user_objv_tracker);
  if (ret < 0) {
    ldout(store->ctx(), 1) << "ERROR: failed to read topics info: ret=" << ret << dendl;
    return ret != -ENOENT ? ret : -EINVAL;
  }
  auto iter = topics.topics.find(topic);
  ...
  auto& t = iter->second;
  rgw_pubsub_sub_config sub_conf;
  sub_conf.user = ps->user;
  sub_conf.name = sub;
  sub_conf.topic = topic;
  sub_conf.dest = dest;
  sub_conf.s3_id = s3_id;
  t.subs.insert(sub);
  ret = ps->write_user_topics(topics, &user_objv_tracker);
  if (ret < 0) {
    ldout(store->ctx(), 1) << "ERROR: failed to write topics info: ret=" << ret << dendl;
    return ret;
  }
  // 向當前用戶的pubsub 上下文中加入配置的訂閱信息
  ret = write_sub(sub_conf, nullptr);
  if (ret < 0) {
    ldout(store->ctx(), 1) << "ERROR: failed to write subscription info: ret=" << ret << dendl;
    return ret;
  }
  return 0;
}

發佈及訂閱的事件處理

接着sync module 的通用部分處理,從 rgw service 初始化開始:

  1. sync module 服務實例化:RGWSI_SyncModules,並生成data handler,data_handler:RGWPSDataSyncModule,包含init(),start_sync(),sync_object(),remove_object(),create_delete_marker()
  2. 已啓動的data sync 線程啓動RGWDataSyncCR 協程,該協程嘗試獲取data handler。
  3. data_handler 開始同步。
  4. 對topics->subs 下的事件處理:存儲事件對象和推送至遠端(http,amqp,kafka endpoint)。

1.sync module 服務實例化並生成data handler

先看下svc_sync_modules.h

class RGWSI_SyncModules : public RGWServiceInstance
{
  RGWSyncModulesManager *sync_modules_manager{nullptr};
  RGWSyncModuleInstanceRef sync_module;
 
  struct Svc {
    RGWSI_Zone *zone{nullptr};
  } svc;
public:
  RGWSI_SyncModules(CephContext *cct): RGWServiceInstance(cct) {}
  ~RGWSI_SyncModules();
  RGWSyncModulesManager *get_manager() {
    return sync_modules_manager;
  }
  void init(RGWSI_Zone *zone_svc);
  int do_start() override;
  RGWSyncModuleInstanceRef& get_sync_module() { return sync_module; }
};
 
// 初始化結束後,會調用已註冊的sync_modules_manager 來創建對應的sync_module 實例
int RGWSI_SyncModules::do_start()
{
  auto& zone_public_config = svc.zone->get_zone();
  // 創建對應實例
  int ret = sync_modules_manager->create_instance(cct, zone_public_config.tier_type, svc.zone->get_zone_params().tier_config, &sync_module);
  ...
    return ret;
  }
...
  return 0;
}

生成data_handler

然後是調到RGWPSSyncModule::create_instance(), 其實就是生成RGWPSSyncModuleInstance對象。獲取到data handler。

rgw_sync_module_pubsub.cc 

int RGWPSSyncModule::create_instance(CephContext *cct, const JSONFormattable& config, RGWSyncModuleInstanceRef *instance) {
  instance->reset(new RGWPSSyncModuleInstance(cct, config));
  return 0;
}

// 看一下RGWPSSyncModuleInstance 構造函數都做了什麼
RGWPSSyncModuleInstance::RGWPSSyncModuleInstance(CephContext *cct, const JSONFormattable& config)
{
  // 非常重要的一步,生成data_handler
  // 後續的數據同步操作主要由data_handler 處理:init(), start_sync(), sync_object(), remove_object(), create_delete_marker()
  data_handler = std::unique_ptr<RGWPSDataSyncModule>(new RGWPSDataSyncModule(cct, config));
  string jconf = json_str("conf", *data_handler->get_conf());
  JSONParser p;
  if (!p.parse(jconf.c_str(), jconf.size())) {
    ldout(cct, 1) << "ERROR: failed to parse sync module effective conf: " << jconf << dendl;
    effective_conf = config;
  } else {
    effective_conf.decode_json(&p);
  }
// 以下是按照配置生成AMQP或kafka endpoint
#ifdef WITH_RADOSGW_AMQP_ENDPOINT
  if (!rgw::amqp::init(cct)) {
    ldout(cct, 1) << "ERROR: failed to initialize AMQP manager in pubsub sync module" << dendl;
  }
#endif
#ifdef WITH_RADOSGW_KAFKA_ENDPOINT
  if (!rgw::kafka::init(cct)) {
    ldout(cct, 1) << "ERROR: failed to initialize Kafka manager in pubsub sync module" << dendl;
  }
#endif
}

2.獲取data handler,並開始同步

那麼handler 是怎麼工作的呢,RGWDataSyncCR 工作方式也是一個函數對象封裝的協程。

RGWDataSyncCR 協程的啓動可見sync module 部分。

class RGWDataSyncCR : public RGWCoroutine {
  RGWDataSyncEnv *sync_env;
  uint32_t num_shards;
  rgw_data_sync_status sync_status;
  ...
public:
  RGWDataSyncCR(RGWDataSyncEnv *_sync_env, uint32_t _num_shards, RGWSyncTraceNodeRef& _tn, bool *_reset_backoff);
  ~RGWDataSyncCR() override;

  int operate() override {
    reenter(this) {
      // 獲取同步狀態
      yield call(new RGWReadDataSyncStatusCoroutine(sync_env, &sync_status));
      // 獲取data handler
      data_sync_module = sync_env->sync_module->get_data_handler();
      ...
      //同步狀態初始化
      if ((rgw_data_sync_info::SyncState)sync_status.sync_info.state == rgw_data_sync_info::StateInit) {
        tn->log(20, SSTR("init"));
        sync_status.sync_info.num_shards = num_shards;
        uint64_t instance_id;
        instance_id = ceph::util::generate_random_number<uint64_t>();
        yield call(new RGWInitDataSyncStatusCoroutine(sync_env, num_shards, instance_id, tn, &sync_status));
        if (retcode < 0) {
          tn->log(0, SSTR("ERROR: failed to init sync, retcode=" << retcode));
          return set_cr_error(retcode);
        }
        // sets state = StateBuildingFullSyncMaps

        *reset_backoff = true;
      }
      //data handler 初始化
      data_sync_module->init(sync_env, sync_status.sync_info.instance_id);

      // fullsync時的同步狀態更新
      if  ((rgw_data_sync_info::SyncState)sync_status.sync_info.state == rgw_data_sync_info::StateBuildingFullSyncMaps) {
        tn->log(10, SSTR("building full sync maps"));
        /* call sync module init here */
        sync_status.sync_info.num_shards = num_shards;
        yield call(data_sync_module->init_sync(sync_env));
        if (retcode < 0) {
          tn->log(0, SSTR("ERROR: sync module init_sync() failed, retcode=" << retcode));
          return set_cr_error(retcode);
        }
        /* state: building full sync maps */
        yield call(new RGWListBucketIndexesCR(sync_env, &sync_status));
        if (retcode < 0) {
          tn->log(0, SSTR("ERROR: failed to build full sync maps, retcode=" << retcode));
          return set_cr_error(retcode);
        }
        sync_status.sync_info.state = rgw_data_sync_info::StateSync;

        /* update new state */
        yield call(set_sync_info_cr());
        if (retcode < 0) {
          tn->log(0, SSTR("ERROR: failed to write sync status, retcode=" << retcode));
          return set_cr_error(retcode);
        }

        *reset_backoff = true;
      }
      // 調用子類的start_sync()
      yield call(data_sync_module->start_sync(sync_env));

      // 同步中的分片處理
      yield {
        if  ((rgw_data_sync_info::SyncState)sync_status.sync_info.state == rgw_data_sync_info::StateSync) {
          tn->log(10, SSTR("spawning " << num_shards << " shards sync"));
          for (map<uint32_t, rgw_data_sync_marker>::iterator iter = sync_status.sync_markers.begin();
               iter != sync_status.sync_markers.end(); ++iter) {
            RGWDataSyncShardControlCR *cr = new RGWDataSyncShardControlCR(sync_env, sync_env->store->svc()->zone->get_zone_params().log_pool,
                                                                          iter->first, iter->second, tn);
            ...
          }
        }
      }

      return set_cr_done();
    }
    return 0;
  }
...
};

data handler 內部的處理如下:

// 在構造函數中,主要做兩件事:
//  1.賦值 env, conf
//  2.初始化env
  RGWPSDataSyncModule(CephContext *cct, const JSONFormattable& config) : env(std::make_shared<PSEnv>()), conf(env->conf) {
    env->init(cct, config);
  }
// PSConfigRef& conf 包含了當前pubsub 的基本配置
// PSEnv 結構如下:
struct PSEnv {
  PSConfigRef conf;
  shared_ptr<RGWUserInfo> data_user_info;
  PSManagerRef manager; //其中包含一個訂閱相關的類class GetSubCR,可作爲函數對象的coroutine
  PSEnv() : conf(make_shared<PSConfig>()),
            data_user_info(make_shared<RGWUserInfo>()) {}
  void init(CephContext *cct, const JSONFormattable& config) {
    conf->init(cct, config); // 初始化之前賦值的PSConf 信息
  }
}
 
// 除了構造函數,RGWPSDataSyncModule 還有幾個成員函數
// 這幾個成員函數用於數據同步處理,而具體處理邏輯是在返回的函數對象中。
void init(RGWDataSyncEnv *sync_env, uint64_t instance_id);
RGWCoroutine *start_sync(RGWDataSyncEnv *sync_env);
RGWCoroutine *sync_object(RGWDataSyncEnv *sync_env, RGWBucketInfo& bucket_info, 
      rgw_obj_key& key, std::optional<uint64_t> versioned_epoch, rgw_zone_set *zones_trace) override {
    ldout(sync_env->cct, 10) << conf->id << ": sync_object: b=" << bucket_info.bucket << 
          " k=" << key << " versioned_epoch=" << versioned_epoch.value_or(0) << dendl;
    return new RGWPSHandleObjCreateCR(sync_env, bucket_info, key, env, versioned_epoch); // 返回RGWPSHandleObjCreateCR 函數對象
  }
RGWCoroutine *remove_object(...);
RGWCoroutine *create_delete_marker(...);
在這之前zone 之間的同步已經開始了:

以sync_object()爲例,分析返回的RGWCoroutine 函數對象的執行過程。
RGWPSHandleObjCreateCR 內部封裝一個協程,具體實現就是執行這個協程。
class RGWPSHandleObjCreateCR : public RGWCoroutine {
  ...
public:
  RGWPSHandleObjCreateCR(RGWDataSyncEnv *_sync_env,...
  ~RGWPSHandleObjCreateCR() override {}
  // 這裏重載() ,利用boost::asio::coroutine 創建RGWPSHandleObjCreateCR 協程處理
  int operate() override {
    reenter(this) { // reenter() 域內定義一段協程
      yield call(new RGWPSFindBucketTopicsCR(sync_env, env, bucket_info.owner, //RGWPSFindBucketTopicsCR 也是一個函數對象類,函數對象內部也是協程。用於獲取bucket topics
                                             bucket_info.bucket, key,
                                             rgw::notify::ObjectCreated,
                                             &topics));
      ...
      yield call(new RGWPSHandleRemoteObjCR(sync_env, bucket_info, key, env, versioned_epoch, topics)); // 獲取topics之後sync_object()的處理
      ...
    }
    return 0;
  }
};
  
// 不再是一個函數對象,包含一個回調函數
class RGWPSHandleRemoteObjCR : public RGWCallStatRemoteObjCR {
  PSEnvRef env;
  std::optional<uint64_t> versioned_epoch;
  TopicsRef topics;
public:
  RGWPSHandleRemoteObjCR(RGWDataSyncEnv *_sync_env,
                        RGWBucketInfo& _bucket_info, rgw_obj_key& _key,
                        PSEnvRef _env, std::optional<uint64_t> _versioned_epoch,
                        TopicsRef& _topics) : RGWCallStatRemoteObjCR(_sync_env, _bucket_info, _key),   // 基類 RGWCallStatRemoteObjCR,其中會觸發回調函數allocate_callback()
                                                           env(_env), versioned_epoch(_versioned_epoch),
                                                           topics(_topics) {
  }
 
  ~RGWPSHandleRemoteObjCR() override {}
  // 回調函數中返回一個RGWPSHandleRemoteObjCBCR 函數對象包裝的協程
  // 這個回調函數會覆蓋基類的同名虛函數 RGWCallStatRemoteObjCR::allocate_callback()
  RGWStatRemoteObjCBCR *allocate_callback() override {
    return new RGWPSHandleRemoteObjCBCR(sync_env, bucket_info, key, env, versioned_epoch, topics);
  }
};
 
// coroutine invoked on remote object creation
class RGWPSHandleRemoteObjCBCR : public RGWStatRemoteObjCBCR {
  RGWDataSyncEnv *sync_env;
  PSEnvRef env;
  std::optional<uint64_t> versioned_epoch;
  EventRef<rgw_pubsub_event> event; // ceph event
  EventRef<rgw_pubsub_s3_record> record; // s3 record
  TopicsRef topics;
public:
  RGWPSHandleRemoteObjCBCR(RGWDataSyncEnv *_sync_env,
                          RGWBucketInfo& _bucket_info, rgw_obj_key& _key,
                          PSEnvRef _env, std::optional<uint64_t> _versioned_epoch,
                          TopicsRef& _topics) : RGWStatRemoteObjCBCR(_sync_env, _bucket_info, _key),
                                                                      sync_env(_sync_env),
                                                                      env(_env),
                                                                      versioned_epoch(_versioned_epoch),
                                                                      topics(_topics) {
  }
  int operate() override {
    reenter(this) {
      ldout(sync_env->cct, 20) << ": stat of remote obj: z=" << sync_env->source_zone
                               << " b=" << bucket_info.bucket << " k=" << key << " size=" << size << " mtime=" << mtime
                               << " attrs=" << attrs << dendl;
      {
        std::vector<std::pair<std::string, std::string> > attrs;
        for (auto& attr : attrs) {
          string k = attr.first;
          if (boost::algorithm::starts_with(k, RGW_ATTR_PREFIX)) {
            k = k.substr(sizeof(RGW_ATTR_PREFIX) - 1);
          }
          attrs.push_back(std::make_pair(k, attr.second));
        }
        // at this point we don't know whether we need the ceph event or S3 record
        // this is why both are created here, once we have information about the
        // subscription, we will store/push only the relevant ones
        make_event_ref(sync_env->cct,
                       bucket_info.bucket, key,
                       mtime, &attrs,
                       rgw::notify::ObjectCreated, &event);
        make_s3_record_ref(sync_env->cct,
                       bucket_info.bucket, bucket_info.owner, key,
                       mtime, &attrs,
                       rgw::notify::ObjectCreated, &record);
      }
      // 這裏開始對各個主題及訂閱做處理,是pubsub 事件的具體處理部分
      // 根據訂閱信息,選擇存儲/推送ceph event或s3 record
      yield call(new RGWPSHandleObjEventCR(sync_env, env, bucket_info.owner, event, record, topics));
      if (retcode < 0) {
        return set_cr_error(retcode);
      }
      return set_cr_done();
    }
    return 0;
  }
};

3.topics->subs 下的事件處理

在RGWPSHandleObjEventCR 同樣會由函數對象封裝一段協程。

這裏處理是核心:

- 遍歷存儲桶/對象的所有的topics

- 接着遍歷topic 下的所有subscriptions,針對是否是s3 兼容,分爲兩個部分,分別:

  • 在當前集羣中存儲事件;
  • 推送事件至已配置的endpoints:http endpoint,amqp endpoint,或者kafka endpoint

class RGWPSHandleObjEventCR : public RGWCoroutine {
  ...
public:
  RGWPSHandleObjEventCR(RGWDataSyncEnv* const _sync_env,....
 
  int operate() override {
    reenter(this) {
      ldout(sync_env->cct, 20) << ": handle event: obj: z=" << sync_env->source_zone
                               << " event=" << json_str("event", *event, false)
                               << " owner=" << owner << dendl;
 
      ldout(sync_env->cct, 20) << "pubsub: " << topics->size() << " topics found for path" << dendl;
      
      // outside caller should check that
      ceph_assert(!topics->empty());
 
      if (perfcounter) perfcounter->inc(l_rgw_pubsub_event_triggered);
 
      // loop over all topics related to the bucket/object
      for (titer = topics->begin(); titer != topics->end(); ++titer) {
        ldout(sync_env->cct, 20) << ": notification for " << event->source << ": topic=" <<
          (*titer)->name << ", has " << (*titer)->subs.size() << " subscriptions" << dendl;
        // loop over all subscriptions of the topic
        for (siter = (*titer)->subs.begin(); siter != (*titer)->subs.end(); ++siter) {
          ldout(sync_env->cct, 20) << ": subscription: " << *siter << dendl;
          has_subscriptions = true;
          sub_conf_found = false;
          // try to read subscription configuration from global/user cond
          // configuration is considered missing only if does not exist in either
          for (oiter = owners.begin(); oiter != owners.end(); ++oiter) {
            yield PSManager::call_get_subscription_cr(sync_env, env->manager, this, *oiter, *siter, &sub);
            if (retcode < 0) {
              if (sub_conf_found) {
                // not a real issue, sub conf already found
                retcode = 0;
              }
              last_sub_conf_error = retcode;
              continue;
            }
            sub_conf_found = true;
            // 根據是否訂閱是否是S3 兼容API,分別處理
            if (sub->sub_conf->s3_id.empty()) {
              // subscription was not made by S3 compatible API
              ldout(sync_env->cct, 20) << "storing event for subscription=" << *siter << " owner=" << *oiter << " ret=" << retcode << dendl;
              // 存儲事件:會調用RGWObjectSimplePutCR 協程處理事件對象的存儲(rgw_cr_tools.h rgw_cr_rados.h )
              yield call(PSSubscription::store_event_cr(sync_env, sub, event)); // 非S3 兼容的rgw_pubsub_event event
              if (retcode < 0) {
                if (perfcounter) perfcounter->inc(l_rgw_pubsub_store_fail);
                ldout(sync_env->cct, 1) << "ERROR: failed to store event for subscription=" << *siter << " ret=" << retcode << dendl;
              } else {
                if (perfcounter) perfcounter->inc(l_rgw_pubsub_store_ok);
                event_handled = true;
              }
              if (sub->sub_conf->push_endpoint) {
                ldout(sync_env->cct, 20) << "push event for subscription=" << *siter << " owner=" << *oiter << " ret=" << retcode << dendl;
                // 推送事件至訂閱中配置的endpoint
                yield call(PSSubscription::push_event_cr(sync_env, sub, event)); // 非S3 兼容的rgw_pubsub_event event
                if (retcode < 0) {
                  if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_failed);
                  ldout(sync_env->cct, 1) << "ERROR: failed to push event for subscription=" << *siter << " ret=" << retcode << dendl;
                } else {
                  if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_ok);
                  event_handled = true;
                }
              }
            } else {
              // subscription was made by S3 compatible API
              ldout(sync_env->cct, 20) << "storing record for subscription=" << *siter << " owner=" << *oiter << " ret=" << retcode << dendl;
              record->configurationId = sub->sub_conf->s3_id;
              yield call(PSSubscription::store_event_cr(sync_env, sub, record)); //S3 兼容的rgw_pubsub_s3_record record
              if (retcode < 0) {
                if (perfcounter) perfcounter->inc(l_rgw_pubsub_store_fail);
                ldout(sync_env->cct, 1) << "ERROR: failed to store record for subscription=" << *siter << " ret=" << retcode << dendl;
              } else {
                if (perfcounter) perfcounter->inc(l_rgw_pubsub_store_ok);
                event_handled = true;
              }
              if (sub->sub_conf->push_endpoint) {
                  ldout(sync_env->cct, 20) << "push record for subscription=" << *siter << " owner=" << *oiter << " ret=" << retcode << dendl;
                yield call(PSSubscription::push_event_cr(sync_env, sub, record)); //S3 兼容的rgw_pubsub_s3_record record
                if (retcode < 0) {
                  if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_failed);
                  ldout(sync_env->cct, 1) << "ERROR: failed to push record for subscription=" << *siter << " ret=" << retcode << dendl;
                } else {
                  if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_ok);
                  event_handled = true;
                }
              }
            }
          }
          if (!sub_conf_found) {
            // could not find conf for subscription at user or global levels
            ...
          }
        }
      }
      ....
      return set_cr_done();
    }
    return 0;
  }
};

這裏會根據是否是S3 兼容API 對事件的細節處理也是不一樣的。

rgw_pubsub_s3_record 完全按照AWS S3 Event Message Structure 的標準定義,目前在用的是version 2.1(目前S3 已不再使用ver2.0)。
而非S3兼容的Ceph 自定義的rgw_pubsub_event 就要簡潔很多,僅記錄一些必要信息:event id, event name, 待存儲的事件對象event_obj。
struct rgw_pubsub_s3_record {
  constexpr static const char* const json_type_single = "Record";
  constexpr static const char* const json_type_plural = "Records";
  // 2.1
  std::string eventVersion;
  // aws:s3
  std::string eventSource;
  // zonegroup
  std::string awsRegion;
  // time of the request
  ceph::real_time eventTime;
  // type of the event
  std::string eventName;
  // user that sent the requet (not implemented)
  std::string userIdentity;
  // IP address of source of the request (not implemented)
  std::string sourceIPAddress;
  // request ID (not implemented)
  std::string x_amz_request_id;
  // radosgw that received the request
  std::string x_amz_id_2;
  // 1.0
  std::string s3SchemaVersion;
  // ID received in the notification request
  std::string configurationId;
  // bucket name
  std::string bucket_name;
  // bucket owner (not implemented)
  std::string bucket_ownerIdentity;
  // bucket ARN, ARN 詳細介紹見文末
  std::string bucket_arn;
  // object key
  std::string object_key;
  // object size (not implemented)
  uint64_t object_size;
  // object etag
  std::string object_etag;
  // object version id bucket is versioned
  std::string object_versionId;
  // hexadecimal value used to determine event order for specific key
  std::string object_sequencer;
  // this is an rgw extension (not S3 standard)
  // used to store a globally unique identifier of the event
  // that could be used for acking
  std::string id;
  // this is an rgw extension holding the internal bucket id
  std::string bucket_id;
  // meta data
  std::map<std::string, std::string> x_meta_map;
...
}
S3 兼容的rgw_pubsub_s3_record 中包含AWS S3 Event Message Structure 標準。
目前的標準版本v2.2
{  
   "Records":[  
      {  
         "eventVersion":"2.2",
         "eventSource":"aws:s3",
         "awsRegion":"us-west-2",
         "eventTime":The time, in ISO-8601 format, for example, 1970-01-01T00:00:00.000Z, when Amazon S3 finished processing the request,
         "eventName":"event-type",
         "userIdentity":{  
            "principalId":"Amazon-customer-ID-of-the-user-who-caused-the-event"
         },
         "requestParameters":{  
            "sourceIPAddress":"ip-address-where-request-came-from"
         },
         "responseElements":{  
            "x-amz-request-id":"Amazon S3 generated request ID",
            "x-amz-id-2":"Amazon S3 host that processed the request"
         },
         "s3":{  
            "s3SchemaVersion":"1.0",
            "configurationId":"ID found in the bucket notification configuration",
            "bucket":{  
               "name":"bucket-name",
               "ownerIdentity":{  
                  "principalId":"Amazon-customer-ID-of-the-bucket-owner"
               },
               "arn":"bucket-ARN" // 見ARN 說明
            },
            "object":{  
               "key":"object-key",
               "size":object-size,
               "eTag":"object eTag",
               "versionId":"object version if bucket is versioning-enabled, otherwise null",
               "sequencer": "a string representation of a hexadecimal value used to determine event sequence, 
                   only used with PUTs and DELETEs"
            }
         },
         "glacierEventData": {
            "restoreEventData": {
               "lifecycleRestorationExpiryTime": "The time, in ISO-8601 format, for example, 1970-01-01T00:00:00.000Z, of Restore Expiry",
               "lifecycleRestoreStorageClass": "Source storage class for restore"
            }
         }
      }
   ]
}
可以對比rgw_pubsub_s3_record 和S3標準,RGW 除了多了object_sequencer,其他都是依照S3 標準。
object_sequencer 是一個後面用於確認的全局唯一的事件編號。
 
struct rgw_pubsub_event {
  constexpr static const char* const json_type_single = "event";
  constexpr static const char* const json_type_plural = "events";
  std::string id; // 事件ID
  std::string event_name; // 事件名
  std::string source; // 發生事件的存儲桶+對象:bucket.name + "/" + key.name;
  ceph::real_time timestamp; // 事件發生時間
  JSONFormattable info; // 其實就是struct objstore_event:bucket,key,mtime,attrs 等
}


- 事件存儲會通過rgw_cr_ 相關方式,異步寫入rados 集羣。

  事件會被存儲在特定用戶的特定存儲桶中,且不可以直接訪問,只能通過提供API 訪問事件。 

  特定用戶uid、存儲事件對象的特定存儲桶前綴參數data_oid_prefix 會在zone tier-config 中設置。而且可以通過data_oid_prefix 指定存儲事件對象的前綴。

class PSSubscription {
  class InitCR;
  friend class InitCR;
  friend class RGWPSHandleObjEventCR;

  RGWDataSyncEnv *sync_env;
  PSEnvRef env;
  PSSubConfigRef sub_conf;
  std::shared_ptr<rgw_get_bucket_info_result> get_bucket_info_result;
  RGWBucketInfo *bucket_info{nullptr};
  RGWDataAccessRef data_access;
  RGWDataAccess::BucketRef bucket; // 這個bucket 即爲存儲事件對象的存儲桶

- 事件推送通過RGWPubSubEndpoint::send_to_completion_async() 發送出去。目前支持三類endpoints:

  • RGWPubSubHTTPEndpoint
  • RGWPubSubAMQPEndpoint
  • RGWPubSubKafkaEndpoint
// endpoint base class all endpoint  - types should derive from it
class RGWPubSubEndpoint {
public:
  RGWPubSubEndpoint() = default;
  // endpoint should not be copied
  RGWPubSubEndpoint(const RGWPubSubEndpoint&) = delete;
  const RGWPubSubEndpoint& operator=(const RGWPubSubEndpoint&) = delete;
 
  typedef std::unique_ptr<RGWPubSubEndpoint> Ptr;
 
  // factory method for the actual notification endpoint
  // derived class specific arguments are passed in http args format
  // may throw a configuration_error if creation fails
  static Ptr create(const std::string& endpoint, const std::string& topic, const RGWHTTPArgs& args, CephContext *cct=nullptr);
  
  // this method is used in order to send notification (Ceph specific) and wait for completion
  // in async manner via a coroutine when invoked in the data sync environment
  virtual RGWCoroutine* send_to_completion_async(const rgw_pubsub_event& event, RGWDataSyncEnv* env) = 0;
 
  // this method is used in order to send notification (S3 compliant) and wait for completion
  // in async manner via a coroutine when invoked in the data sync environment
  virtual RGWCoroutine* send_to_completion_async(const rgw_pubsub_s3_record& record, RGWDataSyncEnv* env) = 0;
 
  // this method is used in order to send notification (S3 compliant) and wait for completion
  // in async manner via a coroutine when invoked in the frontend environment
  virtual int send_to_completion_async(CephContext* cct, const rgw_pubsub_s3_record& record, optional_yield y) = 0;
 
  // present as string
  virtual std::string to_str() const { return ""; }
   
  virtual ~RGWPubSubEndpoint() = default;
   
  // exception object for configuration error
  struct configuration_error : public std::logic_error {
    configuration_error(const std::string& what_arg) :
      std::logic_error("pubsub endpoint configuration error: " + what_arg) {}
  };
};

通知實現

這裏講的通知即Bucket Notification。bucket notification 其實就是事件推送,它是兼容S3 事件推送,且不做事件存儲。

test ref: https://github.com/ceph/ceph/pull/28971

主要實現就是rgw_notify.cc 中的rgw::notify::publish()。這個函數會在每個對象變更的OP 的執行末尾處被調用。如下RGWPutObj

void RGWPutObj::execute()
{
  // send request to notification manager
  const auto ret = rgw::notify::publish(s, mtime, etag, rgw::notify::ObjectCreatedPut, store);
  if (ret < 0) {
    ldpp_dout(this, 5) << "WARNING: publishing notification failed, with error: " << ret << dendl;
  // TODO: we should have conf to make send a blocking coroutine and reply with error in case sending failed
  // this should be global conf (probably returnign a different handler)
    // so we don't need to read the configured values before we perform it
  }
}

rgw::notify::publish()

int publish(const req_state* s,
        const ceph::real_time& mtime,
        const std::string& etag,
        EventType event_type,
        rgw::sal::RGWRadosStore* store) {
    RGWUserPubSub ps_user(store, s->user->user_id);
    RGWUserPubSub::Bucket ps_bucket(&ps_user, s->bucket);
    rgw_pubsub_bucket_topics bucket_topics;
    auto rc = ps_bucket.get_topics(&bucket_topics);
    ...
    rgw_pubsub_s3_record record;
    populate_record_from_request(s, mtime, etag, event_type, record);
    bool event_handled = false;
    bool event_should_be_handled = false;
    for (const auto& bucket_topic : bucket_topics.topics) {
        const rgw_pubsub_topic_filter& topic_filter = bucket_topic.second;
        const rgw_pubsub_topic& topic_cfg = topic_filter.topic;
        if (!match(topic_filter, s, event_type)) {
            // topic does not apply to req_state
            continue;
        }
        event_should_be_handled = true;
        record.configurationId = topic_filter.s3_id;
        ...
        try {
            // TODO add endpoint LRU cache
            const auto push_endpoint = RGWPubSubEndpoint::create(topic_cfg.dest.push_endpoint,
                    topic_cfg.dest.arn_topic,
                    RGWHTTPArgs(topic_cfg.dest.push_endpoint_args),
                    s->cct);
            const std::string push_endpoint_str = push_endpoint->to_str();
            ldout(s->cct, 20) << "push endpoint created: " << push_endpoint_str << dendl;
            auto rc = push_endpoint->send_to_completion_async(s->cct, record, s->yield); // 發送事件通知至遠端,跟訂閱主題觸發的部分一樣,目前支持三類:http,amqp,kafka
            ...
            if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_ok);
            ldout(s->cct, 20) << "successfull push to endpoint " << push_endpoint_str << dendl;
            event_handled = true;
        } catch (const RGWPubSubEndpoint::configuration_error& e) {
            ...
        }
    }
 
    if (event_should_be_handled) {
        // not counting events with no notifications or events that are filtered
        // counting a single event, regardless of the number of notifications it sends
        if (perfcounter) perfcounter->inc(l_rgw_pubsub_event_triggered);
        if (!event_handled) {
            // all notifications for this event failed
            if (perfcounter) perfcounter->inc(l_rgw_pubsub_event_lost);
        }
    }
 
    return 0;
}

注:以上代碼解析基於Ceph 社區最新的master 分支(截至2019/11/27)。

其他

ARN

在創建topic 時需要指定topic arn,在拉取事件時,如果返回的是標準S3 event record 時,其中的bucket 也是通過bucket arn 指定的。那什麼是ARN 呢?

ARN(Amazon Resource Name)用來唯一標識 AWS 資源。要在 AWS 全局環境中(比如 IAM 策略、Amazon Relational Database Service (Amazon RDS) 標籤和 API 調用中)明確指定一項資源時,必須使用 ARN。

ARN 格式

以下是 ARN 的一般格式;所用的具體組成部分和值取決於 AWS 服務。對應rgw_arn.h rgw_arn.cc 中定義了ARN。

arn:<partition>:<service>:<region>:<account-id>:<resource-id>
arn:<partition>:<service>:<region>:<account-id>:<resource-type>/<resource-id>
arn:<partition>:<service>:<region>:<account-id>:<resource-type>:<resource-id>
- partition

資源所處的分區。對於標準 AWS 區域,分區是 aws。如果資源位於其他分區,則分區是 aws-partitionname。例如,位於 中國(北京) 區域的資源的分區爲 aws-cn

RGW 實現支持以下partition:

enum struct Partition {
  aws, aws_cn, aws_us_gov, wildcard
  // If we wanted our own ARNs for principal type unique to us
  // (maybe to integrate better with Swift) or for anything else we
  // provide that doesn't map onto S3, we could add an 'rgw'
  // partition type.
};

service

標識 AWS 產品(例如,Amazon S3、IAM 或 Amazon RDS)的服務命名空間。

RGW 實現支持以下service:

enum struct Service {
  apigateway, appstream, artifact, autoscaling, aws_portal, acm,
  cloudformation, cloudfront, cloudhsm, cloudsearch, cloudtrail,
  cloudwatch, events, logs, codebuild, codecommit, codedeploy,
  codepipeline, cognito_idp, cognito_identity, cognito_sync,
  config, datapipeline, dms, devicefarm, directconnect,
  ds, dynamodb, ec2, ecr, ecs, ssm, elasticbeanstalk, elasticfilesystem,
  elasticloadbalancing, elasticmapreduce, elastictranscoder, elasticache,
  es, gamelift, glacier, health, iam, importexport, inspector, iot,
  kms, kinesisanalytics, firehose, kinesis, lambda, lightsail,
  machinelearning, aws_marketplace, aws_marketplace_management,
  mobileanalytics, mobilehub, opsworks, opsworks_cm, polly,
  redshift, rds, route53, route53domains, sts, servicecatalog,
  ses, sns, sqs, s3, swf, sdb, states, storagegateway, support,
  trustedadvisor, waf, workmail, workspaces, wildcard
};
- region

資源所在的區域。一些資源的 ARN 不需要區域,因此,該組成部分可能會被省略。

- account-id

擁有資源的 AWS 賬戶的 ID(不含連字符)。例如:123456789012。一些資源的 ARN 不需要賬號,因此,該組成部分可能會被省略。

- resource-type 或resource-id

ARN 這部分的內容因服務而異。資源標識符可以是資源的名稱或 ID(例如,user/Bob 或 instance/i-1234567890abcdef0)或資源路徑。例如,某些資源標識符包括父資源 (sub-resource-type/parent-resource/sub-resource) 或限定符(例如版本)(resource-type:resource-name:qualifier)。

rgw 目前的實現要求:

合法的Resource 格式 (only resource part):
 * 'resource'
 * 'resourcetype/resource'
 * 'resourcetype/resource/qualifier'
 * 'resourcetype/resource:qualifier'
 * 'resourcetype:resource'
 * 'resourcetype:resource:qualifier'
 
注:'resourceType'不允許使用通配符。像如下這樣是不合法的:
   arn:aws:iam::123456789012:u*

下面是一個S3 存儲桶的ARN,其中第二個ARN 包含路徑 /Development/

arn:aws:s3:::my_corporate_bucket/*
arn:aws:s3:::my_corporate_bucket/Development/*

Resource ARNs

ref: https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazons3.html#amazons3-resources-for-iam-policies

Q&A

1.Notification 有兩類API,兩類API 有何區別?爲什麼要有兩類API?

答:Notification 有兩類API:S3兼容和非S3 兼容。S3 兼容API 依照AWS S3 Bucket Notification 標準,將notification 當做存儲桶的屬性,可參考:s3/bucketops/#create-notification。而非S3 兼容的API 如 PUT /notifications/bucket/<bucket> , notification 和存儲桶關聯,但不是存儲桶的subresource。

    在pub-sub module 中,非兼容notification API 和topic、subscription 等的API 風格一致,都是直接作爲根資源,這也就是非兼容notification API 的存在原因。而S3 兼容API 主要用於AWS Bucket Notification 的Ceph 實現(Bucket Notification),當然需要兼容S3 API 標準。 

2.Notification 的刪除邏輯

答:1)直接刪除notification 會刪除subscription。2)刪除bucket 會刪除notification ,但是不會刪除notification 對應的subscription(需要顯式刪除subscription)。

3.Subscription 和 Notification 的區別和聯繫?

答:Subscription 跟topic 的關係和 Notification 跟topic 的關係都是多對一的關係:一個主題可以被多個訂閱,一個主題也可以有多個通知。

    1)創建notification 時會創建和notification id 同名的subscription,且該sub 可使用sub 相關api 訪問。2)刪除邏輯不同,刪除subscription 直接調用其刪除API 即可,notification 刪除見上個問題。

4.事件的存儲

答:pub-sub module 中會將事件以對象的形式存儲在特定用戶的特定存儲桶中(可在zone tier-config 中設置),且該事件對象不可以直接訪問,需要通過Pub-Sub 的REST API 訪問。細節可見 發佈與訂閱事件處理 3.topics->subs 下的事件處理 中事件存儲部分。

5.master zone 和pub-sub zone 之間的同步

答:zone 直接的同步會在data sync 線程跑到RGWDataSyncCR 協程中就會進行, 對各個bucket shard 同步進行處理,而同步又分full sync 和incremental sync,這部分主要流程大致爲:

6. 目前pubsub module 的CLI 問題

答:目前pubsub module的CLI 未給出命令說明,且存在問題,拉取events 列表會觸發core dump

目前發現是sub pull 拉取events 列表時獲取sub 的問題

 if (opt_cmd == OPT_PUBSUB_SUB_PULL) {
    ...
    //auto sub = ups.get_sub(sub_name); 
    auto sub = ups.get_sub_with_events(sub_name);
    ret = sub->list_events(marker, max_entries);
    if (ret < 0) {
      cerr << "ERROR: could not list events: " << cpp_strerror(-ret) << std::endl;
      return -ret;
    }
    encode_json("result", *sub, formatter);
    formatter->flush(cout);
 }
在subscription 創建時也要改
 if (opt_cmd == OPT_PUBSUB_SUB_PULL) {
    ...
    //auto sub = ups.get_sub(sub_name);
    auto sub = ups.get_sub_with_events(sub_name);
    ret = sub->subscribe(topic_name, dest_config);
    ...
 }

原先調用get_sub() 獲取到的是基類sub,造成後面調用list_events() 也是調用的基類函數,觸發core dump。

Reference

 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章