【转载】controller-runtime之manager的实现

介绍

在controller-runtime中使用一个 Manager 的接口来管理 Controller,除了控制器其实还可以管理A dmission Webhook,也包括访问资源对象的client、cache、scheme等,如下图所示:

image

Manager 如何使用

首先我们来看看controller-runtime中的Manager 是如何使用的,查看controller-runtime代码仓库中的示例,示例中关于Manager的使用步骤如下:

1、实例化 manager,参数 config

2、向 manager 添加 scheme

3、向 manager 添加 controller, 该 controller 包含一个 reconciler 结构体,我们需要在 reconciler 结构体实现逻辑处理

4、向 manager 添加 webhook,同样需要实现逻辑处理

5、启动 manager.start()

代码如下所示:

func main() {
    ctrl.SetLogger(zap.New())

    // 根据 config 实例化 Manager
    // config.GetConfigOrDie() 使用默认的配置~/.kube/config
    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{})
    if err != nil {
        setupLog.Error(err, "unable to start manager")
        os.Exit(1)
    }

    // in a real controller, we'd create a new scheme for this
    // 将 api 注册到 Scheme,Scheme 提供了 GVK 到 go type 的映射
    // 如果多个 crd, 需要多次调用 AddToScheme
    err = api.AddToScheme(mgr.GetScheme())
    if err != nil {
        setupLog.Error(err, "unable to add scheme")
        os.Exit(1)
    }

    // 注册 Controller 到 Manager
    // For: 监控的资源,相当于调用 Watches(&source.Kind{Type:apiType},&handler.EnqueueRequestFOrObject{})
    // Owns:拥有的下属资源,如果 corev1.Pod{} 资源属于 api.ChaosPod{},也将会被监控,相当于调用 Watches(&source.Kind{Type: <ForType-apiType>}, &handler.EnqueueRequestForOwner{OwnerType: apiType, IsController: true})
    // reconciler 结构体:继承 Reconciler,需要实现该结构体和 Reconcile 方法
    // mgr.GetClient()、mgr.GetScheme() 是 Client 和 Scheme,前面的 manager.New 初始化了
    err = ctrl.NewControllerManagedBy(mgr).
        For(&api.ChaosPod{}).
        Owns(&corev1.Pod{}).
        Complete(&reconciler{
            Client: mgr.GetClient(),
            scheme: mgr.GetScheme(),
        })
    if err != nil {
        setupLog.Error(err, "unable to create controller")
        os.Exit(1)
    }

    // 构建 webhook
    err = ctrl.NewWebhookManagedBy(mgr).
        For(&api.ChaosPod{}).
        Complete()
    if err != nil {
        setupLog.Error(err, "unable to create webhook")
        os.Exit(1)
    }

    // 启动 manager,实际上是启动 controller
    setupLog.Info("starting manager")
    if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
        setupLog.Error(err, "problem running manager")
        os.Exit(1)
    }
}

Manager是一个用于初始化共享依赖关系的结构,接口定义如下所示:

// pkg/manager/manager.go

// Manager initializes shared dependencies such as Caches and Clients, and provides them to Runnables.
// A Manager is required to create Controllers.
// Manager 初始化共享的依赖关系,比如Caches 和 Client,并将它们提供给 Runnables
type Manager interface {
    // Cluster holds a variety of methods to interact with a cluster.
    // Cluster 拥有多种与集群交互的方法
    cluster.Cluster

    // Add will set requested dependencies on the component, and cause the component to be
    // started when Start is called.  Add will inject any dependencies for which the argument
    // implements the inject interface - e.g. inject.Client.
    // Depending on if a Runnable implements LeaderElectionRunnable interface, a Runnable can be run in either
    // non-leaderelection mode (always running) or leader election mode (managed by leader election if enabled).
    // Add 将在组建上设置所需的依赖关系,并在调用 Start 时启动组件
    // Add 将注入接口的依赖关系,比如:注入inject.Client
    // 根据 Runnable 是否实现了 LeaderElectionRunnable 接口判断
    // Runnable 可以在非 LeaderElection 模式(始终运行)或 LeaderElection 模式(如果启用了 LeaderElection,则由 LeaderElection 管理)下运行
    Add(Runnable) error

    // Elected is closed when this manager is elected leader of a group of
    // managers, either because it won a leader election or because no leader
    // election was configured.
    // leader 选举
    // 当赢得选举或者为设置选举则关闭
    Elected() <-chan struct{}

    // AddMetricsExtraHandler adds an extra handler served on path to the http server that serves metrics.
    // Might be useful to register some diagnostic endpoints e.g. pprof. Note that these endpoints meant to be
    // sensitive and shouldn't be exposed publicly.
    // If the simple path -> handler mapping offered here is not enough, a new http server/listener should be added as
    // Runnable to the manager via Add method.
    AddMetricsExtraHandler(path string, handler http.Handler) error

    // AddHealthzCheck allows you to add Healthz checker
    AddHealthzCheck(name string, check healthz.Checker) error

    // AddReadyzCheck allows you to add Readyz checker
    AddReadyzCheck(name string, check healthz.Checker) error

    // Start starts all registered Controllers and blocks until the context is cancelled.
    // Returns an error if there is an error starting any controller.
    //
    // If LeaderElection is used, the binary must be exited immediately after this returns,
    // otherwise components that need leader election might continue to run after the leader
    // lock was lost.
    // Start 启动所有已注册的控制器,并一直运行,直到停止通道关闭
    // 如果启动任何控制器都出错,则返回错误
    // 如果使用了 LeaderElection,则必须在此返回后立即退出二进制文件
    // 否则需要 Leader 选举的组件可能会在 Leader 锁丢失后继续运行
    Start(ctx context.Context) error

    // GetWebhookServer returns a webhook.Server
    GetWebhookServer() *webhook.Server

    // GetLogger returns this manager's logger.
    GetLogger() logr.Logger

    // GetControllerOptions returns controller global configuration options.
    // GetControllerOptions 控制器全局配置选项
    GetControllerOptions() v1alpha1.ControllerConfigurationSpec
}

Manager 可以关闭 Runnable 的生命周期(添加/启动),如果不通过 Manager 启动(需要处理各种常见的依赖关系)。

Manager 还保持共同的依赖性:client、cache、scheme等。

  • 提供了 getter(例如GetClient)
  • 简单的注入机制(runtime/inject

此外还支持领导人选举,只需用选项指定即可,还提供了一个用于优雅关闭的信号处理程序。

image

Manager 实例化

查看 Manager 的实例化 New 函数的实现:

// pkg/manager/manager.go

// New returns a new Manager for creating Controllers.
// New 返回用于创建控制器的新 Manager
func New(config *rest.Config, options Options) (Manager, error) {
    // Set default values for options fields
    // 设置选项字段的默认值
    options = setOptionsDefaults(options)

    // 构造集群
    cluster, err := cluster.New(config, func(clusterOptions *cluster.Options) {
        clusterOptions.Scheme = options.Scheme
        clusterOptions.MapperProvider = options.MapperProvider
        clusterOptions.Logger = options.Logger
        clusterOptions.SyncPeriod = options.SyncPeriod
        clusterOptions.Namespace = options.Namespace
        clusterOptions.NewCache = options.NewCache
        clusterOptions.NewClient = options.NewClient
        clusterOptions.ClientDisableCacheFor = options.ClientDisableCacheFor
        clusterOptions.DryRunClient = options.DryRunClient
        clusterOptions.EventBroadcaster = options.EventBroadcaster //nolint:staticcheck
    })
    if err != nil {
        return nil, err
    }

    // Create the recorder provider to inject event recorders for the components.
    // TODO(directxman12): the log for the event provider should have a context (name, tags, etc) specific
    // to the particular controller that it's being injected into, rather than a generic one like is here.
    recorderProvider, err := options.newRecorderProvider(config, cluster.GetScheme(), options.Logger.WithName("events"), options.makeBroadcaster)
    if err != nil {
        return nil, err
    }

    // Create the resource lock to enable leader election
    var leaderConfig *rest.Config
    var leaderRecorderProvider *intrec.Provider

    if options.LeaderElectionConfig == nil {
        leaderConfig = rest.CopyConfig(config)
        leaderRecorderProvider = recorderProvider
    } else {
        leaderConfig = rest.CopyConfig(options.LeaderElectionConfig)
        leaderRecorderProvider, err = options.newRecorderProvider(leaderConfig, cluster.GetScheme(), options.Logger.WithName("events"), options.makeBroadcaster)
        if err != nil {
            return nil, err
        }
    }

    resourceLock, err := options.newResourceLock(leaderConfig, leaderRecorderProvider, leaderelection.Options{
        LeaderElection:             options.LeaderElection,
        LeaderElectionResourceLock: options.LeaderElectionResourceLock,
        LeaderElectionID:           options.LeaderElectionID,
        LeaderElectionNamespace:    options.LeaderElectionNamespace,
    })
    if err != nil {
        return nil, err
    }

    // Create the metrics listener. This will throw an error if the metrics bind
    // address is invalid or already in use.
    metricsListener, err := options.newMetricsListener(options.MetricsBindAddress)
    if err != nil {
        return nil, err
    }

    // By default we have no extra endpoints to expose on metrics http server.
    metricsExtraHandlers := make(map[string]http.Handler)

    // Create health probes listener. This will throw an error if the bind
    // address is invalid or already in use.
    healthProbeListener, err := options.newHealthProbeListener(options.HealthProbeBindAddress)
    if err != nil {
        return nil, err
    }

    errChan := make(chan error)
    runnables := newRunnables(options.BaseContext, errChan)

    return &controllerManager{
        stopProcedureEngaged:          pointer.Int64(0),
        cluster:                       cluster,
        runnables:                     runnables,
        errChan:                       errChan,
        recorderProvider:              recorderProvider,
        resourceLock:                  resourceLock,
        metricsListener:               metricsListener,
        metricsExtraHandlers:          metricsExtraHandlers,
        controllerOptions:             options.Controller,
        logger:                        options.Logger,
        elected:                       make(chan struct{}),
        port:                          options.Port,
        host:                          options.Host,
        certDir:                       options.CertDir,
        webhookServer:                 options.WebhookServer,
        leaseDuration:                 *options.LeaseDuration,
        renewDeadline:                 *options.RenewDeadline,
        retryPeriod:                   *options.RetryPeriod,
        healthProbeListener:           healthProbeListener,
        readinessEndpointName:         options.ReadinessEndpointName,
        livenessEndpointName:          options.LivenessEndpointName,
        gracefulShutdownTimeout:       *options.GracefulShutdownTimeout,
        internalProceduresStop:        make(chan struct{}),
        leaderElectionStopped:         make(chan struct{}),
        leaderElectionReleaseOnCancel: options.LeaderElectionReleaseOnCancel,
    }, nil
}

New 函数中就是为 Manager 执行初始化工作,最后返回的是一个 controllerManager 的实例,这是因为该结构体是 Manager 接口的一个实现,所以 Manager 的真正操作都是这个结构体去实现的。

接下来最重要的是注册 Controller 到 Manager 的过程:

err = ctrl.NewControllerManagedBy(mgr).
        For(&api.ChaosPod{}).
        Owns(&corev1.Pod{}).
        Complete(&reconciler{
            Client: mgr.GetClient(),
            scheme: mgr.GetScheme(),
        })

builder.ControllerManagedBy 函数返回一个新的控制器构造器 Builder 对象,生成的控制器将由所提供的管理器 Manager 启动,函数实现很简单:

// pkg/builder/controller.go

// Builder builds a Controller.
// Builder 构造一个控制器
type Builder struct {
	forInput         ForInput
	ownsInput        []OwnsInput
	watchesInput     []WatchesInput
	mgr              manager.Manager
	globalPredicates []predicate.Predicate
	ctrl             controller.Controller
	ctrlOptions      controller.Options
	name             string
}

// ControllerManagedBy returns a new controller builder that will be started by the provided Manager.
func ControllerManagedBy(m manager.Manager) *Builder {
	return &Builder{mgr: m}
}

可以看到controller-runtime封装了一个Builder的结构体用来生成Controller,将Manager传递给这个构造器,然后就是调用构造器的For函数了:

// pkg/builder/controller.go

// ForInput represents the information set by For method.
// ForInput 标识 For 方法设置的信息
type ForInput struct {
    object           client.Object
    predicates       []predicate.Predicate
    objectProjection objectProjection
    err              error
}

// For defines the type of Object being *reconciled*, and configures the ControllerManagedBy to respond to create / delete /
// update events by *reconciling the object*.
// This is the equivalent of calling
// Watches(&source.Kind{Type: apiType}, &handler.EnqueueRequestForObject{}).
// For 函数定义了被调谐的对象类型
// 并配置 ControllerManagerBy 通过调谐对象来响应 create/delete/update 事件
// 调用 For 函数相当于调用:
// Watches(&source.Kind{Type: apiType}, &handler.EnqueueRequestForObject{}).
func (blder *Builder) For(object client.Object, opts ...ForOption) *Builder {
    if blder.forInput.object != nil {
        blder.forInput.err = fmt.Errorf("For(...) should only be called once, could not assign multiple objects for reconciliation")
        return blder
    }
    input := ForInput{object: object}
    for _, opt := range opts {
        opt.ApplyToFor(&input)
    }

    blder.forInput = input
    return blder
}

For 函数就是用来定义我们要处理的对象类型的,接着调用了 Owns 函数:

// pkg/builder/controller.go

// Owns defines types of Objects being *generated* by the ControllerManagedBy, and configures the ControllerManagedBy to respond to
// create / delete / update events by *reconciling the owner object*.  This is the equivalent of calling
// Watches(&source.Kind{Type: <ForType-forInput>}, &handler.EnqueueRequestForOwner{OwnerType: apiType, IsController: true}).
// Owns 定义了 ControllerManagerBy 生成的对象类型
// 并配置 ControllerManagerBy 通过调协所有者对象来响应 create/delete/update 事件
// 这相当于调用:
// Watches(&source.Kind{Type: <ForType-forInput>}, &handler.EnqueueRequestForOwner{OwnerType: apiType, IsController: true})
func (blder *Builder) Owns(object client.Object, opts ...OwnsOption) *Builder {
    input := OwnsInput{object: object}
    for _, opt := range opts {
        opt.ApplyToOwns(&input)
    }

    blder.ownsInput = append(blder.ownsInput, input)
    return blder
}

Owns 函数就是来配置我们监听的资源对象的子资源,如果想要协调资源则需要调用 Owns 函数进行配置,然后就是最重要的 Complete 函数了:

// pkg/builder/controller.go

// Build builds the Application Controller and returns the Controller it created.
// Build 构建应用程序 ControllerManagedBy 并返回它创建的 Controller
func (blder *Builder) Build(r reconcile.Reconciler) (controller.Controller, error) {
    if r == nil {
        return nil, fmt.Errorf("must provide a non-nil Reconciler")
    }
    if blder.mgr == nil {
        return nil, fmt.Errorf("must provide a non-nil Manager")
    }
    if blder.forInput.err != nil {
        return nil, blder.forInput.err
    }
    // Checking the reconcile type exist or not
    if blder.forInput.object == nil {
        return nil, fmt.Errorf("must provide an object for reconciliation")
    }

    // Set the ControllerManagedBy
    // 配置 ControllerManagedBy
    if err := blder.doController(r); err != nil {
        return nil, err
    }

    // Set the Watch
    // 设置 Watch
    if err := blder.doWatch(); err != nil {
        return nil, err
    }

    return blder.ctrl, nil
}

Complete 函数通过调用 Build 函数来构建 Controller,其中比较重要的就是 doControllerdoWatch 两个函数,doController 就是去真正实例化 Controller 的函数:

// pkg/builder/controller.go

func (blder *Builder) doController(r reconcile.Reconciler) error {
    globalOpts := blder.mgr.GetControllerOptions()

    ctrlOptions := blder.ctrlOptions
    if ctrlOptions.Reconciler == nil {
        ctrlOptions.Reconciler = r
    }

    // Retrieve the GVK from the object we're reconciling
    // to prepopulate logger information, and to optionally generate a default name.
    // 从我们正在调协的对象中检索GVK
    gvk, err := getGvk(blder.forInput.object, blder.mgr.GetScheme())
    if err != nil {
        return err
    }

    // Setup concurrency.
    // 设置并发
    if ctrlOptions.MaxConcurrentReconciles == 0 {
        groupKind := gvk.GroupKind().String()

        if concurrency, ok := globalOpts.GroupKindConcurrency[groupKind]; ok && concurrency > 0 {
            ctrlOptions.MaxConcurrentReconciles = concurrency
        }
    }

    // Setup cache sync timeout.
    // 设置缓存同步超市时间
    if ctrlOptions.CacheSyncTimeout == 0 && globalOpts.CacheSyncTimeout != nil {
        ctrlOptions.CacheSyncTimeout = *globalOpts.CacheSyncTimeout
    }

    // 根据GVK获取控制器名
    controllerName := blder.getControllerName(gvk)

    // Setup the logger.
    // 设置日志 Logger
    if ctrlOptions.LogConstructor == nil {
        log = blder.mgr.GetLogger().WithValues(
            "controller", controllerName,
            "controllerGroup", gvk.Group,
            "controllerKind", gvk.Kind,
        )

        lowerCamelCaseKind := strings.ToLower(gvk.Kind[:1]) + gvk.Kind[1:]

        ctrlOptions.LogConstructor = func(req *reconcile.Request) logr.Logger {
            log := log
            if req != nil {
                log = log.WithValues(
                    lowerCamelCaseKind, klog.KRef(req.Namespace, req.Name),
                    "namespace", req.Namespace, "name", req.Name,
                )
            }
            return log
        }
    }

    // Build the controller and return.
    // 构造 Controller
    // var newController = controller.New
    blder.ctrl, err = newController(controllerName, blder.mgr, ctrlOptions)
    return err
}

上面的函数通过获取资源对象的 GVK 来获取 Controller 的名称,最后通过一个 newController 函数(controller.New 的别名)来实例化一个真正的 Controller:

// pkg/builder/controller.go

// New returns a new Controller registered with the Manager.  The Manager will ensure that shared Caches have
// been synced before the Controller is Started.
// New 返回一个在 Manager 注册的 Controller
// Manager 将确保共享缓存在控制器启动前已经同步
func New(name string, mgr manager.Manager, options Options) (Controller, error) {
	c, err := NewUnmanaged(name, mgr, options)
	if err != nil {
		return nil, err
	}

	// Add the controller as a Manager components
	// 将 controller 作为 manager 的组件
	return c, mgr.Add(c)
}

// NewUnmanaged returns a new controller without adding it to the manager. The
// caller is responsible for starting the returned controller.
// NewUnmanaged 返回一个新的控制器,而不将其添加到 manager 中
// 调用者负责启动返回的控制器
func NewUnmanaged(name string, mgr manager.Manager, options Options) (Controller, error) {
	if options.Reconciler == nil {
		return nil, fmt.Errorf("must specify Reconciler")
	}

	if len(name) == 0 {
		return nil, fmt.Errorf("must specify Name for Controller")
	}

	if options.LogConstructor == nil {
		log := mgr.GetLogger().WithValues(
			"controller", name,
		)
		options.LogConstructor = func(req *reconcile.Request) logr.Logger {
			log := log
			if req != nil {
				log = log.WithValues(
					"object", klog.KRef(req.Namespace, req.Name),
					"namespace", req.Namespace, "name", req.Name,
				)
			}
			return log
		}
	}

	if options.MaxConcurrentReconciles <= 0 {
		options.MaxConcurrentReconciles = 1
	}

	if options.CacheSyncTimeout == 0 {
		options.CacheSyncTimeout = 2 * time.Minute
	}

	if options.RateLimiter == nil {
		options.RateLimiter = workqueue.DefaultControllerRateLimiter()
	}

	// Inject dependencies into Reconciler
	// 在 Reconciler 中注入依赖关系
	if err := mgr.SetFields(options.Reconciler); err != nil {
		return nil, err
	}

	// Create controller with dependencies set
	// 创建 Controller 并配置依赖关系
	return &controller.Controller{
		Do: options.Reconciler,
		MakeQueue: func() workqueue.RateLimitingInterface {
			return workqueue.NewNamedRateLimitingQueue(options.RateLimiter, name)
		},
		MaxConcurrentReconciles: options.MaxConcurrentReconciles,
		CacheSyncTimeout:        options.CacheSyncTimeout,
		SetFields:               mgr.SetFields,
		Name:                    name,
		LogConstructor:          options.LogConstructor,
		RecoverPanic:            options.RecoverPanic,
	}, nil
}

可以看到NewUnmanaged函数才是真正实例化 Controller 的地方,终于和前文的 Controller 联系起来来,Controller 实例化完成后,又通过 mgr.Add(c) 函数将控制器添加到 Manager 中去进行管理,所以我们还需要去查看下 Manager 的 Add 函数的实现,当然是看 controllerManager 中的具体实现:

// pkg/manager/manager.go

// Runnable allows a component to be started.
// It's very important that Start blocks until
// it's done running.
// Runnable 允许一个组件被启动
type Runnable interface {
    // Start starts running the component.  The component will stop running
    // when the context is closed. Start blocks until the context is closed or
    // an error occurs.
    Start(context.Context) error
}

//  pkg/manager/internal.go

// Add sets dependencies on i, and adds it to the list of Runnables to start.
// Add 设置i的依赖,并将其他添加到 Runnables 列表启动
func (cm *controllerManager) Add(r Runnable) error {
    cm.Lock()
    defer cm.Unlock()
    return cm.add(r)
}

func (cm *controllerManager) add(r Runnable) error {
    // Set dependencies on the object
    // 设置对象的依赖
    if err := cm.SetFields(r); err != nil {
        return err
    }
    return cm.runnables.Add(r)
}

// pkg/manager/runnable_group.go

// Add adds a runnable to closest group of runnable that they belong to.
//
// Add should be able to be called before and after Start, but not after StopAndWait.
// Add should return an error when called during StopAndWait.
// The runnables added before Start are started when Start is called.
// The runnables added after Start are started directly.
// Add将runnable添加到它们所属的最近的runnable组。
// Add应该能够在Start之前和之后调用,但不能在StopAndWait之后调用。
// 在StopAndWait期间调用Add时应返回错误。
// 调用Start时,启动在Start之前添加的可运行项。
// 启动后添加的可运行项直接启动。
func (r *runnables) Add(fn Runnable) error {
    switch runnable := fn.(type) {
    case hasCache:
        return r.Caches.Add(fn, func(ctx context.Context) bool {
            return runnable.GetCache().WaitForCacheSync(ctx)
        })
    case *webhook.Server:
        return r.Webhooks.Add(fn, nil)
    case LeaderElectionRunnable:
        if !runnable.NeedLeaderElection() {
            return r.Others.Add(fn, nil)
        }
        return r.LeaderElection.Add(fn, nil)
    default:
        return r.LeaderElection.Add(fn, nil)
    }
}

controllerManager 的 Add 函数传递的是一个 Runnable 参数,Runnable 是一个接口,用来表示可以启动的一个组件,而恰好 Controller 实际上就实现了这个接口的 Start 函数,所以可以通过 Add 函数来添加 Controller 实例,在 Add 函数中除了依赖注入之外,还根据 Runnable 来判断组件是否支持选举功能,支持则将组件加入到 leaderElectionRunnables 列表中,否则加入到 nonLeaderElectionRunnables 列表中,这点非常重要,涉及到后面控制器的启动方式。

启动过Manager

如果 Manager 已经启动了,现在调用 Add 函数来添加 Runnable,则需要立即调用 startRunnable 函数启动控制器,startRunnable 函数就是在一个 goroutine 中去调用 Runnable 的 Start 函数,这里就相当于调用 Controller 的 Start 函数来启动控制器了。

到这里就实例化 Controller 完成了,回到前面 Builder 的 build 函数中,doController 函数调用完成,接着是 doWatch 函数的实现:

// pkg/builder/controller.go

func (blder *Builder) doWatch() error {
    // Reconcile type
    // 调协类型
    typeForSrc, err := blder.project(blder.forInput.object, blder.forInput.objectProjection)
    if err != nil {
        return err
    }
    src := &source.Kind{Type: typeForSrc}
    hdler := &handler.EnqueueRequestForObject{}
    allPredicates := append(blder.globalPredicates, blder.forInput.predicates...)
    if err := blder.ctrl.Watch(src, hdler, allPredicates...); err != nil {
        return err
    }

    // Watches the managed types
    // Watches 管理的类型(子类型)
    for _, own := range blder.ownsInput {
        typeForSrc, err := blder.project(own.object, own.objectProjection)
        if err != nil {
            return err
        }
        src := &source.Kind{Type: typeForSrc}
        hdler := &handler.EnqueueRequestForOwner{
            OwnerType:    blder.forInput.object,
            IsController: true,
        }
        allPredicates := append([]predicate.Predicate(nil), blder.globalPredicates...)
        allPredicates = append(allPredicates, own.predicates...)
        if err := blder.ctrl.Watch(src, hdler, allPredicates...); err != nil {
            return err
        }
    }

    // Do the watch requests
    // 执行 watch 请求
    for _, w := range blder.watchesInput {
        allPredicates := append([]predicate.Predicate(nil), blder.globalPredicates...)
        allPredicates = append(allPredicates, w.predicates...)

        // If the source of this watch is of type *source.Kind, project it.
        if srckind, ok := w.src.(*source.Kind); ok {
            typeForSrc, err := blder.project(srckind.Type, w.objectProjection)
            if err != nil {
                return err
            }
            srckind.Type = typeForSrc
        }

        if err := blder.ctrl.Watch(w.src, w.eventhandler, allPredicates...); err != nil {
            return err
        }
    }
    return nil
}

上面的 doWatch 函数就是去将我们需要调谐的资源对象放到 Controller 中进行 Watch 操作,包括资源对象管理的子类型,都需要去执行 Watch 操作,这就又回到了前面 Controller 的 Watch 操作了,其实就是去注册 Informer 的事件监听器,将数据添加到工作队列中去。这样到这里我们就将 Controller 初始化完成,并为我们调谐的资源对象执行了 Watch 操作。

最后是调用 Manager 的 Start 函数来启动 Manager,由于上面我们已经把 Controller 添加到了 Manager 中,所以这里启动其实是启动关联的 Controller,启动函数实现如下所示:

// pkg/manager/internal.go

// Start starts the manager and waits indefinitely.
// There is only two ways to have start return:
// An error has occurred during in one of the internal operations,
// such as leader election, cache start, webhooks, and so on.
// Or, the context is cancelled.
// Start 启动管理器并无限期等待
// 只有两种情况让Start 返回:
// 在其中一个内部操作中发生错误
// 例如领导人选举、cache start、webhooks等等。
// 或者 context 取消
func (cm *controllerManager) Start(ctx context.Context) (err error) {
    // 判断是否启动,如果已经启动,则直接返回
    cm.Lock()
    if cm.started {
        cm.Unlock()
        return errors.New("manager already started")
    }
    var ready bool
    defer func() {
        // Only unlock the manager if we haven't reached
        // the internal readiness condition.
        if !ready {
            cm.Unlock()
        }
    }()

    // Initialize the internal context.
    // 初始化内部的 context
    cm.internalCtx, cm.internalCancel = context.WithCancel(ctx)

    // This chan indicates that stop is complete, in other words all runnables have returned or timeout on stop request
    // 此chan表示停止已完成,换句话说,所有可运行程序都已返回或在停止请求时超时
    stopComplete := make(chan struct{})
    defer close(stopComplete)
    // This must be deferred after closing stopComplete, otherwise we deadlock.
    // stopComplete 关闭后必须在 defer 执行下面的操作,否则会出现死锁
    defer func() {
        // https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/gettyimages-459889618-1533579787.jpg
        stopErr := cm.engageStopProcedure(stopComplete)
        if stopErr != nil {
            if err != nil {
                // Utilerrors.Aggregate allows to use errors.Is for all contained errors
                // whereas fmt.Errorf allows wrapping at most one error which means the
                // other one can not be found anymore.
                err = kerrors.NewAggregate([]error{err, stopErr})
            } else {
                err = stopErr
            }
        }
    }()

    // Add the cluster runnable.
    // 添加集群 runnable
    if err := cm.add(cm.cluster); err != nil {
        return fmt.Errorf("failed to add cluster to runnables: %w", err)
    }

    // Metrics should be served whether the controller is leader or not.
    // (If we don't serve metrics for non-leaders, prometheus will still scrape
    // the pod but will get a connection refused).
    // Metrics 服务
    if cm.metricsListener != nil {
        cm.serveMetrics()
    }

    // Serve health probes.
    // 健康检查
    if cm.healthProbeListener != nil {
        cm.serveHealthProbes()
    }

    // First start any webhook servers, which includes conversion, validation, and defaulting
    // webhooks that are registered.
    //
    // WARNING: Webhooks MUST start before any cache is populated, otherwise there is a race condition
    // between conversion webhooks and the cache sync (usually initial list) which causes the webhooks
    // to never start because no cache can be populated.
    if err := cm.runnables.Webhooks.Start(cm.internalCtx); err != nil {
        if !errors.Is(err, wait.ErrWaitTimeout) {
            return err
        }
    }

    // Start and wait for caches.
    // 启动并等待缓存同步
    if err := cm.runnables.Caches.Start(cm.internalCtx); err != nil {
        if !errors.Is(err, wait.ErrWaitTimeout) {
            return err
        }
    }

    // Start the non-leaderelection Runnables after the cache has synced.
    if err := cm.runnables.Others.Start(cm.internalCtx); err != nil {
        if !errors.Is(err, wait.ErrWaitTimeout) {
            return err
        }
    }

    // Start the leader election and all required runnables.
    {
        ctx, cancel := context.WithCancel(context.Background())
        cm.leaderElectionCancel = cancel
        go func() {
            if cm.resourceLock != nil {
                if err := cm.startLeaderElection(ctx); err != nil {
                    cm.errChan <- err
                }
            } else {
                // Treat not having leader election enabled the same as being elected.
                if err := cm.startLeaderElectionRunnables(); err != nil {
                    cm.errChan <- err
                }
                close(cm.elected)
            }
        }()
    }

    ready = true
    cm.Unlock()
    select {
    case <-ctx.Done():
        // We are done
        return nil
    case err := <-cm.errChan:
        // Error starting or running a runnable
        return err
    }
}

上面的启动函数其实就是去启动前面我们加入到 Manager 中的 Runnable(Controller),非 LeaderElection 的列表与 LeaderElection 的列表都分别在一个 goroutine 中启动:

// pkg/manager/runnable_group.go

// Start starts the group and waits for all
// initially registered runnables to start.
// It can only be called once, subsequent calls have no effect.
// Start启动组并等待所有最初注册的可运行程序启动。
// 只能调用一次,后续调用无效。
func (r *runnableGroup) Start(ctx context.Context) error {
    var retErr error

    r.startOnce.Do(func() {
        defer close(r.startReadyCh)

        // Start the internal reconciler.
        go r.reconcile()

        // Start the group and queue up all
        // the runnables that were added prior.
        r.start.Lock()
        r.started = true
        for _, rn := range r.startQueue {
            rn.signalReady = true
            r.ch <- rn
        }
        r.start.Unlock()

        // If we don't have any queue, return.
        if len(r.startQueue) == 0 {
            return
        }

        // Wait for all runnables to signal.
        for {
            select {
            case <-ctx.Done():
                if err := ctx.Err(); !errors.Is(err, context.Canceled) {
                    retErr = err
                }
            case rn := <-r.startReadyCh:
                for i, existing := range r.startQueue {
                    if existing == rn {
                        // Remove the item from the start queue.
                        r.startQueue = append(r.startQueue[:i], r.startQueue[i+1:]...)
                        break
                    }
                }
                // We're done waiting if the queue is empty, return.
                if len(r.startQueue) == 0 {
                    return
                }
            }
        }
    })

    return retErr
}

可以看到最终还是去调用的 Runnable 的 Start 函数来启动,这里其实也就是 Controller 的 Start 函数,这个函数相当于启动一个控制循环不断从工作队列中消费数据,然后给到一个 Reconciler 接口进行处理,也就是我们要去实现的 Reconcile(Request) (Result, error) 这个业务逻辑函数。

image

到这里我们就完成了 Manager 的整个启动过程,包括 Manager 是如何初始化,如何和 Controller 进行关联以及如何启动 Controller 的。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章