通過prometheus實現k8s hpa自定義指標 (四)

在本系列文章的上一節通過prometheus實現k8s hpa自定義指標 (三),我們介紹了編寫一個最基礎的custom metrics API server所需要的庫,該庫作爲prometheus adapter的基礎。在這一節中,我們主要分析prometheus adapter。


默認情況下,adapter插件使用 Kubernetes in-cluster config連接k8s apiserver。它需要以下額外的參數配置,與prometheus和k8s集羣通信。

--lister-kubeconfig=<path-to-kubeconfig>: This configures how the adapter talks to a Kubernetes API server in order to list objects when operating with label selectors. By default, it will use in-cluster config.

--metrics-relist-interval=<duration>: This is the interval at which to update the cache of available metrics from Prometheus.

--rate-interval=<duration>: This is the duration used when requesting rate metrics from Prometheus. It must be larger than your Prometheus collection interval.

--prometheus-url=<url>: This is the URL used to connect to Prometheus. It will eventually contain query parameters to configure the connection.



  • “container” metrics(cAdvisor container metrics): 以container_開頭的series,以及非空namespace和pod_name標籤。
  • “namespaced” metrics (metrics describing namespaced Kubernetes objects): 帶有非空namespace標籤的series(不以container_開頭)。



  • metric名稱和類型已經被確定:
    • 對屬於容器的metrics,將去除container_前綴
    • 如果metric有_total後綴,它被標記爲counter metric,比去掉後綴
    • 如果metric有_seconds_total後綴,被標記爲seconds counter metric,並去掉後綴
    • 如果metric沒有以上後綴,被標記爲gauge metric,meitric名稱將保持原樣
  • 關聯資源與metric:
    • 容器metric和pod關聯
    • 對於非容器metric,series中的每個label將被考慮。如果該標籤表示的是一個可用resource(沒有group),metric可以和該resource關聯。一個metric可以和多個resource相關聯。

當檢索counter和seconds-counter metrics時,適配器會在配置的時間內以特定速率請求metrics。對於具有多個關聯resource的metric,適配器請求的metric在所有未請求的聚合metrics。



func (o PrometheusAdapterServerOptions) RunCustomMetricsAdapterServer(stopCh <-chan struct{}) error {
	config, err := o.Config()
	if err != nil {
		return err

	config.GenericConfig.EnableMetrics = true

	var clientConfig *rest.Config
	if len(o.RemoteKubeConfigFile) > 0 {
		loadingRules := &clientcmd.ClientConfigLoadingRules{ExplicitPath: o.RemoteKubeConfigFile}
		loader := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loadingRules, &clientcmd.ConfigOverrides{})

		clientConfig, err = loader.ClientConfig()
	} else {
		clientConfig, err = rest.InClusterConfig()
	if err != nil {
		return fmt.Errorf("unable to construct lister client config to initialize provider: %v", err)

	discoveryClient, err := discovery.NewDiscoveryClientForConfig(clientConfig)
	if err != nil {
		return fmt.Errorf("unable to construct discovery client for dynamic client: %v", err)

	dynamicMapper, err := dynamicmapper.NewRESTMapper(discoveryClient, apimeta.InterfacesForUnstructured, o.DiscoveryInterval)
	if err != nil {
		return fmt.Errorf("unable to construct dynamic discovery mapper: %v", err)

	clientPool := dynamic.NewClientPool(clientConfig, dynamicMapper, dynamic.LegacyAPIPathResolverFunc)
	if err != nil {
		return fmt.Errorf("unable to construct lister client to initialize provider: %v", err)

	// TODO: actually configure this client (strip query vars, etc)
	baseURL, err := url.Parse(o.PrometheusURL)
	if err != nil {
		return fmt.Errorf("invalid Prometheus URL %q: %v", baseURL, err)
	genericPromClient := prom.NewGenericAPIClient(http.DefaultClient, baseURL)
	instrumentedGenericPromClient := mprom.InstrumentGenericAPIClient(genericPromClient, baseURL.String())
	promClient := prom.NewClientForAPI(instrumentedGenericPromClient)

	cmProvider := cmprov.NewPrometheusProvider(dynamicMapper, clientPool, promClient, o.MetricsRelistInterval, o.RateInterval, stopCh)

	server, err := config.Complete().New("prometheus-custom-metrics-adapter", cmProvider)
	if err != nil {
		return err
	return server.GenericAPIServer.PrepareRun().Run(stopCh)

我們進入RunCustomMetricsAdapterServer函數,discoveryClient和k8s apiserver初始化有關,並生成dynamicMapper,保存着k8s resources和kinds的映射關係。同時初始化promClient,是prometheus的客戶端,給adapter提供metrics。隨後再初始化provider,如下所示:

func NewPrometheusProvider(mapper apimeta.RESTMapper, kubeClient dynamic.ClientPool, promClient prom.Client, updateInterval time.Duration, rateInterval time.Duration, stopChan <-chan struct{}) provider.CustomMetricsProvider {
	lister := &cachingMetricsLister{
		updateInterval: updateInterval,
		promClient:     promClient,

		SeriesRegistry: &basicSeriesRegistry{
			namer: metricNamer{
				// TODO: populate the overrides list
				overrides: nil,
				mapper:    mapper,


	return &prometheusProvider{
		mapper:     mapper,
		kubeClient: kubeClient,
		promClient: promClient,

		SeriesRegistry: lister,

		rateInterval: rateInterval,


func (l *cachingMetricsLister) RunUntil(stopChan <-chan struct{}) {
	go wait.Until(func() {
		if err := l.updateMetrics(); err != nil {
	}, l.updateInterval, stopChan)

func (l *cachingMetricsLister) updateMetrics() error {
	startTime := pmodel.Now().Add(-1 * l.updateInterval)

	// container-specific metrics from cAdvsior have their own form, and need special handling
	containerSel := prom.MatchSeries("", prom.NameMatches("^container_.*"), prom.LabelNeq("container_name", "POD"), prom.LabelNeq("namespace", ""), prom.LabelNeq("pod_name", ""))
	namespacedSel := prom.MatchSeries("", prom.LabelNeq("namespace", ""), prom.NameNotMatches("^container_.*"))
	// TODO: figure out how to determine which metrics on non-namespaced objects are kubernetes-related

	// TODO: use an actual context here
	series, err := l.promClient.Series(context.Background(), pmodel.Interval{startTime, 0}, containerSel, namespacedSel)
	if err != nil {
		return fmt.Errorf("unable to update list of all available metrics: %v", err)

	glog.V(10).Infof("Set available metric list from Prometheus to: %v", series)


	return nil


  1. 容器metrics,metric名稱以container_爲前綴的,並且標籤中包含"container_name:POD"且key爲namespace和pod_name的值爲空的metrics將被過濾掉。
  2. 具備namespace的metrics,metric名稱不以container_爲前綴且metric的標籤中包含namespace且爲空的將被過濾掉。


func (r *basicSeriesRegistry) SetSeries(newSeries []prom.Series) error {
	newInfo := make(map[provider.MetricInfo]seriesInfo)
	for _, series := range newSeries {
		if strings.HasPrefix(series.Name, "container_") {
			r.namer.processContainerSeries(series, newInfo)
		} else if namespaceLabel, hasNamespaceLabel := series.Labels["namespace"]; hasNamespaceLabel && namespaceLabel != "" {
			// we also handle namespaced metrics here as part of the resource-association logic
			if err := r.namer.processNamespacedSeries(series, newInfo); err != nil {
				glog.Errorf("Unable to process namespaced series %q: %v", series.Name, err)
		} else {
			if err := r.namer.processRootScopedSeries(series, newInfo); err != nil {
				glog.Errorf("Unable to process root-scoped series %q: %v", series.Name, err)

	newMetrics := make([]provider.MetricInfo, 0, len(newInfo))
	for info := range newInfo {
		newMetrics = append(newMetrics, info)

	defer r.mu.Unlock()

	r.info = newInfo
	r.metrics = newMetrics

	return nil




// processContainerSeries performs special work to extract metric definitions
// from cAdvisor-sourced container metrics, which don't particularly follow any useful conventions consistently.
func (n *metricNamer) processContainerSeries(series prom.Series, infos map[provider.MetricInfo]seriesInfo) {

	originalName := series.Name

	var name string
	metricKind := GaugeSeries
	if override, hasOverride := n.overrides[series.Name]; hasOverride {
		name = override.metricName
		metricKind = override.kind
	} else {
		// chop of the "container_" prefix
		series.Name = series.Name[10:]
		name, metricKind = n.metricNameFromSeries(series)

	info := provider.MetricInfo{
		GroupResource: schema.GroupResource{Resource: "pods"},
		Namespaced:    true,
		Metric:        name,

	infos[info] = seriesInfo{
		kind:        metricKind,
		baseSeries:  prom.Series{Name: originalName},
		isContainer: true,

processContainerSeries爲從cadvisor中獲取的series的分類函數,這裏series的名稱會轉換,如果配置overrides則覆蓋series名稱(v0.2.0並沒有提供覆蓋series name配置,新版本有提供),否則去除container_前綴,然後再判斷series名稱的類型

// metricNameFromSeries extracts a metric name from a series name, and indicates
// whether or not that series was a counter.  It also has special logic to deal with time-based
// counters, which general get converted to milli-unit rate metrics.
func (n *metricNamer) metricNameFromSeries(series prom.Series) (name string, kind SeriesType) {
	kind = GaugeSeries
	name = series.Name
	if strings.HasSuffix(name, "_total") {
		kind = CounterSeries
		name = name[:len(name)-6]

		if strings.HasSuffix(name, "_seconds") {
			kind = SecondsCounterSeries
			name = name[:len(name)-8]





// processNamespacedSeries adds the metric info for the given generic namespaced series to
// the map of metric info.
func (n *metricNamer) processNamespacedSeries(series prom.Series, infos map[provider.MetricInfo]seriesInfo) error {
	// NB: all errors must occur *before* we save the series info
	name, metricKind := n.metricNameFromSeries(series)
	resources, err := n.groupResourcesFromSeries(series)
	if err != nil {
		return fmt.Errorf("unable to process prometheus series %s: %v", series.Name, err)

	// we add one metric for each resource that this could describe
	for _, resource := range resources {
		info := provider.MetricInfo{
			GroupResource: resource,
			Namespaced:    true,
			Metric:        name,

		// metrics describing namespaces aren't considered to be namespaced
		if resource == (schema.GroupResource{Resource: "namespaces"}) {
			info.Namespaced = false

		infos[info] = seriesInfo{
			kind:       metricKind,
			baseSeries: prom.Series{Name: series.Name},

	return nil


// groupResourceFromSeries collects the possible group-resources that this series could describe by
// going through each label, checking to see if it corresponds to a known resource.  For instance,
// a series `ingress_http_hits_total{pod="foo",service="bar",ingress="baz",namespace="ns"}`
// would return three GroupResources: "pods", "services", and "ingresses".
// Returned MetricInfo is equilavent to the "normalized" info produced by metricInfo.Normalized.
func (n *metricNamer) groupResourcesFromSeries(series prom.Series) ([]schema.GroupResource, error) {
	var res []schema.GroupResource
	for label := range series.Labels {
		// TODO: figure out a way to let people specify a fully-qualified name in label-form
		gvr, err := n.mapper.ResourceFor(schema.GroupVersionResource{Resource: string(label)})
		if err != nil {
			if apimeta.IsNoMatchError(err) {
			return nil, err
		res = append(res, gvr.GroupResource())

	return res, nil



不滿足processContainerSeries和processNamespacedSeries的series被稱爲rootScoped seeries。

// processesRootScopedSeries adds the metric info for the given generic namespaced series to
// the map of metric info.
func (n *metricNamer) processRootScopedSeries(series prom.Series, infos map[provider.MetricInfo]seriesInfo) error {
	// NB: all errors must occur *before* we save the series info
	name, metricKind := n.metricNameFromSeries(series)
	resources, err := n.groupResourcesFromSeries(series)
	if err != nil {
		return fmt.Errorf("unable to process prometheus series %s: %v", series.Name, err)

	// we add one metric for each resource that this could describe
	for _, resource := range resources {
		info := provider.MetricInfo{
			GroupResource: resource,
			Namespaced:    false,
			Metric:        name,

		infos[info] = seriesInfo{
			kind:       metricKind,
			baseSeries: prom.Series{Name: series.Name},

	return nil


// metrics describing namespaces aren't considered to be namespaced
	if resource == (schema.GroupResource{Resource: "namespaces"}) {
		info.Namespaced = false


// SeriesRegistry provides conversions between Prometheus series and MetricInfo
type SeriesRegistry interface {
	// SetSeries replaces the known series in this registry
	SetSeries(series []prom.Series) error
	// ListAllMetrics lists all metrics known to this registry
	ListAllMetrics() []provider.MetricInfo
	// SeriesForMetric looks up the minimum required series information to make a query for the given metric
	// against the given resource (namespace may be empty for non-namespaced resources)
	QueryForMetric(info provider.MetricInfo, namespace string, resourceNames ...string) (kind SeriesType, query prom.Selector, groupBy string, found bool)
	// MatchValuesToNames matches result values to resource names for the given metric and value set
	MatchValuesToNames(metricInfo provider.MetricInfo, values pmodel.Vector) (matchedValues map[string]pmodel.SampleValue, found bool)

以QueryForMetric爲例,該函數主要給定metricInfo的series類型,prometheus query語句以及分租資源:

func (r *basicSeriesRegistry) QueryForMetric(metricInfo provider.MetricInfo, namespace string, resourceNames ...string) (kind SeriesType, query prom.Selector, groupBy string, found bool) {
	defer r.mu.RUnlock()

	if len(resourceNames) == 0 {
		glog.Errorf("no resource names requested while producing a query for metric %s", metricInfo.String())
		return 0, "", "", false

	metricInfo, singularResource, err := metricInfo.Normalized(r.namer.mapper)
	if err != nil {
		glog.Errorf("unable to normalize group resource while producing a query: %v", err)
		return 0, "", "", false

	// TODO: support container metrics
	if info, found := r.info[metricInfo]; found {
		targetValue := resourceNames[0]
		matcher := prom.LabelEq
		if len(resourceNames) > 1 {
			targetValue = strings.Join(resourceNames, "|")
			matcher = prom.LabelMatches

		var expressions []string
		if info.isContainer {
			expressions = []string{matcher("pod_name", targetValue), prom.LabelNeq("container_name", "POD")}
			groupBy = "pod_name"
		} else {
			// TODO: copy base series labels?
			expressions = []string{matcher(singularResource, targetValue)}
			groupBy = singularResource

		if metricInfo.Namespaced {

			expressions = append(expressions, prom.LabelEq("namespace", namespace))

		return info.kind, prom.MatchSeries(info.baseSeries.Name, expressions...), groupBy, true

	glog.V(10).Infof("metric %v not registered", metricInfo)
	return 0, "", "", false


// MatchSeries takes a series name, and optionally some label expressions, and returns a series selector.
// TODO: validate series name and expressions?
func MatchSeries(name string, labelExpressions ...string) Selector {
	if len(labelExpressions) == 0 {
		return Selector(name)

	return Selector(fmt.Sprintf("%s{%s}", name, strings.Join(labelExpressions, ",")))

我們在看下QueryForMetric函數是在哪裏調用的,通過查找,我們會發現在buildQuery函數中,從函數名稱我們可以看到,其實就是在創建prometheus query語句,並返回相應的查詢結果,具體代碼如下:

func (p *prometheusProvider) buildQuery(info provider.MetricInfo, namespace string, names ...string) (pmodel.Vector, error) {
	kind, baseQuery, groupBy, found := p.QueryForMetric(info, namespace, names...)
	if !found {
		return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric)

	fullQuery := baseQuery
	switch kind {
	case CounterSeries:
		fullQuery = prom.Selector(fmt.Sprintf("rate(%s[%s])", baseQuery, pmodel.Duration(p.rateInterval).String()))
	case SecondsCounterSeries:
		// TODO: futher modify for seconds?
		fullQuery = prom.Selector(prom.Selector(fmt.Sprintf("rate(%s[%s])", baseQuery, pmodel.Duration(p.rateInterval).String())))

	// NB: too small of a rate interval will return no results...

	// sum over all other dimensions of this query (e.g. if we select on route, sum across all pods,
	// but if we select on pods, sum across all routes), and split by the dimension of our resource
	// TODO: return/populate the by list in SeriesForMetric
	fullQuery = prom.Selector(fmt.Sprintf("sum(%s) by (%s)", fullQuery, groupBy))

	// TODO: use an actual context
	queryResults, err := p.promClient.Query(context.Background(), pmodel.Now(), fullQuery)
	if err != nil {
		glog.Errorf("unable to fetch metrics from prometheus: %v", err)
		// don't leak implementation details to the user\
		return nil, apierr.NewInternalError(fmt.Errorf("unable to fetch metrics"))

	if queryResults.Type != pmodel.ValVector {
		glog.Errorf("unexpected results from prometheus: expected %s, got %s on results %v", pmodel.ValVector, queryResults.Type, queryResults)
		return nil, apierr.NewInternalError(fmt.Errorf("unable to fetch metrics"))

	return *queryResults.Vector, nil


hpa Controller的rest metrics client

我們在配置hpa yaml文件時,關於type有三種選項,Resource、Pods和Object,這三種type對應於hpa Controller的處理函數分別爲GetResourceMetric、GetRawMetric和GetObjectMetric。他們的處理函數分別如下所示:

// GetResourceMetric gets the given resource metric (and an associated oldest timestamp)
// for all pods matching the specified selector in the given namespace
func (c *resourceMetricsClient) GetResourceMetric(resource v1.ResourceName, namespace string, selector labels.Selector) (PodMetricsInfo, time.Time, error) {
	metrics, err := c.client.PodMetricses(namespace).List(metav1.ListOptions{LabelSelector: selector.String()})
	if err != nil {
		return nil, time.Time{}, fmt.Errorf("unable to fetch metrics from API: %v", err)

	if len(metrics.Items) == 0 {
		return nil, time.Time{}, fmt.Errorf("no metrics returned from heapster")

	res := make(PodMetricsInfo, len(metrics.Items))

	for _, m := range metrics.Items {
		podSum := int64(0)
		missing := len(m.Containers) == 0
		for _, c := range m.Containers {
			resValue, found := c.Usage[v1.ResourceName(resource)]
			if !found {
				missing = true
				glog.V(2).Infof("missing resource metric %v for container %s in pod %s/%s", resource, c.Name, namespace, m.Name)
				break // containers loop
			podSum += resValue.MilliValue()

		if !missing {
			res[m.Name] = int64(podSum)

	timestamp := metrics.Items[0].Timestamp.Time

	return res, timestamp, nil

// customMetricsClient implements the custom-metrics-related parts of MetricsClient,
// using data from the custom metrics API.
type customMetricsClient struct {
	client customclient.CustomMetricsClient

// GetRawMetric gets the given metric (and an associated oldest timestamp)
// for all pods matching the specified selector in the given namespace
func (c *customMetricsClient) GetRawMetric(metricName string, namespace string, selector labels.Selector) (PodMetricsInfo, time.Time, error) {
	metrics, err := c.client.NamespacedMetrics(namespace).GetForObjects(schema.GroupKind{Kind: "Pod"}, selector, metricName)
	if err != nil {
		return nil, time.Time{}, fmt.Errorf("unable to fetch metrics from API: %v", err)

	if len(metrics.Items) == 0 {
		return nil, time.Time{}, fmt.Errorf("no metrics returned from custom metrics API")

	res := make(PodMetricsInfo, len(metrics.Items))
	for _, m := range metrics.Items {
		res[m.DescribedObject.Name] = m.Value.MilliValue()

	timestamp := metrics.Items[0].Timestamp.Time

	return res, timestamp, nil

// GetObjectMetric gets the given metric (and an associated timestamp) for the given
// object in the given namespace
func (c *customMetricsClient) GetObjectMetric(metricName string, namespace string, objectRef *autoscaling.CrossVersionObjectReference) (int64, time.Time, error) {
	gvk := schema.FromAPIVersionAndKind(objectRef.APIVersion, objectRef.Kind)
	var metricValue *customapi.MetricValue
	var err error
	if gvk.Kind == "Namespace" && gvk.Group == "" {
		// handle namespace separately
		// NB: we ignore namespace name here, since CrossVersionObjectReference isn't
		// supposed to allow you to escape your namespace
		metricValue, err = c.client.RootScopedMetrics().GetForObject(gvk.GroupKind(), namespace, metricName)
	} else {
		metricValue, err = c.client.NamespacedMetrics(namespace).GetForObject(gvk.GroupKind(), objectRef.Name, metricName)

	if err != nil {
		return 0, time.Time{}, fmt.Errorf("unable to fetch metrics from API: %v", err)

	return metricValue.Value.MilliValue(), metricValue.Timestamp.Time, nil



本節通過對k8s-prometheus-adapter的主要代碼做分析,主要介紹了series的緩存和對series的處理,並同時介紹了adapter請求metirc與prometheus的交互過程以及hpa controller分類請求custom metric API等。結合前面3節,我們可以做到根據自己的應用自定義指標做擴縮容,同時爲了起到理解hpa的工作過程,本系列文章也介紹了custom metric適配器開發和prometheus適配器源碼的主要部分做分析,主要是給我們定義hpa yaml文件提供一些幫助和支持。當然紙上得來終覺淺,絕知此事需躬行,請務必要動手實踐,多多實踐才能更好的幫助我們理解整個過程。

