背景

Prometheus是最近流行的監控報警系統，具體大家可以搜網上的文章來了解，而由於我司目前的應用使用了Django框架來做爲後端應用，因此需要研究如何將Prometheus與Django結合在一起使用，因此有了接下來的源碼研究。

在分析源代碼之前，先要知道爲什麼需要分析源代碼，對於我來說，有幾個問題是我想要搞明白的：

django-prometheus是如何註冊/metrics uri並通過接口提供服務的？
django-prometheus到底是怎樣將數據從不同的接口收集上來的？
django-prometheus收集上來Metrics後是否需要存儲，如果需要，那麼存儲在什麼地方了？
而在搞清楚這些問題的時候，發現django-prometheus又調用了prometheus_client，又不可避免的有了針對prometheus_client的問題，所以又不得不去看prometheus_client的源碼，也因此有了本文。

第一篇我們已經基本回答了第一個問題，即django-prometheus究竟是如何通過/metrics提供接口服務的。這一篇我們就接着探尋其它問題的答案。

源碼分析

Collector

首先，我們需要知道Collector在應用程序中具體是如何採集數據的，先看幾個例子：

from prometheus_client import Counter, Gauge, Histogram

c = Counter('my_requests_total', 'HTTP Failures', ['method', 'endpoint'])  # 此爲parent對象
c.labels('get', '/').inc()  # 注意labels的使用
c.labels('post', '/submit').inc()


g = Gauge('gg', 'A gauge')
h = Histogram('hh', 'A histogram', buckets=(-5, 0, 5))
s = Summary('ss', 'A summary', ['label1', 'label2'])  # metrics名字, metrics說明, metrics支持的label

# Gauge 有三種方法可以用來改變其記錄值
g.inc()   # 加1操作
g.set(5)  # 設定記錄值爲5
g.dec(2)  # 減2操作

# Histogram 使用observe()方法來記錄值
h.observe(5)

我們以Counter的inc()方法爲例，看下它是如何記錄數值的。

class Counter(MetricWrapperBase):
    ....
    def _metric_init(self):
        self._value = values.ValueClass(self._type, self._name, self._name + '_total', self._labelnames,
                                        self._labelvalues)
        self._created = time.time()
        
    def inc(self, amount=1):
        """Increment counter by the given amount."""
        if amount < 0:
            raise ValueError('Counters can only be incremented by non-negative amounts.')
        self._value.inc(amount) # 這裏的self._value是在_metric_init中定義
    ...

我們在使用Counter對象的inc()方法時本質上是調用了ValueClass的inc()方法
self._value是在_metric_init()方法中初始化的，而_metric_init()是在Collector初始化的時候被調用的。
_metric_init()是在每個Collector具體實現的類中必須要實現的方法，這個方法會被__init__()初始化方法所調用。

而這裏ValueClass具體又是什麼呢？

# prometheus_client/values.py

class MutexValue(object):
    """A float protected by a mutex."""

    _multiprocess = False

    def __init__(self, typ, metric_name, name, labelnames, labelvalues, **kwargs):
        self._value = 0.0   # 定義了一個浮點數
        self._lock = Lock() # 初始化一個線程鎖，用於保證線程安全

    def inc(self, amount):  # 真正的inc操作實現
        with self._lock:
            self._value += amount

    def set(self, value):
        with self._lock:
            self._value = value

    def get(self):
        with self._lock:
            return self._value
            
...

def get_value_class():
    # Should we enable multi-process mode?
    # This needs to be chosen before the first metric is constructed,
    # and as that may be in some arbitrary library the user/admin has
    # no control over we use an environment variable.
    if 'prometheus_multiproc_dir' in os.environ or 'PROMETHEUS_MULTIPROC_DIR' in os.environ:
        return MultiProcessValue()
    else:
        return MutexValue # 重點關注這裏，返回的是MutexValue類


ValueClass = get_value_class() # 在這裏定義ValueClass

不考慮多進程的情況，那麼ValueClass實際就是MutexValue
之所以使用MutexValue類，而不是直接使用原生的float，是由於增加了一個線程鎖作爲信號量，保證數值的更改是線程安全的。
至此，我們知道所有的數據本質上都是在內存中的，並沒有做持久化，理論上當我們調用collect() 去收集metrics的時候也是從內存中獲取的（即從存於內存的對象中獲取）

那接下來就讓我們看下具體collect()做了什麼。

class MetricWrapperBase(object):
    ...
    def _get_metric(self):
        return Metric(self._name, self._documentation, self._type, self._unit)
        
    
    def collect(self):
        metric = self._get_metric()
        for suffix, labels, value in self._samples():
            metric.add_sample(self._name + suffix, labels, value)
        return [metric]
    ...

collect()主要做了什麼事呢？就是獲取到Metric對象（命名爲metric)，然後將samples加入到metric中，然後再將metric返回.
這裏我們又會遇到以下幾個問題：

Metric究竟是個啥?
self._samples是個啥?
add_sample幹了啥？

Metric

爲了回答上邊的問題，我們先來看下Metric的源碼：

# prometheus_client/metrics_core.py

class Metric(object):
    """A single metric family and its samples.
    This is intended only for internal use by the instrumentation client.
    Custom collectors should use GaugeMetricFamily, CounterMetricFamily
    and SummaryMetricFamily instead.
    """

    def __init__(self, name, documentation, typ, unit=''):
        if unit and not name.endswith("_" + unit):
            name += "_" + unit
        if not METRIC_NAME_RE.match(name):
            raise ValueError('Invalid metric name: ' + name)
        self.name = name
        self.documentation = documentation
        self.unit = unit
        if typ == 'untyped':
            typ = 'unknown'
        if typ not in METRIC_TYPES:
            raise ValueError('Invalid metric type: ' + typ)
        self.type = typ  # 標明是什麼類型的Metric，比如gauge, 還是counter
        self.samples = [] # 注意這裏samples是一個list

    def add_sample(self, name, labels, value, timestamp=None, exemplar=None):
        """Add a sample to the metric.
        Internal-only, do not use."""
        self.samples.append(Sample(name, labels, value, timestamp, exemplar))
        ...

從這段代碼可以看出Metric維護了一個成員變量samples, 當調用Metric對象的方法add_sample()時，會初始化一個Sample對象，並將該對象加入到samples list當中。而Sample是一個namedtuple，具體如下。

Sample

Sample = namedtuple('Sample', ['name', 'labels', 'value', 'timestamp', 'exemplar'])
Sample.__new__.__defaults__ = (None, None) # 設置最右兩個字段的默認值，即設置timestamp和exemplar的默認值爲None

Exemplar = namedtuple('Exemplar', ['labels', 'value', 'timestamp'])
Exemplar.__new__.__defaults__ = (None,)

從這部分源碼我們可以看出Sample本質上是一個namedtuple。需要注意的這裏有個較爲特別的語法__new__.__defaults__，這個語法用於爲namedtuple設置默認值。

labels

之前還有一個問題就是self._samples是個啥？
看如下代碼，會發現_samples是MetricWrapperBase的一個method。


class MetricWrapperBase(object):
    ...
    
    def _samples(self):
        if self._is_parent():
            return self._multi_samples()
        else:
            return self._child_samples()

    def _multi_samples(self):
        with self._lock:
            metrics = self._metrics.copy()
        for labels, metric in metrics.items():
            # 這裏labels實際上是lablevalues tuple
            # series_labels大致是這樣的：[('method', 'post'), ('path', '/submit')]
            series_labels = list(zip(self._labelnames, labels))
            
            # 這裏的metric是child metric，所以_samples()調用的是_child_samples(), 也就是返回實際metric記錄的數字
            for suffix, sample_labels, value in metric._samples():
                # 最終返回的結果大致是如下樣子：
                # ('total', {'method': 'post', 'path': '/submit'}, 5)
                yield (suffix, dict(series_labels + list(sample_labels.items())), value)

    def _child_samples(self):  # pragma: no cover
        raise NotImplementedError('_child_samples() must be implemented by %r' % self)
    
    ...

剛開始看這段代碼有點懵逼，爲啥還有pareent, child，到底是什麼意思呢？
後來經過仔細研讀代碼和分析，發現是由於metric的存儲結構導致的。

我們以Counter爲例，當我們的metric沒有label的時候，那麼存儲時候只需要返回當前的數據即可，比如：

{"_total": 5, "_created": 1619692360.740}

但是當我們的metric有lable的時候，就需要分層存儲了。先來看下我們是怎麼使用Counter的

c = Counter('my_requests_total', 'HTTP Failures', ['method', 'endpoint'])

注意這裏初始化完成之後，對象c只有label name，沒有label value，這時候就會被認爲是parent，這時_metrics會被初始化爲一個dict

...
        if self._is_parent():
            # Prepare the fields needed for child metrics.
            self._lock = Lock()
            self._metrics = {}
...

然後在使用lables方法的時候，實際會返回一個全新的Collector對象

c.labels('get', '/').inc()
c.labels('post', '/submit').inc()

關鍵看這個labels方法的代碼：

class MetricWrapperBase(object):
...

    def labels(self, *labelvalues, **labelkwargs)
        ...
        with self._lock:
            if labelvalues not in self._metrics:
                # 注意這裏以labelvalues這個tuple作爲key，以新生成的Collector作爲value
                self._metrics[labelvalues] = self.__class__(
                    self._name,
                    documentation=self._documentation,
                    labelnames=self._labelnames,
                    unit=self._unit,
                    labelvalues=labelvalues,
                    **self._kwargs
                )
            return self._metrics[labelvalues]
        ...
...

關鍵點就在於使用label value的tuple做爲key，然後生成了一個新的Collector對象作爲value，存儲在了_metric字典當中，需要注意的是，這個新的Collector對象，它的labelvalues不再是None，而是有實際的值。所以這時，這個新的Collector就是child。

至此，我們已經基本清楚了，Collector究竟是如何記錄數據的，而上層調用collect()方法時，又是如何將數據收集和整理出來的。

最後上個圖也許更加清晰

django-prometheus和prometheus_client源碼分析（二）背景源碼分析 References

背景

源碼分析

Collector

Metric

Sample

labels

References

Python 爬蟲：Spring Boot 反爬蟲的成功案例

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

文章更新在別處

django-prometheus和prometheus_client源碼分析（二）背景源碼分析 References

django-prometheus和prometheus_client源碼分析（一）背景源碼分析 References

在Windows上使用Nuitka將Python文件打包成exe文件

利用STS技術實現對象存儲的鑑權

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

django-prometheus和prometheus_client源碼分析（二） 背景 源碼分析 References

背景

源碼分析

Collector

Metric

Sample

labels

References

django-prometheus和prometheus_client源碼分析（二）背景源碼分析 References