Opentracing and Jaeger in a real Go Microservices

背景

微服務極大地改變了軟件的開發和交付模式，單體應用被拆分爲多個微服務，單個服務的複雜度大幅降低，庫之間的依賴也轉變爲服務之間的依賴。由此帶來的問題是部署的粒度變得越來越細，衆多服務給運維帶來巨大壓力，不過好在我們有 Kubernetes，可以解決大部分運維方面的難題。

隨着服務數量的增多和內部調用鏈的複雜化，僅憑藉日誌和性能監控很難做到 “See the Whole Picture”，在進行問題排查或是性能分析的時候，無異於盲人摸象。分佈式追蹤能夠幫助開發者直觀分析請求鏈路，快速定位性能瓶頸，逐漸優化服務間依賴，也有助於開發者從更宏觀的角度更好地理解整個分佈式系統。

分佈式追蹤系統大體分爲三個部分，數據採集、數據持久化、數據展示。數據採集是指在代碼中埋點，設置請求中要上報的階段，以及設置當前記錄的階段隸屬於哪個上級階段。數據持久化則是指將上報的數據落盤存儲，例如 Jaeger 就支持多種存儲後端，可選用 Cassandra 或者 Elasticsearch。數據展示則是前端根據 Trace ID 查詢與之關聯的請求階段，並在界面上呈現。

微服務通訊架構圖

Opentracing

發展歷史

早在 2005 年，Google 就在內部部署了一套分佈式追蹤系統 Dapper，並發表了一篇論文《Dapper, a Large-Scale Distributed Systems Tracing Infrastructure》，闡述了該分佈式追蹤系統的設計和實現，可以視爲分佈式追蹤領域的鼻祖。隨後出現了受此啓發的開源實現，如 Zipkin、SourceGraph 開源的 Appdash、Red Hat 的 Hawkular APM、Uber 開源的 Jaeger 等。但各家的分佈式追蹤方案是互不兼容的，這才誕生了 OpenTracing。OpenTracing 是一個 Library，定義了一套通用的數據上報接口，要求各個分佈式追蹤系統都來實現這套接口。這樣一來，應用程序只需要對接 OpenTracing，而無需關心後端採用的到底什麼分佈式追蹤系統，因此開發者可以無縫切換分佈式追蹤系統，也使得在通用代碼庫增加對分佈式追蹤的支持成爲可能。

目前，主流的分佈式追蹤實現基本都已經支持 OpenTracing，包括 Jaeger、Zipkin、Appdash 等，具體可參考官方文檔《Supported Tracer Implementations》。

數據模型

這部分在 OpenTracing 的規範中寫的非常清楚，下面只大概翻譯一下其中的關鍵部分，細節可參考原始文檔《The OpenTracing Semantic Specification》。

Causal relationships between Spans in a single Trace

        [Span A]  ←←←(the root span)
            |
     +------+------+
     |             |
 [Span B]      [Span C] ←←←(Span C is a `ChildOf` Span A)
     |             |
 [Span D]      +---+-------+
               |           |
           [Span E]    [Span F] >>> [Span G] >>> [Span H]
                                       ↑
                                       ↑
                                       ↑
                         (Span G `FollowsFrom` Span F)

Trace 是調用鏈，每個調用鏈由多個 Span 組成。Span 的單詞含義是範圍，可以理解爲某個處理階段。Span 和 Span 的關係稱爲 Reference。上圖中，總共有標號爲 A-H 的 8 個階段。

Temporal relationships between Spans in a single Trace

––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time

 [Span A···················································]
   [Span B··············································]
      [Span D··········································]
    [Span C········································]
         [Span E·······]        [Span F··] [Span G··] [Span H··]

上圖是按照時間順序呈現的調用鏈。

每個階段（Span）包含如下狀態：

操作名稱
起始時間
結束時間
一組 KV 值，作爲階段的標籤（Span Tags）
階段日誌（Span Logs）
階段上下文（SpanContext），其中包含 Trace ID 和 Span ID
引用關係（References）

階段（Span）可以有 ChildOf 和 FollowsFrom 兩種引用關係。ChildOf 用於表示父子關係，即在某個階段中發生了另一個階段，是最常見的階段關係，典型的場景如調用 RPC 接口、執行 SQL、寫數據。FollowsFrom 表示跟隨關係，意爲在某個階段之後發生了另一個階段，用來描述順序執行關係。

ChildOf relationship means that the rootSpan has a logical dependency on the child span before rootSpan can complete its operation. Another standard reference type in OpenTracing is FollowsFrom, which means the rootSpan is the ancestor in the DAG, but it does not depend on the completion of the child span, for example if the child represents a best-effort, fire-and-forget cache write.

Concepts and Terminology, 概念與術語

Traces

一個trace代表一個潛在的，分佈式的，存在並行數據或並行執行軌跡（潛在的分佈式、並行）的系統。一個trace可以認爲是多個span的有向無環圖（DAG）。

Spans

一個span代表系統中具有開始時間和執行時長的邏輯運行單元。span之間通過嵌套或者順序排列建立邏輯因果關係。

Operation Names

每一個span都有一個操作名稱，這個名稱簡單，並具有可讀性高。（例如：一個RPC方法的名稱，一個函數名，或者一個大型計算過程中的子任務或階段）。span的操作名應該是一個抽象、通用的標識，能夠明確的、具有統計意義的名稱；更具體的子類型的描述，請使用Tags
例如，假設一個獲取賬戶信息的span會有如下可能的名稱：

| Operation Name | Guidance |
|:---------------|:--------|
| get | Too general |
| get_account/792 | Too specific |
| get_account | Good, and account_id=792 would make a nice Span tag |

References between Spans

A Span may reference zero or more other SpanContexts that are causally related. OpenTracing presently defines two types of references: ChildOf and FollowsFrom. Both reference types specifically model direct causal relationships between a child Span and a parent Span. In the future, OpenTracing may also support reference types for Spans with non-causal relationships (e.g., Spans that are batched together, Spans that are stuck in the same queue, etc).

ChildOf references: A Span may be the ChildOf a parent Span. In a ChildOf reference, the parent Span depends on the child Span in some capacity. All of the following would constitute ChildOf relationships:

A Span representing the server side of an RPC may be the ChildOf a Span representing the client side of that RPC
A Span representing a SQL insert may be the ChildOf a Span representing an ORM save method
Many Spans doing concurrent (perhaps distributed) work may all individually be the ChildOf a single parent Span that merges the results for all children that return within a deadline

These could all be valid timing diagrams for children that are the ChildOf a parent.

    [-Parent Span---------]
         [-Child Span----]

    [-Parent Span--------------]
         [-Child Span A----]
          [-Child Span B----]
        [-Child Span C----]
         [-Child Span D---------------]
         [-Child Span E----]

FollowsFrom references: Some parent Spans do not depend in any way on the result of their child Spans. In these cases, we say merely that the child Span FollowsFrom the parent Span in a causal sense. There are many distinct FollowsFrom reference sub-categories, and in future versions of OpenTracing they may be distinguished more formally.

These can all be valid timing diagrams for children that "FollowFrom" a parent.

    [-Parent Span-]  [-Child Span-]


    [-Parent Span--]
     [-Child Span-]


    [-Parent Span-]
                [-Child Span-]

Logs

每個span可以進行多次Logs操作，每一次Logs操作，都需要一個帶時間戳的時間名稱，以及可選的任意大小的存儲結構。
標準中定義了一些日誌（logging）操作的一些常見用例和相關的log事件的鍵值，可參考Data Conventions Guidelines 數據約定指南。

SpanContext

每個span必須提供方法訪問SpanContext。SpanContext代表跨越進程邊界，傳遞到下級span的狀態。(例如，包含<trace_id, span_id, sampled>元組)，並用於封裝Baggage (關於Baggage的解釋，請參考下文)。SpanContext在跨越進程邊界，和在追蹤圖中創建邊界的時候會使用。(ChildOf關係或者其他關係，參考Span間關係 )。

Baggage

Baggage是存儲在SpanContext中的一個鍵值對(SpanContext)集合。它會在一條追蹤鏈路上的所有span內全局傳輸，包含這些span對應的SpanContexts。在這種情況下，"Baggage"會隨着trace一同傳播，他因此得名（Baggage可理解爲隨着trace運行過程傳送的行李）。鑑於全棧OpenTracing集成的需要，Baggage通過透明化的傳輸任意應用程序的數據，實現強大的功能。例如：可以在最終用戶的手機端添加一個Baggage元素，並通過分佈式追蹤系統傳遞到存儲層，然後再通過反向構建調用棧，定位過程中消耗很大的SQL查詢語句。

Baggage擁有強大功能，也會有很大的消耗。由於Baggage的全局傳輸，如果包含的數量量太大，或者元素太多，它將降低系統的吞吐量或增加RPC的延遲。

Baggage vs. Span Tags

Baggage在全局範圍內，（伴隨業務系統的調用）跨進程傳輸數據。Span的tag不會進行傳輸，因爲他們不會被子級的span繼承。
span的tag可以用來記錄業務相關的數據，並存儲於追蹤系統中。實現OpenTracing時，可以選擇是否存儲Baggage中的非業務數據，OpenTracing標準不強制要求實現此特性。

Inject and Extract

SpanContexts可以通過Injected操作向Carrier增加，或者通過Extracted從Carrier中獲取，跨進程通訊數據（例如：HTTP頭）。通過這種方式，SpanContexts可以跨越進程邊界，並提供足夠的信息來建立跨進程的span間關係（因此可以實現跨進程連續追蹤）。

Global and No-op Tracers

每一個平臺的OpenTracing API庫(opentracing-go, opentracing-java等)，都必須實現一個空的Tracer，No-op Tracer的實現必須不會出錯，並且不會有任何副作用。這樣在業務方沒有指定collector服務、storage、和初始化全局tracer時，但是rpc組件，orm組件或者其他組件加入了探針。這樣全局默認是No-op Tracer實例，則對業務不會有任何影響。

jaeger

架構

Jaeger can be deployed either as all-in-one binary, where all Jaeger backend components run in a single process, or as a scalable distributed system, discussed below. There two main deployment options:

Collectors are writing directly to storage.
Collectors are writing to Kafka as a preliminary buffer.

Illustration of direct-to-storage architecture

Illustration of architecture with Kafka as intermediate buffer

An instrumented service creates spans when receiving new requests and attaches context information (trace id, span id, and baggage) to outgoing requests. Only ids and baggage are propagated with requests; all other information that compose a span like operation name, logs, etc. are not propagated. Instead sampled spans are transmitted out of process asynchronously, in the background, to Jaeger Agents.

The instrumentation has very little overhead, and is designed to be always enabled in production.

Note that while all traces are generated, only a few are sampled. Sampling a trace marks the trace for further processing and storage. By default, Jaeger client samples 0.1% of traces (1 in 1000), and has the ability to retrieve sampling strategies from the agent.

Agent

The Jaeger agent is a network daemon that listens for spans sent over UDP, which it batches and sends to the collector. It is designed to be deployed to all hosts as an infrastructure component. The agent abstracts the routing and discovery of the collectors away from the client.

Collector

The Jaeger collector receives traces from Jaeger agents and runs them through a processing pipeline. Currently our pipeline validates traces, indexes them, performs any transformations, and finally stores them.

Jaeger’s storage is a pluggable component which currently supports Cassandra, Elasticsearch and Kafka.

Query

Query is a service that retrieves traces from storage and hosts a UI to display them.

Ingester

Ingester is a service that reads from Kafka topic and writes to another storage backend (Cassandra, Elasticsearch).

部署

Agent

Jaeger client libraries expect jaeger-agent process to run locally on each host.

It can be executed directly on the host or via Docker, as follows:

## make sure to expose only the ports you use in your deployment scenario!
docker run \
  --rm \
  -p6831:6831/udp \
  -p6832:6832/udp \
  -p5778:5778/tcp \
  -p5775:5775/udp \
  jaegertracing/jaeger-agent:1.12

The agents can connect point to point to a single collector address, which could be load balanced by another infrastructure component (e.g. DNS) across multiple collectors. The agent can also be configured with a static list of collector addresses.

On Docker, a command like the following can be used:

docker run \
  --rm \
  -p5775:5775/udp \
  -p6831:6831/udp \
  -p6832:6832/udp \
  -p5778:5778/tcp \
  jaegertracing/jaeger-agent:1.12 \
  --reporter.grpc.host-port=jaeger-collector.jaeger-infra.svc:14250

When using gRPC, you have several options for load balancing and name resolution:

Single connection and no load balancing. This is the default if you specify a single host:port. (example: --reporter.grpc.host-port=jaeger-collector.jaeger-infra.svc:14250)
Static list of hostnames and round-robin load balancing. This is what you get with a comma-separated list of addresses. (example: reporter.grpc.host-port=jaeger-collector1:14250,jaeger-collector2:14250,jaeger-collector3:14250)
Dynamic DNS resolution and round-robin load balancing. To get this behaviour, prefix the address with dns:/// and gRPC will attempt to resolve the hostname using SRV records (for external load balancing), TXT records (for service configs), and A records. Refer to the gRPC Name Resolution docs and the dns_resolver.go implementation for more info. (example: --reporter.grpc.host-port=dns:///jaeger-collector.jaeger-infra.svc:14250)

Collectors

The collectors are stateless and thus many instances of jaeger-collector can be run in parallel. Collectors require almost no configuration, except for the location of Cassandra cluster, via --cassandra.keyspace and --cassandra.servers options, or the location of Elasticsearch cluster, via --es.server-urls, depending on which storage is specified. To see all command line options run

go run ./cmd/collector/main.go -h

or, if you don’t have the source code

docker run -it --rm jaegertracing/jaeger-collector:1.12 -h

Storage Backends

Collectors require a persistent storage backend. Cassandra and Elasticsearch are the primary supported storage backends.

The storage type can be passed via SPAN_STORAGE_TYPE environment variable. Valid values are cassandra, elasticsearch, kafka (only as a buffer), grpc-plugin, badger (only with all-in-one) and memory (only with all-in-one).

Elasticsearch

Supported in Jaeger since 0.6.0 Supported versions: 5.x, 6.x

Elasticsearch does not require initialization other than installing and running Elasticsearch. Once it is running, pass the correct configuration values to the Jaeger collector and query service.

Configuration

Minimal

docker run \
  -e SPAN_STORAGE_TYPE=elasticsearch \
  -e ES_SERVER_URLS=<...> \
  jaegertracing/jaeger-collector:1.12

To view the full list of configuration options, you can run the following command:

docker run \
  -e SPAN_STORAGE_TYPE=elasticsearch \
  jaegertracing/jaeger-collector:1.12 \
  --help

more info

微服務框架接入opentracing流程

一個微服務框架包括兩個部分，http(gin) & grpc兩部分，對外提供rest，對內提供grpc服務。

微服務軟件框架圖:

以下是微服務框架接入opentracing的大概流程。

爲每個http請求創建一個tracer

tracer, closer := tracing.Init("hello-world")
defer closer.Close()
opentracing.SetGlobalTracer(tracer)

創建Span，如果http header中有trace和span信息，則從頭部獲取，否則創建新的。

spanCtx, _ := tracer.Extract(opentracing.HTTPHeaders, opentracing.HTTPHeadersCarrier(r.Header))
span := tracer.StartSpan("format", ext.RPCServerOption(spanCtx))

defer span.Finish()

// 把span寫入context，函數見的內部調用需要傳遞ctx，或者說span之間需要傳遞ctx。
ctx := opentracing.ContextWithSpan(context.Background(), span)

Http/GRPC 服務函數的進程內部

span, _ := opentracing.StartSpanFromContext(ctx, "formatString")
defer span.Finish()


// 跨進程調用，如調用一個rest api，則需要把span信息注入http header中。
// tracing.InjectToHeaders(ctx, "GET", url, req.Header)
func InjectToHeaders(ctx context.Context, method string, url string, header http.Header) {
    span := opentracing.SpanFromContext(ctx)
    if span != nil {
        ext.SpanKindRPCClient.Set(span)
        ext.HTTPUrl.Set(span, url)
        ext.HTTPMethod.Set(span, "GET")
        span.Tracer().Inject(
            span.Context(),
            opentracing.HTTPHeaders,
            opentracing.HTTPHeadersCarrier(header),
        )
    }
}

span.LogFields(
        log.String("event", "string-format"),
        log.String("value", helloStr),
)

如何埋點

Gin 框架

router 埋點

在每個需要追蹤請求的http路由方法上，添加「tracing.NewSpan」函數。

import ".../go_common/tracing"

...

authorized := r.Group("/v1")
authorized.Use(handlers.TokenCheck, handlers.MustLogin())
{
    authorized.GET("/user/:id", handlers.GetUserInfo)
    authorized.GET("/user", handlers.GetUserInfoByToken)
    authorized.PUT("/user/:id", tracing.NewSpan("put /user/:id", "handlers.Setting", false), handlers.Setting)
}

參數說明

NewSpan(service string, operationName string, abortOnErrors bool, opts ...opentracing.StartSpanOption)

service generally fill with the endpoint of api.
operationName can be filled with HandleFunc's name.

Handler 函數埋點

func Setting(c *gin.Context) {
    ...
    // 從gin context中獲取span；必須埋點！
    span, found := tracing.GetSpan(c)

    //添加tag和log
    if found == true && span != nil {
        span.SetTag("req", req)
        span.LogFields(
            log.Object("uid", uid),
        )
    }

    // opentracing.ContextWithSpan，將span和context綁定；在handler函數中，這個地方也是必須埋點的。
    ctx, cancel := context.WithTimeout(opentracing.ContextWithSpan(context.Background(), span), time.Second*3)
    defer cancel()

    // call by grpc；這塊不需要特殊處理
    auth := passportpb.Authentication{
        LoginToken: c.GetHeader("Qsc-Peduli-Token"),
    }
    cli, _ := passportpb.Dial(ctx, grpc.WithPerRPCCredentials(&auth))
    reply, err := cli.Setting(ctx, req)

    // directly call by local rpc method；這塊不需要特殊處理
    ctx = metadata.AppendToOutgoingContext(ctx, "logintoken", c.GetHeader("Qsc-Peduli-Token"))
    reply, err := rpc.Srv.Setting(ctx, req)

    ...
}

GRPC

grpc 客戶端SDK

import "github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing"

...

// Dial grpc server
func (c *Client) Dial(serviceName string, opts ...grpc.DialOption) (*grpc.ClientConn, error) {
    
    ...
    
    unaryInterceptor := grpc_middleware.ChainUnaryClient(
        grpc_opentracing.UnaryClientInterceptor(),
    )

    c.Dialopts = append(c.Dialopts, grpc.WithUnaryInterceptor(unaryInterceptor))

    conn, err := grpc.Dial(serviceName, c.Dialopts...)
    if err != nil {
        return nil, fmt.Errorf("Failed to dial %s: %v", serviceName, err)
    }
    return conn, nil
}

grpc 服務端SDK

import "github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing"

...

func NewServer(serviceName, addr string) *Server {
    var opts []grpc.ServerOption
    opts = append(opts, grpc_middleware.WithUnaryServerChain(
        grpc_opentracing.UnaryServerInterceptor(),
    ))

    srv := grpc.NewServer(opts...)
    return &Server{
        serviceName: serviceName,
        addr:        addr,
        grpcServer:  srv,
    }
}

RPC 函數埋點

func (s *Service) Setting(ctx context.Context, req *passportpb.UserSettingRequest) (*passportpb.UserSettingReply, error) {
    // 如果不是grpc調用，即本地rpc函數調用方式，則從上下文中提取span。
    if !s.meta.IsGrpcRequest(ctx) {
    span, _ := opentracing.StartSpanFromContext(ctx, "rpc.srv.Setting")
    defer span.Finish()
    }

    // 如果在rpc函數中，存在請求其它grpc函數，則正常調用即可，因爲在grpc的請求上下文中已經有了trace和span信息，直接繁殖就行，無需額外操作。
    reqVerify := new(passportpb.VerifyRequest)
    reqVerify.UID = req.UserID
    cli, _ := passportpb.Dial(ctx)
    replyV, _ := cli.Verify(ctx, reqVerify)
}

Jaeger UI 最終效果

參考

go-opentracing-guides

opentracing 文檔中文版