Go是如何實現protobuf的編解碼的(2):源碼

原文鏈接:https://mp.weixin.qq.com/s/oY...

這是一篇姊妹篇文章,淺析一下Go是如何實現protobuf編解碼的:

  1. Go是如何實現protobuf的編解碼的(1): 原理
  2. Go是如何實現protobuf的編解碼的(2): 源碼

本編是第二篇。

前言

上一篇文章Go是如何實現protobuf的編解碼的(1):原理
中已經指出了Go語言數據和Protobuf數據的編解碼是由包github.com/golang/protobuf/proto完成的,本編就來分析一下proto包是如何實現編解碼的。

編解碼原理

編解碼包都有支持的編解碼類型,我們暫且把這些類型稱爲底層類型,編解碼的本質是:

  1. 爲每一個底層類型配備一個或多個編解碼函數
  2. 把一個結構體的字段,遞歸的拆解成底層類型,然後選擇合適的函數進行編碼或解碼操作

接下來先看編碼,再看解碼。

編碼

約定:以下所有的代碼片,如果是request.pb.go或main.go中的代碼,會在第一行標記文件名,否則都是proto包的源碼。
// main.go
package main

import (
    "fmt"

    "./types"
    "github.com/golang/protobuf/proto"
)

func main() {
    req := &types.Request{Data: "Hello Dabin"}

    // Marshal
    encoded, err := proto.Marshal(req)
    if err != nil {
        fmt.Printf("Encode to protobuf data error: %v", err)
    }
    ...
}

編碼調用的是proto.Marshal函數,它可以完成的是Go語言數據序列化成protobuf數據,返回序列化結果或錯誤。

proto編譯成的Go結構體都是符合Message接口的,從Marshal可知Go結構體有3種序列化方式:

  1. pb Message滿足newMarshaler接口,則調用XXX_Marshal()進行序列化。
  2. pb滿足Marshaler接口,則調用Marshal()進行序列化,這種方式適合某類型自定義序列化規則的情況。
  3. 否則,使用默認的序列化方式,創建一個Warpper,利用wrapper對pb進行序列化,後面會介紹方式1實際就是使用方式3。
// Marshal takes a protocol buffer message
// and encodes it into the wire format, returning the data.
// This is the main entry point.
func Marshal(pb Message) ([]byte, error) {
    if m, ok := pb.(newMarshaler); ok {
        siz := m.XXX_Size()
        b := make([]byte, 0, siz)
        return m.XXX_Marshal(b, false)
    }
    if m, ok := pb.(Marshaler); ok {
        // If the message can marshal itself, let it do it, for compatibility.
        // NOTE: This is not efficient.
        return m.Marshal()
    }
    // in case somehow we didn't generate the wrapper
    if pb == nil {
        return nil, ErrNil
    }
    var info InternalMessageInfo
    siz := info.Size(pb)
    b := make([]byte, 0, siz)
    return info.Marshal(b, pb, false)
}

newMarshalerMarshaler如下:

// newMarshaler is the interface representing objects that can marshal themselves.
//
// This exists to support protoc-gen-go generated messages.
// The proto package will stop type-asserting to this interface in the future.
//
// DO NOT DEPEND ON THIS.
type newMarshaler interface {
    XXX_Size() int
    XXX_Marshal(b []byte, deterministic bool) ([]byte, error)
}

// Marshaler is the interface representing objects that can marshal themselves.
type Marshaler interface {
    Marshal() ([]byte, error)
}

Request實現了newMarshaler接口,XXX_Marshal實現如下,它實際是調用了xxx_messageInfo_Request.Marshalxxx_messageInfo_Request是定義在request.pb.go中的一個全局變量,類型就是InternalMessageInfo,實際就是前文提到的wrapper。

// request.pb.go
func (m *Request) XXX_Marshal(b []byte, deterministic bool) ([]byte, error) {
    print("Called xxx marshal\n")
    panic("I want see stack trace")
    return xxx_messageInfo_Request.Marshal(b, m, deterministic)
}

var xxx_messageInfo_Request proto.InternalMessageInfo

本質上,XXX_Marshal也是wrapper,後面纔是真正序列化的主體函數在proto包中。

InternalMessageInfo主要是用來緩存序列化和反序列化需要用到的信息。

// InternalMessageInfo is a type used internally by generated .pb.go files.
// This type is not intended to be used by non-generated code.
// This type is not subject to any compatibility guarantee.
type InternalMessageInfo struct {
    marshal   *marshalInfo   // marshal信息
    unmarshal *unmarshalInfo // unmarshal信息
    merge     *mergeInfo
    discard   *discardInfo
}

InternalMessageInfo.Marshal首先是獲取待序列化類型的序列化信息u marshalInfo,然後利用u.marshal進行序列化。

// Marshal is the entry point from generated code,
// and should be ONLY called by generated code.
// It marshals msg to the end of b.
// a is a pointer to a place to store cached marshal info.
func (a *InternalMessageInfo) Marshal(b []byte, msg Message, deterministic bool) ([]byte, error) {
    // 獲取該message類型的MarshalInfo,這些信息都緩存起來了
    // 大量併發時無需重複創建
    u := getMessageMarshalInfo(msg, a)
    // 入參校驗
    ptr := toPointer(&msg)
    if ptr.isNil() {
        // We get here if msg is a typed nil ((*SomeMessage)(nil)),
        // so it satisfies the interface, and msg == nil wouldn't
        // catch it. We don't want crash in this case.
        return b, ErrNil
    }
    // 根據MarshalInfo對數據進行marshal
    return u.marshal(b, ptr, deterministic)
}

由於每種類型的序列化信息是一致的,所以getMessageMarshalInfo對序列化信息進行了緩存,緩存在a.marshal中,如果a中不存在marshal信息,則去生成,但不進行初始化,然後保存到a中。

func getMessageMarshalInfo(msg interface{}, a *InternalMessageInfo) *marshalInfo {
    // u := a.marshal, but atomically.
    // We use an atomic here to ensure memory consistency.
    // 從InternalMessageInfo中讀取
    u := atomicLoadMarshalInfo(&a.marshal)
    // 讀取不到代表未保存過
    if u == nil {
        // Get marshal information from type of message.
        t := reflect.ValueOf(msg).Type()
        if t.Kind() != reflect.Ptr {
            panic(fmt.Sprintf("cannot handle non-pointer message type %v", t))
        }
        u = getMarshalInfo(t.Elem())
        // Store it in the cache for later users.
        // a.marshal = u, but atomically.
        atomicStoreMarshalInfo(&a.marshal, u)
    }
    return u
}

getMarshalInfo只是創建了一個marshalInfo對象,填充了字段typ,剩餘的字段未填充。

// getMarshalInfo returns the information to marshal a given type of message.
// The info it returns may not necessarily initialized.
// t is the type of the message (NOT the pointer to it).
// 獲取MarshalInfo結構體,如果不存在則使用message類型t創建1個
func getMarshalInfo(t reflect.Type) *marshalInfo {
    marshalInfoLock.Lock()
    u, ok := marshalInfoMap[t]
    if !ok {
        u = &marshalInfo{typ: t}
        marshalInfoMap[t] = u
    }
    marshalInfoLock.Unlock()
    return u
}

// marshalInfo is the information used for marshaling a message.
type marshalInfo struct {
    typ          reflect.Type
    fields       []*marshalFieldInfo
    unrecognized field                      // offset of XXX_unrecognized
    extensions   field                      // offset of XXX_InternalExtensions
    v1extensions field                      // offset of XXX_extensions
    sizecache    field                      // offset of XXX_sizecache
    initialized  int32                      // 0 -- only typ is set, 1 -- fully initialized
    messageset   bool                       // uses message set wire format
    hasmarshaler bool                       // has custom marshaler
    sync.RWMutex                            // protect extElems map, also for initialization
    extElems     map[int32]*marshalElemInfo // info of extension elements
}

marshalInfo.marshal是Marshal真實主體,會判斷u是否已經初始化,如果未初始化調用computeMarshalInfo計算Marshal需要的信息,實際就是填充marshalInfo中的各種字段。

u.hasmarshaler代表當前類型是否實現了Marshaler接口,直接調用Marshal函數進行序列化。可以確定Marshal函數的序列化方式2,即實現Marshaler接口的方法,最後肯定也會調用marshalInfo.marshal

該函數的主體是一個for循環,依次遍歷該類型的每一個字段,對required屬性進行校驗,然後按字段類型,調用f.marshaler對該字段類型進行序列化。這個f.marshaler哪來的呢?

// marshal is the main function to marshal a message. It takes a byte slice and appends
// the encoded data to the end of the slice, returns the slice and error (if any).
// ptr is the pointer to the message.
// If deterministic is true, map is marshaled in deterministic order.
// 該函數是Marshal的主體函數,把消息編碼爲數據後,追加到b之後,最後返回b。
// deterministic爲true代表map會以確定的順序進行編碼。
func (u *marshalInfo) marshal(b []byte, ptr pointer, deterministic bool) ([]byte, error) {
    // 初始化marshalInfo的基礎信息
    // 主要是根據已有信息填充該結構體的一些字段
    if atomic.LoadInt32(&u.initialized) == 0 {
        u.computeMarshalInfo()
    }

    // If the message can marshal itself, let it do it, for compatibility.
    // NOTE: This is not efficient.
    // 如果該類型實現了Marshaler接口,即能夠對自己Marshal,則自行Marshal
    // 結果追加到b
    if u.hasmarshaler {
        m := ptr.asPointerTo(u.typ).Interface().(Marshaler)
        b1, err := m.Marshal()
        b = append(b, b1...)
        return b, err
    }

    var err, errLater error
    // The old marshaler encodes extensions at beginning.
    // 檢查擴展字段,把message的擴展字段追加到b
    if u.extensions.IsValid() {
        // offset函數用來根據指針偏移量獲取message的指定字段
        e := ptr.offset(u.extensions).toExtensions()
        if u.messageset {
            b, err = u.appendMessageSet(b, e, deterministic)
        } else {
            b, err = u.appendExtensions(b, e, deterministic)
        }
        if err != nil {
            return b, err
        }
    }
    if u.v1extensions.IsValid() {
        m := *ptr.offset(u.v1extensions).toOldExtensions()
        b, err = u.appendV1Extensions(b, m, deterministic)
        if err != nil {
            return b, err
        }
    }

    // 遍歷message的每一個字段,檢查並做編碼,然後追加到b
    for _, f := range u.fields {
        if f.required {
            // 如果required的字段未設置,則記錄錯誤,所有的marshal工作完成後再處理
            if ptr.offset(f.field).getPointer().isNil() {
                // Required field is not set.
                // We record the error but keep going, to give a complete marshaling.
                if errLater == nil {
                    errLater = &RequiredNotSetError{f.name}
                }
                continue
            }
        }
        // 字段爲指針類型,並且爲nil,代表未設置,該字段無需編碼
        if f.isPointer && ptr.offset(f.field).getPointer().isNil() {
            // nil pointer always marshals to nothing
            continue
        }
        // 利用這個字段的marshaler進行編碼
        b, err = f.marshaler(b, ptr.offset(f.field), f.wiretag, deterministic)
        if err != nil {
            if err1, ok := err.(*RequiredNotSetError); ok {
                // required字段但未設置錯誤
                // Required field in submessage is not set.
                // We record the error but keep going, to give a complete marshaling.
                if errLater == nil {
                    errLater = &RequiredNotSetError{f.name + "." + err1.field}
                }
                continue
            }
            // “動態數組”中包含nil元素
            if err == errRepeatedHasNil {
                err = errors.New("proto: repeated field " + f.name + " has nil element")
            }
            if err == errInvalidUTF8 {
                if errLater == nil {
                    fullName := revProtoTypes[reflect.PtrTo(u.typ)] + "." + f.name
                    errLater = &invalidUTF8Error{fullName}
                }
                continue
            }
            return b, err
        }
    }
    // 爲識別的類型字段,直接轉爲bytes,追加到b
    // computeMarshalInfo中已經收集這些字段
    if u.unrecognized.IsValid() {
        s := *ptr.offset(u.unrecognized).toBytes()
        b = append(b, s...)
    }
    return b, errLater
}

computeMarshalInfo實際上就是對要序列化的類型,進行一次全面檢查,設置好序列化要使用的數據,這其中就包含了各字段的序列化函數f.marshaler。我們就重點關注下這部分,struct的每一個字段都會分配一個marshalFieldInfo,代表這個字段序列化需要的信息,會調用computeMarshalFieldInfo會填充這個對象。

// computeMarshalInfo initializes the marshal info.
func (u *marshalInfo) computeMarshalInfo() {
    // 加鎖,代表了不能同時計算marshal信息
    u.Lock()
    defer u.Unlock()
    // 計算1次即可
    if u.initialized != 0 { // non-atomic read is ok as it is protected by the lock
        return
    }

    // 獲取要marshal的message類型
    t := u.typ
    u.unrecognized = invalidField
    u.extensions = invalidField
    u.v1extensions = invalidField
    u.sizecache = invalidField

    // If the message can marshal itself, let it do it, for compatibility.
    // 判斷當前類型是否實現了Marshal接口,如果實現標記爲類型自有marshaler
    // 沒用類型斷言是因爲t是Type類型,不是保存在某個接口的變量
    // NOTE: This is not efficient.
    if reflect.PtrTo(t).Implements(marshalerType) {
        u.hasmarshaler = true
        atomic.StoreInt32(&u.initialized, 1)
        // 可以直接返回了,後面使用自有的marshaler編碼
        return
    }

    // get oneof implementers
    // 看*t實現了以下哪個接口,oneof特性
    var oneofImplementers []interface{}
    switch m := reflect.Zero(reflect.PtrTo(t)).Interface().(type) {
    case oneofFuncsIface:
        _, _, _, oneofImplementers = m.XXX_OneofFuncs()
    case oneofWrappersIface:
        oneofImplementers = m.XXX_OneofWrappers()
    }

    n := t.NumField()

    // deal with XXX fields first
    // 遍歷t的每一個XXX字段
    for i := 0; i < t.NumField(); i++ {
        f := t.Field(i)
        // 跳過非XXX開頭的字段
        if !strings.HasPrefix(f.Name, "XXX_") {
            continue
        }
        // 處理以下幾個protobuf自帶的字段
        switch f.Name {
        case "XXX_sizecache":
            u.sizecache = toField(&f)
        case "XXX_unrecognized":
            u.unrecognized = toField(&f)
        case "XXX_InternalExtensions":
            u.extensions = toField(&f)
            u.messageset = f.Tag.Get("protobuf_messageset") == "1"
        case "XXX_extensions":
            u.v1extensions = toField(&f)
        case "XXX_NoUnkeyedLiteral":
            // nothing to do
        default:
            panic("unknown XXX field: " + f.Name)
        }
        n--
    }

    // normal fields
    // 處理message的普通字段
    fields := make([]marshalFieldInfo, n) // batch allocation
    u.fields = make([]*marshalFieldInfo, 0, n)
    for i, j := 0, 0; i < t.NumField(); i++ {
        f := t.Field(i)

        // 跳過XXX字段
        if strings.HasPrefix(f.Name, "XXX_") {
            continue
        }

        // 取fields的下一個有效字段,指針類型
        // j代表了fields有效字段數量,n是包含了XXX字段的總字段數量
        field := &fields[j]
        j++
        field.name = f.Name
        // 填充到u.fields
        u.fields = append(u.fields, field)
        // 字段的tag裏包含“protobuf_oneof”特殊處理
        if f.Tag.Get("protobuf_oneof") != "" {
            field.computeOneofFieldInfo(&f, oneofImplementers)
            continue
        }
        // 字段裏不包含“protobuf”,代表不是protoc自動生成的字段
        if f.Tag.Get("protobuf") == "" {
            // field has no tag (not in generated message), ignore it
            // 刪除剛剛保存的字段信息
            u.fields = u.fields[:len(u.fields)-1]
            j--
            continue
        }
        // 填充字段的marshal信息
        field.computeMarshalFieldInfo(&f)
    }

    // fields are marshaled in tag order on the wire.
    // 字段排序
    sort.Sort(byTag(u.fields))

    // 初始化完成
    atomic.StoreInt32(&u.initialized, 1)
}

回顧一下Request的定義,它包含1個字段Data,後面protobuf:...描述了protobuf要使用的信息,"bytes,..."這段被稱爲tags,用逗號進行分割後,其中:

  • tags[0]: bytes,代表Data類型的數據要被轉換爲bytes
  • tags[1]: 1,代表了字段的ID
  • tags[2]: opt,代表可行,非必須
  • tags[3]: name=data,proto文件中的名稱
  • tags[4]: proto3,代表使用的protobuf版本
// request.pb.go
type Request struct{
    Data                 string   `protobuf:"bytes,1,opt,name=data,proto3" json:"data,omitempty"`
    ...
}

computeMarshalFieldInfo首先要獲取字段ID和要轉換的類型,填充到marshalFieldInfo,然後調用setMarshaler利用字段f和tags獲取該字段類型的序列化函數。

// computeMarshalFieldInfo fills up the information to marshal a field.
func (fi *marshalFieldInfo) computeMarshalFieldInfo(f *reflect.StructField) {
    // parse protobuf tag of the field.
    // tag has format of "bytes,49,opt,name=foo,def=hello!"
    // 獲取"protobuf"的完整tag,然後使用,分割,得到上面的格式
    tags := strings.Split(f.Tag.Get("protobuf"), ",")
    if tags[0] == "" {
        return
    }
    // tag的編號,即message中設置的string name = x,則x就是這個字段的tag id
    tag, err := strconv.Atoi(tags[1])
    if err != nil {
        panic("tag is not an integer")
    }
    // 要轉換成的類型,bytes,varint等等
    wt := wiretype(tags[0])
    // 設置字段是required還是opt
    if tags[2] == "req" {
        fi.required = true
    }
    // 設置field和tag信息到marshalFieldInfo
    fi.setTag(f, tag, wt)
    // 根據當前的tag信息(類型等),選擇marshaler函數
    fi.setMarshaler(f, tags)
}

setMarshaler的重點是typeMarshalertypeMarshaler這個函數非常長,其實就是根據類型設置返回對於的序列化函數,比如Bool、Int32、Uint32...,如果是結構體、切片等複合類型,就可以形成遞歸了。

// setMarshaler fills up the sizer and marshaler in the info of a field.
func (fi *marshalFieldInfo) setMarshaler(f *reflect.StructField, tags []string) {
    // map類型字段特殊處理
    switch f.Type.Kind() {
    case reflect.Map:
        // map field
        fi.isPointer = true
        fi.sizer, fi.marshaler = makeMapMarshaler(f)
        return
    case reflect.Ptr, reflect.Slice:
        // 指針字段和切片字段標記指針類型
        fi.isPointer = true
    }

    // 根據字段類型和tag選擇marshaler
    fi.sizer, fi.marshaler = typeMarshaler(f.Type, tags, true, false)
}

// typeMarshaler returns the sizer and marshaler of a given field.
// t is the type of the field.
// tags is the generated "protobuf" tag of the field.
// If nozero is true, zero value is not marshaled to the wire.
// If oneof is true, it is a oneof field.
// 函數非常長,省略內容
func typeMarshaler(t reflect.Type, tags []string, nozero, oneof bool) (sizer, marshaler) {
    ...
    switch t.Kind() {
    case reflect.Bool:
        if pointer {
            return sizeBoolPtr, appendBoolPtr
        }
        if slice {
            if packed {
                return sizeBoolPackedSlice, appendBoolPackedSlice
            }
            return sizeBoolSlice, appendBoolSlice
        }
        if nozero {
            return sizeBoolValueNoZero, appendBoolValueNoZero
        }
        return sizeBoolValue, appendBoolValue
    case reflect.Uint32:
    ...
    case reflect.Int32:
    ....
    case reflect.Struct:
    ...
}

以下是Bool和String類型的2個序列化函數示例:

func appendBoolValue(b []byte, ptr pointer, wiretag uint64, _ bool) ([]byte, error) {
    v := *ptr.toBool()
    b = appendVarint(b, wiretag)
    if v {
        b = append(b, 1)
    } else {
        b = append(b, 0)
    }
    return b, nil
}
func appendStringValue(b []byte, ptr pointer, wiretag uint64, _ bool) ([]byte, error) {
    v := *ptr.toString()
    b = appendVarint(b, wiretag)
    b = appendVarint(b, uint64(len(v)))
    b = append(b, v...)
    return b, nil
}

所以序列化後的[]byte,應當是符合這種模式:

| wiretag | data | wiretag | data | ... | data |

OK,以上就是編碼的主要流程,簡單回顧一下:

  1. proto.Marshal會調用*.pb.go中自動生成的Wrapper函數,Wrapper函數會調用InternalMessageInfo進行序列化,然後才步入序列化的正題
  2. 首先獲取要序列化類型的marshal信息u,如果u沒有初始化,則進行初始化,即設置好結構體每個字段的序列化函數,以及其他信息
  3. 遍歷結構體的每個字段,使用u中的信息爲每個字段進行編碼,並把加過追加到[]byte,所以字段編碼完成,則返回序列化的結果[]byte或者錯誤。

解碼

解碼的流程其實與編碼很類似,會是上面回顧的3大步驟,主要的區別在步驟2:它要獲取的是序列化類型的unmarshal信息u,如果u沒有初始化,會進行初始化,設置的是結構體每個字段的反序列化函數,以及其他信息。

所以解碼的函數解析會簡要的過一遍,不再有編碼那麼詳細的解釋。

下面是proto包中反序列化的接口和函數定義:

// Unmarshaler is the interface representing objects that can
// unmarshal themselves.  The argument points to data that may be
// overwritten, so implementations should not keep references to the
// buffer.
// Unmarshal implementations should not clear the receiver.
// Any unmarshaled data should be merged into the receiver.
// Callers of Unmarshal that do not want to retain existing data
// should Reset the receiver before calling Unmarshal.
type Unmarshaler interface {
    Unmarshal([]byte) error
}

// newUnmarshaler is the interface representing objects that can
// unmarshal themselves. The semantics are identical to Unmarshaler.
//
// This exists to support protoc-gen-go generated messages.
// The proto package will stop type-asserting to this interface in the future.
//
// DO NOT DEPEND ON THIS.
type newUnmarshaler interface {
    // 實現了XXX_Unmarshal
    XXX_Unmarshal([]byte) error
}

// Unmarshal parses the protocol buffer representation in buf and places the
// decoded result in pb.  If the struct underlying pb does not match
// the data in buf, the results can be unpredictable.
//
// Unmarshal resets pb before starting to unmarshal, so any
// existing data in pb is always removed. Use UnmarshalMerge
// to preserve and append to existing data.
func Unmarshal(buf []byte, pb Message) error {
    pb.Reset()
    // pb自己有unmarshal函數,實現了newUnmarshaler接口
    if u, ok := pb.(newUnmarshaler); ok {
        return u.XXX_Unmarshal(buf)
    }
    // pb自己有unmarshal函數,實現了Unmarshaler接口
    if u, ok := pb.(Unmarshaler); ok {
        return u.Unmarshal(buf)
    }
    // 使用默認的Unmarshal
    return NewBuffer(buf).Unmarshal(pb)
}

Request實現了Unmarshaler接口:

// request.pb.go
func (m *Request) XXX_Unmarshal(b []byte) error {
    return xxx_messageInfo_Request.Unmarshal(m, b)
}

反序列化也是使用InternalMessageInfo進行。

// Unmarshal is the entry point from the generated .pb.go files.
// This function is not intended to be used by non-generated code.
// This function is not subject to any compatibility guarantee.
// msg contains a pointer to a protocol buffer struct.
// b is the data to be unmarshaled into the protocol buffer.
// a is a pointer to a place to store cached unmarshal information.
func (a *InternalMessageInfo) Unmarshal(msg Message, b []byte) error {
    // Load the unmarshal information for this message type.
    // The atomic load ensures memory consistency.
    // 獲取保存在a中的unmarshal信息
    u := atomicLoadUnmarshalInfo(&a.unmarshal)
    if u == nil {
        // Slow path: find unmarshal info for msg, update a with it.
        u = getUnmarshalInfo(reflect.TypeOf(msg).Elem())
        atomicStoreUnmarshalInfo(&a.unmarshal, u)
    }
    // Then do the unmarshaling.
    // 執行unmarshal
    err := u.unmarshal(toPointer(&msg), b)
    return err
}

以下是反序列化的主題函數,u未初始化時會調用computeUnmarshalInfo設置反序列化需要的信息。

// unmarshal does the main work of unmarshaling a message.
// u provides type information used to unmarshal the message.
// m is a pointer to a protocol buffer message.
// b is a byte stream to unmarshal into m.
// This is top routine used when recursively unmarshaling submessages.
func (u *unmarshalInfo) unmarshal(m pointer, b []byte) error {
    if atomic.LoadInt32(&u.initialized) == 0 {
        // 爲u填充unmarshal信息,以及設置每個字段類型的unmarshaler函數
        u.computeUnmarshalInfo()
    }
    if u.isMessageSet {
        return unmarshalMessageSet(b, m.offset(u.extensions).toExtensions())
    }
    var reqMask uint64 // bitmask of required fields we've seen.
    var errLater error
    for len(b) > 0 {
        // Read tag and wire type.
        // Special case 1 and 2 byte varints.
        var x uint64
        if b[0] < 128 {
            x = uint64(b[0])
            b = b[1:]
        } else if len(b) >= 2 && b[1] < 128 {
            x = uint64(b[0]&0x7f) + uint64(b[1])<<7
            b = b[2:]
        } else {
            var n int
            x, n = decodeVarint(b)
            if n == 0 {
                return io.ErrUnexpectedEOF
            }
            b = b[n:]
        }
        // 獲取tag和wire標記
        tag := x >> 3
        wire := int(x) & 7

        // Dispatch on the tag to one of the unmarshal* functions below.
        // 根據tag選擇該類型的unmarshalFieldInfo:f
        var f unmarshalFieldInfo
        if tag < uint64(len(u.dense)) {
            f = u.dense[tag]
        } else {
            f = u.sparse[tag]
        }
        // 如果該類型有unmarshaler函數,則執行解碼和錯誤處理
        if fn := f.unmarshal; fn != nil {
            var err error
            // 從b解析,然後填充到f的對應字段
            b, err = fn(b, m.offset(f.field), wire)
            if err == nil {
                reqMask |= f.reqMask
                continue
            }
            if r, ok := err.(*RequiredNotSetError); ok {
                // Remember this error, but keep parsing. We need to produce
                // a full parse even if a required field is missing.
                if errLater == nil {
                    errLater = r
                }
                reqMask |= f.reqMask
                continue
            }
            if err != errInternalBadWireType {
                if err == errInvalidUTF8 {
                    if errLater == nil {
                        fullName := revProtoTypes[reflect.PtrTo(u.typ)] + "." + f.name
                        errLater = &invalidUTF8Error{fullName}
                    }
                    continue
                }
                return err
            }
            // Fragments with bad wire type are treated as unknown fields.
        }

        // Unknown tag.
        // 跳過未知tag,可能是proto中的message定義升級了,增加了一些字段,使用老版本的,就不識別新的字段
        if !u.unrecognized.IsValid() {
            // Don't keep unrecognized data; just skip it.
            var err error
            b, err = skipField(b, wire)
            if err != nil {
                return err
            }
            continue
        }
        // 檢查未識別字段是不是extension
        // Keep unrecognized data around.
        // maybe in extensions, maybe in the unrecognized field.
        z := m.offset(u.unrecognized).toBytes()
        var emap map[int32]Extension
        var e Extension
        for _, r := range u.extensionRanges {
            if uint64(r.Start) <= tag && tag <= uint64(r.End) {
                if u.extensions.IsValid() {
                    mp := m.offset(u.extensions).toExtensions()
                    emap = mp.extensionsWrite()
                    e = emap[int32(tag)]
                    z = &e.enc
                    break
                }
                if u.oldExtensions.IsValid() {
                    p := m.offset(u.oldExtensions).toOldExtensions()
                    emap = *p
                    if emap == nil {
                        emap = map[int32]Extension{}
                        *p = emap
                    }
                    e = emap[int32(tag)]
                    z = &e.enc
                    break
                }
                panic("no extensions field available")
            }
        }

        // Use wire type to skip data.
        var err error
        b0 := b
        b, err = skipField(b, wire)
        if err != nil {
            return err
        }
        *z = encodeVarint(*z, tag<<3|uint64(wire))
        *z = append(*z, b0[:len(b0)-len(b)]...)

        if emap != nil {
            emap[int32(tag)] = e
        }
    }
    // 校驗解析到的required字段的數量,如果與u中記錄的不匹配,則報錯
    if reqMask != u.reqMask && errLater == nil {
        // A required field of this message is missing.
        for _, n := range u.reqFields {
            if reqMask&1 == 0 {
                errLater = &RequiredNotSetError{n}
            }
            reqMask >>= 1
        }
    }
    return errLater
}

設置字段反序列化函數的過程不看了,看一下怎麼選函數的,typeUnmarshaler是爲字段類型,選擇反序列化函數,這些函數選擇與序列化函數是一一對應的。

// typeUnmarshaler returns an unmarshaler for the given field type / field tag pair.
func typeUnmarshaler(t reflect.Type, tags string) unmarshaler {
    ...
    // Figure out packaging (pointer, slice, or both)
    slice := false
    pointer := false
    if t.Kind() == reflect.Slice && t.Elem().Kind() != reflect.Uint8 {
        slice = true
        t = t.Elem()
    }
    if t.Kind() == reflect.Ptr {
        pointer = true
        t = t.Elem()
    }
    ...
    switch t.Kind() {
    case reflect.Bool:
        if pointer {
            return unmarshalBoolPtr
        }
        if slice {
            return unmarshalBoolSlice
        }
        return unmarshalBoolValue
    }
}

unmarshalBoolValue是默認的Bool類型反序列化函數,會把protobuf數據b解碼,然後轉換爲bool類型v,最後賦值給字段f。

func unmarshalBoolValue(b []byte, f pointer, w int) ([]byte, error) {
    if w != WireVarint {
        return b, errInternalBadWireType
    }
    // Note: any length varint is allowed, even though any sane
    // encoder will use one byte.
    // See https://github.com/golang/protobuf/issues/76
    x, n := decodeVarint(b)
    if n == 0 {
        return nil, io.ErrUnexpectedEOF
    }
    // TODO: check if x>1? Tests seem to indicate no.
    // toBool是返回bool類型的指針
    // 完成對字段f的賦值
    v := x != 0
    *f.toBool() = v
    return b[n:], nil
}

總結

本文分析了Go語言protobuf數據的序列化和反序列過程,可以簡要概括爲:

  1. proto.Marshalproto.Unmarshal會調用*.pb.go中自動生成的Wrapper函數,Wrapper函數會調用InternalMessageInfo進行(反)序列化,然後才步入(反)序列化的正題
  2. 首先獲取要目標類型的(um)marshal信息u,如果u沒有初始化,則進行初始化,即設置好結構體每個字段的(反)序列化函數,以及其他信息
  3. 遍歷結構體的每個字段,使用u中的信息爲每個字段進行編碼,生成序列化的結果,或進行解碼,給結構體成員進行賦值

參考文章

以下參考文章都值得閱讀:

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章