Google ProtoBuf 一種語言無關、平臺無關、可擴展的序列化結構數據的方法

ProtoBuf

Welcome to the developer documentation for protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

protocol buffers 是一種語言無關、平臺無關、可擴展的序列化結構數據的方法。它可用於(數據)通信協議、數據存儲等。

官方網址:https://developers.google.cn/protocol-buffers/

Guide:https://developers.google.cn/protocol-buffers/docs/overview

github地址:https://github.com/protocolbuffers/protobuf

支持的語言

Protocol buffers currently support generated code in Java, Python, Objective-C, and C++. With our new proto3 language version, you can also work with Dart, Go, Ruby, and C#, with more languages to come.

它支持以下語言:

  • Java
  • Python
  • Objective-C
  • C++

在新的proto3版本中,又支持了:

  • Dart
  • Go
  • Ruby
  • C#
proto是如何工作的(C++舉例)?

You specify how you want the information you’re serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here’s a very basic example of a .proto file that defines a message containing information about a person:

在類型爲.proto的文件中定義你想要如何序列化結構數據的方法,舉個例子如下:

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

As you can see, the message format is simple – each message type has one or more uniquely numbered fields, and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes, or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically. You can specify optional fields, required fields, and repeated fields. You can find more information about writing .proto files in the Protocol Buffer Language Guide.

消息格式很簡單,每個消息類型都有一個或多個字段,每個字段都有一個名字和它的值類型,這個值類型可以是很多(整數,浮點,布爾,字符串,原始字節,或者是一個消息類型),允許你自由組織你數據的層次結構。這裏有三種屬性的字段,分別是:

  • optional
  • required
  • repeated

Once you’ve defined your messages, you run the protocol buffer compiler for your application’s language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes – so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:

它會將.proto翻譯成你需要語言的文件,提供了set方法等設置它的方法。你可以使用它去填充,序列化,檢索信息。你可能會寫一些如下的代碼:

Person person;
person.set_name("John Doe");
person.set_id(1234);
person.set_email("[email protected]");
fstream output("myfile", ios::out | ios::binary);
person.SerializeToOstream(&output);

Then, later on, you could read your message back in:

讀取:

fstream input("myfile", ios::in | ios::binary);
Person person;
person.ParseFromIstream(&input);
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.

它是向後兼容的。如果你添加了一個字段,舊的庫就會在轉換的時候會忽略新添加的字段。

You’ll find a complete reference for using generated protocol buffer code in the API Reference section, and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.

爲什麼不用XML?

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

因爲protobuf更快更簡單更輕巧更清晰更方便。

For example, let’s say you want to model a person with a name and an email. In XML, you need to do:

舉個例子說明一下,你給一個有名字和郵箱的人抽象出來,用XML你需要做:

<person>
    <name>John Doe</name>
    <email>jdoe@example.com</email>
  </person>

while the corresponding protocol buffer message (in protocol buffer text format) is:

但是ProtoBuf:

# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
  name: "John Doe"
  email: "[email protected]"
}

When this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.

protobuf更快更輕巧。

Also, manipulating a protocol buffer is much easier:

protobuf也更簡單:

cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

Whereas with XML you would have to do something like:

但是你用XML就是這個樣子:

cout << "Name: "
       << person.getElementsByTagName("name")->item(0)->innerText()
       << endl;
cout << "E-mail: "
       << person.getElementsByTagName("email")->item(0)->innerText()
       << endl;

However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

不是說ProtoBuf比起XML總是最優選擇,在標籤文本中(比如說HTML)就不是這樣。另外XML是一種易於理解,可以編輯的。但是ProtoBuf用了特定的格式。選擇合適的場景使用。

Introducing proto3

Our most recent version 3 release introduces a new language version - Protocol Buffers language version 3 (aka proto3), as well as some new features in our existing language version (aka proto2). Proto3 simplifies the protocol buffer language, both for ease of use and to make it available in a wider range of programming languages: our current release lets you generate protocol buffer code in Java, C++, Python, Java Lite, Ruby, JavaScript, Objective-C, and C#. In addition you can generate proto3 code for Go using the latest Go protoc plugin, available from the golang/protobuf Github repository. More languages are in the pipeline.

Note that the two language version APIs are not completely compatible. To avoid inconvenience to existing users, we will continue to support the previous language version in new protocol buffers releases.

You can see the major differences from the current default version in the release notes and learn about proto3 syntax in the Proto3 Language Guide. Full documentation for proto3 is coming soon!

(If the names proto2 and proto3 seem a little confusing, it’s because when we originally open-sourced protocol buffers it was actually Google’s second version of the language – also known as proto2. This is also why our open source version number started from v2.0.0).

proto3馬上就來啦。

它的歷史

Protocol buffers were initially developed at Google to deal with an index server request/response protocol. Prior to protocol buffers, there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol. This resulted in some very ugly code, like:

本來ProtoBuf是用來處理請求/響應協議的。但是協議的版本太多,於是就出現了一下的醜醜的代碼:

 if (version == 3) {
   ...
 } else if (version > 4) {
   if (version == 5) {
     ...
   }
   ...
 }

Explicitly formatted protocols also complicated the rollout of new protocol versions, because developers had to make sure that all servers between the originator of the request and the actual server handling the request understood the new protocol before they could flip a switch to start using the new protocol.

Protocol buffers were designed to solve many of these problems:

  • New fields could be easily introduced, and intermediate servers that didn’t need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.
  • Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)

However, users still needed to hand-write their own parsing code.

As the system evolved, it acquired a number of other features and uses:

  • Automatically-generated serialization and deserialization code avoided the need for hand parsing.
  • In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).
  • Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server’s interface.

Protocol buffers are now Google’s lingua franca for data – at time of writing, there are 306,747 different message types defined in the Google code tree across 348,952 .proto files. They’re used both in RPC systems and for persistent storage of data in a variety of storage systems.

ProtoBuf跟遠程調用(rpc)相關。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章