Google Protocol Buffer( 简称 Protobuf) 是 Google 公司内部的混合语言数据标准。Protocol Buffers 是一种轻便高效的结构化数据存储格式,可以用于结构化数据串行化,或者说序列化。它很适合做数据存储或 RPC 数据交换格式。可用于通讯协议、数据存储等领域的语言无关、平台无关、可扩展的序列化结构数据格式。目前提供了 C++、Java、Python 三种语言的 API。
Protobuf 的优点
Protobuf 有如 XML,不过它更小、更快、也更简单。你可以定义自己的数据结构,然后使用代码生成器生成的代码来读写这个数据结构。你甚至可以在无需重新部署程序的情况下更新数据结构。只需使用 Protobuf 对数据结构进行一次描述,即可利用各种不同语言或从各种不同数据流中对你的结构化数据轻松读写。
它有一个非常棒的特性,即“向后”兼容性好,人们不必破坏已部署的、依靠“老”数据格式的程序就可以对数据结构进行升级。这样您的程序就可以不必担心因为消息结构的改变而造成的大规模的代码重构或者迁移的问题。因为添加新的消息中的 field 并不会引起已经发布的程序的任何改变。
Protobuf 语义更清晰,无需类似 XML 解析器的东西(因为 Protobuf 编译器会将 .proto 文件编译生成对应的数据访问类以对 Protobuf 数据进行序列化、反序列化操作)。
使用 Protobuf 无需学习复杂的文档对象模型,Protobuf 的编程模式比较友好,简单易学,同时它拥有良好的文档和示例,对于喜欢简单事物的人们而言,Protobuf 比其他的技术更加有吸引力。
Protobuf 的不足
Protbuf 与 XML 相比也有不足之处。它功能简单,无法用来表示复杂的概念。
XML 已经成为多种行业标准的编写工具,Protobuf 只是 Google 公司内部使用的工具,在通用性上还差很多。
由于文本并不适合用来描述数据结构,所以 Protobuf 也不适合用来对基于文本的标记文档(如 HTML)建模。另外,由于 XML 具有某种程度上的自解释性,它可以被人直接读取编辑,在这一点上 Protobuf 不行,它以二进制的方式存储,除非你有 .proto 定义,否则你没法直接读出 Protobuf 的任何内容。
使用protobuf的原由
一个好的软件框架应该要有明确的输入和输出,对于CNN网络而言,其主要有两部分组成:网络具体结构和网络的具体优化算法及参数。对于框架的使用者而言,用户只需输入两个描述文件即可得到对该网络的优化结果,这无疑是非常方便的。
caffe框架选择使用谷歌的开源protobuf工具对这两部分进行描述,解析和存储,这一部分为caffe的实现节省了大量的代码。
如前面讲述的目标检测demo,py-faster-rcnn,其主要分为训练和测试两个过程,两个过程的核心文件都是prototxt格式的文本文件。
如训练过程
输入:
(1)slover.prototxt。描述网络训练时的各种参数文件,如训练的策略,学习率的变化率,模型保存的频率等参数
(2)train.prototxt。描述训练网络的网络结构文件。
(3)test.prototxt。描述测试网络的网络结构文件。
输出:
VGG16.caffemodel:保存的训练好的网络参数文件。
protobuf的使用流程
protobuf工具主要是数据序列化存储和解析。在实际使用的时候主要是作为一个代码自动生成工具来使用,通过生成对所定义的数据结构的标准读写代码,用户可以通过标准的读写接口从文件中进行数据的读取,解析和存储。
目前proto支持C++,python,java等语言,这里主要演示caffe中使用的C++调用。
主要使用过程为:
(1)编写XXX.proto文件。该文件里主要定义了各种数据结构及对应的数据类型,如int,string等。
(2)使用protoc对XXX.proto文件进行编译,生成对应的数据结构文件的读取和写入程序,程序接口都是标准化的。生成的文件一般名为XXX.pb.cc和XXX.pb.h。
(3)在新程序中使用XXX.pb.c和XXX.pb.h提供的代码。
简易caffe.proto编写解析示例
为了后面更加清楚的理解protobuf工具,这里一个简单的caffe.proto为例进行solver.prototxt和train.prototxt的解析
caffe.proto文件编写:
syntax = "proto2";
package caffe;//c++ namespace
message NetParameter {
optional string name = 1; // consider giving the network a name
repeated LayerParameter layer = 2; // ID 100 so layers are printed last.
}
message SolverParameter {
optional string train_net = 1;
optional float base_lr = 2;
optional string lr_policy = 3;
optional NetParameter net_param = 4;
}
message ParamSpec {
optional string name = 1;
optional float lr_mult = 3 [default = 1.0];
optional float decay_mult = 4 [default = 1.0];
}
// LayerParameter next available layer-specific ID: 147 (last added: recurrent_param)
message LayerParameter {
optional string name = 1; // the layer name
optional string type = 2; // the layer type
repeated string bottom = 3; // the name of each bottom blob
repeated string top = 4; // the name of each top blob
repeated ParamSpec param = 6;
// Layer type-specific parameters.
optional ConvolutionParameter convolution_param = 106;
optional PythonParameter python_param = 130;
}
message ConvolutionParameter {
optional uint32 num_output = 1; // The number of outputs for the layer
// Pad, kernel size, and stride are all given as a single value for equal
// dimensions in all spatial dimensions, or once per spatial dimension.
repeated uint32 pad = 3; // The padding size; defaults to 0
repeated uint32 kernel_size = 4; // The kernel size
repeated uint32 stride = 6; // The stride; defaults to 1
}
message PythonParameter {
optional string module = 1;
optional string layer = 2;
// This value is set to the attribute `param_str` of the `PythonLayer` object
// in Python before calling the `setup()` method. This could be a number,
// string, dictionary in Python dict format, JSON, etc. You may parse this
// string in `setup` method and use it in `forward` and `backward`.
optional string param_str = 3 [default = ''];
}
...
编译生成caffe.pb.cc与caffe.pb.h文件
protoc caffe.proto --cpp_out=.//在当前目录生成cpp文件及头文件
编写测试文件main.cpp
#include <fcntl.h>
#include <unistd.h>
#include <iostream>
#include <string>
#include <google/protobuf/io/coded_stream.h>
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/text_format.h>
#include "caffe.pb.h"
using namespace caffe;
using namespace std;
using google::protobuf::io::FileInputStream;
using google::protobuf::Message;
bool ReadProtoFromTextFile(const char* filename, Message* proto) {
int fd = open(filename, O_RDONLY);
FileInputStream* input = new FileInputStream(fd);
bool success = google::protobuf::TextFormat::Parse(input, proto);
delete input;
close(fd);
return success;
}
int main()
{
SolverParameter SGD;
if(!ReadProtoFromTextFile("solver.prototxt", &SGD))
{
cout<<"error opening file"<<endl;
return -1;
}
cout<<"hello,world"<<endl;
cout<<SGD.train_net()<<endl;
cout<<SGD.base_lr()<<endl;
cout<<SGD.lr_policy()<<endl;
NetParameter VGG16;
if(!ReadProtoFromTextFile("train.prototxt", &VGG16))
{
cout<<"error opening file"<<endl;
return -1;
}
cout<<VGG16.name()<<endl;
return 0;
}
编写solver与train网络描述文件
solver.prototxt内容
train_net: "/home/bryant/cuda-test/train.prototxt"
base_lr: 0.001
lr_policy: "step"
train.prototxt内容:
name: "VGG_ILSVRC_16_layers"
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 2"
}
}
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}
编译链接,生成main
g++ caffe.pb.cc main.cpp -o main -lprotobuf
运行结果
bryant@bryant:~/cuda-test/src$ ./main
hello,world
/home/bryant/cuda-test/train.prototxt
0.001
step
VGG_ILSVRC_16_layers
bryant@bryant:~/cuda-test/src$
cmake编译proto文件
在cmake中编译时可以直接在CMakeLists.txt中编译.proto文件,而不需要在终端中输入命令编译,这样使用时方便很多。protobuf提供了FindProtobuf.cmake文件,通过find_package()宏可以向CMakeLists.txt引入该文件。
该文件的内容如下:
#.rst:
# FindProtobuf
# ------------
#
# Locate and configure the Google Protocol Buffers library.
#
# The following variables can be set and are optional:
#
# ``PROTOBUF_SRC_ROOT_FOLDER``
# When compiling with MSVC, if this cache variable is set
# the protobuf-default VS project build locations
# (vsprojects/Debug and vsprojects/Release
# or vsprojects/x64/Debug and vsprojects/x64/Release)
# will be searched for libraries and binaries.
# ``PROTOBUF_IMPORT_DIRS``
# List of additional directories to be searched for
# imported .proto files.
#
# Defines the following variables:
#
# ``PROTOBUF_FOUND``
# Found the Google Protocol Buffers library
# (libprotobuf & header files)
# ``PROTOBUF_INCLUDE_DIRS``
# Include directories for Google Protocol Buffers
# ``PROTOBUF_LIBRARIES``
# The protobuf libraries
# ``PROTOBUF_PROTOC_LIBRARIES``
# The protoc libraries
# ``PROTOBUF_LITE_LIBRARIES``
# The protobuf-lite libraries
#
# The following cache variables are also available to set or use:
#
# ``PROTOBUF_LIBRARY``
# The protobuf library
# ``PROTOBUF_PROTOC_LIBRARY``
# The protoc library
# ``PROTOBUF_INCLUDE_DIR``
# The include directory for protocol buffers
# ``PROTOBUF_PROTOC_EXECUTABLE``
# The protoc compiler
# ``PROTOBUF_LIBRARY_DEBUG``
# The protobuf library (debug)
# ``PROTOBUF_PROTOC_LIBRARY_DEBUG``
# The protoc library (debug)
# ``PROTOBUF_LITE_LIBRARY``
# The protobuf lite library
# ``PROTOBUF_LITE_LIBRARY_DEBUG``
# The protobuf lite library (debug)
#
# Example:
#
# .. code-block:: cmake
#
# find_package(Protobuf REQUIRED)
# include_directories(${PROTOBUF_INCLUDE_DIRS})
# include_directories(${CMAKE_CURRENT_BINARY_DIR})
# protobuf_generate_cpp(PROTO_SRCS PROTO_HDRS foo.proto)
# protobuf_generate_python(PROTO_PY foo.proto)
# add_executable(bar bar.cc ${PROTO_SRCS} ${PROTO_HDRS})
# target_link_libraries(bar ${PROTOBUF_LIBRARIES})
#
# .. note::
# The ``protobuf_generate_cpp`` and ``protobuf_generate_python``
# functions and :command:`add_executable` or :command:`add_library`
# calls only work properly within the same directory.
#
# .. command:: protobuf_generate_cpp
#
# Add custom commands to process ``.proto`` files to C++::
#
# protobuf_generate_cpp (<SRCS> <HDRS> [<ARGN>...])
#
# ``SRCS``
# Variable to define with autogenerated source files
# ``HDRS``
# Variable to define with autogenerated header files
# ``ARGN``
# ``.proto`` files
#
# .. command:: protobuf_generate_python
#
# Add custom commands to process ``.proto`` files to Python::
#
# protobuf_generate_python (<PY> [<ARGN>...])
#
# ``PY``
# Variable to define with autogenerated Python files
# ``ARGN``
# ``.proto`` filess
#=============================================================================
# Copyright 2009 Kitware, Inc.
# Copyright 2009-2011 Philip Lowman <[email protected]>
# Copyright 2008 Esben Mose Hansen, Ange Optimization ApS
#
# Distributed under the OSI-approved BSD License (the "License");
# see accompanying file Copyright.txt for details.
#
# This software is distributed WITHOUT ANY WARRANTY; without even the
# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# See the License for more information.
#=============================================================================
# (To distribute this file outside of CMake, substitute the full
# License text for the above reference.)
function(PROTOBUF_GENERATE_CPP SRCS HDRS)
if(NOT ARGN)
message(SEND_ERROR "Error: PROTOBUF_GENERATE_CPP() called without any proto files")
return()
endif()
if(PROTOBUF_GENERATE_CPP_APPEND_PATH)
# Create an include path for each file specified
foreach(FIL ${ARGN})
get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
get_filename_component(ABS_PATH ${ABS_FIL} PATH)
list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
if(${_contains_already} EQUAL -1)
list(APPEND _protobuf_include_path -I ${ABS_PATH})
endif()
endforeach()
else()
set(_protobuf_include_path -I ${CMAKE_CURRENT_SOURCE_DIR})
endif()
if(DEFINED PROTOBUF_IMPORT_DIRS)
foreach(DIR ${PROTOBUF_IMPORT_DIRS})
get_filename_component(ABS_PATH ${DIR} ABSOLUTE)
list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
if(${_contains_already} EQUAL -1)
list(APPEND _protobuf_include_path -I ${ABS_PATH})
endif()
endforeach()
endif()
set(${SRCS})
set(${HDRS})
foreach(FIL ${ARGN})
get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
get_filename_component(FIL_WE ${FIL} NAME_WE)
list(APPEND ${SRCS} "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}.pb.cc")
list(APPEND ${HDRS} "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}.pb.h")
add_custom_command(
OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}.pb.cc"
"${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}.pb.h"
COMMAND ${PROTOBUF_PROTOC_EXECUTABLE}
ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR} ${_protobuf_include_path} ${ABS_FIL}
DEPENDS ${ABS_FIL} ${PROTOBUF_PROTOC_EXECUTABLE}
COMMENT "Running C++ protocol buffer compiler on ${FIL}"
VERBATIM )
endforeach()
set_source_files_properties(${${SRCS}} ${${HDRS}} PROPERTIES GENERATED TRUE)
set(${SRCS} ${${SRCS}} PARENT_SCOPE)
set(${HDRS} ${${HDRS}} PARENT_SCOPE)
endfunction()
function(PROTOBUF_GENERATE_PYTHON SRCS)
if(NOT ARGN)
message(SEND_ERROR "Error: PROTOBUF_GENERATE_PYTHON() called without any proto files")
return()
endif()
if(PROTOBUF_GENERATE_CPP_APPEND_PATH)
# Create an include path for each file specified
foreach(FIL ${ARGN})
get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
get_filename_component(ABS_PATH ${ABS_FIL} PATH)
list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
if(${_contains_already} EQUAL -1)
list(APPEND _protobuf_include_path -I ${ABS_PATH})
endif()
endforeach()
else()
set(_protobuf_include_path -I ${CMAKE_CURRENT_SOURCE_DIR})
endif()
if(DEFINED PROTOBUF_IMPORT_DIRS)
foreach(DIR ${PROTOBUF_IMPORT_DIRS})
get_filename_component(ABS_PATH ${DIR} ABSOLUTE)
list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
if(${_contains_already} EQUAL -1)
list(APPEND _protobuf_include_path -I ${ABS_PATH})
endif()
endforeach()
endif()
set(${SRCS})
foreach(FIL ${ARGN})
get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
get_filename_component(FIL_WE ${FIL} NAME_WE)
list(APPEND ${SRCS} "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2.py")
add_custom_command(
OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2.py"
COMMAND ${PROTOBUF_PROTOC_EXECUTABLE} --python_out ${CMAKE_CURRENT_BINARY_DIR} ${_protobuf_include_path} ${ABS_FIL}
DEPENDS ${ABS_FIL} ${PROTOBUF_PROTOC_EXECUTABLE}
COMMENT "Running Python protocol buffer compiler on ${FIL}"
VERBATIM )
endforeach()
set(${SRCS} ${${SRCS}} PARENT_SCOPE)
endfunction()
if(CMAKE_SIZEOF_VOID_P EQUAL 8)
set(_PROTOBUF_ARCH_DIR x64/)
endif()
# Internal function: search for normal library as well as a debug one
# if the debug one is specified also include debug/optimized keywords
# in *_LIBRARIES variable
function(_protobuf_find_libraries name filename)
find_library(${name}_LIBRARY
NAMES ${filename}
PATHS ${PROTOBUF_SRC_ROOT_FOLDER}/vsprojects/${_PROTOBUF_ARCH_DIR}Release)
mark_as_advanced(${name}_LIBRARY)
find_library(${name}_LIBRARY_DEBUG
NAMES ${filename}
PATHS ${PROTOBUF_SRC_ROOT_FOLDER}/vsprojects/${_PROTOBUF_ARCH_DIR}Debug)
mark_as_advanced(${name}_LIBRARY_DEBUG)
if(NOT ${name}_LIBRARY_DEBUG)
# There is no debug library
set(${name}_LIBRARY_DEBUG ${${name}_LIBRARY} PARENT_SCOPE)
set(${name}_LIBRARIES ${${name}_LIBRARY} PARENT_SCOPE)
else()
# There IS a debug library
set(${name}_LIBRARIES
optimized ${${name}_LIBRARY}
debug ${${name}_LIBRARY_DEBUG}
PARENT_SCOPE
)
endif()
endfunction()
# Internal function: find threads library
function(_protobuf_find_threads)
set(CMAKE_THREAD_PREFER_PTHREAD TRUE)
find_package(Threads)
if(Threads_FOUND)
list(APPEND PROTOBUF_LIBRARIES ${CMAKE_THREAD_LIBS_INIT})
set(PROTOBUF_LIBRARIES "${PROTOBUF_LIBRARIES}" PARENT_SCOPE)
endif()
endfunction()
#
# Main.
#
# By default have PROTOBUF_GENERATE_CPP macro pass -I to protoc
# for each directory where a proto file is referenced.
if(NOT DEFINED PROTOBUF_GENERATE_CPP_APPEND_PATH)
set(PROTOBUF_GENERATE_CPP_APPEND_PATH TRUE)
endif()
# Google's provided vcproj files generate libraries with a "lib"
# prefix on Windows
if(MSVC)
set(PROTOBUF_ORIG_FIND_LIBRARY_PREFIXES "${CMAKE_FIND_LIBRARY_PREFIXES}")
set(CMAKE_FIND_LIBRARY_PREFIXES "lib" "")
find_path(PROTOBUF_SRC_ROOT_FOLDER protobuf.pc.in)
endif()
# The Protobuf library
_protobuf_find_libraries(PROTOBUF protobuf)
#DOC "The Google Protocol Buffers RELEASE Library"
_protobuf_find_libraries(PROTOBUF_LITE protobuf-lite)
# The Protobuf Protoc Library
_protobuf_find_libraries(PROTOBUF_PROTOC protoc)
# Restore original find library prefixes
if(MSVC)
set(CMAKE_FIND_LIBRARY_PREFIXES "${PROTOBUF_ORIG_FIND_LIBRARY_PREFIXES}")
endif()
if(UNIX)
_protobuf_find_threads()
endif()
# Find the include directory
find_path(PROTOBUF_INCLUDE_DIR
google/protobuf/service.h
PATHS ${PROTOBUF_SRC_ROOT_FOLDER}/src
)
mark_as_advanced(PROTOBUF_INCLUDE_DIR)
# Find the protoc Executable
find_program(PROTOBUF_PROTOC_EXECUTABLE
NAMES protoc
DOC "The Google Protocol Buffers Compiler"
PATHS
${PROTOBUF_SRC_ROOT_FOLDER}/vsprojects/${_PROTOBUF_ARCH_DIR}Release
${PROTOBUF_SRC_ROOT_FOLDER}/vsprojects/${_PROTOBUF_ARCH_DIR}Debug
)
mark_as_advanced(PROTOBUF_PROTOC_EXECUTABLE)
include(${CMAKE_CURRENT_LIST_DIR}/FindPackageHandleStandardArgs.cmake)
FIND_PACKAGE_HANDLE_STANDARD_ARGS(Protobuf DEFAULT_MSG
PROTOBUF_LIBRARY PROTOBUF_INCLUDE_DIR)
if(PROTOBUF_FOUND)
set(PROTOBUF_INCLUDE_DIRS ${PROTOBUF_INCLUDE_DIR})
endif()
显然定义了PROTOBUF_GENERATE_CPP,PROTOBUF_GENERATE_PYTHON,_protobuf_find_libraries等函数,PROTOBUF_GENERATE_CPP是用于把.proto编译成cpp文件,PROTOBUF_GENERATE_PYTHON是用于编译生成Python文件。
比如上述的例子中用cmake编译源码,且在CmakeLists.txt中编译.proto文件,CMakeLists.txt文件的写法如下:
cmake_minimum_required(VERSION 2.8)
PROJECT (protoTest)
set(CMAKE_CXX_STANDARD 14)
set(SRC_LIST main.cpp)
file(GLOB_RECURSE SRC_PROTOCOL_LIST ${CMAKE_CURRENT_SOURCE_DIR}/*.proto)
message(***********${SRC_PROTOCOL_LIST}***********)
# Find required protobuf package
find_package(Protobuf REQUIRED)
if(PROTOBUF_FOUND)
message(STATUS "protobuf library found")
else()
message(FATAL_ERROR "protobuf library is needed but cant be found")
endif()
include_directories(${PROTOBUF_INCLUDE_DIRS})
link_libraries(${PROTOBUF_LIBRARIES})
include_directories(${CMAKE_CURRENT_BINARY_DIR})
PROTOBUF_GENERATE_CPP(PROTO_SRCS PROTO_HDRS ${SRC_PROTOCOL_LIST})
add_executable(protoTest ${SRC_LIST} ${PROTO_SRCS} ${PROTO_HDRS})
target_link_libraries(protoTest ${PROTOBUF_LIBRARIES})
运行结果
suteng@suteng:~/Documents/findprotobuf/build$ ./protoTest
hello,world
0
参考:https://blog.csdn.net/piaopiaopiaopiaopiao/article/details/84347377