This post is a translation of the original article in 🇫🇷 French
Today I want to talk about a tool that I now use almost every day: Protocol Buffers (or ProtoBuf for short).
Contrary to popular belief, it is perfectly possible (and efficient!) to use ProtoBuf in PHP.
ProtoBuf is:
My use case is quite basic: I need to transmit information between several microservices, via a RabbitMQ bus. So I use ProtoBuf for that.
We will exchange data between a PHP application and a Go application. Let’s see how it works! 🎉
If you have looked at the official website, you see the word “Google” everywhere. Don’t panic, it’s still very interoperable. The coupling to Google is quite non-existent, and the technology is used by many different actors. Google is mainly at the initiative of the project.
The idea behind all this is to describe data via .proto
files, which are standardized and agnostic. From these files,
any data will be serialized and deserialized, in binary or JSON.
For a basic example, we will describe a simple message, of type blog post:
# file src/BlogPost.proto
syntax = "proto3";
message BlogPost {
string uuid = 1;
string title = 2;
string content = 3;
}
This is a simple message, which contains a title and content. Each attribute is associated with a position (1, 2, 3, …), which should never change over time. It is on this that serialization and deserialization are based.
Let’s continue with our BlogPost
, in order to add tags and an author (in a rather simplistic way, but the idea is there):
# file src/User.proto
syntax = "proto3";
message User {
string uuid = 1;
string name = 2;
optional string avatar = 3;
}
# file src/Tag.proto
syntax = "proto3";
message Tag {
string label = 1;
}
Let’s modify the BlogPost
to connect everything. The file now looks like:
syntax = "proto3";
import "src/User.proto";
import "src/Tag.proto";
message BlogPost {
string uuid = 1;
string title = 2;
string content = 3;
User author = 4;
repeated Tag tags = 5;
}
We can have as many tags as we want, via the repeated
instruction.
We will finally add a publication date to our BlogPost. To do this, we will need to import the
timestamp
type, which is native, but must be imported if you want to use it. There are quite a few types, I let you
discover them in the documentation
# ...
import "google/protobuf/timestamp.proto";
message BlogPost {
# ...
google.protobuf.Timestamp published = 6;
}
To go all the way and discover one last rather useful aspect, know that it is also possible to use enums:
# ...
message BlogPost {
# ...
enum PublicationStatus {
DRAFT = 0;
PUBLISHED = 1;
ARCHIVED = 2;
}
PublicationStatus status = 7;
}
If you want to test, and don’t have the courage to copy-paste everything, here is the complete code for the BlogPost
:
syntax = "proto3";
import "src/User.proto";
import "src/Tag.proto";
import "google/protobuf/timestamp.proto";
message BlogPost {
string uuid = 1;
string title = 2;
string content = 3;
User author = 4;
repeated Tag tags = 5;
google.protobuf.Timestamp published = 6;
enum PublicationStatus {
DRAFT = 0;
PUBLISHED = 1;
ARCHIVED = 2;
}
PublicationStatus status = 7;
}
It is possible to define Namespaces for classes generated by ProtoBuf. It is even required for some languages (like Go).
We will add metadata to each of our .proto
files, by adding:
option go_package = "blog/demo";
option php_namespace = "Blog\\Demo";
option php_metadata_namespace = "Blog\\Demo\\Metadata";
💡 Tips
As your project progresses, you will need to evolve your messages. If you are concerned about breaking backward compatibility (e.g. by making an attribute obsolete), a good practice is to use a version attribute.
message ... { optional int32 version = 999; }
Store the current version of your data there, you will then be able to manage it according to its version without breaking everything.
We’ve described a whole bunch of great things, but using them is even better! It’s time to install ProtoBuf.
Simply download the latest release on the official Github repository (look for the protoc-xxx
file that corresponds to your distribution).
For example, in my case, I download version 21.2 for Ubuntu:
curl https://github.com/protocolbuffers/protobuf/releases/download/v21.2/protoc-21.2-linux-x86_64.zip \
-o protoc-21.2-linux-x86_64.zip
unzip -qq protoc-21.2-linux-x86_64.zip -d protoc
chmod +x protoc/bin/protoc
I now have a protoc
folder in my current directory, with the bin/protoc
binary that we will use for everything else.
We will now do something quite magical: we will generate PHP code to serialize and deserialize BlogPost objects.
Still in bash, run:
mkdir -p generated # the "generated" folder will contain the generated code
protoc/bin/protoc --php_out=./generated --proto_path=src $(find src -name '*.proto')
You will find a set of ready-to-use PHP classes in the generated
folder.
Let’s create a small script to use them. The first step will be to install ProtoBuf for PHP:
composer require google/protobuf
Then let’s create a script that will generate the PHP code:
<?php
require_once __DIR__ . '/vendor/autoload.php';
$blogSpot = new \Blog\Demo\BlogPost();
$blogSpot
->setTitle('My great post')
->setContent('Lorem ipsum')
->setAuthor(
(new \Blog\Demo\User())
->setName('Jean-François')
)
->setPublished(new Google\Protobuf\Timestamp());
// the content serialized in JSON:
$json = $blogSpot->serializeToJsonString();
// the content serialized in binary
$binary = $blogSpot->serializeToString();
// We put the binary in a temporary file, which will be read by Go
file_put_contents('blogpost.bin', $json);
The JSON looks like:
{"title":"My great post","content":"Lorem ipsum","author":{"name":"Jean-François"},"published":"1970-01-01T00:00:00Z"}
And there you go! Now we will deserialize our post, but in Go this time.
Let’s install the dependencies:
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go mod init Version1
go get github.com/golang/protobuf
go get github.com/golang/protobuf/proto
Let’s use ProtoBuf to automatically generate the Go code:
protoc/bin/protoc --go_out=vendor $(find src -name '*.proto')
Here is a very simple code to read and parse the blogspot.bin
file that we generated in PHP:
package main
import "io/ioutil"
import "github.com/golang/protobuf/proto"
import "blog/demo"
import "log"
func main() {
in, err := ioutil.ReadFile("blogpost.bin")
if err != nil {
log.Fatalf("Read File Error: %s ", err.Error())
}
blogpost := &demo.BlogPost{}
err2 := proto.Unmarshal(in, blogpost)
if err2 != nil {
log.Fatalf("DeSerialization error: %s", err.Error())
}
log.Printf("BlogPost: %s", blogpost.Title)
log.Printf("Author is: %s", blogpost.Author.Name)
}
Let’s run it:
go run demo.go
# BlogPost: My great post
# Author is: Jean-François
Our Go program successfully read the file serialized by PHP, and was able to extract the information from it, without any problems.
In both cases we were able to use typed objects or structures. If the data is deserialized, it means it’s valid!
The advantages of ProtoBuf are truly numerous:
The data is standardized. You manipulate typed structures. There is no need to add validators. You can use complex types. Serialization/deserialization is efficient.
However, from my experience, there is one caveat: the documentation could be greatly simplified to make it more accessible for beginners.
I hope I’ve made you want to try this tool. Feel free to share your experience on the subject.
💡 Additional Resources
You can also discover a real production use case of ProtoBuf in a RabbitMQ data bus
© Jean-François Lépine, 2013 - 2024