Blog of Jean-François Lépine

DevOps, and software quality.

My book RSS

PHPMetrics AST Metrics

Navigation

Home

en About

About me

Contact

Contact me

Follow me:

2022-07-13

ProtoBuf in PHP for ultra-efficient and agnostic serialization

Share on Twitter Share on LinkedIn

You will find the 🇫🇷 French version of this article here

TL;DR (AI)

Discover how to efficiently serialize data between PHP and Go microservices using Protocol Buffers (ProtoBuf).
Learn to define data schemas with .proto files, generate PHP and Go code, and exchange typed, compact messages effortlessly.
Gain a practical, step-by-step guide to install, use, and benefit from ProtoBuf’s speed, standardization, and cross-language compatibility.

Summary generated by AI to help you skim.

Today I want to talk about a tool that I now use almost every day: Protocol Buffers (or ProtoBuf for short).

Contrary to popular belief, it is perfectly possible (and efficient!) to use ProtoBuf in PHP.

❔ What is ProtoBuf?

ProtoBuf is:

A standard for exchanging data (to structure and serialize it);
A code generator (Java, PHP, Go…) to process this data

My use case is quite basic: I need to transmit information between several microservices, via a RabbitMQ bus. So I use ProtoBuf for that.

We will exchange data between a PHP application and a Go application. Let’s see how it works! 🎉

📄 The standard

If you have looked at the official website, you see the word “Google” everywhere. Don’t panic, it’s still very interoperable. The coupling to Google is quite non-existent, and the technology is used by many different actors. Google is mainly at the initiative of the project.

The idea behind all this is to describe data via .proto files, which are standardized and agnostic. From these files, any data will be serialized and deserialized, in binary or JSON.

For a basic example, we will describe a simple message, of type blog post:

# file src/BlogPost.proto

syntax = "proto3";
message BlogPost {
  string uuid = 1;
  string title = 2;
  string content = 3;
}

This is a simple message, which contains a title and content. Each attribute is associated with a position (1, 2, 3, …), which should never change over time. It is on this that serialization and deserialization are based.

Let’s continue with our BlogPost, in order to add tags and an author (in a rather simplistic way, but the idea is there):

# file src/User.proto

syntax = "proto3";
message User {
    string uuid = 1;
    string name = 2;
    optional string avatar = 3;
}

# file src/Tag.proto

syntax = "proto3";
message Tag {
    string label = 1;
}

Let’s modify the BlogPost to connect everything. The file now looks like:

syntax = "proto3";

import "src/User.proto";
import "src/Tag.proto";

message BlogPost {
  string uuid = 1;
  string title = 2;
  string content = 3;
  User author = 4;
  repeated Tag tags = 5;
}

We can have as many tags as we want, via the repeated instruction.

We will finally add a publication date to our BlogPost. To do this, we will need to import the timestamp type, which is native, but must be imported if you want to use it. There are quite a few types, I let you discover them in the documentation

# ...
import "google/protobuf/timestamp.proto";

message BlogPost {
  # ...
  google.protobuf.Timestamp published = 6;
}

To go all the way and discover one last rather useful aspect, know that it is also possible to use enums:

# ...

message BlogPost {
  # ...
  enum PublicationStatus {
    DRAFT = 0;
    PUBLISHED = 1;
    ARCHIVED = 2;
  }
  PublicationStatus status = 7;
}

If you want to test, and don’t have the courage to copy-paste everything, here is the complete code for the BlogPost:

syntax = "proto3";

import "src/User.proto";
import "src/Tag.proto";
import "google/protobuf/timestamp.proto";

message BlogPost {
  string uuid = 1;
  string title = 2;
  string content = 3;
  User author = 4;
  repeated Tag tags = 5;
  google.protobuf.Timestamp published = 6;
  enum PublicationStatus {
    DRAFT = 0;
    PUBLISHED = 1;
    ARCHIVED = 2;
  }
  PublicationStatus status = 7;
}

It is possible to define Namespaces for classes generated by ProtoBuf. It is even required for some languages (like Go).

We will add metadata to each of our .proto files, by adding:

option go_package = "blog/demo";
option php_namespace = "Blog\\Demo";
option php_metadata_namespace = "Blog\\Demo\\Metadata";

💡 Tips

As your project progresses, you will need to evolve your messages. If you are concerned about breaking backward compatibility (e.g. by making an attribute obsolete), a good practice is to use a version attribute.
message ... {
 optional int32 version = 999;
}
Store the current version of your data there, you will then be able to manage it according to its version without breaking everything.

🧬 Using ProtoBuf and Generating Code

We’ve described a whole bunch of great things, but using them is even better! It’s time to install ProtoBuf.

Simply download the latest release on the official Github repository (look for the protoc-xxx file that corresponds to your distribution).

For example, in my case, I download version 21.2 for Ubuntu:

curl https://github.com/protocolbuffers/protobuf/releases/download/v21.2/protoc-21.2-linux-x86_64.zip \
  -o protoc-21.2-linux-x86_64.zip
unzip -qq protoc-21.2-linux-x86_64.zip -d protoc
chmod +x protoc/bin/protoc

I now have a protoc folder in my current directory, with the bin/protoc binary that we will use for everything else.

We will now do something quite magical: we will generate PHP code to serialize and deserialize BlogPost objects.

Still in bash, run:

mkdir -p generated # the "generated" folder will contain the generated code
protoc/bin/protoc --php_out=./generated  --proto_path=src $(find src -name '*.proto')

You will find a set of ready-to-use PHP classes in the generated folder.

Let’s create a small script to use them. The first step will be to install ProtoBuf for PHP:

composer require google/protobuf

Then let’s create a script that will generate the PHP code:

<?php

require_once __DIR__ . '/vendor/autoload.php';

$blogSpot = new \Blog\Demo\BlogPost();
$blogSpot
    ->setTitle('My great post')
    ->setContent('Lorem ipsum')
    ->setAuthor(
        (new \Blog\Demo\User())
        ->setName('Jean-François')
    )
    ->setPublished(new Google\Protobuf\Timestamp());

// the content serialized in JSON:
$json = $blogSpot->serializeToJsonString();

// the content serialized in binary
$binary = $blogSpot->serializeToString();

// We put the binary in a temporary file, which will be read by Go
file_put_contents('blogpost.bin', $json);

The JSON looks like:

{"title":"My great post","content":"Lorem ipsum","author":{"name":"Jean-François"},"published":"1970-01-01T00:00:00Z"}

And there you go! Now we will deserialize our post, but in Go this time.

🚀 Using ProtoBuf in Go

Let’s install the dependencies:

go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go mod init Version1
go get github.com/golang/protobuf
go get github.com/golang/protobuf/proto

Let’s use ProtoBuf to automatically generate the Go code:

protoc/bin/protoc --go_out=vendor $(find src -name '*.proto')

Here is a very simple code to read and parse the blogspot.bin file that we generated in PHP:

package main
import "io/ioutil"
import "github.com/golang/protobuf/proto"
import "blog/demo"
import "log"

func main() {

    in, err := ioutil.ReadFile("blogpost.bin")
    if err != nil {
        log.Fatalf("Read File Error: %s ", err.Error())
    }
    blogpost := &demo.BlogPost{}
    err2 := proto.Unmarshal(in, blogpost)
    if err2 != nil {
        log.Fatalf("DeSerialization error: %s", err.Error())
    }

    log.Printf("BlogPost: %s", blogpost.Title)
    log.Printf("Author is: %s", blogpost.Author.Name)
}

Let’s run it:

go run demo.go

# BlogPost: My great post
# Author is: Jean-François

Our Go program successfully read the file serialized by PHP, and was able to extract the information from it, without any problems.

🔥 Conclusion

In both cases we were able to use typed objects or structures. If the data is deserialized, it means it’s valid!

The advantages of ProtoBuf are truly numerous:

The data is standardized. You manipulate typed structures. There is no need to add validators. You can use complex types. Serialization/deserialization is efficient.

However, from my experience, there is one caveat: the documentation could be greatly simplified to make it more accessible for beginners.

I hope I’ve made you want to try this tool. Feel free to share your experience on the subject.

💡 Additional Resources

You can also discover a real production use case of ProtoBuf in a RabbitMQ data bus