Choosing a serialization method for your data can be a confusing task considering the multitude of options out there, such as JSON, MessagePack, AVRO, Thrift, Protobuf, Flat Buffers, etc. If you’re using gRPC or Thrift RPC to communicate between your services, the decision has already been made for you. But for event-driven, you’ll more than likely want to prioritize performance over other considerations.
Today we’ll evaluate two of the most common serialization methods for event-driven systems: Protocol Buffers and JSON. Let’s briefly overview both options:
Protobuf vs JSON: A Quick Overview
JSON
JSON is probably the most obvious candidate for most:
- Supported in the standard library of almost any language.
- Human-readable.
- No strict schema to deal with.
Due to its ubiquity, dealing with JSON is going to be the easiest option. Even better, you don’t need any tooling to see what’s in the messages. Just pop open kafdrop web UI and view your messages in plain text, easy-peasy! Depending on your needs and/or point-of-view, the lack of a strictly defined schema can be a positive or a negative. We’ll discuss that further later on.
Protobuf
Protobuf is the less obvious choice of the two. But it definitely has its strengths:
- Schemas for free
- Support for binary fields without the need to base64 back and forth
- Built-in API support with gRPC
- And most importantly: Speed!
There is a bit of initial setup work when dealing with protobuf:
- First, you have to define your events and their schemas ahead of time.
- Then you have to use protoc, which isn’t the world’s most intuitive tool, to generate the code needed to create/manipulate protobuf events.
- And finally, you have to include that generated code in all of your projects that are handling your events.
That’s a few more steps than just calling [json.Marshal()]
and [json.Unmarshal()]
!
Engage with us!
Join the conversation, share your ideas, and collaborate with a diverse and passionate community.
Protobuf vs JSON: Schemas
Do I even need a Schema?
Plain JSON is definitely going to be easier and quicker to iterate with. You won’t have to update any definitions, regenerate any code, or re-pull any changes into your code bases. Need a new field? Just add it in service A, and update service B to read the new field.
We can see that this is probably not going to scale well, however. Having multiple engineers on multiple teams adding, removing, or editing fields at will is going to result in things breaking at some point. It would be much better to have a single source of truth, a clearly defined schema.
Schema Options for JSON
The two big players in codifying the schema of a JSON message are AVRO and JSONSchema. The specifics/benefits of these two are outside the scope of this discussion, but we’re including them to let you know that there are options out there.
Schema Options for Protobuf
Here’s a key benefit of using protobuf, the definitions already define the schemas. No need to layer any other solution on top!
There are some caveats to be aware of though. Changing the type on a field will break backward compatibility, preventing you from replaying or reading messages generated before the change. It’s best to treat protobuf definitions as additive only. If your field’s data type needs to change, add a new field with the new data type, and “remove” the old field by marking it as deprecated (by appending [deprecated = true]
to the end of the field definition), or reserving the field number.
Protobuf vs JSON: Performance
If you’re going the event-driven architecture route, you’re probably interested in scale and want the best performance you can get out of a serialization method.
Let’s write some benchmarks to get an idea of the speed of both. At Streamdal, we’re a Golang shop, so we’ll use that for our benchmark. If you’re using a different language, its implementations of JSON and protobuf and the resulting performance will be different.
First, we’ll define both the JSON and protobuf events. Our JSON event will look like this:
type MyJSONMessage struct {
Id string `json:"id"`
Message string `json:"message"`
Num int `json:"num"`
}
And our protobuf event will look like this:
events/mypbmessage.proto:
syntax = "proto3";
package events;
option go_package = "github.com/streamdal/pbvsjson/events";
messageMyPBMessage {
string id = 1;
stringmessage = 2;
int32 num =
We’ll compile that proto definition into Go code:
$ protoc -I ./events --go_out=paths=source_relative:events events/*.proto
Now let’s benchmark serializing and deserializing our JSON and protobuf events:
main_test.go:
package main
import (
"encoding/json"
"math"
"testing"
"github.com/golang/protobuf/proto"
"github.com/streamdal/pbvsjson/events"
)
type MyJSONMessage struct {
Id string `json:"id"`
Message string `json:"message"`
Num int `json:"num"`
}
var msg *MyJSONMessage
var pbMsg *events.MyPBMessage
func init() {
msg = &MyJSONMessage{
Id: "97ec560f-2b14-4930-9a27-f0427f08951c",
Message: "Let's benchmark!",
Num: math.MaxInt32,
}
pbMsg = &events.MyPBMessage{
Id: "97ec560f-2b14-4930-9a27-f0427f08951c",
Message: "Let's benchmark!",
Num: math.MaxInt32,
}
}
func BenchmarkJSON(b *testing.B) {
for i := 0; i < b.N; i++ {
// First marshal to JSON
data, err := json.Marshal(msg)
if err != nil {
b.Error(err)
}
// Now unmarshal back into usable struct
decoded := &MyJSONMessage{}
if err := json.Unmarshal(data, decoded); err != nil {
b.Error(err)
}
}
}
func BenchmarkProtobuf(b *testing.B) {
for i := 0; i < b.N; i++ {
// First marshal to wire format
data, err := proto.Marshal(pbMsg)
if err != nil {
b.Error(err)
}
// Now unmarshal back into usable struct
tmpMsg := &events.MyPBMessage{}
if err := proto.Unmarshal(data, tmpMsg); err != nil {
b.Error(err)
}
}
}
And let’s see what those benchmarks look like:
$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/streamdal/pbvsjson
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkJSON-8 612400 1731 ns/op
BenchmarkProtobuf-8 3088689 385.9 ns/op
Protobuf is the clear winner here.
Let’s dig a little deeper and see why with some more benchmarks:
main_test.go:
funcBenchmarkJSON_marshal(b *testing.B) {
for i := 0; i < b.N; i++ {
// First marshal to JSON
_, err := json.Marshal(msg)
if err != nil {
b.Error(err)
}
}
}
funcBenchmarkProtobuf_marshal(b *testing.B) {
for i := 0; i < b.N; i++ {
// First marshal to wire format
_, err := proto.Marshal(pbMsg)
if err != nil {
b.Error(err)
}
}
}
funcBenchmarkJSON_unmarshal(b *testing.B) {
data, err := json.Marshal(msg)
if err != nil {
b.Error(err)
}
for i := 0; i < b.N; i++ {
tmpMsg := &MyJSONMessage{}
if err := json.Unmarshal(data, tmpMsg); err != nil {
b.Error(err)
}
}
}
funcBenchmarkProtobuf_unmarshal(b *testing.B) {
data, err := proto.Marshal(pbMsg)
if err != nil {
b.Error(err)
}
for i := 0; i < b.N; i++ {
tmpMsg := &events.MyPBMessage{}
if err := proto.Unmarshal(data, tmpMsg); err != nil {
b.Error(err)
}
}
}
These additional benchmarks give us:
$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/streamdal/pbvsjson
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkJSON-8 612400 1731 ns/op
BenchmarkProtobuf-8 3088689 385.9 ns/op
BenchmarkJSON_marshal-8 3975855 302.7 ns/op
BenchmarkProtobuf_marshal-8 8163854 145.1 ns/op
BenchmarkJSON_unmarshal-8 886795 1290 ns/op
BenchmarkProtobuf_unmarshal-8 5829130 217.1 ns/op
A 2x advantage when serializing Protobuf vs JSON. Not bad! But the real win here is when unserializing. The overhead of having to parse the text-based JSON format is too insignificant not to ignore. At a 6x speed advantage, protobuf is the way to go for the performance-minded.
Protobuf vs JSON: Message Size
The JSON message clocks in at 91 bytes, and the protobuf message at 62 bytes. Our example events are trivial in structure for the sake of grok-ability, but your events will grow along with the number transmitted through your infrastructure. Size should be a consideration.
Protobuf vs JSON: Observability & Readability
As a text-based format, JSON wins the readability challenge. That’s not to say Protobuf can’t be made readable though. Whether you build a tool internally or use an open-source tool like Plumber.
Conclusions
In the world of scale, protobuf is definitely the way to go.
Schemas are a great benefit to microservice architecture by giving you a single source of truth for how your events should be structured. It may seem like a bunch of busy work in the beginning, but the benefits lay at scale.
Observability takes a hit when it comes to protobuf, but there are solutions.
For hobby projects, JSON is a solid choice for readability and development velocity. But in the world of scale, go binary, go protobuf.
Want to nerd out with me and other misfits about your experiences with monorepos, deep-tech, or anything engineering-related?
Join our Discord, we’d love to have you!