Fast made faster: Kryo and Externalizer4J

Fast made faster: Kryo and Externalizer4J

Kryo and Externalizer4J are a great match. Even though Kryo serialization offers fast, zero-effort serialization already, in combination with Externalizer4J you can squeeze even better performance out of Kryo.

Kryo and Externalizer4J

How can Externalizer4J make Kryo even faster? Kryo already has fast Serializers, efficient IO and ASM based code generation all in one. How can Externalizer4J improve upon that!?!

What Externalizer4J does when optimizing for Kryo in a nutshell

Externalizer4J optimizes Kryo serialization using Kryo’s own KryoSerializable interface. If you never heard of KryoSerializable before you can think of it like Kryo’s equivalent of the JDK’s Externalizable interface. It allows classes to implement their own serialization logic directly instead of using a separate Serializer implementation.

Kryo performance can be optimized by creating a KryoSerializable implementation for each class. And that’s what Externalizer4J does. It first analyzes the bytecode of each individual class as well as the whole class hierarchy. Based on that it generates an optimized KryoSerializable implementation for each class. Existing class files get modified and the methods of the KryoSerializable interface are generated in bytecode directly.

See the benefits? You get optimized serialization with no additional coding on your part. No extra dependency since we use Kryo’s own API calls.

KryoSerializable generation vs Kryo FieldSerializer

But wait a minute! Kryo already has the FieldSerializer which generates a class specific Serializer at runtime. And it uses code generation too, still at runtime, to obtain the best performance. How does that differ from the generation of KryoSerializable offered by Externalizer4J?

Indeed the KryoSerializable implementations generated by Externalizer4J are very similar to what the FieldSerializer does. Of course the code is generated at compile time and not at runtime. But that makes little difference. The actual difference is the analysis used by Externalizer4J prior to the actual bytecode generation. Knowledge about the actual types and interdependencies makes it possible to use the most efficient method offered by the Kryo API. The FieldSerializer on the other hand relies on more general purpose calls.

Examples and results

Time to take a look at some results obtained using these optimizations. To do that we use a small JMH based benchmark. We serialized the classes using Kryo and later on did the same again after having optimized these same classes. Before looking at the class we illustrate the main points of the JMH benchmark setup shown in the image below. In the kryoSerDeser method a Kryo instance is used to serialized and deserialize our objects. After each benchmark iteration we reset() the Kryo instance to avoid overflowing the intermediate buffer (a simple array of bytes).


The class we used in the benchmark can be found below, click on the image to see the actual code. The main points are:

  • ChildData which extends ParentData
  • the text book Order class which contains a List of OrderLine items.

Kryo 23% to 41% faster

The chart below compares the relative combined serialization and deserialization speed. The blue bars represent the reference speed, the speed obtained using Kryo. The red values represent the speed obtained with the same Kryo library and setup but now using the KryoSerializable implementation generated by Externalizer4J for each class.

Comparing the speed of Kryo with that of Kryo and Externalizer4J together. Using the KryoSerializable implementation to Kryo performance improves from 23% to 41% dependings on the class.

The examples above show that by using Kryo’s own API to the fullest Externalizer4J is able to optimize performance even further. And all this without any hand written code!


Address: Street Name City, State, Country.
Phone: (800) 0123 – 456 – 7890


Monday – Friday: 7am – 10pm
Saturday: 8am – 10pm
Sunday: 8am – 11pm


Copyright © All Rights Reserved.