Benchmarking Kryo v4 against v2 and v3
In this post we are benchmarking Kryo v4 against its predecessors v2 and v3. Somewhere mid-2016 version 4.0.0 of the Kryo high performance serialization library has been released. It is time to take a look at the latest version of this cool library created by Nathan Sweet and maintained by a group of contributors on github.
The results presented here were obtained using the jvm-serializers benchmarking framework. We benchmark both the speed of the seriailzation methods and the size of the serialized data.
Benchmarking Kryo
Writing benchmarks can be fun but is also error prone. So instead of writing my own I decided to use the jvm-serializers benchmarks. This project provides a framework to benchmark serialization libraries as long as these are written in Java. It comes with 50+ test cases for a whole range of libraries. Each test case offers one or more benchmarks.
All the benchmarks record the serialization and deserialization speed. Additionally it records the size of the serialized data and even the size after compression. We’ll be using these metric to compare the following Kryo versions: v2.24, v3.0.3 and v4.0.0. You can find the full results in text form at the very end of this post.
Benchmark details
In this post I took the Kryo specific test case provided by the jvm-serializers project. The test case has 5 different benchmarks for Kryo. Each benchmark has a name and uses different Kryo features and settings. The list below contains the name of the benchmarks and the description provided by the jvm-serializers project.
The order of the benchmarks in the list reflects the amount of coding effort. The kryo-serializer benchmark uses the default Kryo settings. It requires no coding and makes no assumptions regarding the object graph structure or nullness of fields. The kryo-manual benchmark, as the name suggests, is hand coded and uses all the tricks of the trade offered by Kryo. It requires the most coding effort, can not handle cycles in the object graph. It also uses knowledge about the nullness to maximize performance.
- Benchmark: kryo-serializer
- supporting full object graph write/read. Object graph may contain cycles. If an Object is referenced twice, it will be so after deserialization.
- nothing is known in advance, no class generation, no preregistering of classes. Everything is captured at runtime using e.g. reflection.
- Benchmark: kryo-flat
- Only cycle free tree structures. An object referenced twice will be serialized twice.
- no manual optimizations.
- Benchmark: kryo-flat-pre
- Only cycle free tree structures. An object referenced twice will be serialized twice.
- no manual optimizations.
- schema is known in advance (pre registration or even class generation).
- Benchmark:kryo-opt
- Only cycle free tree structures. An object referenced twice will be serialized twice.
- illustrates what’s possible, at what level generic approaches can be optimized in case
- Hand written code: configure the FieldSerializer for each serialized class
- Benchmark: kryo-manual
- Only cycle free tree structures. An object referenced encountered twice will be serialized twice.
- illustrates what’s possible, at what level generic approaches can be optimized in case
- Hand written code: implemented a custom Kryo Serializers for each class and it register with Kryo
Kryo v4 vs v2 and v3
Before we look at the actual results one more detail: the results compared 3 versions of Kryo; v2.24, v3.0.3 and v4.0.0. The 5 benchmarks described above were run for each version. To minimize the clutter I grouped the benchmark results for each version using the names kryo2, kryo3 and kryo4 respectively.
Data size
The chart below shows the serialized data size for all benchmarks. Overall there is no big difference between the three versions of Kryo. It seems Kryo’s data encoding scheme has not changed much from one version to the next.
There can be several reasons for this. A first one is backwards compatibility. By encoding the data in the same way from one version to the other the project guarantees that you will always be able to deserialize your data. A second reason might be that the encoding is already the most compact platform independent representation. In other words it is not possible to encode the data more efficiently. Note that the changelog for version 4 clearly mentions an incompatible change in the way it optimizes classes with generics!
The serialized data size does not depend on the performance of the hardware on which we run the benchmark.

Benchmarking Kryo versions: comparing the size using jvm-serializers benchmark framework
The chart makes it easy to compare the overall results. The the actual values can be found in the table below. Kryo v2 and v3 have identical values while v4 is slightly larger.
bytes | Serializer | Flat | Flat-pre | Opt | Manual |
kryo2 | 286 | 268 | 212 | 209 | 211 |
kryo3 | 286 | 268 | 212 | 209 | 211 |
kryo4 | 288 | 270 | 214 | 211 | 211 |
Speed
So if the serailized data size remains the same how about the serialization and deserialization speed? Well here again we find pretty much the same results. With some minor variations in the results. Here a the numbers:
Speed | Serializer | Flat | Flat-pre | Opt | Manual |
kryo2 | 7973 | 4827 | 3165 | 3183 | 2762 |
kryo3 | 7991 | 4735 | 3226 | 3284 | 2681 |
kryo4 | 7849 | 4631 | 3330 | 3338 | 2676 |
The table above contains the average sum of the serialization and deserialization time for each benchmark. These absolute numbers are not really revelant since these depend the hardware I used to run the benchmark on. Instead it all about comparison of the different values for the 3 Kryo versions. The chart below offers a easier way to compare the performance visually.

Benchmarking Kryo v2, v3 and v4: results of speed benchmark using jvm-serializers framework
Conclusion
The results of the jvm-serializer Kryo benchmarks are for both the size and speed are nearly identical. Overall the good news is that the results obtained with the jvm-serializers benchmarks show no performance regression from Kryo v2 to the current Kryo v4. This means the latest version of Kryo brings a number of fixes and enhancements without decrease in performance.
Regarding the speed benchmarks; the differences in speed reported by the jvm-serializers benchmarks are very small. In fact they are so small that it would be tempting to say that the differences are not statistically significant. But since the jvm-serializers benchmarks do not report any statistically measurement we can not say for sure. BUT there is another jvm specific benchmark tool which provides the not only an average measure but also the mean error… yes you guessed it: JMH
In a next post I’ll revisit the speed benchmarks using the data classes of the jvm-serializers project and will benchmark them with JMH. Just to be sure

We are looking for BETA users for Externalizer4J upcoming support for Kryo. Click for more information
Resources
- Kryo project page on github: Kryo is a fast and efficient object graph serialization framework for Java. The goals of the project are speed, efficiency, and an easy to use API
- Kryo changelog
- jvm-serializers project page on github:
- JMH: Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM.
Run the benchmarks
The results presented here were obtained on my aging computer. It would be interesting to see if the similar results can be obtained on your machine(s). If you want to run the benchmarks yourself here are the steps to follow:
- Go to the github project page and download the zip with the source code
- Compile the source code for all the test cases (this may take a while):
make compile - Run the benchmark with the following commmand:
./run-bench.sh kryo-serializer,kryo-flat,kryo-flat-pre,kryo-opt,kryo-manual - Rename the result files created in the results/raw subdirectory
- mv kryo-serializer-result.txt kryo3-serializer-result.txt
- Repeat for all the kryo-XYZ-result.txt files
- Download the jar files needed for Kryo v3 and Kryo v4 respectively
- Download them to the libs subdirectory
- lib/reflectasm-1.11.3.jar
- lib/objenesis-2.2.jar
- lib/kryo-4.0.0.jar
- lib/minlog-1.3.0.jar
- lib/reflectasm-1.10.1.jar
- lib/objenesis-2.1.jar
- lib/kryo-3.0.3.jar
- Edit the run-bench.sh script and look for the line which defines “cp=” and change it to this:kryo4_cp=lib/reflectasm-1.11.3.jar:lib/objenesis-2.2.jar:lib/kryo-4.0.0.jar:lib/minlog-1.3.0.jar kryo3_cp=lib/reflectasm-1.10.1.jar:lib/objenesis-2.1.jar:lib/kryo-3.0.3.jar:lib/minlog-1.3.0.jar kryo_cp=$kryo3_cp cp=$kryo_cp$sep./build/bytecode/main$sep$cpgen$sep$cplib
- Change the value of kryo_cp to $kryo4_cp to run the benchmarks with Kryo v4
Raw results
Full data
create ser deser total size +dfl kryo4-manual 185 1280 1396 2676 211 131 kryo3-manual 159 1300 1381 2681 211 131 kryo2-manual 178 1313 1448 2762 211 131 kryo2-flat-pre 166 1458 1707 3165 212 132 kryo2-opt 166 1477 1707 3183 209 129 kryo3-flat-pre 183 1498 1729 3226 212 132 kryo3-opt 167 1458 1826 3284 209 129 kryo4-flat-pre 182 1682 1648 3330 214 134 kryo4-opt 184 1592 1746 3338 211 131 kryo4-flat 177 2035 2596 4631 270 179 kryo3-flat 187 2143 2592 4735 268 177 kryo2-flat 178 2075 2752 4827 268 177 kryo4-serializer 178 4262 3587 7849 288 190 kryo2-serializer 162 4395 3578 7973 286 188 kryo3-serializer 184 4317 3673 7991 286 188
Effort Format Structure Misc kryo4-manual MANUAL_OPT BINARY FLAT_TREE [] manually optimized kryo3-manual MANUAL_OPT BINARY FLAT_TREE [] manually optimized kryo2-manual MANUAL_OPT BINARY FLAT_TREE [] manually optimized kryo2-flat-pre CLASSES_KNOWN BINARY FLAT_TREE [] no shared refs, preregistered classes kryo2-opt MANUAL_OPT BINARY FLAT_TREE [] manually optimized kryo3-flat-pre CLASSES_KNOWN BINARY FLAT_TREE [] no shared refs, preregistered classes kryo3-opt MANUAL_OPT BINARY FLAT_TREE [] manually optimized kryo4-flat-pre CLASSES_KNOWN BINARY FLAT_TREE [] no shared refs, preregistered classes kryo4-opt MANUAL_OPT BINARY FLAT_TREE [] manually optimized kryo4-flat ZERO_KNOWLEDGE BINARY FLAT_TREE [] default, no shared refs kryo3-flat ZERO_KNOWLEDGE BINARY FLAT_TREE [] default, no shared refs kryo2-flat ZERO_KNOWLEDGE BINARY FLAT_TREE [] default, no shared refs kryo4-serializer ZERO_KNOWLEDGE BINARY FULL_GRAPH [] default kryo2-serializer ZERO_KNOWLEDGE BINARY FULL_GRAPH [] default kryo3-serializer ZERO_KNOWLEDGE BINARY FULL_GRAPH [] default