kryo-versions-size-benchmark-using-jvm-serializers


In this post we are benchmarking Kryo v4 against its predecessors v2 and v3. Somewhere mid-2016 version 4.0.0 of the Kryo high performance serialization library  has been released. It is time to take a look at the latest version of this cool library created by Nathan Sweet and maintained by a group of contributors on github.

The results presented here were obtained using the jvm-serializers benchmarking framework. We benchmark both the speed of the seriailzation methods and the size of the serialized data.

Benchmarking Kryo

Writing benchmarks can be fun but is also error prone. So instead of writing my own I decided to use the jvm-serializers benchmarks. This project provides a framework to benchmark serialization libraries as long as these are written in Java. It  comes with 50+ test cases for a whole range of libraries. Each test case offers one or more benchmarks.

All the benchmarks record the serialization and deserialization speed. Additionally it records the size of the serialized data and even the size after compression. We’ll be using these metric to compare the following Kryo versions: v2.24, v3.0.3 and v4.0.0. You can find the full results in text form at the very end of this post.

Benchmark details

In this post I took the Kryo specific test case provided by the jvm-serializers project. The test case has 5 different benchmarks for Kryo. Each benchmark has a name and uses different Kryo features and settings. The list below contains the name of the benchmarks and the description provided by the jvm-serializers project.

The order of the benchmarks in the list reflects the amount of coding effort. The kryo-serializer benchmark uses the default Kryo settings. It requires no coding and makes no assumptions regarding the object graph structure or nullness of fields. The kryo-manual benchmark, as the name suggests, is hand coded and uses all the tricks of the trade offered by Kryo. It requires the most coding effort, can not handle cycles in the object graph. It also uses knowledge about the nullness to maximize performance.

  • Benchmark: kryo-serializer
    • supporting full object graph write/read. Object graph may contain cycles. If an Object is referenced twice, it will be so after deserialization.
    • nothing is known in advance, no class generation, no preregistering of classes. Everything is captured at runtime using e.g. reflection.
  • Benchmark: kryo-flat
    • Only cycle free tree structures. An object referenced twice will be serialized twice.
    • no manual optimizations.
  • Benchmark: kryo-flat-pre
    • Only cycle free tree structures. An object referenced twice will be serialized twice.
    • no manual optimizations.
    • schema is known in advance (pre registration or even class generation).
  • Benchmark:kryo-opt
    • Only cycle free tree structures. An object referenced twice will be serialized twice.
    • illustrates what’s possible, at what level generic approaches can be optimized in case
    • Hand written code: configure the FieldSerializer for each serialized class
  • Benchmark: kryo-manual
    • Only cycle free tree structures. An object referenced encountered twice will be serialized twice.
    • illustrates what’s possible, at what level generic approaches can be optimized in case
    • Hand written code: implemented a custom Kryo Serializers for each class and it register with Kryo

Kryo v4 vs v2 and v3

Before we look at the actual results one more detail: the results compared 3 versions of Kryo; v2.24, v3.0.3 and v4.0.0. The 5 benchmarks described above were run for each version. To minimize the clutter I grouped the benchmark results for each version using the names kryo2, kryo3 and kryo4 respectively.

Data size

The chart below shows the serialized data size for all benchmarks. Overall there is no big difference between the three versions of Kryo. It seems Kryo’s data encoding scheme has not changed much from one version to the next.

There can be several reasons for this. A first one is backwards compatibility. By encoding the data in the same way from one version to the other the project guarantees that you will always be able to deserialize your data. A second reason might be that the encoding is already the most compact platform independent representation. In other words it is not possible to encode the data more efficiently. Note that the changelog for version 4 clearly mentions an incompatible change in the way it optimizes classes with generics!

The serialized data size does not depend on the performance of the hardware on which we run the benchmark.

Benchmarking Kryo versions: comparing the size using jvm-serializers benchmark framework

Benchmarking Kryo versions: comparing the size using jvm-serializers benchmark framework

The chart makes it easy to compare the overall results. The the actual values can be found in the table below. Kryo v2 and v3 have identical values while v4 is slightly larger.

bytesSerializerFlatFlat-preOptManual
kryo2286268212209211
kryo3286268212209211
kryo4288270214211211

Speed

So if the serailized data size remains the same how about the serialization and deserialization speed? Well here again we find pretty much the same results. With some minor variations in the results. Here a the numbers:

SpeedSerializerFlatFlat-preOptManual
kryo279734827316531832762
kryo379914735322632842681
kryo478494631333033382676

The table above contains the average sum of the serialization and deserialization time for each benchmark. These absolute numbers are not really revelant since these depend the hardware I used to run the benchmark on. Instead it all about comparison of the different values for the 3 Kryo versions. The chart below offers a easier way to compare the performance visually.

Benchmarking Kryo v2, v3 and v4: results of speed benchmark using jvm-serializers framework

Benchmarking Kryo v2, v3 and v4: results of speed benchmark using jvm-serializers framework

Conclusion

The results of the jvm-serializer Kryo benchmarks are for both the size and speed are nearly identical. Overall the good news is that the results obtained with the jvm-serializers benchmarks show no performance regression from Kryo v2 to the current Kryo v4. This means the latest version of Kryo brings a number of fixes and enhancements without decrease in performance.

Regarding the speed benchmarks; the differences in speed reported by the jvm-serializers benchmarks are very small. In fact they are so small that it would be tempting to say that the differences are not statistically significant. But since the jvm-serializers benchmarks do not report any statistically measurement we can not say for sure. BUT there is another jvm specific benchmark tool which provides the not only an average measure but also the mean error… yes you guessed it: JMH

In a next post I’ll revisit the speed benchmarks using the data classes of the jvm-serializers project and will benchmark them with JMH. Just to be sure 😉

optimize-kryo-with-extenalizer4j

We are looking for BETA users for Externalizer4J upcoming support for Kryo. Click for more information

Resources

  • Kryo project page on github: Kryo is a fast and efficient object graph serialization framework for Java. The goals of the project are speed, efficiency, and an easy to use API
  • Kryo changelog
  • jvm-serializers project page on github:
  • JMH: Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM.

Run the benchmarks

The results presented here were obtained on my aging computer. It would be interesting to see if the similar results can be obtained on your machine(s). If you want to run the benchmarks yourself here are the steps to follow:

  1. Go to the github project page and download the zip with the source code
  2. Compile the source code for all the test cases (this may take a while):
    make compile
  3. Run the benchmark with the following commmand:
    ./run-bench.sh kryo-serializer,kryo-flat,kryo-flat-pre,kryo-opt,kryo-manual
  4. Rename the result files created in the results/raw subdirectory
    1. mv kryo-serializer-result.txt kryo3-serializer-result.txt
    2. Repeat for all the kryo-XYZ-result.txt files
  5. Download the jar files needed for Kryo v3 and Kryo v4 respectively
    1. Download them to the libs subdirectory
    2. lib/reflectasm-1.11.3.jar
    3. lib/objenesis-2.2.jar
    4. lib/kryo-4.0.0.jar
    5. lib/minlog-1.3.0.jar
    6. lib/reflectasm-1.10.1.jar
    7. lib/objenesis-2.1.jar
    8. lib/kryo-3.0.3.jar
  6. Edit the run-bench.sh script and look for the line which defines “cp=” and change it to this:
    kryo4_cp=lib/reflectasm-1.11.3.jar:lib/objenesis-2.2.jar:lib/kryo-4.0.0.jar:lib/minlog-1.3.0.jar
    kryo3_cp=lib/reflectasm-1.10.1.jar:lib/objenesis-2.1.jar:lib/kryo-3.0.3.jar:lib/minlog-1.3.0.jar
    
    kryo_cp=$kryo3_cp
    
    cp=$kryo_cp$sep./build/bytecode/main$sep$cpgen$sep$cplib
  7. Change the value of kryo_cp to $kryo4_cp to run the benchmarks with Kryo v4

Raw results

Full data

                                   create     ser   deser   total   size  +dfl
kryo4-manual                          185    1280    1396    2676    211   131
kryo3-manual                          159    1300    1381    2681    211   131
kryo2-manual                          178    1313    1448    2762    211   131
kryo2-flat-pre                        166    1458    1707    3165    212   132
kryo2-opt                             166    1477    1707    3183    209   129
kryo3-flat-pre                        183    1498    1729    3226    212   132
kryo3-opt                             167    1458    1826    3284    209   129
kryo4-flat-pre                        182    1682    1648    3330    214   134
kryo4-opt                             184    1592    1746    3338    211   131
kryo4-flat                            177    2035    2596    4631    270   179
kryo3-flat                            187    2143    2592    4735    268   177
kryo2-flat                            178    2075    2752    4827    268   177
kryo4-serializer                      178    4262    3587    7849    288   190
kryo2-serializer                      162    4395    3578    7973    286   188
kryo3-serializer                      184    4317    3673    7991    286   188
                                   Effort          Format         Structure  Misc
kryo4-manual                       MANUAL_OPT      BINARY         FLAT_TREE  [] manually optimized                                       
kryo3-manual                       MANUAL_OPT      BINARY         FLAT_TREE  [] manually optimized                                       
kryo2-manual                       MANUAL_OPT      BINARY         FLAT_TREE  [] manually optimized                                       
kryo2-flat-pre                     CLASSES_KNOWN   BINARY         FLAT_TREE  [] no shared refs, preregistered classes                    
kryo2-opt                          MANUAL_OPT      BINARY         FLAT_TREE  [] manually optimized                                       
kryo3-flat-pre                     CLASSES_KNOWN   BINARY         FLAT_TREE  [] no shared refs, preregistered classes                    
kryo3-opt                          MANUAL_OPT      BINARY         FLAT_TREE  [] manually optimized                                       
kryo4-flat-pre                     CLASSES_KNOWN   BINARY         FLAT_TREE  [] no shared refs, preregistered classes                    
kryo4-opt                          MANUAL_OPT      BINARY         FLAT_TREE  [] manually optimized                                       
kryo4-flat                         ZERO_KNOWLEDGE  BINARY         FLAT_TREE  [] default, no shared refs                                  
kryo3-flat                         ZERO_KNOWLEDGE  BINARY         FLAT_TREE  [] default, no shared refs                                  
kryo2-flat                         ZERO_KNOWLEDGE  BINARY         FLAT_TREE  [] default, no shared refs                                  
kryo4-serializer                   ZERO_KNOWLEDGE  BINARY         FULL_GRAPH [] default                                                  
kryo2-serializer                   ZERO_KNOWLEDGE  BINARY         FULL_GRAPH [] default                                                  
kryo3-serializer                   ZERO_KNOWLEDGE  BINARY         FULL_GRAPH [] default