Externalizer4J has been created to take away the hassle of implementing serialization methods. It generates these methods for you taking into account all the constraints so that you can obtain the best serialization performance for your classes.

The entire optimization process is automated. Externalizer4J analyzes the bytecode of your classes to determine which classes can be optimized and how. In it choices it always uses the safest assumptions when selecting the methods to serialize and deserialize the fields of your classes. This bias towards safety can mean that in some cases the serialization logic can still be optimized even further. But this is not a problem. Externalizer4J offers you the possibility to optimize serialization even more provided that you supply it with additional hints.

Advanced optimization

Externalizer4J can generate even more efficient logic provided it has additional knowlegde regarding 2 aspects of the data stored in your classes:

  • the nullness of an object field: will the field ever be null? When Externalizer4J knows that a reference is guaranteed never to be null it will not generate logic which contains null checks. Performance will be even higher since null checks can be avoided at runtime.
  • the uniqueness of an object reference: does the reference to an object appear more than once in the object graph? When a reference is only held in a single field in the entire object graph then it is unique. This allows for another great performance gain. Serialization libraries track the occurrence of objects. This is done to handle cycles in the object graph on the one hand. And to avoid serializing an already serialized Object again on the other hand.This tracking of object costs memory and extra computation. When Externalizer4J knows a reference is unique it will use serialization calls which do not tracks references.

Consider the example shown in the code snippet below. The Item class has 3 String fields. In this example we can assume that the serial number and name of each item will be unique. On the other hand we also know that a brand can have more than one item. So the serialNumber and name fields contain unique object references while the brand field does not. So serialization needs to track the object reference for the brand field in other to avoid serializing and deserializing the string “Xpensive” more than once. The other two fields can be serialized and deserialized more quickly.

The next section explain how you can provide nullness and uniqueness information to Externalizer4J.

public class Item {
  String serialNumber;
  String brand;
  String name;
}

Item item1 = new Item();
item1.serialNumber = "0001";
item1.brand = "Xpensive";
item1.name = "sports car";
//
Item item2 = new Item();
item2.serialNumber = "0002";
item2.brand = "Xpensive";
item2.name = "yaght";

Guiding optimization with annotations

Externalizer4J’s advanced optimizations requires knowledge about the nullness and uniqueness of fields in your classes. The annotations added since Java 5 offer an ideal way to pass the extra information along to a tool like Externalizer4J.

Annotations are easy to use, can be applied to individuals fields and do not influence the logic of your application. Simply add annotations which tell Externalizer4J whether a field will be null and whether the object it references is unique. Externalizer4J will generate logic which will be even faster based on this. But the next question is: “which annotations then?

No annotations – your annotations

It may surprise you at first but Externalizer4J does not provide specific annotations for you to use! The reason for this is that Externalizer4J is a tool and not a framework. We don’t want your code to be dependent on any Externalizer4J specific API.

Being a build tool Externalizer4J should not and is not required in the classpath of your applications. Requiring you to use an Externalizer4J specific annotation would change this. So we chose not to make your code dependent on Externalizer4J. Instead Externalizer4J can look at the annotations that you designate! This is a far more elegant solution.

Specifying annotations

You can specify annotations of your choice in the externalizer4j.properties file. There you can specify 3 types of annotations:

  1. uniquenessAnnotations: to indicate a field contains a unique reference in the object graph.
  2. nonNullAnnotations: to indicate the field will never contain a null reference
  3. nonNullElementAnnotations: to indicate that the element of an array or Collection will never contain null references.

The picture below illustrates how you can list the fully qualified class name of the annotations. The @NeverNull annotation is used to indicate to Externalizer4J that a field will never store a null reference. But you can use the  @NoNull annotation as well at the same time. Notice also that the same @NeverNull annotation is also used to indicate that arrays and Collections will no contain null values.

Defining which annotations should be used by Externalizer4J to apply advanvced optimization should be done in the externalizer4j.properties file.

Types of annotations

You can use pretty much any annotation of your choice in combination with Externalizer4J. As long as the annotation information is present in the class files it can be used by Externalizer4J. This means the annotation should have the default CLASS retention type.

RetentionPolicy.CLASS

Annotation strategies

Since Externalizer4J does not impose any predefined annotations you are free to choose. You can combine several strategies:

  • define your own annotation as suggested above
  • use annotation from existing libraries and frameworks:
    • JSR303, JSR305 and the JSR308 related Checker Framework each define nullness related annotations
    • JPA and Hibernate @Id and @IdClass annotations can often be used, or reused, to indicate uniqueness

Advanced optimization examples

Meet the class MyData. A  simple class which implement the java.io.Serializable interface and nothing more. The typical hashCode(), equals() and toString() methods are not shown for brevity.

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

/**
 * Demonstration of the advanced optimization provided by Externalizer4J
 *
 */
public class MyData implements Serializable {
    // Array of primitives
    public long[] longs;

    // Collection
    public List<Long> wrappedLongs = new ArrayList<Long>();

    // Object types
    Date d = new Date();
    String s = "Ceci n'est un string";
}

Automatic optimization by Externalizer4J

Before looking at the advanced optimizations we first look at the default optimizations implemented by Externalizer4J. You first need to create an externalizer4j.properties file. Without it Externalizer4J will simply do nothing.

IMPORTANT: in the example below we’ll explicitly set the optimizer to externalizable to enable the conversion from Serializable to Externalizable classes. All the advanced optimization options described here do however apply to all the optimizers provided by Externalizer4J. The list of optimizer includes:

  • hazelcast.dataserializable: for conversion from Serializable to Hazelcast’s  DataSerializable API
  • Kryo: for conversion from Serializable to Kryo’s  KryoSerializable API
  • Infinispan: for conversion from Serializable to Infinispan’s Externalizer API as well as annotation of the with @SerializeWith

The externalizer4j.properties file contains:

#
# Accept the EULA by setting to true to activate
#
# default: false
#
acceptEULA=true

#
# Optimizer selection
#
optimizer=externalizable

We now compile our class and let Externalizer4J analyse and optimize it. We now use IntelliJ IDEA’s built-in decompiler to inspect the MyData.class file. The decompiled class is shown below. The most interesting points can be found in the writeExternal() method. The readExternal() method is therefore left out to minimize the size of the screenshots. Notice the following points:

  1. MyData now implements both the Serializable and Externalizable interfaces
  2. The class implements the writeExternal() and readExternal() methods
    1. readExternal() is not shown here
  3. writeExternal() performs a null check before accessing each field
    1. Externalizer4J writes a status byte to track null fields. writeObject(null) is never called. This is both faster and produces less data!
  4. Each Object field is written to the ObjectOutput using the generic writeObject() method

The combination of all those points guarantees that the writeExternal() method is always correct. But the writeObject() method is not always optimal for performance. The writeObject() method tracks the objects written to the ObjectOutput() which has a cost in terms of speed and, to a lesser extent, in memory usage.

Using annotations we can avoid the use of the generic writeObject() without sacrificing safety as we will see next.

Custom annotations

Before diving into the different optimization examples we first take a look at the annotations we’ll use. As mentioned above you can use any annotation of your choice. The images below show the 3 custom annotations we’ll use in the examples. As you can see these are very simple annotations.

Unique object references

Externalizer4J can select more efficient serialization calls when it knows that an Object should not be tracked. You can indicate unique Object references by annotating them and defining the annotation in the externalizer4j.properties file as shown below. In the current example the @UniqueObject annotation is used. The fully qualified class name (FQCN) is listed as a value uniquenessAnnotations= setting (see below).

When a Collection field or an non-primitive array is annotated to indicate that it is unique this uniqueness is extended to all the object references stored in the Collection or array!

IMPORTANT: marking Collection or array fields as unique can yield significant performance gains. But the opposite can be true as well. When the references stored in the Collection or array appear in other parts in the object graph this can result in the same objects being serialized multiple times. This will reduce the speed and result in an increase in the serialized data size.

NOTE: you that you can specify a comma separated list of annotations as value for the uniquenessAnnotations setting.

#
# Accept the EULA by setting to true to activate
#
# default: false
#
acceptEULA=true
optimizer=externalizable

uniquenessAnnotations=demo.serialization.UniqueObject

The next step is to annotate all the fields of the MyData class with @UniqueObject as shown next. MyData now looks like this :

public class MyData implements Serializable {
// Array of primitives
@UniqueObject 
public long[] longs;

// Collection
@UniqueObject 
public List<Long> wrappedLongs = new ArrayList<Long>();

// Object types
@UniqueObject 
Date d = new Date();
@UniqueObject 
String s = "Ceci n'est un string";
}

When we look at the decompiled code of the MyData class we now see a few differences with the first optimized version. The null checks are still there but all the writeObject() calls are gone from the writeExternal() method!

Since Externalizer4J now knows that the field s points to a unique String reference it can safely use writeUTF(). This method is faster than writeObject() and produces slightly less data. writeUTF() does not handle null values. But this is not a problems since Externalizer4J has generated the necessary null checks as well. So String values are handle more efficiently now. But what about the other fields?

The writeExternal() method uses different static methods provided by the class called DataSerializer to write the other fields in the MyData class. These methods offer convenience for both reading and writing commonly used types. The performance of these special purpose methods is higher than that of writeObject() or readObject(). Generally speaking their implementation assumes there are no cycles in the object graph!

The DataSerializer class is a utility class provided by Externalizer4J! When used by the generated serialization methods this DataSerializer class will be added to the compilation output. This is done to avoid introducing a runtime dependency on Externalizer4J. The DataSerializer class should be packaged together with your other classes.

Externalizer4J can add helper classes to the output directory to avoid runtime dependencies

NOTE: Externalizer4J uses different helper classes for each serialization API. The name of theses classes is obviously different. There is a different class for Hazelcast, Kryo, the JDK and Infinispan.

The special purpose methods used to serialize and deserialize the different fields don’t incur the overhead of object tracking. The resulting class now has even higher performance than the previously optimized version. This kind of optimization is not used by default because Externalizer4J can not safely determine the uniqueness of object reference in an automated fashion. Manually annotating the fields is the only safe way to enable this type of optimization.

Non null object references

With the second optimization we used information about the uniqueness of object references to obtain an even more optimized class. But there is yet another way to optimize the serialization logic. You can get rid of the null checks performed during the serialization when objects are know to be non null.

To convei the non nullness of object references in our class we first need to define in the externalizer4j.properties file which annotations are used to mark non null objects. In the example below we show how you can use different annotations at once. We simply use a comma separated list of the fully qualified class names of the @NeverNull and @Optimize annotation classes.

#
# Accept the EULA by setting to true to activate
#
# default: false
#
acceptEULA=true
optimizer=externalizable

nonNullAnnotations=demo.serialization.NeverNull,demo.serialization.Optimize

We annotation the MyData class again. Just to illustrate the point we alternatively used the @NeverNull and @Optimize annotations.

public class MyData implements Serializable {
// Array of primitives
@NeverNull 
public long[] longs;

// Collection
@Optimize 
public List<Long> wrappedLongs = new ArrayList<Long>();

// Object types
@NeverNull 
Date d = new Date();
@Optimize 
String s = "Ceci n'est un string";
}

We recompile the MyData class and look again at the decompiled version of the code. You can again see that the class now implements Externalizable. But this time there is much less code. So we include the readExternal() method in the screenshot as well.

You’ll notice immediately that the methods got much shorter because all the null checks are no longer present. Since the object references are guaranteed to be non null they can safely be skipped. Notice that the calls to writeObject() and readObject() are now used again since we removed the uniqueness annotations from the code as well as from the externalizer4j.properties file.

Non null elements

The previous example showed how we can avoid having unnecessary null checks in the serialization methods. In this example we look at an even more specialized use case: object arrays and Collection classes.

Both object arrays and collections can contain null references. The consequence of this is that each element has to be null checked before serializing it. When the object references are guaranteed not to be null we can speed up serialization by avoiding this null check. Externalizer4J uses yet another setting to identify such array and collection fields. We illustrate this in the following example. We now declare the nonNullElementAnnotations in the externalizer4j.properties file. We reuse the @NeverNull annotation used in the previous example as shown below.

#
# Accept the EULA by setting to true to activate
#
# default: false
#
acceptEULA=true
optimizer=externalizable

nonNullElementAnnotations=demo.serialization.NeverNull

We annotate the wrappedLongs field.

public class MyData implements Serializable {
// Array of primitives
public long[] longs;

// Collection
@NeverNull 
public List<Long> wrappedLongs = new ArrayList<Long>();

// Object types
Date d = new Date();
String s = "Ceci n'est un string";
}

Looking at the decompiled version of MyData you can see that all the fields get null checked again. Including the wrappedLongs field! The collection does not contain null but the field itself could be null hence the check. What has change is rather subtle. The long wrapper values are written to the output using the DataSerializer helper class. The important difference is the last boolean parameter of the writeWrapperCollection() call. The true value disables null checks for the individual elements in the wrappedLongs collection.

 

Combining the optimizations

Last but not least you can use all the optimizations all at once. To illustrate this we define annotations for the uniqueness, non null fields and non null elements and apply them were needed in our code.

#
# Accept the EULA by setting to true to activate
#
# default: false
#
acceptEULA=true
optimizer=externalizable

uniquenessAnnotations=demo.serialization.UniqueObject
nonNullAnnotations=demo.serialization.NeverNull,demo.serialization.Optimize
nonNullElementAnnotations=demo.serialization.NeverNull

The decompiled code is shown below. Since all fields are never supposed to be null the methods are much smaller again. We therefore can take a look at both the writeExternal() and readExternal() methods at once. The serialization and deserialization is done using the helper methods of the DataSerializer class. These methods are used because the object are unique and no object tracking is required. Note that for the String field “s” the code uses the writeUTF() and readUTF() methods. These methods provided the ObjectOutput API are fast as well since they don’t track String objects.