Implementing readObject() or readExternal() for immutable objects is a problem. We show how the JDK uses Unsafe to assign values to final fields.

Did you ever wondered how Java serialization can set final fields of your immutable classes? If not don’t worry, most Java developers never stopped to think about it. In this post we illustrate what the question is all about. And we tell you how the JDK serialization can ignore the fact that fields are declared final and still set their value during deserialization. Not only can it sets set field values, it does so efficiently.

Before we start short question for you! Do you think it involves :

  1. the use of reflection?
  2. special hooks in the javac compiler?
  3. a low-level JDK specific API?

Final fields

Declaring your fields final is what you do when declaring immutable objects. Your immutable class may look like the our BusinessCard below. Nothing special: all fields are final and set by the constructor. While we’re on it we make sure the class implements Serializable it will come in handy in the next section.

public class BusinessCard implements Serializable {
 public final String name;
 public final String title;

 public BusinessCard(String name, String title) {
 this.name = name;
 this.title = title;
 }
}

The problem illustrated

By implementing Serializable the BusinessCard class can already be serialized. To illustrated the problem with final fields we will implement the serialization and deserialization methods writeObject() and readObject() by hand. Serializing the data of our BusinessCard class is straightforward. The writeObject() method writes the name and title fields to the ObjectOutputStream. Deserializing the data will not prove that easy. In fact it is impossible to implement readObject()!

As you may already know the Serializable interface does not define methods that you need to implement. It is purely a tagging interface. The JDK provides a default way to serialize classes so that you don’t need to bother with this.  The Sun engineers also foresaw a way to provide you own logic as well. Reflection is used to see whether you implemented very specific methods (the name and signature are predefined). If these methods exist those will be used instead of the default serialization. The methods we’ll look at here are readObject() (method specs here) and writeObject() (method specs here).

 

import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;

public class BusinessCard implements Serializable {
 public final String name;
 public final String title;

 public BusinessCard(String name, String title) {
    this.name = name;
    this.title = title;
 }

 private void writeObject(ObjectOutputStream os) throws IOException {
    os.writeObject(name);
    os.writeObject(title);
 }

 // We can not implement readObject() correctly
 // PROBLEM: can not assign value to final fields
 private void readObject(ObjectInputStream is) throws IOException, ClassNotFoundException {
    name = (String) is.readObject();
    title = (String) is.readObject();
 }
}

To understand why we implement readObject() as illustrated above it may help to know how objects gets deserialized. During deserialization the JDK code first creates a new instance of your class, in our example a new BusinessCard. But it does not just use any constructor to create that new instance! It uses the no-argument constructor. If that class does not have a no-argument constructor then the algorithm will climb the class hierarchy until it finds one. At this point the fields of the newly created object are not set yet. This is where the readObject() method comes in.

The JDK now uses the readObject() (if it exists) to restore the object state by assigning the values to the fields of the object. As you can see above implementing the readObject() method is not difficult in itself. We read the two string values in turn from the ObjectInputStream and assign them to the respective fields. But wait! The name and title fields are declared final. So we can not assign values to them (excepting the constructor). You can try but if you IDE does not stop you then javac will give you an error. We simply can not implement the readObject() method for our BusinessCard!

This brings us back to original question: how does the JDK manage to assign values to those final fields while we can’t? How can it do this automatically for any class?

Set final fields using Reflection

Ok first of, the logic in our code above and the way the JDK serialization restores the field values are equivalent. So the difference is not one of semantics: they both do the same!

We need a way to set the field values are runtime. Using reflection is an option. The JDK serialization already makes heavy use of reflection. The Field class, part of the reflection API, can be used to set the value of a field even when the field was declared final. Look at the code snippet below. It shows how we take our BusinessCard and can change the name and title fields at runtime.

import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.lang.reflect.Field;

public class BusinessCard implements Serializable {
public final String name;
public final String title;

public BusinessCard(String name, String title) {
this.name = name;
this.title = title;
}

public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {
BusinessCard businessCard = new BusinessCard("John Doe", "CEO");
//
Class c = BusinessCard.class;
Field nameField = c.getDeclaredField("name");
Field titleField = c.getDeclaredField("title");
// Bypass the final semantics
nameField.setAccessible(true); // we now can call set()
titleField.setAccessible(true); // to override the value of the fields
// Print before
System.out.println(businessCard.name + " " + businessCard.title); // prints "John Doe CEO"
// Override value
nameField.set(businessCard, "Jane Doe");
titleField.set(businessCard, "CFO");
// Print after
System.out.println(businessCard.name + " " + businessCard.title); // prints "Jane Doe CFO"
}
}

By invoking the setAccessibly(true) method on the Field object we can bypass 2 things at runtime:

  1. A non-public field can be read using the get() method. The private, protected or package protected access modifier have no effect
  2. Using the set() method we can overwrite the field value. Even when it was declared final.

But that’s not how the Sun engineers implemented the deserialization algorithm. The reflection API comes at a cost. The calls of the reflection API perform plenty of security checks and lookups. To maximize performance the deserialization uses a JDK specific power tool: sun.misc.Unsafe

Set final fields: be Unsafe

The java magic article gives you a good overview of functionalities exposed by the Unsafe class. But if you did not yet read the article here is what Unsafe is in a nutshell: it is a power tool which exposes low-level view of the JVM to Java developers. It gives access to the equivalent of pointers! Obviously, Java has no syntax for pointers the way C does. Instead long values represent positions in memory. Unsafe allows java developers to manipulate the memory content directly. This low-level API allows us bypasses the access restriction of Java altogether. The Unsafe API does not consider fields. Instead it works with objects and memory offsets. No field names, no access modifiers (like public or private); only offsets in memory.

The Unsafe class is not documented in the JDK javadoc. If you are interested the OpenJDK project provides very good javadoc for the Unsafe class: http://www.docjar.com/docs/api/sun/misc/Unsafe.html

Using Unsafe we can read and write directly into memory normally only managed by the JVM. Unsafe give us access to the memory behind the objects. We can access array elements as well as the fields of object using simple pointer arithmetic. As the name suggests Unsafe is a powerful but also very dangerous tool. We can easily corrupt the memory content by accident creating hard to find bugs.

Back to our final fields problem. Unsafe is just what the Sun engineers needed: a way to modified the state of object without using the reflection API. By modifying the memory content it completely ignores the final semantics of Java. We now take a quickly glance inside the deserialization code to show you were this happens. It starts with a call to ObjectInputStream.readObject()

Below we sketch the different calls behind the deserialization of a single normal Object. We assume the object does not implement Externalizable neither does it implement the readObject() method. Here we go:

  1. ObjectInputStream.readObject() calls ObjectInputStream.readObject0()
  2. ObjectInputStream.readObject0() calls ObjectInputStream.readOrdinaryObject()
  3. ObjectInputStream.readOrdinaryObject() calls  ObjectInputStream.readSerialData()
  4. ObjectInputStream.readSerialData() calls ObjectInputStream.defaultReadFields()
  5. ObjectInputStream.defaultReadFields()
    1. ObjectStreamClass.setPrimFieldValues() calls ObjectStreamClass.FieldReflector.setPrimFieldValues()
    2. ObjectStreamClass.FieldReflector.setPrimFieldValues() sets all the primitive field
      1. ObjectStreamClass.setPrimFieldValues() calls Unsafe.putXYZ() methods matching the type of each field
      2. Example for a byte field: unsafe.putByte(obj, key, buf[off]);
    3. ObjectStreamClass.setObjectFieldValues() calls ObjectStreamClass.FieldReflector.setObjectFieldValues()
    4. ObjectStreamClass.FieldReflector.setObjectFieldValues() calls Unsafe.putObject() for each field
      1. Example: unsafe.putObject(obj, key, val);

 

The steps above show the calls that lead to the actual use of the Unsafe class during the deserialization of an object. It is worth pointing out at this point that the serialization (starting from ObjectOutputStream) also makes use of Unsafe. This time is used for performance reasons only since the final semantics are not a problem in this case.

In order to illustrate the use of the Unsafe class to set the field values differently we use screenshots of the code below. First the setPrimFieldValues() method which sets, as the name suggests, the value of primitive fields. In this case the values were not converted yet. The  values are passed as an array of bytes (the buf parameter). The Bits class handles the conversion from bytes to the appropriate primitive type.

unsafe_putPrimitives

The setObjFieldValues() method which sets the value of non-primitive fields of an object is shown next. Note that in this case the object values have already been deserialized! The objects are passed to setObjFieldValues() method were created by calling the readObject() method recursively.

unsafe_putObject

 

Summary

Using a simple example we saw how declaring fields final is a problem when we want to implement deserialization. The JDK on the other hand seems to have a generic mechanism which handles final fields without a problem. The trick resides in the use of the Unsafe class. Although the reflection API also offers a way to assign final fields the Unsafe based solution is faster. It ignores the object-oriented model of Java by modifying the memory content directly.

If you liked the content please share on you social media of choice. Thank you!

Resources

For more details in the use and abuse of reflection API to assign final fields see Java 5 – “final” is not final anymore

Go one step further and consider the following question about final transient fields and serialization

Contact us

If you have some question or maybe you found a typo. Please contact us!