Chapter 11 Flashcards — Serialization

flashcards effective-java serialization

What is the primary reason Bloch says to prefer alternatives to Java serialization?
?
Java deserialization of untrusted data is a remote code execution vector. Attackers craft “gadget chains” — sequences of readObject calls on innocent library classes — to execute arbitrary code. The attack surface is the entire JVM classpath: every Serializable class is a potential gadget. Modern alternatives (JSON via Jackson, Protocol Buffers, Avro) only create objects of declared types, eliminating gadget chains entirely.

What is a “deserialization bomb” and how does it work?
?
A deserialization bomb is a crafted byte stream that causes exponential CPU or memory consumption during deserialization — an effective denial-of-service attack. Example: a deeply nested HashSet of HashSets, where each level requires hashCode() computation on all nested sets. A compact byte stream can trigger 2^100 hashCode operations. It exploits the recursive nature of Java’s default deserialization.

What Java API was introduced in Java 9 (JEP 290) to mitigate deserialization attacks?
?
ObjectInputFilter — deserialization filters that allow you to whitelist or blacklist classes that can be deserialized. Syntax: "com.example.MyClass;java.lang.String;!*" (allow specific classes, reject everything else). Java 17 added JEP 415: Context-Specific Deserialization Filters, which allows setting a SerialFilterFactory that returns per-stream filters based on context, enabling more granular control.

What are the five hidden costs of implementing Serializable?
?

Serialized form becomes public API — internal field structure is locked in across all versions.
Exposes internal representation — private fields are visible in the byte stream.
Bypasses constructors — deserialization creates objects without constructor calls, bypassing invariant checks.
Version compatibility burden — every release must be tested against all prior serialized forms.
Security surface area — every Serializable class is a potential gadget chain participant, even if benign.

What is serialVersionUID and what happens if you don’t declare it?
?
serialVersionUID is a long constant that identifies the class version for serialization compatibility. If you don’t declare it, Java auto-computes it from the class structure (field names, types, method signatures). Any class change — adding a method, reordering fields, changing visibility — alters the computed UID, causing InvalidClassException when deserializing old data. Always declare it explicitly: private static final long serialVersionUID = 1L;

When should you change serialVersionUID versus keep it the same?
?

Keep same UID: when making backward-compatible changes (e.g., adding new fields). Old serialized data is still valid; new fields default to null/0/false.
Change UID: when making incompatible changes (removing fields, changing types, structural restructuring). Changing the UID makes deserialization of old data fail fast with InvalidClassException, rather than silently producing a corrupt object.

What is the default serialized form and why is it often wrong?
?
The default serialized form encodes the entire object graph rooted at the object — all non-transient fields, including private ones and internal implementation structures (e.g., Entry nodes in a linked list). It is often wrong because: (1) it exposes implementation details, (2) it locks in the internal structure as public API, (3) it may cause StackOverflowError on deep recursive structures, (4) it’s often larger than a logical representation would be. The logical content (what the object represents) is usually far simpler than its physical representation (how it’s stored).

What does the transient keyword do in serialization, and what value do transient fields get after deserialization?
?
transient marks a field to be excluded from serialization. It is used for: implementation details (internal node references, cached values), derived/computed fields, external resource handles (sockets, file handles, threads), and any field whose value is inherently session-specific. After deserialization, transient fields get their default values: null for objects, 0 for numeric types, false for booleans. If a non-default value is needed, initialize them in readObject.

What is the correct order of operations in a defensive readObject method?
?

Call s.defaultReadObject() first (always, for forward compatibility).
Make defensive copies of all mutable fields — before validation.
Validate invariants on the copied values.
Throw InvalidObjectException (not IllegalArgumentException) on invariant violations.
Never call overridable methods (object is not fully initialized yet).

The key insight: copy before validate, to sever any shared references an attacker’s crafted byte stream might exploit.

What is the “rogue object reference” attack on readObject?
?
An attacker crafts a byte stream containing a “thief” object that holds a reference to a mutable field of the target object. Java deserializes the entire graph simultaneously, preserving shared references. The thief’s readResolve captures the field reference. Even if readObject later makes a defensive copy, the attacker has the original mutable reference and can mutate the field through it. Defense: copy mutable fields first in readObject — this severs shared references before validation, so the thief’s reference points to a different object.

Why should you throw InvalidObjectException (not IllegalArgumentException) in readObject?
?
InvalidObjectException is the semantically correct exception for invariant violations detected during deserialization. It extends ObjectStreamException, which extends IOException, which is the checked exception type that readObject declares. Using IllegalArgumentException would require wrapping it, and it does not clearly communicate that the failure is deserialization-specific. InvalidObjectException conveys that the byte stream itself is invalid, which is the correct framing.

Why do enums provide better singleton serialization safety than readResolve?
?
With readResolve, a new object is created before readResolve is called — a “thief” object in the byte stream can capture this pre-readResolve instance, giving the attacker a second reference. Enums are different: the Java Language Specification (§8.9.3) guarantees enum deserialization uses Enum.valueOf(Class, String), which looks up the existing constant by name — no new object is created. The readResolve method of enums is explicitly ignored by the serialization mechanism. It is JLS-guaranteed, not application-level protection.

If you must use readResolve for singleton control (cannot use an enum), what is the critical requirement for all instance fields?
?
All instance fields must be transient. If any non-transient reference field exists, an attacker can use the rogue object reference attack to capture the pre-readResolve instance through that field. A non-transient field is an attack vector. This requirement is fragile: if a future developer adds a non-transient field without knowing this constraint, the singleton guarantee is silently broken. This fragility is why enums are strongly preferred.

Describe the three components of the serialization proxy pattern and what each does.
?

writeReplace() on the enclosing class: called before serialization; returns new SerializationProxy(this). The proxy (not this) is what gets serialized.
SerializationProxy: a private static nested class with a constructor that copies the enclosing object’s logical state. It is a simple data holder with no invariants.
readResolve() on the proxy: called after proxy deserialization; calls new EnclosingClass(...) — the real constructor — which enforces all invariants. Returns the properly constructed enclosing object.
readObject() on the enclosing class: throws InvalidObjectException("Proxy required") — prevents direct deserialization attacks that bypass the proxy.

What are the advantages of the serialization proxy pattern over a defensive readObject?
?

Invariants enforced automatically — readResolve calls the real constructor; no duplicate validation code.
Fields can be final — no need to reassign them in readObject; the constructor handles it.
No defensive copies in readObject — the constructor already makes them; proxy is a simple data holder.
Can deserialize into a different type — readResolve can return any object (e.g., EnumSet uses this to choose between RegularEnumSet and JumboEnumSet based on size).
No overridable-method pitfall — readResolve on the proxy calls a constructor, not the object under construction.

What are the limitations of the serialization proxy pattern?
?

Cannot be used if the class can be subclassed — a subclass could bypass the proxy mechanism or override readResolve.
Performance overhead — approximately 14% slower serialization and 36% slower deserialization due to extra object allocation and reflective method dispatch.
Cannot handle circular object graph references — the proxy captures state at serialization time; circular references require special handling.
More code — requires writing SerializationProxy, writeReplace, readResolve, and a defensive readObject on the enclosing class.

How do Java records (Java 16+) change the serialization safety story?
?
Records automatically invoke the canonical constructor during deserialization — unlike regular classes, which bypass all constructors. This means compact constructor invariant checks are automatically enforced on deserialization. For immutable records with immutable field types (Instant, String, etc.), this provides the safety of the serialization proxy pattern with far less code. Additionally, records pair naturally with Jackson for JSON serialization, making Java serialization unnecessary for most data transfer scenarios.

What is the difference between writeObject and writeReplace?
?

writeObject(ObjectOutputStream s): customizes how this object is written to the stream. You call s.defaultWriteObject() then write additional data. The same class instance is still what gets serialized — you are controlling its format.
writeReplace(): returns a replacement object to serialize instead of this. If it returns a different object, that object’s writeObject is called, not the original’s. Used in the serialization proxy pattern to replace the real object with a simple proxy.

What is the difference between readObject and readResolve?
?

readObject(ObjectInputStream s): called during deserialization to read the object’s state from the stream. You call s.defaultReadObject() and then read any additional data you wrote in writeObject. The object being populated is the final result (unless readResolve is also defined).
readResolve(): called after readObject completes. Its return value replaces the deserialized object. Used to enforce singleton/instance control (returning an existing canonical instance) or in the serialization proxy pattern (returning a new object constructed through the real constructor).

What serialization mechanism does EnumSet use internally, and why?
?
EnumSet uses a serialization proxy internally. EnumSet has two concrete implementations: RegularEnumSet (for enums with ≤64 constants, backed by a long bitmask) and JumboEnumSet (for larger enums, backed by a long[]). The serialization proxy captures the logical content (element type + element set) and readResolve calls EnumSet.noneOf(elementType) followed by adding all elements. This means: (1) the concrete type is not locked into the serialized form, and (2) the correct implementation is chosen at deserialization time based on the enum size — which may have grown since serialization.

You have a class with a mutable Date field. What do you need to do to serialize it safely?
?
Three steps: (1) Mark it transient if it is an implementation detail or cached value. (2) If it is logical state that must be serialized, implement writeObject that writes date.getTime() (a long) instead of the Date object itself. (3) Implement readObject that reconstructs the Date from the long with a defensive copy (new Date(dateTime)) and validates any invariants. Alternatively, replace Date with Instant (Java 8+), which is immutable — no defensive copy needed, and it serializes safely via its own serialized form.

What happens to private fields when a class implements Serializable?
?
Private fields are exposed in the serialized byte stream. Any reader of the byte stream can inspect the values of all non-transient private fields. This violates encapsulation: the serialized form reveals the internal implementation. For example, a BankAccount with a private double balance effectively makes that field visible to anyone who intercepts or stores the serialized form. This is one reason Bloch argues serialized form should be treated as a public API — it exposes what would otherwise be hidden.

What is the correct way to declare serialVersionUID and when should you change its value?
?

private static final long serialVersionUID = 1L;

Declare it as private static final long. The value itself is arbitrary; 1L is conventional for new classes. Keep the same value for backward-compatible changes (new optional fields). Change the value for incompatible changes (removed fields, changed types, restructured class) to produce a fail-fast InvalidClassException rather than silently creating a corrupt deserialized object. Never rely on auto-generated UIDs — they change with any class modification.

Why are inner classes poor candidates for implementing Serializable?
?
The serialized form of a non-static inner class contains an implicit reference to the enclosing instance. This reference must also be serializable, which is often not the case. More critically, the serialized form of inner classes is compiler-defined and unspecified — different compilers may produce different forms for the same source code. Static nested classes do not have this problem and can implement Serializable more reliably. Bloch’s rule: never implement Serializable in an inner class (non-static nested class).

What does calling s.defaultReadObject() or s.defaultWriteObject() do, and why should you always call it?
?
s.defaultWriteObject() serializes all non-transient instance fields using the default mechanism. s.defaultReadObject() deserializes them. You should always call these first in custom writeObject/readObject, even if you think the class has no non-transient fields, because: (1) it ensures future fields added by a subclass or future version are handled correctly, (2) it maintains the normal serialization stream format that other serialization tools expect, and (3) it enables use of ObjectStreamClass features like field selection and filtering.

What is the safest modern alternative to Java serialization for a REST API response?
?
Jackson (com.fasterxml.jackson.databind.ObjectMapper) for JSON serialization. It is the de facto standard for Java REST APIs. Advantages over Java serialization: (1) only creates objects of declared types — no gadget chains, (2) human-readable format for debugging, (3) language-agnostic — any client can consume it, (4) schema-independent — field names and types, not class structure, define the format. Use Java records (Java 16+) or explicit DTO classes, annotated with Jackson annotations as needed. For high-throughput internal services, Protocol Buffers offer binary efficiency with type safety.

What does Bloch mean when he says “the serialized form is an eternal commitment”?
?
Once a class with Serializable is distributed to clients, the on-wire format of its serialized data becomes a permanent contract. If clients store serialized instances (in databases, caches, files) or transmit them over a network, you must ensure that any future version of the class can still deserialize those stored bytes. You cannot rename fields, change their types, remove fields, or restructure the class without breaking existing serialized data — unless you write explicit migration code in readObject. This commitment is “eternal” because serialized data can persist indefinitely, and you cannot control or upgrade all clients simultaneously.

How does the serialization proxy pattern handle the case where readObject is called directly (e.g., via a crafted byte stream)?
?
The enclosing class implements readObject as follows:

private void readObject(ObjectInputStream stream)
        throws InvalidObjectException {
    throw new InvalidObjectException("Proxy required");
}

This always throws if someone attempts to deserialize the enclosing class directly. Since writeReplace ensures the enclosing class is never serialized (only the proxy is), a direct deserialization attempt means a crafted or corrupt byte stream. Throwing InvalidObjectException prevents the attacker from bypassing the proxy and constructing the object in an invalid state.

Total Cards: 25
Review Time: ~30 minutes
Priority: MEDIUM
Last Updated: 2026-05-10

Study Notes by Niladri & AI

Explorer

ch11-flashcards

Chapter 11 Flashcards — Serialization

Graph View