Chapter 6: Lambdas and Streams

effective-java lambdas streams functional-programming java best-practices

Book: Effective Java, 3rd Edition β€” Joshua Bloch
Status: 🟩 Complete
Difficulty: Medium
Items: 42-48 (7 items)
Time to complete: ~45 min


Overview

Java 8 introduced lambdas, method references, functional interfaces, and the Stream API β€” collectively the most significant API additions since generics in Java 5. These features enable a more declarative, concise, and composable programming style, but they come with traps: over-use of streams harms readability, side effects in stream operations break the mental model, and parallel streams are frequently misused in ways that hurt performance rather than help it.

This chapter gives practical guidance on when each feature applies, what pitfalls to avoid, and where the traditional imperative style still wins. The goal is not to write streams everywhere β€” it is to use each tool where it genuinely improves clarity and correctness.

Java 16-17 updates (Stream.toList(), Collectors.teeing(), sealed interfaces, virtual threads) are integrated throughout.


Items

Item 42: Prefer Lambdas to Anonymous Classes

The Problem

Before Java 8, the only way to create a function object β€” an object that represents a piece of behavior β€” was an anonymous class. Anonymous classes are verbose, obscuring the logic with boilerplate.

// BAD: Anonymous class as a Comparator β€” verbose, obscures intent
Collections.sort(words, new Comparator<String>() {
    @Override
    public int compare(String s1, String s2) {
        return Integer.compare(s1.length(), s2.length());
    }
});
 
// BAD: Anonymous Runnable
new Thread(new Runnable() {
    @Override
    public void run() {
        System.out.println("Running in thread");
    }
}).start();

The Solution

Use lambdas for any functional interface β€” the boilerplate disappears, leaving only the logic. A functional interface is any interface with exactly one abstract method.

// GOOD: Lambda Comparator β€” concise and clear
Collections.sort(words, (s1, s2) -> Integer.compare(s1.length(), s2.length()));
 
// Even better: method reference + Comparator factory
words.sort(Comparator.comparingInt(String::length));
 
// GOOD: Lambda Runnable
new Thread(() -> System.out.println("Running in thread")).start();
 
// GOOD: Lambda in a map (replacing enum switch)
// From Item 34 β€” operation enum can use lambdas as fields
public enum Operation {
    PLUS  ("+", (x, y) -> x + y),
    MINUS ("-", (x, y) -> x - y),
    TIMES ("*", (x, y) -> x * y),
    DIVIDE("/", (x, y) -> x / y);
 
    private final String symbol;
    private final DoubleBinaryOperator op;
 
    Operation(String symbol, DoubleBinaryOperator op) {
        this.symbol = symbol;
        this.op = op;
    }
 
    public double apply(double x, double y) { return op.applyAsDouble(x, y); }
}

When Lambdas Do NOT Replace Anonymous Classes

Lambdas have important limitations:

  1. Lambdas cannot have state or self-reference: There is no way to refer to a lambda itself from inside the lambda body. If the function needs recursion or needs to reference itself, use an anonymous class with a named method.
  2. Lambdas cannot implement multiple interfaces: If you need an object that is both Runnable and Serializable, an anonymous class can implement both; a lambda cannot.
  3. Lambdas are restricted to functional interfaces: Anonymous classes can implement abstract classes or interfaces with multiple abstract methods.
  4. Stack traces are cryptic: Lambda stack traces contain synthetic names like $$Lambda$1/0x0000..., making debugging harder.
// When anonymous class is still appropriate: self-referential recursion
Comparator<String> comp = new Comparator<>() {
    @Override
    public int compare(String s1, String s2) {
        // Can reference 'this' for recursion or self-delegation
        if (s1.isEmpty()) return this.compare("a", s2); // contrived, but valid
        return s1.compareTo(s2);
    }
};

Why This Works

Lambdas rely on target typing β€” the compiler infers the lambda’s type from the context (the expected functional interface type). Type parameters are usually inferred, keeping the syntax clean. The JVM compiles lambdas to efficient invokedynamic instructions rather than generating anonymous inner class files, so they are generally faster to load.

When to Apply / When NOT to Apply

  • Apply for any single-method behavioral parameter: sorting, filtering, mapping, event handling, callbacks
  • Do NOT use lambdas longer than ~3 lines β€” extract to a named method and use a method reference instead
  • Do NOT use lambdas when you need this to refer to the enclosing class (lambdas capture the enclosing this; anonymous classes shadow it with their own this)
  • Do NOT use lambdas for abstract classes or multi-method interfaces

Java 17 Update β€” Local Records

Java 16+ local records can replace anonymous classes used as data carriers in lambda chains. When you need a temporary named tuple to flow through a stream pipeline:

// Java 16+: Local record as a data carrier, replacing anonymous class
void processOrders(List<Order> orders) {
    // Local record defined inside the method
    record OrderSummary(Order order, double discountedPrice) {}
 
    orders.stream()
          .filter(o -> o.isActive())
          .map(o -> new OrderSummary(o, computeDiscount(o)))
          .filter(s -> s.discountedPrice() > 100)
          .forEach(s -> ship(s.order()));
}

This is cleaner than an anonymous class that implements a structural interface or a Map.Entry hack.


Item 43: Prefer Method References to Lambdas

The Problem

Lambdas are already concise, but when a lambda does nothing except call an existing method, a method reference is even more concise and communicates intent through the method name. Over-verbose lambdas obscure names that already carry meaning.

// VERBOSE lambda β€” obscures the existing method name
map.merge(key, 1, (count, incr) -> count + incr);
 
// Cleaner with method reference
map.merge(key, 1, Integer::sum);
 
// Another common example
list.stream()
    .filter(s -> s.isEmpty())  // verbose
    .count();
 
list.stream()
    .filter(String::isEmpty)   // cleaner β€” method name says it all
    .count();

The Four Types of Method References β€” Complete Table

TypeSyntaxLambda EquivalentExample
StaticType::staticMethod(args) -> Type.staticMethod(args)Integer::parseInt
Bound instanceinstance::instanceMethod(args) -> instance.instanceMethod(args)System.out::println
Unbound instanceType::instanceMethod(obj, args) -> obj.instanceMethod(args)String::toLowerCase
ConstructorType::new(args) -> new Type(args)ArrayList::new
// STATIC: The method to call is a static method of the type
Function<String, Integer>  parser   = Integer::parseInt;
// Equivalent lambda: s -> Integer.parseInt(s)
 
// BOUND INSTANCE: The receiver is a specific instance captured at creation time
String prefix = "Hello, ";
Function<String, String>   greeter  = prefix::concat;
// Equivalent lambda: s -> prefix.concat(s)
 
// UNBOUND INSTANCE: The receiver is the first argument of the lambda
Function<String, String>   toLower  = String::toLowerCase;
// Equivalent lambda: s -> s.toLowerCase()
 
Comparator<String>         byLen    = Comparator.comparingInt(String::length);
// Equivalent lambda: s -> s.length() (used as a key extractor)
 
// CONSTRUCTOR: Creates a new instance
Supplier<List<String>>     listMaker = ArrayList::new;
// Equivalent lambda: () -> new ArrayList<>()
 
Function<Integer, int[]>   arrMaker = int[]::new;
// Equivalent lambda: n -> new int[n]

When Lambdas Are Preferable

A method reference is NOT always cleaner. If the method name is longer than the lambda, the lambda can be clearer:

// Lambda is cleaner here β€” method name adds no information
service.execute(GoshThisClassHasALongName::action);
// vs.
service.execute(() -> action()); // or just action() if in scope

A lambda is also clearer when the lambda parameter names provide important documentation:

// Lambda β€” parameter names 'numerator' and 'denominator' explain the operation
BiFunction<Double, Double, Double> divide = (numerator, denominator) -> numerator / denominator;
// Method reference loses this documentation
BiFunction<Double, Double, Double> divide = Math::IEEEremainder; // confusing β€” wrong semantics too

Why This Works

Method references serve as shorthand for lambdas that simply forward to an existing method. The method’s name is self-documenting. IDEs (IntelliJ, Eclipse) automatically suggest converting lambdas to method references where applicable, so the ecosystem reinforces this preference.

When to Apply / When NOT to Apply

  • Apply whenever a lambda does nothing but call a single existing method
  • Do NOT apply if the method name is longer or less clear than a simple lambda
  • Do NOT apply if the lambda’s parameter names provide meaningful documentation that would be lost

Java 17 Update

No language changes. However, with var (Java 10+) in local variable declarations and var in lambda parameters (Java 11+, part of JEP 323), you can add annotations to lambda parameters:

// Java 11+: @NonNull annotation on lambda parameter (requires var)
list.stream()
    .filter((@NonNull var s) -> !s.isEmpty())
    .collect(Collectors.toList());

This is a niche use case. The practical guidance of β€œprefer method references to lambdas” is unchanged.


Item 44: Favor the Use of Standard Functional Interfaces

The Problem

With lambdas, it’s tempting to define custom functional interfaces for every callback. This is unnecessary and pollutes the codebase β€” the JDK ships 43 functional interfaces in java.util.function that cover almost every use case.

// BAD: Custom functional interface for a predicate β€” already exists!
@FunctionalInterface
public interface StringChecker {
    boolean check(String s);
}
 
// BAD: Custom interface for a string transformer β€” already exists!
@FunctionalInterface
public interface StringTransformer {
    String transform(String s);
}

The Solution

Use the standard functional interfaces from java.util.function. The six primary interfaces and their roles:

InterfaceSignatureDescription
Predicate<T>boolean test(T t)Boolean-valued function of one argument
Function<T, R>R apply(T t)Function from T to R
Supplier<T>T get()Provides a T without input
Consumer<T>void accept(T t)Consumes a T, returns nothing
UnaryOperator<T>T apply(T t)Function where input and output types are the same
BinaryOperator<T>T apply(T t1, T t2)Binary function where all types are the same

The six primary interfaces have primitive specializations to avoid boxing overhead:

// Primitive specializations β€” avoid boxing/unboxing
IntPredicate   evenCheck = n -> n % 2 == 0;         // int β†’ boolean, not Integer
LongFunction<String> fmt = l -> String.valueOf(l);  // long β†’ Object
IntSupplier    counter   = () -> 42;                 // () β†’ int
IntConsumer    printer   = System.out::println;      // int β†’ void
IntUnaryOperator   negate = n -> -n;                // int β†’ int
IntBinaryOperator  add    = Integer::sum;            // (int, int) β†’ int
 
// BiFunction variants for two-argument functions
BiPredicate<String, Integer> check = (s, n) -> s.length() == n;
BiFunction<String, String, String> concat = String::concat;
BiConsumer<String, Integer>  print  = (s, n) -> System.out.println(s + n);

When to Define Your Own Functional Interface

Bloch recommends defining a custom @FunctionalInterface only when:

  1. It will be commonly used and could benefit from a descriptive name (e.g., Comparator<T> is more descriptive than BiFunction<T, T, Integer>)
  2. It has a strong contract associated with it (documented behavior expectations)
  3. It can benefit from default methods (e.g., Comparator’s thenComparing, reversed)
// GOOD: Custom functional interface β€” worth defining because of contract + name
@FunctionalInterface
public interface ElevatedTask<T, R> {
    R execute(T input) throws Exception; // Declares checked exception β€” Function<T,R> cannot
}
 
// Standard interfaces cannot declare checked exceptions β€” this is a valid reason to define custom
// Usage examples of standard functional interfaces
// Predicate β€” filtering
List<String> nonEmpty = names.stream()
    .filter(Predicate.not(String::isEmpty))  // Java 11+: Predicate.not()
    .collect(Collectors.toList());
 
// Function composition
Function<String, String> trim    = String::trim;
Function<String, String> toUpper = String::toUpperCase;
Function<String, String> normalize = trim.andThen(toUpper);
 
// Consumer chaining
Consumer<String> log  = s -> logger.info(s);
Consumer<String> save = repository::save;
Consumer<String> logAndSave = log.andThen(save);

Why This Works

Standard functional interfaces form a shared vocabulary β€” every Java developer knows Predicate, Function, Supplier, Consumer. Using them means callers can pass lambdas, method references, or existing function objects without casting. Custom interfaces fragment the API and require users to learn new names.

When to Apply / When NOT to Apply

  • Always start with a standard functional interface; only define a custom one if there is a clear, compelling reason
  • Always annotate custom functional interfaces with @FunctionalInterface β€” this causes a compile error if the interface accidentally has more or fewer than one abstract method, and documents the intent
  • Use primitive specializations (IntPredicate, LongFunction, etc.) when working with primitives to avoid autoboxing overhead

Java 17 Update

No new functional interfaces in Java 17. However:

  • Predicate.not() (Java 11): Negates a method reference cleanly: filter(Predicate.not(String::isBlank))
  • Function.identity(): Still the standard no-op function
  • The functional interface ecosystem is stable; Quarkus/Micronaut use them extensively in their reactive APIs

Item 45: Use Streams Judiciously

The Problem

The Stream API is powerful but not universally better than loops. Overusing streams leads to code that is hard to read, debug, and maintain. Streams also have limitations: you cannot break or continue out of a stream pipeline, you cannot modify local variables in stream lambdas (effectively final constraint), and certain operations (e.g., reading adjacent elements) are awkward in streams.

// BAD: Stream where a loop is clearer β€” computing Cartesian product
// (Bloch's deck of cards example)
// Streams version β€” confusing flatMap
static List<String> deck() {
    return Stream.of(SUIT_VALUES)
        .flatMap(suit -> Stream.of(RANK_VALUES)
                               .map(rank -> rank + " of " + suit))
        .collect(Collectors.toList());
}
 
// Also BAD: Forcing streams where you need to modify local state
// (Lambdas cannot assign to non-final local variables)
int[] count = {0}; // Hack β€” defeats the purpose of functional style
list.stream().forEach(s -> count[0]++); // BAD: side effect in stream
 
// GOOD: Just use a loop
int count = 0;
for (String s : list) count++;
// Or: list.size()

The Solution β€” When Streams Shine

Streams excel at:

  1. Uniform transformations of sequences: map, filter, flatMap
  2. Collecting results: group, partition, count, find min/max
  3. Searching: findFirst, findAny, anyMatch, allMatch
  4. Aggregation: reduce, sum, average, count
// GOOD: Stream for transformation pipeline β€” clear and concise
// Find top 5 longest distinct words from a text
List<String> top5Longest = words.stream()
    .distinct()
    .sorted(Comparator.comparingInt(String::length).reversed())
    .limit(5)
    .collect(Collectors.toList());
 
// GOOD: Anagram grouping (classic stream example)
Map<String, List<String>> anagramGroups = words.stream()
    .collect(Collectors.groupingBy(word -> word.chars()
                                               .sorted()
                                               .collect(StringBuilder::new,
                                                        (sb, c) -> sb.append((char) c),
                                                        StringBuilder::append)
                                               .toString()));

When NOT to Use Streams

// BAD: Stream for reading adjacent elements β€” awkward
// Detecting consecutive duplicates
// There is no clean way to read element[i] and element[i+1] in a stream
// Use a loop instead:
List<String> deduped = new ArrayList<>();
for (int i = 0; i < list.size(); i++) {
    if (i == 0 || !list.get(i).equals(list.get(i - 1)))
        deduped.add(list.get(i));
}
 
// BAD: Stream when you need to return from inside a loop
// (You can't return from a lambda β€” only from the lambda itself, not the enclosing method)
Optional<String> found = list.stream()
    .filter(s -> someComplexCondition(s))
    .findFirst();
// This is actually fine β€” findFirst + Optional is the stream-idiomatic way
 
// When the loop version is better β€” multiple exit points
for (String s : list) {
    if (condition1(s)) return "early exit 1";
    if (condition2(s)) return "early exit 2";
    process(s);
}

Streams vs. Traditional Loops β€” Performance Reality

ScenarioWinnerNotes
Sequential, small collections (< 1,000 elements)Loop or Stream (tie)JIT optimizes both; readability decides
Large data, simple operations (filter/map/collect)StreamLazy evaluation avoids materializing intermediates
Large data, parallelizable, CPU-boundParallel stream (carefully)See Item 48
Adjacent element accessLoopNo clean stream equivalent
Early exit with multiple conditionsLoopreturn/break inside a loop
Checked exceptions in the operationLoopLambdas cannot throw checked exceptions cleanly
Mutable reduction (e.g., building a StringBuilder)Bothreduce + collect vs. StringBuilder in loop
Primitive int/long arraysStream (IntStream/LongStream)Avoids autoboxing

Why This Works

Streams represent a lazy pipeline evaluation model β€” intermediate operations (filter, map) do not process elements until a terminal operation triggers execution. This avoids materializing intermediate collections and can be more memory-efficient for large data sets. However, the overhead of creating a stream, lambda objects, and the pipeline machinery is measurable for very small collections β€” where a simple loop dominates.

When to Apply / When NOT to Apply

  • Apply streams when the logic is a natural pipeline: transform, filter, aggregate
  • Do NOT apply when: you need to read adjacent elements, you need to modify local state, the logic has complex branching, or a simple loop is plainly more readable to your team
  • The best code is the most readable code β€” if the stream version requires a long comment to explain it, the loop is probably better

Java 17 Update β€” Stream.toList()

Java 16 added Stream.toList() as a shorthand for collect(Collectors.toUnmodifiableList()). The result is an unmodifiable list:

// Before Java 16
List<String> result = words.stream()
    .filter(s -> s.length() > 3)
    .collect(Collectors.toList());  // Returns a modifiable ArrayList
 
// Java 16+: Stream.toList() β€” unmodifiable, slightly more efficient
List<String> result = words.stream()
    .filter(s -> s.length() > 3)
    .toList();  // Returns an unmodifiable list
 
// Important difference:
// Collectors.toList() β†’ modifiable (implementation is ArrayList)
// Stream.toList()     β†’ unmodifiable (throws UnsupportedOperationException on add/remove)
// Collectors.toUnmodifiableList() β†’ unmodifiable (equivalent to Stream.toList())

Item 46: Prefer Side-Effect-Free Functions in Streams

The Problem

The stream paradigm is built on pure functions β€” functions with no side effects whose result depends only on their inputs. Using streams with side-effecting lambdas undermines the whole model: it makes code hard to reason about, breaks parallelism, and defeats the purpose of using streams.

The most common mistake is using forEach as a glorified loop body:

// BAD: Side effects in stream operations β€” disguised loop
// Collects word frequencies into a map via forEach
Map<String, Long> freq = new HashMap<>();
words.stream()
     .forEach(word -> freq.merge(word.toLowerCase(), 1L, Long::sum)); // SIDE EFFECT
 
// This is worse than a simple loop β€” it provides no benefits of the stream model
// and adds overhead of the stream pipeline
 
// BAD: forEach for printing with state mutation
List<String> collected = new ArrayList<>();
words.stream()
     .filter(s -> s.length() > 3)
     .forEach(s -> collected.add(s)); // Mutating an external list β€” wrong

The Solution

Use collectors for aggregation, not side effects in forEach. forEach should be used only for reporting results β€” printing, logging, persisting β€” not for computation.

// GOOD: Collector-based aggregation β€” no side effects
Map<String, Long> freq = words.stream()
    .collect(Collectors.groupingBy(String::toLowerCase, Collectors.counting()));
 
// GOOD: Correct use of forEach β€” reporting only
freq.forEach((word, count) ->
    System.out.printf("%-15s %d%n", word, count));
 
// GOOD: toList, toSet, toMap collectors
List<String> topWords = freq.entrySet().stream()
    .sorted(Map.Entry.<String, Long>comparingByValue().reversed())
    .limit(10)
    .map(Map.Entry::getKey)
    .collect(Collectors.toList());
// Java 16+:
    .toList();

Essential Collectors β€” Reference Table

// Collecting to a collection
Collectors.toList()               // mutable list
Collectors.toUnmodifiableList()   // unmodifiable list (equivalent to Stream.toList())
Collectors.toSet()                // mutable set
Collectors.toUnmodifiableSet()    // unmodifiable set
Collectors.toCollection(TreeSet::new) // specific collection type
 
// Collectors.toMap β€” two-arg and three-arg forms
Collectors.toMap(keyFn, valueFn)                       // throws on duplicate keys
Collectors.toMap(keyFn, valueFn, mergeFn)              // merge on duplicate keys
Collectors.toMap(keyFn, valueFn, mergeFn, mapSupplier) // specific map type
 
// Grouping and partitioning
Collectors.groupingBy(classifier)                          // Map<K, List<V>>
Collectors.groupingBy(classifier, downstream)              // with downstream collector
Collectors.groupingBy(classifier, mapFactory, downstream)  // specific map type
Collectors.partitioningBy(predicate)                       // Map<Boolean, List<V>>
Collectors.partitioningBy(predicate, downstream)
 
// Counting and statistics
Collectors.counting()
Collectors.summingInt(fn)
Collectors.averagingDouble(fn)
Collectors.summarizingInt(fn)   // IntSummaryStatistics (count, sum, min, max, avg)
 
// Joining strings
Collectors.joining()
Collectors.joining(delimiter)
Collectors.joining(delimiter, prefix, suffix)
 
// Reduction
Collectors.reducing(identity, fn)
Collectors.reducing(identity, mapper, fn)
 
// Mapping downstream
Collectors.mapping(fn, downstream)     // transform then collect
Collectors.flatMapping(fn, downstream) // Java 9+: flatMap then collect
Collectors.filtering(pred, downstream) // Java 9+: filter then collect
 
// Teeing (Java 12+): collect to two collectors and merge
Collectors.teeing(downstream1, downstream2, merger)

Collectors.teeing() Example (Java 12+)

// Compute count and sum simultaneously in a single pass
record Stats(long count, double sum) {}
 
Stats stats = numbers.stream()
    .collect(Collectors.teeing(
        Collectors.counting(),
        Collectors.summingDouble(Double::doubleValue),
        Stats::new
    ));
// One stream pass, two collectors, one merge β€” elegant but non-obvious
// Use sparingly β€” can harm readability

Why This Works

Pure functions make streams composable, testable, and parallelizable. A lambda with no side effects can be applied in any order, to any subset of elements, by any number of threads β€” which is exactly what the parallel stream infrastructure requires. Side-effecting lambdas introduce ordering dependencies and race conditions that make parallelism dangerous.

When to Apply / When NOT to Apply

  • Always use collectors for aggregation; never use forEach for computation
  • forEach is appropriate only at the β€œend” of a pipeline for I/O, logging, or invoking a method with external effects
  • Collectors.teeing() (Java 12+) is powerful but use it only when a single-pass, dual-result collection genuinely simplifies the code β€” it can be cryptic

Java 17 Update

Collectors.teeing() was added in Java 12 β€” use it for two-result aggregation without materializing intermediate lists. Collectors.flatMapping() and Collectors.filtering() (Java 9+) allow downstream composition without intermediate flatMap/filter calls on the stream itself.


Item 47: Prefer Collection to Stream as a Return Type

The Problem

When a method returns a sequence of elements, the choice between returning a Stream, an Iterable, or a Collection has significant consequences for callers.

// BAD: Returning Stream β€” callers who want to iterate (non-stream) are stuck
public Stream<ProcessHandle> parentProcess() {
    return ProcessHandle.current().parent().stream();
}
 
// Caller who just wants to iterate is forced to use awkward forEach
parentProcess().forEach(p -> System.out.println(p.pid()));
// Or they have to convert: parentProcess().collect(toList()).iterator()
 
// BAD: Returning Iterable β€” callers who want to use stream operations are stuck
public Iterable<String> getNames() { ... }
// Caller who wants a stream: StreamSupport.stream(getNames().spliterator(), false)
// Ugly! And Iterable is not Iterable<String> when used with generics in some contexts

The Solution

Return Collection (or a subtype like List or Set) when the sequence is small enough to materialize in memory. Collection implements both Iterable (for for-each loops) and provides stream() for stream pipelines β€” callers get both APIs.

// GOOD: Return Collection β€” serves both iterator-based and stream-based callers
public List<Anagram> getAnagrams(String word) {
    return anagramMap.getOrDefault(word, Collections.emptyList());
}
 
// Caller using iteration:
for (Anagram a : dict.getAnagrams("silent")) { ... }
 
// Caller using streams:
dict.getAnagrams("silent").stream()
    .filter(a -> a.length() > 4)
    .forEach(System.out::println);

When the Sequence is Too Large to Materialize

For very large or infinite sequences, materializing into a Collection is impractical. In these cases, returning Stream is appropriate β€” just be aware that callers who want to iterate need stream.iterator() or StreamSupport.stream().

// GOOD: Return Stream for large/computed/infinite sequences
public static Stream<BigInteger> primes() {
    return Stream.iterate(BigInteger.TWO, BigInteger::nextProbablePrime);
}
// Callers know this is a stream-only return β€” lazy, potentially infinite
 
// Providing both: two methods
public Stream<T> stream()   { return IntStream.range(0, size()).mapToObj(this::get); }
public Iterable<T> asIterable() { return this::iterator; }

The Iterable Gap

There is an annoying gap: Stream implements AutoCloseable and has iterator(), but does NOT implement Iterable β€” you cannot use a Stream directly in a for-each loop:

// DOES NOT COMPILE β€” Stream does not implement Iterable
for (ProcessHandle ph : ProcessHandle.allProcesses()) { ... } // ERROR
// ProcessHandle.allProcesses() returns Stream<ProcessHandle>
 
// Workaround 1: Cast (ugly but functional)
for (ProcessHandle ph : (Iterable<ProcessHandle>) ProcessHandle.allProcesses()::iterator) { ... }
 
// Workaround 2: Adapter method
public static <E> Iterable<E> iterableOf(Stream<E> stream) {
    return stream::iterator;
}
for (ProcessHandle ph : iterableOf(ProcessHandle.allProcesses())) { ... }
 
// Best solution: if you control the API, return Collection or List

Why This Works

Collection is the richest common abstraction for finite sequences in the JDK β€” it supports size(), contains(), stream(), iterator(), toArray(), and more. Returning Collection or a subtype maximizes compatibility. The only reason to return Stream is when the sequence is lazy, infinite, or too large to materialize β€” and even then, document this clearly.

When to Apply / When NOT to Apply

  • Return List or Set for most finite sequences β€” they give callers the most flexibility
  • Return Collection (abstract) when you want to abstract over the specific collection type
  • Return Stream only for inherently lazy or unbounded sequences, or in strongly stream-centric APIs where callers will always chain stream operations
  • Do NOT return Iterable β€” it’s a subset of Collection with no additional benefit and loses stream() support from the interface type (though the underlying object likely has it)

Java 17 Update

The Iterable gap still exists. Stream.toList() (Java 16+) makes converting a stream result to a list easier on the caller side, but doesn’t change the advice for return types. The gap between Stream and Iterable is a known wart in the Java API that has not been resolved.


Item 48: Use Caution When Making Streams Parallel

The Problem

stream().parallel() looks like a free performance boost but is one of the most frequently misused features in Java. Parallel streams can cause incorrect results, liveness failures, and performance degradation if used incorrectly.

// BAD: Parallel stream on wrong data source β€” no speedup or worse
// LinkedList β€” cannot split efficiently
Stream.of(linkedList.toArray())  // had to convert first β€” already lost
      .parallel()
      .filter(...)
      .collect(...);
 
// BAD: Parallel stream with an inherently sequential operation
// The Ο€ computation (Leibniz formula) β€” reduction across an ordered stream of rationals
// This cannot be parallelized efficiently
LongStream.rangeClosed(1, n)
          .parallel()        // BAD: each step depends on previous
          .mapToObj(...)
          .reduce(BigDecimal.ZERO, BigDecimal::add);
 
// VERY BAD: Parallel stream with side effects β€” race condition
List<Long> results = new ArrayList<>(); // not thread-safe!
LongStream.range(0, 1_000_000)
          .parallel()
          .filter(n -> isPrime(n))
          .forEach(results::add); // RACE CONDITION: data corruption

The Solution

Use parallel streams only when:

  1. The data source splits efficiently: ArrayList, arrays, IntStream.range(), LongStream.range() β€” splittable in O(1). LinkedList, Stream.iterate(), BufferedReader.lines() β€” poor splittability.
  2. The operations are CPU-bound and non-trivial: trivial operations (just filtering or mapping with a cheap predicate) don’t justify the parallelism overhead.
  3. The collection is large enough: the rule of thumb is N * Q > 10,000 where N = number of elements and Q = operation cost per element (in basic ops). For small N, sequential is always faster.
  4. No shared mutable state: all operations must be stateless and non-interfering.
  5. Order doesn’t matter or you use unordered(): findAny is faster than findFirst in parallel; use unordered() on ordered streams if order doesn’t matter.
// GOOD: Parallel stream where it makes sense
// Count prime numbers in a large range β€” CPU-bound, no shared state, array-based
long primeCount = LongStream.rangeClosed(2, 2_000_000)
    .parallel()
    .filter(n -> isPrime(n))  // isPrime is CPU-bound, stateless
    .count();
// Empirically ~4x faster on a quad-core machine than sequential version
 
// GOOD: Parallel stream on ArrayList with heavy processing
List<ProcessedItem> result = largeArrayList.parallelStream()
    .map(this::expensiveTransformation)  // each call takes ~1ms
    .collect(Collectors.toList());
// Efficient: ArrayList splits in O(1), operation is heavy enough to justify overhead

Splittability of Common Data Sources

Data SourceSplittabilityNotes
ArrayListExcellentO(1) split by index
int[], long[], double[] arraysExcellentO(1) split by index
IntStream.range(), LongStream.range()ExcellentO(1) arithmetic split
HashSet, HashMap keySet/valuesGoodSplits via internal segment boundaries
TreeSet, TreeMapFairSplit by structural midpoints
LinkedListPoorO(n) to find midpoint
Stream.iterate()Very PoorInherently sequential β€” cannot split
BufferedReader.lines()PoorSequential file I/O

Parallel Streams and Collectors

// Collectors are generally safe with parallel streams β€” they use internal combining
Map<String, Long> freqParallel = words.parallelStream()
    .collect(Collectors.groupingByConcurrent(  // Use groupingByConcurrent for parallel
                 String::toLowerCase,
                 Collectors.counting()));
// groupingBy on a parallel stream creates ConcurrentHashMap internally
// groupingByConcurrent is more efficient as it uses a single shared concurrent map
 
// WARNING: Collectors.toList() is safe in parallel (uses combiner)
// Collectors.toMap() without a merge function THROWS on duplicate keys in parallel
// (race condition in detecting duplicates)

Performance Reality β€” Parallel Streams

Real-world benchmarks (JMH) show:

  • Sequential streams are typically 10-40% slower than traditional for loops for simple operations (loop overhead from lambda dispatch and pipeline machinery)
  • Parallel streams break even vs. sequential at roughly 10,000+ elements for cheap operations, and at 1,000+ elements for expensive operations
  • Parallel streams can easily be slower than sequential for: small collections, poorly splittable sources, operations with contention on shared state, and operations that collect into ordered structures
  • The best parallel speedup is achieved with: large ArrayList/arrays, CPU-bound stateless operations, and count()/sum()/reduce() terminal operations

Why This Works (When It Does)

Java’s parallel streams use the ForkJoin framework internally β€” they recursively split the data source, process each half in a separate thread, and combine results. This achieves linear scaling (up to the number of CPU cores) when: the split is cheap, the operation is expensive enough to justify thread coordination overhead, and there are no ordering or mutable-state constraints.

When to Apply / When NOT to Apply

  • Apply parallel streams when: data source is an ArrayList/array, N > 10,000, operation is CPU-bound and stateless, and results have been benchmarked (JMH)
  • Do NOT apply: on small collections, on LinkedList/Stream.iterate(), for I/O-bound operations, when there is shared mutable state, or when you haven’t measured the difference
  • Always benchmark with JMH before and after β€” the JVM’s adaptive JIT often closes the gap between sequential and parallel for small collections
  • Virtual Threads (Java 21): For I/O-bound work, virtual threads (Project Loom) are a better concurrency model than parallel streams. Parallel streams are for CPU-bound parallelism; virtual threads handle I/O-bound concurrency. Do not use parallel streams to compensate for blocking I/O β€” use virtual threads instead.

Java 17/21 Update β€” Virtual Threads

Java 21 (mainstream virtual threads via Project Loom) significantly changes the concurrency landscape for I/O-bound work. Key guidance:

// WRONG mental model: Use parallel streams for I/O-bound work
// (waiting on DB, HTTP calls, file reads)
urls.parallelStream()
    .map(url -> fetchFromInternet(url)) // BLOCKS a platform thread! BAD
    .collect(toList());
 
// CORRECT for I/O-bound work: Virtual threads via ExecutorService (Java 21)
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    List<Future<String>> futures = urls.stream()
        .map(url -> executor.submit(() -> fetchFromInternet(url)))
        .collect(toList());
    // Collect results from futures
}
// Virtual threads handle millions of concurrent I/O-bound tasks efficiently
 
// Parallel streams remain the right choice for CPU-bound work
// (image processing, cryptography, large numerical computation)

The guidance from Item 48 is not weakened by virtual threads β€” it is clarified: parallel streams are for CPU-bound parallelism, virtual threads are for I/O-bound concurrency. Confusing the two is a common mistake in Java 21+.


Interview Questions & Exercises

Q1: What is the difference between map and flatMap in streams? When would you use each?

Context: Core stream API question β€” asked at all levels. Commonly followed by a hands-on coding exercise.

Answer:

  • map(Function<T, R>) applies a function to each element, producing exactly one output element per input element. The stream goes from Stream<T> to Stream<R>.
  • flatMap(Function<T, Stream<R>>) applies a function that produces a stream for each element, then flattens all those streams into a single stream. Used when the mapping produces a variable number of results per element.
// map: one-to-one
List<String> words = List.of("hello", "world");
List<Integer> lengths = words.stream()
    .map(String::length)   // "hello"β†’5, "world"β†’5
    .collect(toList());    // [5, 5]
 
// flatMap: one-to-many, then flatten
List<String> letters = words.stream()
    .flatMap(w -> Arrays.stream(w.split(""))) // "hello"β†’["h","e","l","l","o"]
    .collect(toList());                        // ["h","e","l","l","o","w","o","r","l","d"]
 
// Common use case: flattening nested collections
List<List<Integer>> nested = List.of(List.of(1, 2), List.of(3, 4));
List<Integer> flat = nested.stream()
    .flatMap(Collection::stream)
    .collect(toList()); // [1, 2, 3, 4]

Follow-up: β€œWhat does flatMap do with empty streams?” β€” It simply contributes nothing to the output, effectively filtering out elements whose mapping produces an empty stream. This is useful for filtering and mapping simultaneously.


Q2: Why can’t you use a non-effectively-final variable inside a lambda? How do you work around it?

Context: Common Java 8 interview question about lambda capture semantics.

Answer: Lambdas capture variables from the enclosing scope, but only if those variables are effectively final (never reassigned after their initial assignment). This constraint exists because:

  1. Lambdas may be executed in a different thread or at a later time β€” mutable captured variables would introduce race conditions and inconsistency
  2. Lambdas are compiled to methods on the enclosing class or synthetic classes β€” mutable local variables cannot be safely shared without synchronization

Workarounds:

// Problem:
int count = 0;
list.stream().forEach(s -> count++); // ERROR: count is not effectively final
 
// Workaround 1: Use an AtomicInteger for a mutable counter
AtomicInteger count = new AtomicInteger(0);
list.stream().forEach(s -> count.incrementAndGet()); // OK β€” AtomicInteger itself is final
 
// Workaround 2: Use a collector (the correct approach for aggregation)
long count = list.stream().filter(s -> !s.isEmpty()).count(); // No mutation needed
 
// Workaround 3: One-element array (hack β€” avoid in production)
int[] countArr = {0};
list.stream().forEach(s -> countArr[0]++); // The array reference is final; element is mutable

The correct answer for stream contexts is almost always Workaround 2 β€” design the computation so no mutation is needed.

Follow-up: β€œWhat’s the difference between effectively final and explicitly final?” β€” final is a keyword that prevents reassignment and is checked by the compiler; β€œeffectively final” means the variable is never reassigned after initialization, so the compiler treats it as if it were final without requiring the keyword.


Q3: What is the difference between Stream.toList() and Collectors.toList()?

Context: Java 16+ question β€” tests awareness of recent API changes.

Answer:

  • Collectors.toList() (Java 8+): Returns a mutable List (implementation is ArrayList). You can add/remove elements afterward.
  • Stream.toList() (Java 16+): Returns an unmodifiable List. Calling add(), remove(), or set() on it throws UnsupportedOperationException.
  • Collectors.toUnmodifiableList() (Java 10+): Also returns an unmodifiable List β€” semantically equivalent to Stream.toList() but with more typing.
// Mutable
List<String> mutable = stream.collect(Collectors.toList());
mutable.add("new element"); // OK
 
// Unmodifiable β€” Java 16+
List<String> immutable = stream.toList();
immutable.add("new element"); // UnsupportedOperationException
 
// Choosing:
// - Use stream.toList() (Java 16+) when you don't need to mutate the result β€” default choice
// - Use Collectors.toList() when you explicitly need a mutable list
// - Use List.copyOf(stream.collect(...)) for a defensive unmodifiable copy of an existing list

Follow-up: β€œDoes Stream.toList() return a null-permitting list?” β€” Yes, it permits null elements (unlike List.of() which does not). This is a subtle but important difference.


Q4: What are the four types of method references? Give an example of each.

Context: Core Java 8 question tested at mid-to-senior level.

Answer:

TypeSyntaxExampleLambda Equivalent
Static methodClassName::staticMethodInteger::parseInts -> Integer.parseInt(s)
Bound instance methodinstance::methodSystem.out::printlnx -> System.out.println(x)
Unbound instance methodClassName::instanceMethodString::toUpperCases -> s.toUpperCase()
ConstructorClassName::newArrayList::new() -> new ArrayList<>()
// Static
Function<String, Integer> parse = Integer::parseInt;
 
// Bound instance β€” System.out is the receiver, captured at creation time
Consumer<String> print = System.out::println;
 
// Unbound instance β€” receiver is provided as first arg of the lambda
Function<String, String> upper = String::toUpperCase;
// Useful in sorted: words.stream().sorted(String::compareToIgnoreCase)
 
// Constructor
Supplier<List<String>> listFactory = ArrayList::new;
Function<Integer, int[]> arrayFactory = int[]::new;

Follow-up: β€œWhen would a lambda be preferable to a method reference?” β€” When the method name is longer than a simple lambda, or when the lambda’s parameter names provide meaningful documentation (e.g., (numerator, denominator) -> numerator / denominator is more descriptive than Math::IEEEremainder).


Q5: When should you NOT use parallel streams?

Context: Senior/FAANG interview β€” tests understanding of concurrency, JVM internals, and performance.

Answer: Avoid parallel streams when:

  1. Small collections (< ~10,000 elements for cheap operations): Parallelism overhead (thread coordination, splitting, combining) exceeds the speedup. Sequential streams or plain loops are faster.
  2. Poorly splittable sources: LinkedList, Stream.iterate(), BufferedReader.lines() cannot be split efficiently. The ForkJoin framework degrades to near-sequential performance.
  3. Shared mutable state: Any lambda that modifies a shared variable introduces race conditions. Example: adding to a non-thread-safe ArrayList in a parallel forEach.
  4. Ordered operations on unordered needs: findFirst, forEachOrdered, collect(toList()) on parallel streams impose ordering constraints that negate much of the parallelism benefit. Use findAny and unordered() where order doesn’t matter.
  5. I/O-bound operations: Parallel streams use the common ForkJoin pool (shared with other tasks). Blocking I/O in stream lambdas starves the pool. Use virtual threads (Java 21) instead.
  6. Short-circuiting operations on small streams: anyMatch, findFirst on small collections are faster sequentially because there’s less overhead.

Always benchmark with JMH before and after. The JVM’s JIT often surprises you.

Follow-up: β€œHow would you make a parallel stream safe when using a collecting operation?” β€” Use thread-safe collectors (Collectors.groupingByConcurrent for parallel grouping, or Collectors.toConcurrentMap), or use collectors that handle combining safely (most standard collectors do). The collect() terminal operation handles combining correctly by design.


Q6: What is the difference between reduce and collect in streams?

Context: Advanced stream API question.

Answer:

  • reduce is for immutable reduction β€” it combines elements by repeatedly applying a binary function to produce a single value. The result type is the same as (or related to) the element type. Best for: sum, product, max/min, concatenation.
  • collect is for mutable reduction β€” it mutates a result container (like a List, Map, or StringBuilder) by incorporating each element. Best for: building collections, maps, or any complex aggregation structure.
// reduce β€” immutable, builds a result by repeated combination
Optional<String> combined = words.stream()
    .reduce((a, b) -> a + ", " + b); // String concatenation (use joining for efficiency)
// O(nΒ²) for strings β€” each concat creates a new string!
 
// collect β€” mutable, accumulates into a container
String combined = words.stream()
    .collect(Collectors.joining(", ")); // Uses StringBuilder internally β€” O(n)
 
// reduce is appropriate for:
int sum = numbers.stream().reduce(0, Integer::sum);       // int accumulation
Optional<Integer> max = numbers.stream().reduce(Integer::max); // finding max
 
// collect is appropriate for:
List<String> filtered = words.stream()
    .filter(s -> s.length() > 3)
    .collect(Collectors.toList());
Map<Integer, List<String>> byLength = words.stream()
    .collect(Collectors.groupingBy(String::length));

Key insight: reduce requires an associative, stateless combiner. collect uses a mutable container with accumulator and combiner functions β€” designed for parallel streams to work correctly.

Follow-up: β€œIs reduce parallelizable?” β€” Yes, when the combiner is associative and the identity element is correct. reduce(0, Integer::sum) parallelizes correctly. String concatenation with reduce is associative but O(nΒ²) β€” use joining collector instead.


Q7: Explain Collectors.groupingBy with a downstream collector. Give a real-world example.

Context: Common in senior interviews β€” tests deep Stream API knowledge.

Answer: groupingBy(classifier, downstream) groups elements by a key function, then applies a downstream collector to each group. The downstream can be any collector: counting(), toList(), mapping(), summarizingInt(), etc.

// Group orders by customer, then count orders per customer
Map<Customer, Long> ordersPerCustomer = orders.stream()
    .collect(Collectors.groupingBy(Order::getCustomer, Collectors.counting()));
 
// Group by category, then collect product names only (mapping downstream)
Map<Category, List<String>> namesByCategory = products.stream()
    .collect(Collectors.groupingBy(
        Product::getCategory,
        Collectors.mapping(Product::getName, Collectors.toList())
    ));
 
// Group by department, then get average salary per department
Map<String, Double> avgSalaryByDept = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        Collectors.averagingDouble(Employee::getSalary)
    ));
 
// Multi-level grouping: by department, then by seniority
Map<String, Map<Seniority, List<Employee>>> nested = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        Collectors.groupingBy(Employee::getSeniority)
    ));
 
// With EnumMap for enum keys (efficient)
Map<Department, List<Employee>> byDept = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        () -> new EnumMap<>(Department.class),
        Collectors.toList()
    ));

Follow-up: β€œHow does partitioningBy differ from groupingBy?” β€” partitioningBy(Predicate) is a specialized form of groupingBy that always produces exactly two groups: true and false. It returns Map<Boolean, List<T>> and is slightly more efficient than groupingBy with a boolean classifier because it knows the key set in advance.


Q8: What is Collectors.teeing() and when is it useful?

Context: Advanced Java 12+ question. Tests knowledge of newer API additions.

Answer: Collectors.teeing(downstream1, downstream2, merger) (Java 12+) applies two collectors to the same stream in a single pass, then combines their results using a merger function. This is useful when you need two different aggregations over the same data without iterating twice or materializing the stream.

// Compute min and max simultaneously in a single pass
record MinMax(Optional<Integer> min, Optional<Integer> max) {}
 
MinMax result = numbers.stream()
    .collect(Collectors.teeing(
        Collectors.minBy(Comparator.naturalOrder()),
        Collectors.maxBy(Comparator.naturalOrder()),
        MinMax::new
    ));
 
// Compute count and average simultaneously
record Stats(long count, double average) {}
Stats stats = transactions.stream()
    .collect(Collectors.teeing(
        Collectors.counting(),
        Collectors.averagingDouble(Transaction::getAmount),
        Stats::new
    ));
 
// Split into two filtered lists in one pass
record Partition<T>(List<T> positives, List<T> negatives) {}
Partition<Integer> parts = numbers.stream()
    .collect(Collectors.teeing(
        Collectors.filtering(n -> n >= 0, Collectors.toList()),
        Collectors.filtering(n -> n < 0,  Collectors.toList()),
        Partition::new
    ));

teeing is most valuable when: the stream is from an I/O source (can only be traversed once), or when creating two separate streams would be more expensive. Use sparingly β€” nested collectors can reduce readability.


Key Takeaways

  • Lambdas over anonymous classes: Use lambdas for any functional interface. Keep lambdas short (≀ 3 lines); extract longer logic to named methods and use method references. Anonymous classes remain appropriate for multi-method interfaces, abstract classes, and self-referential code.
  • Method references over lambdas: When a lambda does nothing but delegate to an existing method, replace it with a method reference. The four types are static, bound instance, unbound instance, and constructor. Prefer lambdas when parameter names add documentation.
  • Standard functional interfaces: Use Predicate, Function, Supplier, Consumer, UnaryOperator, BinaryOperator (and their primitive specializations) before defining custom interfaces. Use @FunctionalInterface on any custom functional interface you do define.
  • Streams judiciously: Streams are not universally better than loops. Use them for transformation pipelines, filtering, aggregation, and searching. Use loops for adjacent-element access, multiple exit points, or when a loop is plainly more readable.
  • Side-effect-free functions: Use collectors for aggregation; use forEach only for reporting. Collectors.groupingBy, toMap, joining, counting, and teeing (Java 12+) cover almost every aggregation need.
  • Stream.toList() (Java 16+): Returns an unmodifiable list. Prefer it over Collectors.toList() when you don’t need a mutable result. Collectors.toList() returns a mutable ArrayList.
  • Return Collection not Stream: APIs returning sequences should return List, Set, or Collection to serve both for-each and stream callers. Return Stream only for lazy or unbounded sequences.
  • Parallel streams with caution: Only use parallelStream() when: data source is ArrayList/array, N > ~10,000, operation is CPU-bound and stateless, and you have benchmarked the improvement. Never use parallel streams for I/O-bound work β€” use virtual threads (Java 21) instead.
  • EnumMap + streams: Use () -> new EnumMap<>(MyEnum.class) as the map supplier in groupingBy when keys are enum constants for maximum efficiency. See ch05-enums-and-annotations.
  • Virtual threads vs. parallel streams: Parallel streams = CPU-bound parallelism via ForkJoin. Virtual threads (Java 21) = I/O-bound concurrency. These are complementary, not interchangeable.

Last Updated: 2026-05-10