Java with Streams

The addition of the Stream was one of the major features added to Java 8. Streams are lazy-sequential data pipeline of functional blocks. It is n’t implemented as a data structure or by changing its elements directly.

Streams, just a dumb pipe providing the scaffolding to operate on, making it really a smart pipe.

Overview

Why are streams valuable to learn about ?

The basic concept behind streams is very simple:- We got a data source, perform zero or more intermediate operations, and then process to get a result.

More elaborately, the parts of a stream can be separated into mainly three steps:

  • Obtaining the stream (source)
  • Doing tasks(intermediate operations)
  • Getting a result (terminal operation)

Obtaining the stream

The first step is obtaining a stream. Many data structures of the JDK already support providing a stream:

Or we can create one by using java.util.stream.Stream#of(T... values) with our values. The class java.util.StreamSupport also provides multiple static methods for creating streams.

Doing tasks

The java.util.Stream interface provides a lot of different operations.

Filtering

Mapping

Sizing/sorting

Debugging

Getting a result

Performing operations on the stream elements is great. But at some point, we want to get a result back from our data pipeline.Terminal operations are initiating the lazy pipeline to do the actual work and don’t return a new stream.

Aggregate to new collection/array

Reduce to a single value

Calculations

Matching

Finding

Consuming

Stream Characteristics

Streams aren’t just glorified loops. Sure, we can express any stream with a loop and most loops with streams. But this doesn’t mean they’re equal or one is always better than the other.

Laziness

The most significant advantage of streams over loops is laziness. Until we call a terminal operation on a stream, no work is done. We can build up our processing pipeline over time and only run it at the exact time we want it to.

And not just the building of the pipeline is lazy. Most intermediate operations are lazy, too. Elements are only consumed as they’re needed.

Optimizations included

Thanks to being (mostly) stateless, streams can optimize themselves quite efficiently. Stateless intermediate operations can be fused together to a combined consumer. Redundant operations might be removed. And some pipeline paths might be short-circuited.

The JVM will optimize traditional loops, too. But streams are an easier target due to their multi-operation design and are mostly statelessness.

Stateless

The most important characteristics of functional programming is the immutable state. Most intermediate operations are stateless, except for distinct(), sorted(...), limit(...), and skip(...).

We should always strive to design them to be stateless, even though Java allows the building of stateful lambdas. Any state can have severe impacts on safety and performance and might introduce unintended side effects.

Less boilerplate

Streams are often easier to read and comprehend. Below I am adding a simple processing example with a for loop:

Now equivalent in streams is :

Now it results in a shorter code block, crisper view, in more readable format with no loop boilerplate, and no extra temporary variables. All are packaged in fluent API. This way, our code reflects the what, and we no longer need to care about the actual iteration process, the how.

Non Reusable

This is one of the thing of Streams that being just a dumb pipeline, streams can’t be reused. But they don’t change the original data source — but we can always create another stream/collection from the source.

Easy parallelization

In software ecosystem, concurrency is hard to do right and easy to do wrong. Streams also support parallel execution (forkJoin) and remove much of the overhead if we’re doing it ourselves.A stream can be parallelized by calling the intermediate operation parallel() and turned back to sequential by calling sequential(). But not every stream pipeline is a good match for parallel processing.

The source must be big enough and the operations costly enough to justify the overhead of multiple threads. Context switches are expensive. We shouldn’t parallelize a stream just because we can.

Primitive handling

Just like with functional interfaces, streams have specialized classes for dealing with primitives to avoid autoboxing/unboxing:

Best Practices and Caveats

Small operations

Now that we know, Lambdas can be simple one-liners or huge code blocks if wrapped in curly braces. To retain the simplicity and conciseness, we should restrict ourselves to these two use cases for operations:

  • One-line expressions — e.g., .filter(employee -> employee.getAge() > 18)
  • Method references — e.g., filter(this::myFilterCriteria)

By using method references, we can have more complex operations, reuse operational logic, and even unit test it more easily.

Cast and type checks

Don’t forget that Class<T> is an object too, providing many helpful methods:

Method References

There are some implications of Method Reference on the bytecode level as simplicity and conciseness are also affected by using method references.

A lambda might be translated into an anonymous class calling the body, creating more code than needed.The bytecode between a lambda and a method reference differs slightly with the method reference generating less.Also, by using method references, we lose the visual noise of the lambda:

Return or Null Check

Intermediate operations should either return a value or handle null in the next operation.

Adding a simple .filter(Objects::nonNull) might be enough to ensure no null pointers exaceptions.

Code format

By putting each pipeline step into a new line, we can improve readability:

It also allows us to set breakpoints at the correct pipeline step.

Not all Iteration is a stream ~~~~

Often a traditional loop might be a better choice than using forEach(...) on a stream.As written before, we shouldn’t replace every loop. Just because it iterates, doesn’t make it a valid target for stream-based processing.

Effectively final

We can access variables outside of intermediate operations, as long as they are in scope and effectively final. This means it’s not allowed to change after initialization but doesn’t need an explicit final modifier.

Sometimes this restriction seems cumbersome, and we can change the state of effectively final objects, as long as the variable is final. But doing so undermines the concept of immutability and introduces unintended side effects.

Checked exceptions

Streams and Exceptions are a subject that warrants their own article(s), but I’ll try to summarize it.This code won’t compile:

The method Class.forName(String className) throws a checked exception, ClassNotFoundException and requires a try-catch, making the code very worse to read:

By refactoring the className conversion to a dedicated method, we can retail the simplicity of the stream:

We still need to handle possible null values, but the checked exception isn’t visible in the stream code.

Another solution for dealing with checked exceptions is wrapping the intermediate operations in consumers/functions etc. that catch the checked exceptions and re-throwing them as unchecked. But, in my opinion, that’s more like an ugly hack than a valid solution.If an operation throws a checked exception, we should refactor it to a method and handle its exception accordingly.

Unchecked exceptions

After we handle all checked exceptions, there are chances our streams can still blow up, thanks to unchecked exceptions.

There’s not a one-size-fits-all solution for preventing exceptions, just as there’s not in any other code. Developer discipline can greatly reduce the risk. Use small, well-defined operations with enough checks and validation. This way we can at least minimize the risk.

Debugging

Streams can be debugged as any other fluent call. If we have a single operation in a line, a break point will stop accordingly. But the creation of anonymous classes for lambdas can result in a really confusing stack trace.

During development, we could also utilize the intermediate operation peek(Consumer<? super T> action) to intercept an element. The operation is mainly for debugging purposes and shouldn't be used in the stream's final form. IntelliJ also provides a visual debugger.

Order of operations

Think of a simple stream:

This code will run map five times, sorted eight times, filter five times, and forEach two times. This means a total of 20 operations to output two values.

If we reorder the pipeline parts, we can reduce the total operations count significantly without changing the actual outcome:

By filtering first, we’re going to restrict the other operations to a minimum: filter five times, map two times, sort one time, and forEach two times, which saves us 10 operations in total.

Summary

In this blog we have gone the features of Java 8 Stream APIs and some best practices and caveats in them. Let me know if you have any feedback and I would be happy to incorporate them. Happy Learning !!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Praveen G. Nair

Praveen G. Nair

80 Followers

I am a Software Developer and a Technologist. Interested in all cool stuffs of software development, Machine Learning and Cloud. https://praveeng-nair.web.app/