Java 8 collectors
Java 8 has been released quite a long time, since I come to ruby in 2011, I haven’t work with java. Even I heard that there’re few cool features come out with Java 8, like lambda
, stream collection
, functional interface
, new date api
. none of them can attract me, given ruby ship all these features since the date of birth.
While recently I do try to solve a problem with Java, and I found that Java has changed a lot compared with my impression on it. in this post, I’m going to talk about the collectors shipped with Java 8. also I’ll try to give collector example in scala.
The Problem
Reduction, (aka. iterating on a collection, apply reduction computation on each elements, and produce a single result or a smaller collection) is a common problem in any programming language.
Let’s look at a specific example:
Given an collection of employees, grouping these employees by age produce a map between age and list of employees.
Here is the class definition of Employee:
The Solution
a simple implementation could be:
if you have been working with Java for quite a long time, you may be sick to write these code. you must have write code in this structure for quite a long time. to demonstrate the duplication of this structure, let’s rewrite the above code to this format:
all these code did is to collect some information for give collection and apply reduction on the items in this collection and produce a result container.
with Java 8s collector interface, you can simply do
so what is the magic behind it:
The magic is behind the Collector<T, A, R>
interface:
Collectors.groupingBy
is a built in collector which acceptting a function with type T -> K
which can be group against(in this case, employee.age()
).
it will produce a result with type Map<K, List<T>>
(aka, the result type R
).
Here is the official definition of Collector from its api document:
A mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed. Reduction operations can be performed either sequentially or in parallel.
You see from the document, Collector take three type parameters T
, A
and R
, where T
is the type of element inside the collection, A
is an intermediate type which could be used to do the mutable reduction, R
is the type of result.
There four functions in this interface which work together to accumulate entries into a mutable result container.
- supplier(), with type
() -> A
- creation of a new result container. - accumulator(), with type
(A, T) -> A
- incorprating a new element into the result container. - combiner(), with type
(A, A) -> A
- combing two result container into one. - finisher(), with type
A -> R
- a optional final transformation on the result container to get the result. the optional means that in some scenarios,A
andR
could be same, so thefinisher
function is not required. but in some other cases, whenA
andR
are different, this function is required to get the final result.
In the previous example, the type of result of Collector.groupingBy
is Collector<Employee, ?, Map<Integer, List<Employee>>
.
let’s extend this problem a little bit: how about grouping employees by age range(e.g. 20-29 as a group, 30-39 as a group) this time, you can not find any buitin collector which is suitable to solve this problem, now, you will need a customised collector to do the reduction.
(this blog post)[http://www.nurkiewicz.com/2014/07/introduction-to-writing-custom.html] is a fairly good guide for how to create you own Collector implementation.
Collector in Scala
After found this useful pattern, I wonder if scala’s powerful collection system support this computation. Unfortunately, I can not found a similar api from any collection type. But I do found that we can easily build our own version of collector based on scala.collection.mutable.Builder
.
scala.collection.mutable.Builder
play the same role with accumulator
(the A
) in java Collector. Let’s see the following example of how we implement the collect
method in scala and how we use it to solve the word count problem:
and here is the code to use the CounterBuilder
Conclusion
- Java 8s
Collector Api
provide a better way to encapsulate reduction computation - not only some built in reduction(e.g. max, min, sum, average), but also customized reduction(via customised collector), Collector is designed to be composed, which means these reduction logic are much easier to be reused. - Scala dose not have native support for customised mutable reduction, but based on scala’s powerfull collection system, we can create our own version.