Scala Collections: A Group of groupBy() Examples

Scala provides a rich Collections API. Let's look at the useful groupBy() function.

What does groupBy() do? It takes a collection, assesses each item in that collection against a discriminator function, and returns a Map data structure. Each key in the returned map is a distinct result of the discriminator function, and the key's corresponding value is another collection which contains all elements of the original one that evaluate the same way against the discriminator function.

So, for example, here is a collection of Strings:
val sports = Seq("baseball", "ice hockey", "football", "basketball", "110m hurdles", "field hockey")

Running it through the Scala interpreter produces this output showing our value's definition:
sports: Seq[String] = List(baseball, ice hockey, football, basketball, 110m hurdles, field hockey)

We can group those sports names by, say, their first letter. To do so, we need a discriminator function that takes each element and returns the first character. For example:

sports.groupBy(_.charAt(0))

Running that in the interpreter shows the result:

res0: scala.collection.immutable.Map[Char,Seq[String]] = Map(b -> List(baseball, basketball), 1 -> List(110m hurdles), i -> List(ice hockey), f -> List(football, field hockey))

As you can see, the result is a Map with four key-value pairs. The keys are the letters b,i,f and the digit 1. All of the sports names that begin with "b" are grouped into a new List, and so on for the other sports.

In the above case, the discriminator function produced a key that was of type Char, the character in the 0th position of each String. Here's another example, one that produces a Boolean type for the keys in the Map:

sports.groupBy(_.contains("ball"))

In the above, contains() is a function that will return a true if "ball" is in the name of the sport and false otherwise. We would expect at most two entries in our Map, one with true as the key, one with false as the key. When we check it in the interpreter, we get:

res1: scala.collection.immutable.Map[Boolean,Seq[String]] = Map(false -> List(ice hockey, 110m hurdles, field hockey), true -> List(baseball, football, basketball))

In this case, groupBy() has partitioned the original collection into two new collections, mapped to true for the List(baseball, football, basketball) and to false for the non-ball sports, List(ice hockey, 110m hurdles, field hockey).

Let's switch to numeric values instead of Chars, Strings and Booleans. The groupBy() principles are the same. Here is a new collection of Integers:

val s1 = List(1,3,5,7,9)

If our code needs to treat each element in the collection differently, depending on its remainder when divided by 3, we'd write the following:

s1.groupBy(_ % 3)

res2: scala.collection.immutable.Map[Int,List[Int]] = Map(2 -> List(5), 1 -> List(1, 7), 0 -> List(3, 9))

The Map produced by groupBy() has three pairs. The key = 0 is the collection of all elements that are evenly divisible by 3. Notice that the value mapped to key = 2 is still a List, still a collection, even though it has only one element in it.

We have seen groupBy() with Strings and Ints in the collections, and producing keys that can be Int, Char, even Boolean. Isn't the groupBy() function flexible? And powerful?
More formally, according to the Scaladocs API, it has this signature:
def groupBy[K](f: (A) ⇒ K): immutable.Map[K, Seq[A]]

In that formal definition, K is the Type of the keys in the map, as produced by the discriminator function; f is the function that will determine into which collection the items of the original collection will be placed; and the return type is a Map (an immutable Map) with keys of type K and values collections.

Let's end with a couple more interesting examples of groupBy(). To date, our examples have used some pretty primitive data types. So let's define a more interesting type, and create a collection of objects of this type.

class Point(val x: Int, val y: Int)
val sp = Seq(new Point(1,1), new Point(1,2), new Point(2,2), new Point(2,1))

The resulting value definition looks like this (I have edited out some of the gory details to improve readability), showing that I have a collection of four Point objects:

sp: Seq[Point] = List(Point@1b80cb0, Point@13002da, Point@34f910, Point@120344c)

Now we can group them by the value of one of the members of the object. In this case, the discriminator function simply names the member. Let's group the Points by their x value, to partition the collection by their location on the x axis:

sp.groupBy(_.x)

The result is a Map with two key-value pairs, partitioning the original collection into the Points with x = 1 and those with x = 2:

res3: scala.collection.immutable.Map[Int,Seq[Point]] = Map(2 -> List(Point@34f910, Point@120344c), 1 -> List(Point@1b80cb0, $Point@13002da))

One final example. So far, all our discriminator functions have been pretty simple, so let's do something a little more interesting. Let's group our original sports collection into sports that use balls, sports that use hockey sticks, and a catch-all group of other sports. One way to do so is to create a discriminator function that does a little pattern-matching:

sports.groupBy {
  case sport if sport.contains("ball") => "Balls"
  case sport if sport.contains("hockey") => "Sticks"
  case _ => "Other"
}

We expect the three ball sports in the original collection to be mapped to the key "Balls", the two hockey sports to be mapped to the key "Sticks" and everything else will map to the key "Other". And that is exactly what groupBy() gives us:

res4: scala.collection.immutable.Map[String,Seq[String]] = Map(Sticks -> List(ice hockey, field hockey), Balls -> List(baseball, football, basketball), Other -> List(110m hurdles))

SJGP Software

Search This Blog

Scala Collections: A Group of groupBy() Examples

Labels

Popular posts from this blog

How to do Git Rebase in Eclipse

Git Reset in Eclipse

Trigger Windows Scheduled Task from Remote Computer via Jenkins

Abort a Git Merge or Cherry-Pick

Updating Oracle javapath symlinks on Windows