Skip to main content

Using Java 8 to Refactor an Iteration over a Collection

On a recent project, I came across an ideal algorithm to (re-)write using Java 8's Streams API and Lambda expressions. Here is the original code, modified to protect the client:

public String extractAllToAccountsAsCSV(List<AccountPair> collectionOfPairedAccounts)
{
String result = "";
for (AccountPair pair : collectionOfPairedAccounts)
{
if (result.length() > 0)
result += ",";
result += String.valueOf(pair.getToAccount());
}
return result;
}

This simple algorithm iterates over a collection of AccountPairs, which are objects of a data structure that associates two accounts. Its goal is to produce a comma-separated String output of all values of one kind of account in the AccountPair, a CSV that the calling class will consume in some way.

The description sounds right in the wheelhouse of Java 8. It would not be a refactoring that alters the structure of the class, per se, but it does change the detailed design inside the public-facing API, makes it cleaner and more ready for parallelization.


Of course, this algorithm had no unit tests, a common refrain on projects with that organization. So our first step is to write some tests before making any changes.

For example, cover the cases of the collection containing zero, one, or many AccountPairs. Here are some jUnit tests:
@Test
public void extractAsCSVAllToAccounts_EmptyCollection_EmptyString()
{
MyClass objUnderTest = new MyClass();
String result = objUnderTest.extractAllToAccountsAsCSV(new ArrayList<AccountPair>());
assertEquals("Expected empty string from empty collection.", "", result);
}

@Test
public void extractAsCSVAllToAccounts_CollectionOfOnePair_CommalessString()
{
MyClass objUnderTest = new MyClass();
List<AccountPair> accounts = new ArrayList<AccountPair>();
accounts.add(new AccountPair(111,222));
String result = objUnderTest.extractAllToAccountsAsCSV(accounts);
assertEquals("222", result);
}

@Test
public void extractAsCSVAllToAccounts_CollectionOfManyPairs_CorrectString()
{
MyClass objUnderTest = new MyClass();
List<AccountPair> accounts = new ArrayList<AccountPair>();
accounts.add(new AccountPair(111,222));
accounts.add(new AccountPair(333,444));
accounts.add(new AccountPair(555,666));
String result = objUnderTest.extractAllToAccountsAsCSV(accounts);
assertEquals("222,444,666", result);
}

With an appropriate collection of unit tests in place, to guarantee consistent behavior, we can now convert it to the Java 8 Streams API.

First, convert the collection to a Stream<AccountPair> using the stream() method.
collectionOfPairedAccounts.stream()

Next, extract the data we are interested in by mapping this Stream to a Stream of the To Account values. Since they are integers, and we want a String result, convert it using String.valueOf()
map(pair -> String.valueOf(pair.getToAccount()))

Finally, we want a single String value as the result, so use collect() to gather all of the To Accounts into the final string. We want the result to be comma-separated so we use Collectors.joining() and specify the separator ","
collect(Collectors.joining(","))

The result:
public String extractAllToAccountsAsCSV(List<AccountPair> collectionOfPairedAccounts)
{
return collectionOfPairedAccounts.stream()
.map(pair -> String.valueOf(pair.getToAccount()))
.collect(Collectors.joining(","));
}

When we rerun the unit tests, to ensure that this conversion to Java 8 did not break its behavior, we see green; all tests continue to pass.

Now, the resulting method is written in a way very similar to how the original algorithm defined the problem: iterate became stream, getToAccount() and its String conversion are right in the middle, the if-check and String operations became collect().

Some questions come to mind, ones that we will leave for another day:
(1) The original algorithm was a single pass. Does stream(), map() and collect() use just a single pass? How do these functions work together? It's still O(n), and I know that in the case of this project, the collection will be on the order of hundreds, not millions, in size. But still, how do the two implementations compare?

(2) The new algorithm is more concise and at a higher level of abstraction than the original; is it the minimal implementation? or can it be reduced even more? Again, an exercise for another day.

Popular posts from this blog

How to do Git Rebase in Eclipse

This is an abbreviated version of a fuller post about Git Rebase in Eclipse. See the longer one here : One side-effect of merging Git branches is that it leaves a Merge commit. This can create a history view something like: The clutter of parallel lines shows the life spans of those local branches, and extra commits (nine in the above screen-shot, marked by the green arrows icon). Check out this extreme-case history:  http://agentdero.cachefly.net/unethicalblogger.com/images/branch_madness.jpeg Merge Commits show all the gory details of how the code base evolved. For some teams, that’s what they want or need, all the time. Others may find it unnecessarily long and cluttered. They prefer the history to tell the bigger story, and not dwell on tiny details like every trivial Merge-commit. Git Rebase offers us 2 benefits over Git Merge: First, Rebase allows us to clean up a set of local commits before pushing them to the shared, central repository. For ...

Git Reset in Eclipse

Using Git and the Eclipse IDE, you have a series of commits in your branch history, but need to back up to an earlier version. The Git Reset feature is a powerful tool with just a whiff of danger, and is accessible with just a couple clicks in Eclipse. In Eclipse, switch to the History view. In my example it shows a series of 3 changes, 3 separate committed versions of the Person file. After commit 6d5ef3e, the HEAD (shown), Index, and Working Directory all have the same version, Person 3.0.

Scala Collections: A Group of groupBy() Examples

Scala provides a rich Collections API. Let's look at the useful groupBy() function. What does groupBy() do? It takes a collection, assesses each item in that collection against a discriminator function, and returns a Map data structure. Each key in the returned map is a distinct result of the discriminator function, and the key's corresponding value is another collection which contains all elements of the original one that evaluate the same way against the discriminator function. So, for example, here is a collection of Strings: val sports = Seq ("baseball", "ice hockey", "football", "basketball", "110m hurdles", "field hockey") Running it through the Scala interpreter produces this output showing our value's definition: sports: Seq[String] = List(baseball, ice hockey, football, basketball, 110m hurdles, field hockey) We can group those sports names by, say, their first letter. To do so, we need a disc...

Updating Oracle javapath symlinks on Windows

A Java-based application on my Windows 10 machine recently started prompting me to upgrade my version of Java. Since I wanted to control it myself, I declined the app's offer to upgrade for me, and downloaded and installed the latest Java 8 from Oracle. In my case, Java 1.8.0_171, 64-bit version. The upgrade went fine. But when I launched the app, it again said I needed to upgrade. Why was it still looking at the old location? I made the change using Settings, to change the JAVA_HOME environment variable to point to the location of the new upgrade. But no change, the app still insisted that I needed to upgrade. A little research into the app's execution path showed that it was using c:\ProgramData\Oracle\Java\javapath to find Java. When I looked in that folder, I found symbolic links to my old Java installation. Normally, this hidden bit of information gets updated automatically in the upgrade or installation process. I have read of cases where, when downg...

Code Coverage in C#.NET Unit Tests - Setting up OpenCover

The purpose of this post is to be a brain-dump for how we set up and used OpenCover and ReportGenerator command-line tools for code coverage analysis and reporting in our projects. The documentation made some assumptions that took some digging to fully understand, so to save my (and maybe others') time and effort in the future, here are my notes. Our project, which I will call CEP for short, includes a handful of sub-projects within the same solution. They are a mix of Web APIs, ASP MVC applications and Class libraries. For Unit Tests, we chose to write them using the MSTest framework, along with the Moq mocking framework. As the various sub-projects evolved, we needed to know more about the coverage of our automated tests. What classes, methods and instructions had tests exercising them, and what ones did not? Code Coverage tools are conveniently built-in for Visual Studio 2017 Enterprise Edition, but not for our Professional Edition installations. Much less for any Commun...