On a recent project, I came across an ideal algorithm to (re-)write using Java 8's Streams API and Lambda expressions. Here is the original code, modified to protect the client:
public String extractAllToAccountsAsCSV(List<AccountPair> collectionOfPairedAccounts)
{
String result = "";
for (AccountPair pair : collectionOfPairedAccounts)
{
if (result.length() > 0)
result += ",";
result += String.valueOf(pair.getToAccount());
}
return result;
}
This simple algorithm iterates over a collection of AccountPairs, which are objects of a data structure that associates two accounts. Its goal is to produce a comma-separated String output of all values of one kind of account in the AccountPair, a CSV that the calling class will consume in some way.
The description sounds right in the wheelhouse of Java 8. It would not be a refactoring that alters the structure of the class, per se, but it does change the detailed design inside the public-facing API, makes it cleaner and more ready for parallelization.
Of course, this algorithm had no unit tests, a common refrain on projects with that organization. So our first step is to write some tests before making any changes.
For example, cover the cases of the collection containing zero, one, or many AccountPairs. Here are some jUnit tests:
@Test
public void extractAsCSVAllToAccounts_EmptyCollection_EmptyString()
{
MyClass objUnderTest = new MyClass();
String result = objUnderTest.extractAllToAccountsAsCSV(new ArrayList<AccountPair>());
assertEquals("Expected empty string from empty collection.", "", result);
}
@Test
public void extractAsCSVAllToAccounts_CollectionOfOnePair_CommalessString()
{
MyClass objUnderTest = new MyClass();
List<AccountPair> accounts = new ArrayList<AccountPair>();
accounts.add(new AccountPair(111,222));
String result = objUnderTest.extractAllToAccountsAsCSV(accounts);
assertEquals("222", result);
}
@Test
public void extractAsCSVAllToAccounts_CollectionOfManyPairs_CorrectString()
{
MyClass objUnderTest = new MyClass();
List<AccountPair> accounts = new ArrayList<AccountPair>();
accounts.add(new AccountPair(111,222));
accounts.add(new AccountPair(333,444));
accounts.add(new AccountPair(555,666));
String result = objUnderTest.extractAllToAccountsAsCSV(accounts);
assertEquals("222,444,666", result);
}
With an appropriate collection of unit tests in place, to guarantee consistent behavior, we can now convert it to the Java 8 Streams API.
First, convert the collection to a Stream<AccountPair> using the stream() method.
collectionOfPairedAccounts.stream()
Next, extract the data we are interested in by mapping this Stream to a Stream of the To Account values. Since they are integers, and we want a String result, convert it using String.valueOf()
map(pair -> String.valueOf(pair.getToAccount()))
Finally, we want a single String value as the result, so use collect() to gather all of the To Accounts into the final string. We want the result to be comma-separated so we use Collectors.joining() and specify the separator ","
collect(Collectors.joining(","))
The result:
public String extractAllToAccountsAsCSV(List<AccountPair> collectionOfPairedAccounts)
{
return collectionOfPairedAccounts.stream()
.map(pair -> String.valueOf(pair.getToAccount()))
.collect(Collectors.joining(","));
}
When we rerun the unit tests, to ensure that this conversion to Java 8 did not break its behavior, we see green; all tests continue to pass.
Now, the resulting method is written in a way very similar to how the original algorithm defined the problem: iterate became stream, getToAccount() and its String conversion are right in the middle, the if-check and String operations became collect().
Some questions come to mind, ones that we will leave for another day:
(1) The original algorithm was a single pass. Does stream(), map() and collect() use just a single pass? How do these functions work together? It's still O(n), and I know that in the case of this project, the collection will be on the order of hundreds, not millions, in size. But still, how do the two implementations compare?
(2) The new algorithm is more concise and at a higher level of abstraction than the original; is it the minimal implementation? or can it be reduced even more? Again, an exercise for another day.
public String extractAllToAccountsAsCSV(List<AccountPair> collectionOfPairedAccounts)
{
String result = "";
for (AccountPair pair : collectionOfPairedAccounts)
{
if (result.length() > 0)
result += ",";
result += String.valueOf(pair.getToAccount());
}
return result;
}
This simple algorithm iterates over a collection of AccountPairs, which are objects of a data structure that associates two accounts. Its goal is to produce a comma-separated String output of all values of one kind of account in the AccountPair, a CSV that the calling class will consume in some way.
The description sounds right in the wheelhouse of Java 8. It would not be a refactoring that alters the structure of the class, per se, but it does change the detailed design inside the public-facing API, makes it cleaner and more ready for parallelization.
Of course, this algorithm had no unit tests, a common refrain on projects with that organization. So our first step is to write some tests before making any changes.
For example, cover the cases of the collection containing zero, one, or many AccountPairs. Here are some jUnit tests:
@Test
public void extractAsCSVAllToAccounts_EmptyCollection_EmptyString()
{
MyClass objUnderTest = new MyClass();
String result = objUnderTest.extractAllToAccountsAsCSV(new ArrayList<AccountPair>());
assertEquals("Expected empty string from empty collection.", "", result);
}
@Test
public void extractAsCSVAllToAccounts_CollectionOfOnePair_CommalessString()
{
MyClass objUnderTest = new MyClass();
List<AccountPair> accounts = new ArrayList<AccountPair>();
accounts.add(new AccountPair(111,222));
String result = objUnderTest.extractAllToAccountsAsCSV(accounts);
assertEquals("222", result);
}
@Test
public void extractAsCSVAllToAccounts_CollectionOfManyPairs_CorrectString()
{
MyClass objUnderTest = new MyClass();
List<AccountPair> accounts = new ArrayList<AccountPair>();
accounts.add(new AccountPair(111,222));
accounts.add(new AccountPair(333,444));
accounts.add(new AccountPair(555,666));
String result = objUnderTest.extractAllToAccountsAsCSV(accounts);
assertEquals("222,444,666", result);
}
With an appropriate collection of unit tests in place, to guarantee consistent behavior, we can now convert it to the Java 8 Streams API.
First, convert the collection to a Stream<AccountPair> using the stream() method.
collectionOfPairedAccounts.stream()
Next, extract the data we are interested in by mapping this Stream to a Stream of the To Account values. Since they are integers, and we want a String result, convert it using String.valueOf()
map(pair -> String.valueOf(pair.getToAccount()))
Finally, we want a single String value as the result, so use collect() to gather all of the To Accounts into the final string. We want the result to be comma-separated so we use Collectors.joining() and specify the separator ","
collect(Collectors.joining(","))
The result:
public String extractAllToAccountsAsCSV(List<AccountPair> collectionOfPairedAccounts)
{
return collectionOfPairedAccounts.stream()
.map(pair -> String.valueOf(pair.getToAccount()))
.collect(Collectors.joining(","));
}
When we rerun the unit tests, to ensure that this conversion to Java 8 did not break its behavior, we see green; all tests continue to pass.
Now, the resulting method is written in a way very similar to how the original algorithm defined the problem: iterate became stream, getToAccount() and its String conversion are right in the middle, the if-check and String operations became collect().
Some questions come to mind, ones that we will leave for another day:
(1) The original algorithm was a single pass. Does stream(), map() and collect() use just a single pass? How do these functions work together? It's still O(n), and I know that in the case of this project, the collection will be on the order of hundreds, not millions, in size. But still, how do the two implementations compare?
(2) The new algorithm is more concise and at a higher level of abstraction than the original; is it the minimal implementation? or can it be reduced even more? Again, an exercise for another day.