Testing in the Trenches (TinT) is an occasional series recounting some of the experiences I have had as a Unit Testing evangelist on various projects. Where appropriate, any references to actual details have been sanitized to protect the participants. This post is adapted from a regular email that I sent to the team, to promote and educate about testing. I secretly called these my Unit Test Propaganda Messages.
For a while now, I have been promoting among the team the idea of automated testing, especially Unit Testing, as a means to improve both our lives as developers and our software for our clients.
Recently, I was listening to a Software Engineering Radio podcast, and the speaker mentioned seven different kinds of automated testing that they do in his organization. The speaker was Mike Barker, talking about his work on the LMAX architecture, and the seven kinds of automated testing he named (around the 50-minute mark) were:
It got me thinking about our own environment, practices, and needs. What kinds of testing do we do? What are the benefits of automating them or not? Some of these are the responsibility of developers, others may be more for the QA team. Sometimes there is confusion about what each kind of test is and does, so let's unpack Barker's list. Since he did not go into detail about the definitions of the different types, I will add some context.
And of course other lists of essential kinds of automated tests will include different types in the list, such as GUI Tests, Regression Tests, Input/Output Tests, etc. We will leave them for another day.
To help understand the differences, let's use this metaphor: imagine the system we are building is the human body. We have to assemble all of our code to create the internal structures and sub-systems, and wrap the whole together into the publicly-visible skin, hair, etc.
Unit Testing - this is the kind of test that I promote most vigorously in our team. These are developer-written and developer-maintained tests that validate the functionality at the code level. They test small units of code - a method, a function, a component, a process, object-level behavior and state. They are so-called White-Box tests, meaning the developer can use their knowledge of the inner workings to shape their tests.
Unit Tests do add a little time to the programming task, but they save so much time in so many other ways, which speeds up the overall development process:
A good unit-test suite gives developers confidence that their changes work and that they do not break or change other behavior, and because they are fast and targeted, they should be run by the developers many times a day.
We use jUnit as the framework to build and run our automated unit tests. It is not the only unit-testing framework, but is ideally suited to the task and is widely used in the industry.
In our human-body metaphor, unit tests would verify all of the little activities, such as: does the index finger curl up when you contract the outer two knuckles? what are the limits of the range of motion of the thumb? does the wrist move according to all expected degrees of freedom?
Integration Testing - these tests gather several units and test that they function together through their interfaces as expected and designed. They usually skip the user-interface portion, and test directly the integration of the components that do the work requested by the user. In our code base, too often the different tasks and units are so tangled together that it can be hard to isolate a unit to test it by itself. Some blocks of code depend too closely on the database or the GUI or other classes. As a result, we often quake fearfully in the face of the effort to create unit tests, throw up our hands, and move to a higher level of testing.
Now here is where the lines start to blur. Where I learned about testing, we used terms like Functional Tests to mean more or less Integration tests. More, actually, as their goal was more than just the integration of the components, but the end result of their collaboration.
In our human-body metaphor, integration / functional tests would verify that the fingers and thumb can work together to hold a pencil; or that lungs, jaw, tongue and lips coordinate their activity to form a certain sound.
Acceptance Testing - these tests are based on the Requirements specified for a given sub-system. They are the checks that an end-user does, to sign-off that the fix or feature is what they requested or paid for. Since developers are notorious for interpreting the requirements through our techno-centric lens, these tests double-check that the user is getting what they think, want or need.
I have been on past projects where automated Acceptance Tests were part of the Developer's responsibility, to prove that the requirements were met, although there were usually other people who did further verification before letting the system out the door into the wild.
Also blurring the lines are what some call System Tests. System tests use a full running version of the system, including the UI (unlike Integration or Unit Tests). They may overlap Acceptance tests, but may or may not be tied as tightly to the specified requirements. Or they may test some of the more technical requirements of the system, that do not show up as user-level requirements.
The original team for whom I wrote this material was scheduled to move over the final half of the year toward using Selenium and WebDriver to drive automated testing of the Web side of the system. The QA team and developers will work together to define, create and maintain these tests, which will be somewhere in the spectrum of Functional / Integration / Acceptance tests, exactly where still to be determined.
In our human-body metaphor, Acceptance tests would verify that the assembled body with all of its moving parts can work together to throw a baseball; or stand up from a sitting position.
Performance Testing (both End-to-End and Micro-Benchmarks) - these tests verify that the system meets its performance requirements. Automating them lets us monitor the state of our system easily, with less manual effort, and more regularity. However, setting up and automating a Performance test is much more complex than smaller, simpler unit tests.
An End-to-End one might measure a complete process, such as our current investigation into the speed of the weekly newsletter emails in the upcoming release.
A Micro-benchmark might measure performance against expectations for smaller sub-systems, such as the template processing part of the newsletter emails, or the time to display a data-intensive screen.
In our human-body metaphor, an end-to-end performance test might be that the baseball-throwing from Acceptance testing can be thrown with a speed of at least 80 mph. A Micro-benchmark performance test might be testing that the hand can scoop up and hold a minimum number of peanuts - smaller in scale than an end-to-end test, but meeting or exceeding some defined criteria.
Static Analysis - these are not tests in the same way as the other categories, in that they are not running the code against some requirement. Rather, these are automated runs of code review tools. They can check against team coding standards, or against industry best-practices. Tools such as PMD and CPD do things like scan for empty try/catch/finally blocks or switch statements (possible bugs), unused variables and methods, over-complicated sections, excessively long classes or methods, duplicated code, and more. They can automatically run overnight on the day's changes, and identify places that could be improved, corrected or refactored.
For a lark, I ran PMD to find dead code and CPD to find duplicated code in one core package of our team's code base. The tools found literally hundreds of cases of dead code, unused variables, and copy-and-pasted code in that one package. These so-called "code smells" are places where our system is unnecessarily hard to read, understand and maintain, and are places where bugs could slip in, if they do not exist already.
Static Analysis is a little harder to fit into our human-body metaphor. It looks at the code that constitutes the system, the materials we use in building it. The nails and board sizes if our system were a house. In our human body system, that might mean verifying that the Finger Nail component meets our system's standards of length, growth rate, position, and does not borrow excessively from the cell structures of other sub-systems like cells in the tongue.
Database Tests - Barker does not say anything in the podcast about his automated database tests. Too bad, because I would love to learn more. In our current projects, we do some query analysis toward improving the efficiency of some slower parts of the system, but they are generally manual one-off investigations based on customer complaints. The prospect of automating this process or some database validation checks is intriguing.
Since I can only speculate what Barker meant, and we do not have anything that I would consider an automated Database test, I am not sure how to map it to my human body metaphor. Possibly that sensory input from the nose gets stored in the correct format and location of the brain.
In conclusion, we currently automate Unit tests, have played around with automating Performance tests, and are working toward something closer to Acceptance tests. But using Mr. Barker's inspiring list, there is lots of room for growing our automated testing, with the goals of increasing reliability, performance, features and reducing bugs in our application.
For a while now, I have been promoting among the team the idea of automated testing, especially Unit Testing, as a means to improve both our lives as developers and our software for our clients.
Recently, I was listening to a Software Engineering Radio podcast, and the speaker mentioned seven different kinds of automated testing that they do in his organization. The speaker was Mike Barker, talking about his work on the LMAX architecture, and the seven kinds of automated testing he named (around the 50-minute mark) were:
- Unit Tests,
- Integration Tests,
- Acceptance Tests,
- Performance Tests (End-to-End),
- Performance Tests (Micro-Benchmarks),
- Static Analysis, and
- Database Tests.
It got me thinking about our own environment, practices, and needs. What kinds of testing do we do? What are the benefits of automating them or not? Some of these are the responsibility of developers, others may be more for the QA team. Sometimes there is confusion about what each kind of test is and does, so let's unpack Barker's list. Since he did not go into detail about the definitions of the different types, I will add some context.
And of course other lists of essential kinds of automated tests will include different types in the list, such as GUI Tests, Regression Tests, Input/Output Tests, etc. We will leave them for another day.
Unit Testing - this is the kind of test that I promote most vigorously in our team. These are developer-written and developer-maintained tests that validate the functionality at the code level. They test small units of code - a method, a function, a component, a process, object-level behavior and state. They are so-called White-Box tests, meaning the developer can use their knowledge of the inner workings to shape their tests.
Unit Tests do add a little time to the programming task, but they save so much time in so many other ways, which speeds up the overall development process:
- they give quick feedback if your code is not working as expected;
- they create localized regression tests to guarantee against broken or changed functionality;
- they document existing and expected behavior;
- they provide confidence when refactoring toward a better design.
A good unit-test suite gives developers confidence that their changes work and that they do not break or change other behavior, and because they are fast and targeted, they should be run by the developers many times a day.
We use jUnit as the framework to build and run our automated unit tests. It is not the only unit-testing framework, but is ideally suited to the task and is widely used in the industry.
In our human-body metaphor, unit tests would verify all of the little activities, such as: does the index finger curl up when you contract the outer two knuckles? what are the limits of the range of motion of the thumb? does the wrist move according to all expected degrees of freedom?
Integration Testing - these tests gather several units and test that they function together through their interfaces as expected and designed. They usually skip the user-interface portion, and test directly the integration of the components that do the work requested by the user. In our code base, too often the different tasks and units are so tangled together that it can be hard to isolate a unit to test it by itself. Some blocks of code depend too closely on the database or the GUI or other classes. As a result, we often quake fearfully in the face of the effort to create unit tests, throw up our hands, and move to a higher level of testing.
Now here is where the lines start to blur. Where I learned about testing, we used terms like Functional Tests to mean more or less Integration tests. More, actually, as their goal was more than just the integration of the components, but the end result of their collaboration.
In our human-body metaphor, integration / functional tests would verify that the fingers and thumb can work together to hold a pencil; or that lungs, jaw, tongue and lips coordinate their activity to form a certain sound.
Acceptance Testing - these tests are based on the Requirements specified for a given sub-system. They are the checks that an end-user does, to sign-off that the fix or feature is what they requested or paid for. Since developers are notorious for interpreting the requirements through our techno-centric lens, these tests double-check that the user is getting what they think, want or need.
I have been on past projects where automated Acceptance Tests were part of the Developer's responsibility, to prove that the requirements were met, although there were usually other people who did further verification before letting the system out the door into the wild.
Also blurring the lines are what some call System Tests. System tests use a full running version of the system, including the UI (unlike Integration or Unit Tests). They may overlap Acceptance tests, but may or may not be tied as tightly to the specified requirements. Or they may test some of the more technical requirements of the system, that do not show up as user-level requirements.
The original team for whom I wrote this material was scheduled to move over the final half of the year toward using Selenium and WebDriver to drive automated testing of the Web side of the system. The QA team and developers will work together to define, create and maintain these tests, which will be somewhere in the spectrum of Functional / Integration / Acceptance tests, exactly where still to be determined.
In our human-body metaphor, Acceptance tests would verify that the assembled body with all of its moving parts can work together to throw a baseball; or stand up from a sitting position.
Performance Testing (both End-to-End and Micro-Benchmarks) - these tests verify that the system meets its performance requirements. Automating them lets us monitor the state of our system easily, with less manual effort, and more regularity. However, setting up and automating a Performance test is much more complex than smaller, simpler unit tests.
An End-to-End one might measure a complete process, such as our current investigation into the speed of the weekly newsletter emails in the upcoming release.
A Micro-benchmark might measure performance against expectations for smaller sub-systems, such as the template processing part of the newsletter emails, or the time to display a data-intensive screen.
In our human-body metaphor, an end-to-end performance test might be that the baseball-throwing from Acceptance testing can be thrown with a speed of at least 80 mph. A Micro-benchmark performance test might be testing that the hand can scoop up and hold a minimum number of peanuts - smaller in scale than an end-to-end test, but meeting or exceeding some defined criteria.
Static Analysis - these are not tests in the same way as the other categories, in that they are not running the code against some requirement. Rather, these are automated runs of code review tools. They can check against team coding standards, or against industry best-practices. Tools such as PMD and CPD do things like scan for empty try/catch/finally blocks or switch statements (possible bugs), unused variables and methods, over-complicated sections, excessively long classes or methods, duplicated code, and more. They can automatically run overnight on the day's changes, and identify places that could be improved, corrected or refactored.
For a lark, I ran PMD to find dead code and CPD to find duplicated code in one core package of our team's code base. The tools found literally hundreds of cases of dead code, unused variables, and copy-and-pasted code in that one package. These so-called "code smells" are places where our system is unnecessarily hard to read, understand and maintain, and are places where bugs could slip in, if they do not exist already.
Static Analysis is a little harder to fit into our human-body metaphor. It looks at the code that constitutes the system, the materials we use in building it. The nails and board sizes if our system were a house. In our human body system, that might mean verifying that the Finger Nail component meets our system's standards of length, growth rate, position, and does not borrow excessively from the cell structures of other sub-systems like cells in the tongue.
Database Tests - Barker does not say anything in the podcast about his automated database tests. Too bad, because I would love to learn more. In our current projects, we do some query analysis toward improving the efficiency of some slower parts of the system, but they are generally manual one-off investigations based on customer complaints. The prospect of automating this process or some database validation checks is intriguing.
Since I can only speculate what Barker meant, and we do not have anything that I would consider an automated Database test, I am not sure how to map it to my human body metaphor. Possibly that sensory input from the nose gets stored in the correct format and location of the brain.
In conclusion, we currently automate Unit tests, have played around with automating Performance tests, and are working toward something closer to Acceptance tests. But using Mr. Barker's inspiring list, there is lots of room for growing our automated testing, with the goals of increasing reliability, performance, features and reducing bugs in our application.