At a recent software craftsmanship meetup I attended, there was an hour-long group discussion of the definitions of the terms “unit test”, “integration test” and “acceptance test”. How can this be, 2 decades after Kent Beck started developing and using the automated test software that has now become xUnit, that the very people who are most interested in pursuing best practices in object-oriented software development have doubts or confusion about the meaning of these fundamental terms?
The confusion is, in fact, widespread. Even J.B. Rainsberger, one of the world’s finest experts on test-driven design, ran into problems with this lexical quicksand.
The term “unit test” comes from a paradigm of scope, but there is some disagreement of what that scope is. Some experts, such as Roy Osherove and Michael Feathers, have tried to impose some precision on that notion of scope. Stack Overflow users seem to agree with them. There remains some ambiguity, however, as Martin Fowler has recognized and explained – particularly as a result of the differences between “mockist” and “classicist” testing approaches. Personally, I prefer Fowler’s more inclusive definition of unit tests (tests focused on a unit which assume that collaborators work – whether you mock them out or integrate them).
Integration Test vs. Integrated Test
Then there’s the term “integration test”. There is a paradigm of scope in this name, as well – the word “integration” implies combining multiple “unit” scopes (or at least combining 1 unit scope with a tool or service external to the code, such as a database). But is every test that involves more than 1 unit an integration test? There’s another paradigm involved, which is that there’s the notion of testing the integration of units rather than the logic of the units themselves. This is where Rainsberger mis-stepped. He now rightly makes a distinction between integration tests – tests which validate the integration of different components as well as databases, the file system, etc. – and integrated tests – tests which attempt to do the work of validating units without fully isolating those units (tests which he finds to be mostly counterproductive over time because they are often too heavy to use for thorough testing of the logic of individual units, and they give a false sense of security).
This is a more finessed approach than Osherove’s, who seems to say that any test which is not a “good” unit test by his definition is an integration test.
The third notion of scope is that of acceptance tests. The scope is the whole product (end-to-end), limiting the functionality tested to 1 user story.
So we have notions of 3 different categories of tests: unit, integration and acceptance. These 3 categories are not exhaustive. If we adhere to strict definitions and use Rainsberger’s term of “integrated tests” as a fourth category there are no longer ambiguities between unit and integration tests because those 2 categories of test serve different purposes (it’s no longer a question of scope). There remains an ambiguity between the notion of unit testing and integrated testing – over the notion of the boundaries of a unit (for example, testing a root aggregate may involve several non-mocked classes, but I would argue that it’s a unit test because the root aggregate and its aggregate elements form a single unit). Since Rainsberger has labeled integrated tests as often harmful (“a scam” in his words), defining this boundary can give a notion of the quality of testing.
As an aside, I find that sometimes a test can start as a unit test and then become an integrated test because of a useful refactoring. Enforcing SRP can have this effect. You have a classicist-style test for a method that eventually does too much. So you delegate part of the method’s implementation to another class. Do you also need to rewrite the test to mock out the new delegate, even though the test passes as-is? Probably the answer is to keep the original test unchanged and add additional focused unit tests to the delegate. I suppose Osherove would say in this case that either the original test’s definition of a unit was too broad (so for him, it’s an integration test from the start), or else the refactoring merely made the unit span multiple classes (so it remains a unit test).
More Useful Terms?
Are these terms the most useful ones? Maybe not, especially given the confusion they engender.
Rainsberger prefers to promote the idea of collaboration tests and contract tests. Collaboration tests are mockist-style tests where a test verifies the interactions between the method under test and its collaborators (which are mocked or otherwise replaced by test doubles). Contract tests verify that a method, given certain inputs, produces a certain output (or possibly an internal state change that can be verified). Generally, the contract test applies to an interface or abstract class, (that which was mocked out in the collaboration tests) and can be applied to any concrete subtype of the abstract type under test. The 2 categories are complementary: You test collaborations to verify high-level services and then you use contract tests to verify in detail the functionality that was mocked out in the collaboration tests. I believe that some contract tests can (and often should) be integrated tests – for example focused tests that touch a database to verify low-level data model logic (which goes beyond just checking that the database is integrated). This is a pattern which I came to use spontaneously in my exploration of the hexagonal architecture.
Rainsberger also prefers the term microtests over unit tests. The term does seem to give a clearer notion of scope than “unit”.
In Fowler’s article, we see 2 other interesting categorizations.
There’s Jay Fields’s distinction between “solitary” and “sociable” tests. In solitary tests, there are no collaborators which are not replaced by test doubles. Sociable tests generally involve at least one real collaborator.
Fowler also distinguishes between 2 test suites: a “compile” suite (run on every build) and a “commit” suite (run before a commit, or in continuous integration). This is a useful idea for TDD, and the categorization is less ambiguous: if it runs fast, it’s in “compile”, otherwise it’s in “commit”. The only ambiguity is in the speed threshold separating the 2 categories, and that’s more a question of practicality and personal or team preference – there isn’t a “right answer”.
There’s are also catch-all terms if you’re not sure which type of test you have:
- developer tests – this term is obviously about who is responsible for creating and maintaining the test rather than the purpose or scope of the test.
- automated tests – this term is obviously about how the tests are performed, so it would include developer tests and, for example, selenium tests created by Q.A.
A picture’s worth 1000 words
Using Martin Fowler’s more inclusive view of “unit tests”:
The colored area seems to be a source of some confusion – if you have a collaborating or delegate class that is not mocked out it can still be considered a unit test. Feathers does not address this area in his definition. Osherove does: “A unit of work can span a single method, a whole class or multiple classes…” So a unit test can be an integrated test. Whether it should is a different question based on notions of good design and the actual execution speed of the test.
London Mockists to the Rescue?
In “Growing Object-Oriented Software, Guided by Tests”, Steve Freeman and Nat Pryce manage, using simple questions, to define the major test scopes without getting hung up on details of things like unit boundaries:
- Acceptance: Does the whole system work?
- Integration: Does our code work against code we can’t change?
- Unit: Do our objects do the right thing, are they convenient to work with?
That seems like enough of a definition to do some useful tests.
In conclusion, don’t lose sleep over these definitions or let them stop you from doing tests. There exist terms which are clear and precise enough to describe the good practices you need, and we can live with some gray areas where the most commonly used terms are concerned. Just be careful when you’re discussing tests to make sure that everyone in the discussion has the same understanding of the terms.
I just discovered another interesting approach to this topic, by Simon Brown.