Tag: TDD

Acronyms in the Software Craftsman’s toolbox

Advertisements

Improved matching error messages in Extended-Mockito

I’ve recently made some improvements to Extended-Mockito in the area of failure messages.

In early versions, messages from a failure to match in a verify() call were a bit cryptic, especially when matching based on lambdas.  This is what you used to get :

Wanted but not invoked:
exampleService.doAThingWithSomeParameters(
    <Extended matchers$$ lambda$ 6/ 1 5 0 2 6 8 5 4 0>,
    <Extended matchers$$ lambda$ 8/ 3 6 1 5 7 1 9 6 8>,
    <custom argument matcher>,
    <Extended matchers$$ lambda$ 1 1/ 2 1 0 5 0 6 4 1 2>
);

In the first of my examples with the new improvements (available starting in version 2.0.78-beta.1 of extended-mockito and transitively in version 0.9.0 of tdd-mixins-core and tdd-mixins-junit4 ) it’s now possible to show more clearly what kind of arguments were expected:

Wanted but not invoked:
exampleService.doAThingWithSomeParameters(
{String containing all of: [Butcher,Baker,Candlestick Maker]},
[All items matching the given Predicate],
[One or more items matching the given Predicate],
SomeBean where val1 > 5
);

For info, the expectation call which gives this failure message is:

verify(expecting).doAThingWithSomeParameters(this.containsAllOf("Butcher", "Baker", "Candlestick Maker"),
				allSetItemsMatch(s -> s.startsWith("A")),
				oneOrMoreListItemsMatch(s -> s.startsWith("B")),
				objectMatches((SomeBean o) -> o.getVal1() > 5, "SomeBean where val1 > 5"));

The TestNest Pattern

I discovered the TestNest pattern – the idea of using a suite of nested test classes in order to create a hierarchical organization of tests – in this blog post by Robert C. Martin, but apart from one other article I haven’t found a lot of information online about this technique. So I thought I’d try it out in a sort of hello-world to get a feel for it. As a bonus, I added in some tests using the excellent Zohhak test library which generates automatically nested tests for sets of values via an annotation.

My test class can be found here.

Here are some of the highlights…

The outer class declaration with its JUnit Suite annotations:

@RunWith(Suite.class)

@SuiteClasses({ MyExampleClassTest.Method1.class, MyExampleClassTest.Method2.class,

MyExampleClassTest.Calculation1.class, MyExampleClassTest.Calculation2.class})

public class MyExampleClassTest {

One of the nested classes in MyExampleClassTest is called SharedState. It contains the object under test (called underTest) and is the parent of all the other nested test classes (and it also uses a mixin from tdd-mixins-junit4 which gives all its subclasses the ability to call assertions non-statically).

There are 4 nested classes which contain tests – one for each method in the class under test. This seemed like a logical organization, though it’s not what Uncle Bob did in his aforementioned blog post. It might make sense to further subdivide between happy path tests and weird corner case tests (what Uncle Bob called “DegenerateTests” in his example), and there may be better ways to divide tests than along class and method lines (though I would hope that my classes and methods under test are cohesive units of organization).

Using the Suite runner in the parent class doesn’t prevent you from using other runners in the nested classes. Two of the nested classes use the Zohhak test runner:

@RunWith(ZohhakRunner.class)

I intentionally put 2 failures in the tests (one fail() in a normal test and one error in a Zohhak value set) to see what the failures would look like.

Here’s part of what Maven had to say about these failures:

Failed tests: should_fail(org.example.MyExampleClassTest$Method2): intentional failure to illustrate nested test failure messages

should_calculate_cube [-1, 1](org.example.MyExampleClassTest$Calculation2): expected:<[]1> but was:<[-]1>

Here’s what I see in Eclipse’s JUnit sub-window:

Eclipse JUnit run results showing 2 nested errors

This seems like a minor improvement over long, descriptive test method names for quickly getting a feel for where the problem isespecially when there are multiple simultaneous failures. I didn’t have to put the name of the method under test in the test method name, and in the eclipse JUnit runner UI the tests for each method are nicely grouped together.  Zohhak works well with this approach, as well (and seems like a pleasure to use in general for testing calculation results).

 

Does TDD “damage” your design?

I recently came across a couple articles that challenged some of my beliefs about best practices.

In this article, Simon Brown makes the case for components tightly coupling a service with its data access implementation and for testing each component as a unit rather than testing the service with mocked-out data access. Brown also cites David Heinemeir Hansson, the creator of Rails, who has written a couple of incendiary articles discouraging isolated tests and even TDD in general. Heinemeir Hansson goes so far as to suggest that TDD results in “code that is warped out of shape solely to accomodate testing objectives.Ouch.

These are thought-provoking articles written by smart, accomplished engineers, but I disagree with them.

For those unfamiliar with the (volatile and sometimes confusing and controversial) terminology, isolated tests are tests which mock out dependencies of the unit under test. This is done both for performance reasons (which Heinemeir Hansson calls into question) and for focus on the unit (if a service calls the database and the test fails, is the problem in the service or the SQL or the database tables or the network connection?). There’s also a question of the difficulty of setting up and maintaining tests with database dependencies. There are tools for that, but there’s a learning curve and some set-up required (which hopefully can be Dockerized to make life easier). And there’s one more very important reason which I’ll get to later…

Both Brown and Heinemeir Hansson argue against adding what they consider unnecessary layers of indirection. If your design is test-driven, the need for unit tests will nudge you to de-couple things that Brown and Heinemeir Hansson think should remain coupled. The real dilemma is where should we put the inevitable complexity in any design? As an extreme example, to avoid all sorts of “unnecessary” code you could just put all your business logic into stored procedures in the database.

“Gang of Four” member Ralph Johnson described a paradox:

There is no theoretical reason that anything is hard to change about software. If you pick any one aspect of software then you can make it easy to change, but we don’t know how to make everything easy to change. Making something easy to change makes the overall system a little more complex, and making everything easy to change makes the entire system very complex. Complexity is what makes software hard to change. That, and duplication.

TDD, especially the “mockist” variety, nudges us to add layers of indirection to separate responsibilities cleanly. Johnson seems to be implying that doing this systematically can add unnecessary complexity to the system, making it harder to change, paradoxically undermining one of TDD’s goals.

I do not think that lots of loose coupling makes things harder to change. It does increase the number of interfaces, but it makes it easier to swap out implementations or to limit behavior changes to a single class.

And what about the complexity of the test code? Brown and Heinemeir Hansson seem to act as if reducing the complexity of the test code does not matter. Or rather, that you don’t need to write tests for code that’s hard to test because you should just expand the scope of the tests to do verification at the level of whole components.

Here’s where I get back to that other important reason why “isolated” tests are necessary: math. J.B. Rainsberger simply destroys the arguments of the kind that Brown and Heinemeir Hansson make and their emphasis on component-level tests. He points out that there’s an explosive multiplicative effect on the number of tests needed when you test classes in combination. For an oversimplified example, if your service class has 10 execution paths and its calls to your storage class have 10 execution paths on average, testing them as a component, you may need to write as may as 100 tests to get full coverage of the component. Testing them as separate units, you only need 20 tests to get the same coverage. Imagine your component has 10 interdependent classes like that… Do you have the developer bandwidth to write all those tests? If you do write them all, how easy is it to change something in your component? How many of those tests will break if you make one simple change?

So I reject the idea that TDD “damages” the design. If you think TDD would damage your design, maybe you just don’t know how bad your design is, because most of your code is not really tested.

As for Heinemeir Hansson’s contention that it’s outdated thinking to isolate tests from database access, he may be right about performance issues (not everyone has expensive development machines with fancy SSD drives, but there should be a way to run a modest number of database tests quickly). If a class’s single responsibility is closely linked to the database, I’m in favor of unit-testing it against a real database, but any other test that hits a real database should be considered an integration test. Brown proposes a re-shaped, “architecturally-aligned” testing pyramid with fewer unit tests and more integrated component tests. Because of the aforementioned combinatorial effect of coupling classes in the component, that approach would seem to require either writing (and frequently running) a lot more tests or releasing components which are not exhaustively tested.

A Testing Toolbox for Java 8

A few months ago, I presented interface-it, a java-8 tool to generate mixin interfaces. In case you don’t know what that means or why mixins could be useful to you, I wrote a short article which explains it.

One of the key motivations for the interface-it tool was to be able to generate mixins for the latest versions of unit-testing libraries like Mockito and AssertJ. Now you no longer have to worry about that, because I’m doing it for you. And more.

I now have several more projects to present to you.

Presenting tdd-mixins-junit4

Working backwards – if you want to have a great test fixture by adding only one dependency in your build configuration (your Maven Pom, Ivy xml, Gradle, or just a fat jar added to your classpath), use tdd-mixins-junit4. It gives you all the basics you need to do mocking and assertions with fluidity, simplicity and power.

Normally, that’s all you should need for your tests. Mockito allows you to to set up collaborating objects and verify behavior, and the extensions I added make it even easier to handle cases where, for example, you want to mock behavior based on arguments passed to the mock which are generated by your unit under test. As for verifying returned results, AssertJ and JUnit assertions allow you to verify any data returned by the unit under test.

Presenting tdd-mixins-core

If you do not want to use JUnit 4 (maybe you want to use TestNG or an early version of JUnit 5), then you can use tdd-mixins-core, which has everything that tdd-mixins-junit4 has, except the mixin for JUnit assertions and JUnit itself.

Presenting extended-mockito

So these tdd-mixins libraries notably give you mixins for the aforementioned libraries Mockito and AssertJ. As for Mockito, they use my extended-mockito (TODO: link to project) library, which not only provides mixins for classes like Mockito and BDDMockito, but it also provides extra matcher methods to simplify specifying matching arguments for mocked methods. For example:

when(myMockSpamFilter.isSpam(containsOneOrMoreOf("Viagra", "Cialis", "Payday loan")))
            .thenReturn(Boolean.TRUE);

See the project’s home page or the unit tests for more details.

Presenting template-example

As for AssertJ, it’s already quite extended for general purpose use, so there is no extended-assertj project, but if you want to take things farther, I did create a project called template-example, which shows how, with a little tweaking, you can use a Maven plugin to auto-generate custom assertions for your own JavaBeans which are combined with the AssertJ mixin from tdd-mixins-core. These custom assertions allow you to do smooth, fluent assertions for your own data types, allowing this sort of validation call:

assertThat(employee).employer().address().postalCode().startsWith("11");

With these tools, you can more productively write unit tests with powerful assertions and mocking. They give you a fixture that you can set up in any test class by implementing an interface or two – for example:

public class MyTest implements ExtendedMockito, AllAssertions {

Not Included

What’s missing from these tools? I wanted to keep the toolset light, so there are some excellent but more specialized tools which are not included. For example, I have generated a mixin for Jsoup, which is very useful if you need to validate generated HTML, but unless I hear a clamoring for it, I will leave it out of tdd-mixins-core because it adds a dependency that lots of people may not need. Same for extensions to AssertJ – I generated mixins for AssertJ-DB  and for AssertJGuava (UPDATE: also added one for Awaitility), but did not include them in tdd-mixins (you can copy and paste the generated mixins’ source files if you want to use them).

Another library which is useful but which does not lend itself to mixins (because it uses annotations rather than static calls) is Zohhak it simplifies testing methods which return results that depend on a wide variety of possible input values (such as mathematical calculations or business rules).

API is Legacy

Now that I’ve released version 1.1 of interface-it, I understand better why so many projects decide to do a big 2.0 release which includes breaking changes to API’s.

Once you’ve decided to support backwards compatibility for all the exposed API (all public and protected methods in the public classes, normally), minor releases hit a lot of technical debt because you can’t refactor everything you want to. TDD doesn’t save you from this .

With interface-it, I made a few assumptions about the API which changed very quickly when I discovered new requirements. For example, before the 1.0 release, I thought about encapsulating some of the arguments to key methods, but I didn’t see clear affiliations or behavior related to these arguments. It was just data needed for generating a mixin. For version 1.1, where you can generate a hierearchy of mixin types (just parent-child via the command-line interface, but the code API is more flexible), it was no longer data but behavior that I needed, because I had to manage multiple classes in the same call. Despite having excellent code coverage thanks to TDD, the change was a bit painful to implement, and I had to leave in place some methods which I would have preferred to delete.

TDD is our friend, but…

“Animals are our friends!!”, comedian Bob Goldthwait used to shout, adding “But they won’t lend you money.”

One famous definition of legacy code (from Michael Feathers) is “code without tests”. I would extend that definition to say that it’s code which does not have tests for the behavior you want to have. TDD will get you code that is well-tested for the behavior you want when you develop it. If your code does anything innovative, some of the behavior you want will almost certainly change over time.

So TDD is our friend, but it won’t eliminate all legacy code. What it will do is make it easier to know when we accidentally change behavior. It doesn’t necessarily make it easy to change our code’s behavior, either, but it gives us confidence to tackle the change because we know that tests will catch and allow us to fix any little mis-steps before they become more difficult and time-consuming to find and fix, and, importantly, it nudges us constantly to design our code in a way which allows it to be refactored with less trouble and risk.

Finding Fun

It’s hard to master software development if you don’t enjoy doing it, so you’ve got to find fun. There’s a lot of fun to be found in software development (unless you’re one of those people who really can’t program).

We should be thankful that our job allows us to have fun. There are plenty of jobs where it’s a lot more difficult to find fun.

So how to find fun?

Learn

Discovery and mastery are fun. If what you discover and master are relevant to your work, your employer will probably accept that at least a small part of your time is spent on learning. It’s in your employer’s interest to have developers who are up to speed. At a minimum you should be able to organize a “brown bag” session to learn during lunch time.

Share learning

Brown bag lunches are also a fun opportunity to share what you’ve learned with others. So is pair programming.

Gamify

Challenging yourself makes things more fun. Challenging others adds even more fun.

Doing TDD lets you create little challenges all the time for yourself. You can also challenge yourself to move the needle on certain code metrics (test coverage, cyclomatic complexity, PMD/Findbugs issue count, etc.), within reason (choose metrics which actually add value to your work). Following the Boy Scout Rule, you can challenge yourself to make a big ugly method into small, beautiful methods.

One suggestion I heard recently to make pair programming more fun is to play “TDD pong”. One member of the pair writes a test and then challenges the other to make the test pass. Then the roles switch.

I also heard recently from a scrum master who invented a role-playing system (using Star Wars characters to bridge the gap between older and younger developers) where each developer is assigned for a sprint a certain character.  Each character gets points for doing a different thing, such as writing “perfect” unit tests, facilitating communication between developers, enhancing code performance, etc.  Whoever has the most points at the end of the sprint gets a small prize. The idea is to get less experienced developers  to adapt good practices by having them focus on one practice at a time.

Take micro-breaks

When it’s not fun, stop for a minute. Even when it is fun, stop for a minute now and then. Your eyes and wrists will thank you, and things will stay fun longer.

Laugh

We can laugh at what we do, our processes, our teams, our way of thinking.

We live in a silly world. A generation ago, if you walked down the street alone while having a conversation, people would think you have some form of schizophrenia. Now they just think you have some form of bluetooth device. It’s best not to take things too seriously.

Laughing at your test data:
Be careful, because some of your test data might turn up in a commercial presentation which could make or break your employer’s reputation, and you don’t want to waste too much time on non-essential tasks, but you can make your test data more fun.

For example, if you have a test database which needs a list of video game titles, you could enter “game1”, “game2”, “game3”, etc.  Or you could enter “Accountant’s Creed”, “Angry Nerds”, “Bassoon Hero”, “Mimecraft”, “Unreal Fantasy”, “Shower Defense”, “Handy Crutch Saga”, “Grand Theft Lotto”, “Call of Booty”, “Resident E-mail”, etc. Your choice.

Create

We’re fortunate enough to do work where we get to create things with our minds. Don’t be afraid to break out your metaphorical “box of crayons”.

 

Legacy Lexicon? Naming different types of tests

At a recent software craftsmanship meetup I attended, there was an hour-long group discussion of the definitions of the terms “unit test”, “integration test” and “acceptance test”. How can this be, 2 decades after Kent Beck started developing and using the automated test software that has now become xUnit, that the very people who are most interested in pursuing best practices in object-oriented software development have doubts or confusion about the meaning of these fundamental terms?

The confusion is, in fact, widespread. Even J.B. Rainsberger, one of the world’s finest experts on test-driven design, ran into problems with this lexical quicksand.

Unit Test

The term “unit test” comes from a paradigm of scope, but there is some disagreement of what that scope is. Some experts, such as Roy Osherove  and Michael Feathers, have tried to impose some precision on that notion of scope. Stack Overflow users seem to agree with them. There remains some ambiguity, however, as Martin Fowler has recognized and explained – particularly as a result of the differences between “mockist” and “classicist” testing approaches. Personally, I prefer Fowler’s more inclusive definition of unit tests (tests focused on a unit which assume that collaborators work – whether you mock them out or integrate them).

Integration Test vs. Integrated Test

Then there’s the term “integration test”. There is a paradigm of scope in this name, as well – the word “integration” implies combining multiple “unit” scopes (or at least combining 1 unit scope with a tool or service external to the code, such as a database). But is every test that involves more than 1 unit an integration test? There’s another paradigm involved, which is that there’s the notion of testing the integration of units rather than the logic of the units themselves. This is where Rainsberger mis-stepped. He now rightly makes a distinction between integration tests – tests which validate the integration of different components as well as databases, the file system, etc. – and integrated tests – tests which attempt to do the work of validating units without fully isolating those units (tests which he finds to be mostly counterproductive over time because they are often too heavy to use for thorough testing of the logic of individual units, and they give a false sense of security).

This is a more finessed approach than Osherove’s, who seems to say that any test which is not a “good” unit test by his definition is an integration test.

Acceptance Test

The third notion of scope is that of acceptance tests. The scope is the whole product (end-to-end), limiting the functionality tested to 1 user story.

Boundaries

So we have notions of 3 different categories of tests: unit, integration and acceptance. These 3 categories are not exhaustive. If we adhere to strict definitions and use Rainsberger’s term of “integrated tests” as a fourth category there are no longer ambiguities between unit and integration tests because those 2 categories of test serve different purposes (it’s no longer a question of scope). There remains an ambiguity between the notion of unit testing and integrated testing – over the notion of the boundaries of a unit (for example, testing a root aggregate may involve several non-mocked classes, but I would argue that it’s a unit test because the root aggregate and its aggregate elements form a single unit). Since Rainsberger has labeled integrated tests as often harmful (“a scam” in his words), defining this boundary can give a notion of the quality of testing.

As an aside, I find that sometimes a test can start as a unit test and then become an integrated test because of a useful refactoring. Enforcing SRP can have this effect. You have a classicist-style test for a method that eventually does too much. So you delegate part of the method’s implementation to another class.  Do you also need to rewrite the test to mock out the new delegate, even though the test passes as-is? Probably the answer is to keep the original test unchanged and add additional focused unit tests to the delegate. I suppose Osherove would say in this case that either the original test’s definition of a unit was too broad (so for him, it’s an integration test from the start), or else the refactoring merely made the unit span multiple classes (so it remains a unit test).

More Useful Terms?

Are these terms the most useful ones? Maybe not, especially given the confusion they engender.

Rainsberger prefers to promote the idea of collaboration tests and contract tests. Collaboration tests are mockist-style tests where a test verifies the interactions between the method under test and its collaborators (which are mocked or otherwise replaced by test doubles). Contract tests verify that a method, given certain inputs, produces a certain output (or possibly an internal state change that can be verified). Generally, the contract test applies to an interface or abstract class, (that which was mocked out in the collaboration tests) and can be applied to any concrete subtype of the abstract type under test. The 2 categories are complementary: You test collaborations to verify high-level services and then you use contract tests to verify in detail the functionality that was mocked out in the collaboration tests. I believe that some contract tests can (and often should) be integrated tests – for example focused tests that touch a database to verify low-level data model logic (which goes beyond just checking that the database is integrated). This is a pattern which I came to use spontaneously in my exploration of the hexagonal architecture.

Rainsberger also prefers the term microtests over unit tests. The term does seem to give a clearer notion of scope than “unit”.

In Fowler’s article, we see 2 other interesting categorizations.

There’s Jay Fields’s distinction between “solitary” and “sociable” tests. In solitary tests, there are no collaborators which are not replaced by test doubles. Sociable tests generally involve at least one real collaborator.

Fowler also distinguishes between 2 test suites: a “compile” suite (run on every build) and a “commit” suite (run before a commit, or in continuous integration). This is a useful idea for TDD, and the categorization is less ambiguous: if it runs fast, it’s in “compile”, otherwise it’s in “commit”. The only ambiguity is in the speed threshold separating the 2 categories, and that’s more a question of practicality and personal or team preference – there isn’t a “right answer”.

Catch-all terms

There’s are also catch-all terms if you’re not sure which type of test you have:

  •  developer tests – this term is obviously about who is responsible for creating and maintaining the test rather than the purpose or scope of the test.
  • automated tests – this term is obviously about how the tests are performed, so it would include developer tests and, for example, selenium tests created by Q.A.

A picture’s worth 1000 words

Using Martin Fowler’s more inclusive view of “unit tests”:

TestLexiconFowler

The colored area seems to be a source of some confusion – if you have a collaborating or delegate class that is not mocked out it can still be considered a unit test. Feathers does not address this area in his definition. Osherove does: “A unit of work can span a single method, a whole class or multiple classes…”  So a unit test can be an integrated test. Whether it should is a different question based on notions of good design and the actual execution speed of the test.

London Mockists to the Rescue?

In “Growing Object-Oriented Software, Guided by Tests”, Steve Freeman and Nat Pryce manage, using simple questions, to define the major test scopes without getting hung up on details of things like unit boundaries:

  • Acceptance: Does the whole system work?
  • Integration: Does our code work against code we can’t change?
  • Unit: Do our objects do the right thing, are they convenient to work with?

That seems like enough of a definition to do some useful tests.

Conclusion

In conclusion, don’t lose sleep over these definitions or let them stop you from doing tests.  There exist terms which are clear and precise enough to describe the good practices you need, and we can live with some gray areas where the most commonly used terms are concerned. Just be careful when you’re discussing tests to make sure that everyone in the discussion has the same understanding of the terms.

 

UPDATE:

I just discovered another interesting approach to this topic, by Simon Brown.