Month: January 2016

Gentle Introduction: Mockito

From time to time I will write an article aimed at introducing a tool or a concept to newbies, under the category “Gentle Introduction”. Since development technologies evolve quickly, you’re surely a newbie at something even if you’re a brilliant, accomplished developer.

Here I present the mock object framework Mockito.

Why do you need it?

For most of your unit tests (unit tests which are not integration tests – I’m still struggling to find exactly the right term to make this clear), you need to isolate the unit under test from everything external (so, for example, your suite of 500 tests can verify the logic of your code in 1 minute instead of wasting 20 minutes on calls to the database which are unnecessary for testing the logic).  You could do this by writing a stub class for every collaborator, but that’s an unwieldy approach, and you may find yourself spending more and more time maintaining a bunch of stubs as API’s evolve. A test double library like Mockito saves time and helps you write better, clearer, focused tests which run fast and are easier to maintain.

Why I chose Mockito

I used to use JMock as a mock objects framework, but I prefer Mockito because of its clearer syntax, its clear separation of stubbing and verification functionalities, and the fact that by default it ignores interactions which are not explicitly stubbed or verified (whereas with JMock adding a new interaction will almost always break your test).

What you can do with Mockito

The Mockito javadoc provides a lot of good introductory examples, but here’s a basic example to show what Mockito can do. It’s a piece of a TODO list service.  The full example code (which runs with JUnit 4) is available in a pastebin file. Below we see a test to verify a service method that delays by 1 day the deadline for a TODO item.  Using Mockito.when(), we set up the call which retrieves the item from storage to return a known value (this is known as stubbing), and we do verification (using Mockito.verify()) that the new deadline is correctly passed to storage:

	
@Test
public void can_delay_by_one_day() throws UnknownIdException {
	// Given:
	LocalDateTime originalDeadline = LocalDateTime.of(2016, Month.MARCH,
			20, 8, 00);
	// Stub the call to getById to return a known value
	Mockito.when(storage.getById(ITEM_ID)).thenReturn(
			new TodoItem(ITEM_ID, "Blog article", originalDeadline));

	// When:
	this.underTest.delayByOneDay(ITEM_ID);

	// Then:
	// Verify that the storage interface is called with the correct
	// arguments
	Mockito.verify(this.storage).updateDeadline(ITEM_ID,
			LocalDateTime.of(2016, Month.MARCH, 21, 8, 00));
}

If you use static imports, you’ll get code that looks more like:

verify(storage).updateDeadline(ITEM_ID, of(2016, MARCH, 21, 8, 00));

Here’s an example where we stub the storage retrieval method to throw an exception to verify that it’s not caught and swallowed by the service method:

	
@Test(expected = UnknownIdException.class)
public void should_pass_on_unknown_id_exception() throws UnknownIdException {
	// Given:
	Mockito.when(storage.getById(ITEM_ID)).thenThrow(
			new UnknownIdException(3L));
	
	// When:
	this.underTest.delayByOneDay(ITEM_ID);

	// Then: expect exception
}

More stuff you should know

If you understood everything up to here (stubbing to return a value, stubbing to throw an exception, and verifying a call with specific arguments), you know more than half of what you need to understand about Mockito to get a lot of value from it. It’s also important to know about matchers. If you have to verify a call but either do not know (or care about) or do not have access to an argument which will be passed, you can use matchers to cover the case you need to cover.  If you use a matcher for one argument of a call, you need to use it for all the arguments, so the Matchers class (as well as its subclass Mockito) contains a bunch of standard matchers like eq() and anyObject() for simple cases, and argThat() which allows you to use a more complex custom matcher.

You should know that stubbing void methods is a bit different.  Instead of calling

Mockito.when(myMock.voidMethod(arg0))...

you need to call:

Mockito.doAnswer(...).when(myMock).voidMethod(arg0);

There’s a nice explanation of this on StackOverflow.

You also need to know about Mockito.verifyNoMoreInteractions() in case you want the test to fail when there’s an unforeseen interaction with a particular mocked collaborator. There are also argument capturing (extracting arguments passed to mocked calls for later comparison or other analysis) and spies (partial mocking of real objects), which could be useful in certain cases.

Also, if you prefer a BDD-style approach to tests, and it bothers you to call when() in your “given” section, the class BDDMockito is for you.  It renames the “when” and “verify” methods to “given” and “then”.

Caveats and Limits

One caveat: I do not recommend using the auto-wiring annotations available with Mockito.  There are known limitations to the @InjectMocks annotation (as described here and here), which create some confusion and encourage questionable design practices.  Better to explicitly create and pass around mock objects. It makes your dependencies more visible, which is good design. Using @Mock (which is a factory annotation for a field of a class which is a mock) might save you a few keystrokes if you have a lot of mocks in one class, but you must either use Mockito’s test runner (JUnit allows only 1 test runner per class, and I think some are more useful than Mockito’s) or call MockitoAnnotations.initMocks(this) in your setup(), so unless you have a lot of mocks in one test class (which might be a code smell) I don’t think it’s worth it.

Mockito is based on a low-level proxy library (cglib or ByteBuddy, depending on the version of Mockito), which means that there are some limits to what it can do, but those limits are generally conducive to good design.  For example, Mockito, unlike PowerMock or JMockit, can not mock static methods.  If you’re developing new code, this is a good thing.  For legacy code, my goal would be to get to the point where the test code works with Mockito only, even if I have to use PowerMock to test things before refactoring.  Mockito can’t mock final classes or methods, either, but that’s another limitation which can be overcome by improving your design.

Mockito can mock concrete classes, but normally you should be mocking mostly interfaces.  If you follow the dependency inversion principle religiously, you can test the logic in your classes mocking only interfaces and maybe some abstract classes.  For concrete classes you don’t own, you can mock them, but it’s generally encouraged to wrap them in an abstraction with an interface that you do own and can mock instead.

I tested the example code above with Mockito versions 2.0.2-beta and 2.0.36-beta. Note that 2.0.36-beta has some breaking changes vis-à-vis earlier versions where matchers are concerned, though the workarounds for these are pretty easy. The example code here runs on both versions without any changes.

How to get Mockito

To try Mockito, you can download the 2.0.2-beta mockito-all fat jar from bintray.com, though you’d be better off using a tool like maven (or gradle or ivy) to automatically download. For the latest beta (currently 2.0.40), there is no fat jar – you will need mockito-core and the separate dependency jars – so a dependency manager tool like maven is even more useful.

Advertisements

Fear of Streams

There’s an article in the new issue of the French magazine Programmez which got my attention. It chronicles the efforts of some programmers for a B2C site to investigate converting their old for loops into more readable and maintainable Java-8 streams. The article is kind of a mess, actually, but to a software craftsman or craftswoman who works on Java legacy code, the message is a bit frightening.  The programmers did performance tests of the methods they changed, but because some of the more complex methods they modified took twice as long to execute as they did before, their employer was not convinced and does not want them to proceed with a refactor.

If you thought that it was difficult to sell your product owner on the idea of adding days to the schedule to clean up legacy code, imagine having to reveal that the clean-up also involves the risk of performance issues!

Not all of their changes slowed down execution – some even speeded it up a bit.  What troubled me about the article, though, is that I could not predict, looking at the examples they gave of their modifications, which ones would run faster and which ones would run slower than before.

I turned to StackOverflow to see if somebody knows how to do that in general. Is there any way, in a code review, for example, to see if a stream will run significantly slower (a constant-order slowdown, but enough to annoy a user or kill a refactoring initiative) based on just looking at the code?  Of course, it’s possible to code stupidly in a way that will obviously cause a slowdown.  But, apart from some useful guidance about when to use parallel streams and when not to, the message I got from some smart, informed developers is that you can’t, or else it’s not worth the trouble for what amounts to at worst a constant-order slowdown. Of course you should review code changes to make sure they don’t affect the correctness or the complexity of the algorithm, and maybe you can tell whether going parallel would help or hurt performance, but that’s all you can usefully detect.

I think the biggest mistake that the developers in the Programmez article made was to evaluate the performance based on isolated unit tests (microtests).  Performance of a method is not interesting to the end-user.  A  constant-order decrease in performance of a method will frighten a product owner, but maybe the end-user won’t even notice it.  For example, it may be that doubling the execution time of a loop results in a 0.001% decrease in performance of a web request because it’s a loop to process results of a heavy database query.

If the code you work on is for a financial trading application where every nanosecond counts, then you’re not going to be using Java streams at all – you likely have customized collections that run super-fast with no syntactic frills.  For the rest of us, we need to do automated end-to-end performance tests for the scenarios where performance might be an issue.

If you convert for loops to streams, don’t convert everything on the same day. Space out these kinds of changes enough that end-to-end tests will catch only one performance issue at a time.  Then you can reassure the product owner that there will be no “noticeable” decrease in performance from the refactor, but that the code will be more readable and understandable, and thus easier to change and less bug-prone.  For certain iterations which can be parallelized efficiently, there may even be noticeable performance gains.

The String Thing: Java String Concatenation Performance

I’m researching an upcoming performance-related article, and it occurred to me that I had always just accepted it as given that concatenating Strings using the plus operator in a loop is bad.  First I used StringBuffer instead (back in the day), and then StringBuilder because StringBuffer is synchronized. Occasionally, I started using Guava’s Joiner (which has nice, clean syntax, but involves a 3rd-party dependency). Now Java 8 offers a few new approaches to String concatenation, the most eloquent of which is String.join().

So, rather than rest on received wisdom, I thought I’d put these different approaches to the test using JMH benchmarks.  The source code for my benchmarks is here. It concatenates a list of 20000 Strings into one underscore-delimited String in 7 different ways.

Here are the results (on an old windows 7 laptop with 4MB of overly-solicited RAM and 2 cores, so your mileage may vary) – lower scores are faster:

Benchmark                                                 Mode  Cnt       Score       Error  Units
concatenationUsingGuavaJoiner                             avgt   20      70.193 ±     2.456  ns/op
concatenationUsingPlus                                    avgt   20  231599.954 ± 20386.920  ns/op
concatenationUsingStringBuffer                            avgt   20      58.894 ±     3.943  ns/op
concatenationUsingStringBuilder                           avgt   20      52.580 ±     1.514  ns/op
concatenationUsingStringJoin                              avgt   20      69.401 ±     5.827  ns/op
concatenationUsingStringJoinerInStreamWithForEachOrdered  avgt   20      64.491 ±     3.789  ns/op
concatenationUsingStringJoinerOnForLoop                   avgt   20      69.301 ±     6.815  ns/op

Briefly, this shows that concatenation using ‘+’ is truly awful (more than 4300 times slower than the StringBuilder approach!). All other approaches are roughly equivalent in performance, though StringBuilder runs about 30% faster than the syntactically superior approaches. The biggest surprise for me was how little difference there is between StringBuffer and StringBuilder (StringBuilder is about 12% faster). I was also a bit surprised that the compiler/JVM doesn’t do a better job of optimizing the ‘+’-based concatenation, given that this is a known issue since the 1990’s (it turns out the received wisdom of old still applies).

So, if you must have speed at all costs, the winner is:

	@Benchmark
	public String concatenationUsingStringBuilder() {
		// The approved pre-Java-8 approach
		StringBuilder result = new StringBuilder();
		for (String s : sourceList) {
			if (result.length() > 0) {
				result.append(SEPARATOR);
			}
			result.append(s);
		}
		return result.toString();
	}

Otherwise (if having readable, maintainable code is worth more in the long run than a minor performance gain in a few parts of the code), here’s the real winner going forward:

	@Benchmark
	public String concatenationUsingStringJoin() {
		// The simplest, most concise Java-8 approach
		return String.join("" + SEPARATOR, sourceList);
	}

Addendum: Before someone adds a comment about it, let me say that for readability with String join, the code would be prettier if SEPARATOR were a String constant instead of a char constant.  Then you’d just have:

return String.join(SEPARATOR, sourceList);

I didn’t fix it in my benchmark because it’s a reminder that String.join() does not accept a character constant. Fixing it would have no effect on performance, which I can prove with  the output from javap -c on my benchmark class. It shows that the compiler has converted “” + SEPARATOR to the String constant “_”, which is loaded with the ldc bytecode).

Legacy Lexicon? Naming different types of tests

At a recent software craftsmanship meetup I attended, there was an hour-long group discussion of the definitions of the terms “unit test”, “integration test” and “acceptance test”. How can this be, 2 decades after Kent Beck started developing and using the automated test software that has now become xUnit, that the very people who are most interested in pursuing best practices in object-oriented software development have doubts or confusion about the meaning of these fundamental terms?

The confusion is, in fact, widespread. Even J.B. Rainsberger, one of the world’s finest experts on test-driven design, ran into problems with this lexical quicksand.

Unit Test

The term “unit test” comes from a paradigm of scope, but there is some disagreement of what that scope is. Some experts, such as Roy Osherove  and Michael Feathers, have tried to impose some precision on that notion of scope. Stack Overflow users seem to agree with them. There remains some ambiguity, however, as Martin Fowler has recognized and explained – particularly as a result of the differences between “mockist” and “classicist” testing approaches. Personally, I prefer Fowler’s more inclusive definition of unit tests (tests focused on a unit which assume that collaborators work – whether you mock them out or integrate them).

Integration Test vs. Integrated Test

Then there’s the term “integration test”. There is a paradigm of scope in this name, as well – the word “integration” implies combining multiple “unit” scopes (or at least combining 1 unit scope with a tool or service external to the code, such as a database). But is every test that involves more than 1 unit an integration test? There’s another paradigm involved, which is that there’s the notion of testing the integration of units rather than the logic of the units themselves. This is where Rainsberger mis-stepped. He now rightly makes a distinction between integration tests – tests which validate the integration of different components as well as databases, the file system, etc. – and integrated tests – tests which attempt to do the work of validating units without fully isolating those units (tests which he finds to be mostly counterproductive over time because they are often too heavy to use for thorough testing of the logic of individual units, and they give a false sense of security).

This is a more finessed approach than Osherove’s, who seems to say that any test which is not a “good” unit test by his definition is an integration test.

Acceptance Test

The third notion of scope is that of acceptance tests. The scope is the whole product (end-to-end), limiting the functionality tested to 1 user story.

Boundaries

So we have notions of 3 different categories of tests: unit, integration and acceptance. These 3 categories are not exhaustive. If we adhere to strict definitions and use Rainsberger’s term of “integrated tests” as a fourth category there are no longer ambiguities between unit and integration tests because those 2 categories of test serve different purposes (it’s no longer a question of scope). There remains an ambiguity between the notion of unit testing and integrated testing – over the notion of the boundaries of a unit (for example, testing a root aggregate may involve several non-mocked classes, but I would argue that it’s a unit test because the root aggregate and its aggregate elements form a single unit). Since Rainsberger has labeled integrated tests as often harmful (“a scam” in his words), defining this boundary can give a notion of the quality of testing.

As an aside, I find that sometimes a test can start as a unit test and then become an integrated test because of a useful refactoring. Enforcing SRP can have this effect. You have a classicist-style test for a method that eventually does too much. So you delegate part of the method’s implementation to another class.  Do you also need to rewrite the test to mock out the new delegate, even though the test passes as-is? Probably the answer is to keep the original test unchanged and add additional focused unit tests to the delegate. I suppose Osherove would say in this case that either the original test’s definition of a unit was too broad (so for him, it’s an integration test from the start), or else the refactoring merely made the unit span multiple classes (so it remains a unit test).

More Useful Terms?

Are these terms the most useful ones? Maybe not, especially given the confusion they engender.

Rainsberger prefers to promote the idea of collaboration tests and contract tests. Collaboration tests are mockist-style tests where a test verifies the interactions between the method under test and its collaborators (which are mocked or otherwise replaced by test doubles). Contract tests verify that a method, given certain inputs, produces a certain output (or possibly an internal state change that can be verified). Generally, the contract test applies to an interface or abstract class, (that which was mocked out in the collaboration tests) and can be applied to any concrete subtype of the abstract type under test. The 2 categories are complementary: You test collaborations to verify high-level services and then you use contract tests to verify in detail the functionality that was mocked out in the collaboration tests. I believe that some contract tests can (and often should) be integrated tests – for example focused tests that touch a database to verify low-level data model logic (which goes beyond just checking that the database is integrated). This is a pattern which I came to use spontaneously in my exploration of the hexagonal architecture.

Rainsberger also prefers the term microtests over unit tests. The term does seem to give a clearer notion of scope than “unit”.

In Fowler’s article, we see 2 other interesting categorizations.

There’s Jay Fields’s distinction between “solitary” and “sociable” tests. In solitary tests, there are no collaborators which are not replaced by test doubles. Sociable tests generally involve at least one real collaborator.

Fowler also distinguishes between 2 test suites: a “compile” suite (run on every build) and a “commit” suite (run before a commit, or in continuous integration). This is a useful idea for TDD, and the categorization is less ambiguous: if it runs fast, it’s in “compile”, otherwise it’s in “commit”. The only ambiguity is in the speed threshold separating the 2 categories, and that’s more a question of practicality and personal or team preference – there isn’t a “right answer”.

Catch-all terms

There’s are also catch-all terms if you’re not sure which type of test you have:

  •  developer tests – this term is obviously about who is responsible for creating and maintaining the test rather than the purpose or scope of the test.
  • automated tests – this term is obviously about how the tests are performed, so it would include developer tests and, for example, selenium tests created by Q.A.

A picture’s worth 1000 words

Using Martin Fowler’s more inclusive view of “unit tests”:

TestLexiconFowler

The colored area seems to be a source of some confusion – if you have a collaborating or delegate class that is not mocked out it can still be considered a unit test. Feathers does not address this area in his definition. Osherove does: “A unit of work can span a single method, a whole class or multiple classes…”  So a unit test can be an integrated test. Whether it should is a different question based on notions of good design and the actual execution speed of the test.

London Mockists to the Rescue?

In “Growing Object-Oriented Software, Guided by Tests”, Steve Freeman and Nat Pryce manage, using simple questions, to define the major test scopes without getting hung up on details of things like unit boundaries:

  • Acceptance: Does the whole system work?
  • Integration: Does our code work against code we can’t change?
  • Unit: Do our objects do the right thing, are they convenient to work with?

That seems like enough of a definition to do some useful tests.

Conclusion

In conclusion, don’t lose sleep over these definitions or let them stop you from doing tests.  There exist terms which are clear and precise enough to describe the good practices you need, and we can live with some gray areas where the most commonly used terms are concerned. Just be careful when you’re discussing tests to make sure that everyone in the discussion has the same understanding of the terms.

 

UPDATE:

I just discovered another interesting approach to this topic, by Simon Brown.