Tag: Java

See which lines of code are really tested with PIT

PIT (a.k.a. pitest) is a mutation testing tool which is fairly easy to set up (in maven or gradle) for most java applications.

The basic idea with mutation testing is that the tool will iterate over each line of code, changing that 1 line, and then run all the tests against the mutated code base. If the changed line provokes an error in at least one test, it’s considered covered. If no test fails, then the line is not covered.

It’s possible to configure PIT to ignore your slow integration tests or end-to-end tests, which is a good idea, because PIT will run the same tests over and over again. With isolated unit tests, it’s pretty fast and a great coverage tool.

I tried it out on my interface-it project. Here’s a link to my pom.xml file which uses the pitest maven plugin. I struggled a little with the configuration of target classes, and their getting started document was slightly off, but it did not take long to set up.

Here are a few screen captures of the resulting report, which includes drill-down to see each line of code and an explanation of the mutation for the line and the result of the mutation :

Main page

pitestmethod

pitestexplanation

 

Advertisements

Key Software Factory Tools in the Java World

Here are some popular free and/or freemium tools for creating a software factory for a Java project :

I’ve written about these build tools before.

Here’s a chart showing the relative popularity of these 3 tools.

There are many more unit test facilitators available for free, and I’ve developed some lesser-known open-source unit test facilitators myself.

  • Integration test facilitators
    • Arquillian – allows you to run a test on a Java EE server (either a remote server or one configured and launched by your test code)
    • OpenEJB – a lightweight EJB framework implementation that can be launched in a test
    • DBUnit – set up and clean up after tests that use real database connections
    • Docker – actually, Docker is a facilitator for just about any sort of automation you want to do, but I put it in the integration test category because it allows you to spin up a database or other server from a test.

Note: do not mix the Arquillian and OpenEJB together in the same test suite – jar conflicts can ensue when tests are run manually in Eclipse

There are surely other tools I’ve left out.  Feel free to mention your favorites in the comments.

Are you sure you’ve sanitized your inputs?

This boggles the mind. Using an alphabet of just 6 non-alphanumeric characters, anyone can write any javascript code. The problem of how to allow some friendly javascript code while blocking anything unfriendly might be a subject worthy of computer science research.

In the meantime, eBay (and others) really should do something to reduce this vulnerability. I have a quick-and-dirty solution in Java based on detecting significantly long runs of the 6 characters in question. The weakness of the attack in question is of course that you need a lot of characters to do anything evil in the obfuscated javascript, so there should be long runs containing only  the 6 characters. It’s possible to include spaces and line breaks and even comments to break up the runs – I took this into account in my solution.  I chose 10 as the run length threshold for detecting the obfuscation, because I don’t know of something legitimate you can do in javascript using 10 of these characters in a row that you couldn’t do another way using some alphabetic characters, and if I saw code with 10 of those characters in a row, I would suspect it right away.

Here’s some of the code in my solution. First, the implementation of containsSneakyJavascript:

public static boolean containsSneakyJavascriptCode(final String userInput) {
	SneakyJSDetectionContext ctx = new SneakyJSDetectionContext(userInput);
	while (ctx.notDone()) {
		ctx.processCurrentChar();
		ctx.nextChar();
	}
	return ctx.detectedSneakyJS();
}

That’s code at a pretty high level of abstraction, so here’s more detail with the implementation of the processCurrentChar() call that you see in the code above. It ignores whitespace and characters inside comments and otherwise checks whether the current character adds to or ends the current run of suspect characters and whether it starts a comment:

void processCurrentChar() {
	if (insideAComment()) {
		checkForEndOfComment();
	} else if (isNotWhiteSpace()) {
		if (isInSneakyAlphabet(currentChar())) {
			incrementCurrentRunLength();
		} else {
			if (isStartOfComment()) {
				setCommentStart();
			} else {
				resetCurrentRunToEmpty();
			}
		}
	}
}

The full implementation code is here, and for good measure here are the unit tests for it.

You’re welcome, eBay.

The What and Why of Mixins in Java 8

A mixin is a type which can be mixed in to (i.e. included in) other types via multiple inheritance. This allows the encapsulation of certain behaviors which cut across the main type hierarchy. In C++ this is fairly standard, though it’s sometimes discouraged because of the potential for the “diamond problem”. In Java 8, mixins can be created using interfaces with default methods.

Why use mixins in Java?

Mixins give you design options which you did not have before. A class can inherit behavior that isn’t in its superclass. For example, you can have Goose implements Honker and Car implements Honker – with no need to implement honk() more than once. Some more practical uses include test fixtures (which I will address in more detail in my next article), and logging.

Default methods in mixins are also very useful for wrapping static method calls, especially in legacy code. This is what my interface-it tool automates. So you can replace static calls with calls to interface methods that can be mocked in tests, and/or overridden for special cases. It allows you to make a procedural design more object-oriented and pull hidden dependencies out into the open. It also lets you avoid using static imports, which can make the code less readable.

Finally, a benefit of mixins is polymorphism.  You can override or extend the default behavior in special cases, which is something you can’t do with static calls.

NIO2 is great… until…

In Java 7, Oracle gave us a shiny new API to do dingy old file-related tasks with less boilerplate code (and with shiny new functionalities like file change listeners, memory mapping, asynchronous I/O, etc.), and in Java 8 Oracle extended the API to use Streams.

I recently discovered that it’s great… until you need to delete a directory on Windows. The odd thing is that I really wasn’t using much NIO code.

I have integration tests where the code under test creates a temporary directory (not using NIO). The code under test test writes a file into that directory (not using NIO). The test used NIO (Files.lines()) to read the generated file and a static file for comparison. The post-test clean-up deletes the contents of the temporary directory and then the temporary directory itself (not using NIO).

Without using NIO to do the deletion, I had only a boolean (the result of File.delete()) to tell me that the directory was not actually deleted. Switching to NIO (Files.delete(), which throws an exception when it fails), I could see that the deletion failed because the directory was supposedly not empty. This might be caused by the never-to-be fixed bug 4715154 in the JDK.

I tried looping with delays and forced garbage collection (as suggested in this StackOverflow comment), but to no avail. When I got rid of NIO in the test code that reads the files for comparison, then the deletion worked. All’s well that ends well, I suppose, and the hour spent debugging that issue was educational.

I thought it was worth writing up a warning about this, though, because when you’re doing TDD you don’t expect to spend an hour debugging code in your @AfterClass method which does nothing but delete a directory and its contents.

 

 

Fear of Streams

There’s an article in the new issue of the French magazine Programmez which got my attention. It chronicles the efforts of some programmers for a B2C site to investigate converting their old for loops into more readable and maintainable Java-8 streams. The article is kind of a mess, actually, but to a software craftsman or craftswoman who works on Java legacy code, the message is a bit frightening.  The programmers did performance tests of the methods they changed, but because some of the more complex methods they modified took twice as long to execute as they did before, their employer was not convinced and does not want them to proceed with a refactor.

If you thought that it was difficult to sell your product owner on the idea of adding days to the schedule to clean up legacy code, imagine having to reveal that the clean-up also involves the risk of performance issues!

Not all of their changes slowed down execution – some even speeded it up a bit.  What troubled me about the article, though, is that I could not predict, looking at the examples they gave of their modifications, which ones would run faster and which ones would run slower than before.

I turned to StackOverflow to see if somebody knows how to do that in general. Is there any way, in a code review, for example, to see if a stream will run significantly slower (a constant-order slowdown, but enough to annoy a user or kill a refactoring initiative) based on just looking at the code?  Of course, it’s possible to code stupidly in a way that will obviously cause a slowdown.  But, apart from some useful guidance about when to use parallel streams and when not to, the message I got from some smart, informed developers is that you can’t, or else it’s not worth the trouble for what amounts to at worst a constant-order slowdown. Of course you should review code changes to make sure they don’t affect the correctness or the complexity of the algorithm, and maybe you can tell whether going parallel would help or hurt performance, but that’s all you can usefully detect.

I think the biggest mistake that the developers in the Programmez article made was to evaluate the performance based on isolated unit tests (microtests).  Performance of a method is not interesting to the end-user.  A  constant-order decrease in performance of a method will frighten a product owner, but maybe the end-user won’t even notice it.  For example, it may be that doubling the execution time of a loop results in a 0.001% decrease in performance of a web request because it’s a loop to process results of a heavy database query.

If the code you work on is for a financial trading application where every nanosecond counts, then you’re not going to be using Java streams at all – you likely have customized collections that run super-fast with no syntactic frills.  For the rest of us, we need to do automated end-to-end performance tests for the scenarios where performance might be an issue.

If you convert for loops to streams, don’t convert everything on the same day. Space out these kinds of changes enough that end-to-end tests will catch only one performance issue at a time.  Then you can reassure the product owner that there will be no “noticeable” decrease in performance from the refactor, but that the code will be more readable and understandable, and thus easier to change and less bug-prone.  For certain iterations which can be parallelized efficiently, there may even be noticeable performance gains.

The String Thing: Java String Concatenation Performance

I’m researching an upcoming performance-related article, and it occurred to me that I had always just accepted it as given that concatenating Strings using the plus operator in a loop is bad.  First I used StringBuffer instead (back in the day), and then StringBuilder because StringBuffer is synchronized. Occasionally, I started using Guava’s Joiner (which has nice, clean syntax, but involves a 3rd-party dependency). Now Java 8 offers a few new approaches to String concatenation, the most eloquent of which is String.join().

So, rather than rest on received wisdom, I thought I’d put these different approaches to the test using JMH benchmarks.  The source code for my benchmarks is here. It concatenates a list of 20000 Strings into one underscore-delimited String in 7 different ways.

Here are the results (on an old windows 7 laptop with 4MB of overly-solicited RAM and 2 cores, so your mileage may vary) – lower scores are faster:

Benchmark                                                 Mode  Cnt       Score       Error  Units
concatenationUsingGuavaJoiner                             avgt   20      70.193 ±     2.456  ns/op
concatenationUsingPlus                                    avgt   20  231599.954 ± 20386.920  ns/op
concatenationUsingStringBuffer                            avgt   20      58.894 ±     3.943  ns/op
concatenationUsingStringBuilder                           avgt   20      52.580 ±     1.514  ns/op
concatenationUsingStringJoin                              avgt   20      69.401 ±     5.827  ns/op
concatenationUsingStringJoinerInStreamWithForEachOrdered  avgt   20      64.491 ±     3.789  ns/op
concatenationUsingStringJoinerOnForLoop                   avgt   20      69.301 ±     6.815  ns/op

Briefly, this shows that concatenation using ‘+’ is truly awful (more than 4300 times slower than the StringBuilder approach!). All other approaches are roughly equivalent in performance, though StringBuilder runs about 30% faster than the syntactically superior approaches. The biggest surprise for me was how little difference there is between StringBuffer and StringBuilder (StringBuilder is about 12% faster). I was also a bit surprised that the compiler/JVM doesn’t do a better job of optimizing the ‘+’-based concatenation, given that this is a known issue since the 1990’s (it turns out the received wisdom of old still applies).

So, if you must have speed at all costs, the winner is:

	@Benchmark
	public String concatenationUsingStringBuilder() {
		// The approved pre-Java-8 approach
		StringBuilder result = new StringBuilder();
		for (String s : sourceList) {
			if (result.length() > 0) {
				result.append(SEPARATOR);
			}
			result.append(s);
		}
		return result.toString();
	}

Otherwise (if having readable, maintainable code is worth more in the long run than a minor performance gain in a few parts of the code), here’s the real winner going forward:

	@Benchmark
	public String concatenationUsingStringJoin() {
		// The simplest, most concise Java-8 approach
		return String.join("" + SEPARATOR, sourceList);
	}

Addendum: Before someone adds a comment about it, let me say that for readability with String join, the code would be prettier if SEPARATOR were a String constant instead of a char constant.  Then you’d just have:

return String.join(SEPARATOR, sourceList);

I didn’t fix it in my benchmark because it’s a reminder that String.join() does not accept a character constant. Fixing it would have no effect on performance, which I can prove with  the output from javap -c on my benchmark class. It shows that the compiler has converted “” + SEPARATOR to the String constant “_”, which is loaded with the ldc bytecode).