Fear of Streams

There’s an article in the new issue of the French magazine Programmez which got my attention. It chronicles the efforts of some programmers for a B2C site to investigate converting their old for loops into more readable and maintainable Java-8 streams. The article is kind of a mess, actually, but to a software craftsman or craftswoman who works on Java legacy code, the message is a bit frightening.  The programmers did performance tests of the methods they changed, but because some of the more complex methods they modified took twice as long to execute as they did before, their employer was not convinced and does not want them to proceed with a refactor.

If you thought that it was difficult to sell your product owner on the idea of adding days to the schedule to clean up legacy code, imagine having to reveal that the clean-up also involves the risk of performance issues!

Not all of their changes slowed down execution – some even speeded it up a bit.  What troubled me about the article, though, is that I could not predict, looking at the examples they gave of their modifications, which ones would run faster and which ones would run slower than before.

I turned to StackOverflow to see if somebody knows how to do that in general. Is there any way, in a code review, for example, to see if a stream will run significantly slower (a constant-order slowdown, but enough to annoy a user or kill a refactoring initiative) based on just looking at the code?  Of course, it’s possible to code stupidly in a way that will obviously cause a slowdown.  But, apart from some useful guidance about when to use parallel streams and when not to, the message I got from some smart, informed developers is that you can’t, or else it’s not worth the trouble for what amounts to at worst a constant-order slowdown. Of course you should review code changes to make sure they don’t affect the correctness or the complexity of the algorithm, and maybe you can tell whether going parallel would help or hurt performance, but that’s all you can usefully detect.

I think the biggest mistake that the developers in the Programmez article made was to evaluate the performance based on isolated unit tests (microtests).  Performance of a method is not interesting to the end-user.  A  constant-order decrease in performance of a method will frighten a product owner, but maybe the end-user won’t even notice it.  For example, it may be that doubling the execution time of a loop results in a 0.001% decrease in performance of a web request because it’s a loop to process results of a heavy database query.

If the code you work on is for a financial trading application where every nanosecond counts, then you’re not going to be using Java streams at all – you likely have customized collections that run super-fast with no syntactic frills.  For the rest of us, we need to do automated end-to-end performance tests for the scenarios where performance might be an issue.

If you convert for loops to streams, don’t convert everything on the same day. Space out these kinds of changes enough that end-to-end tests will catch only one performance issue at a time.  Then you can reassure the product owner that there will be no “noticeable” decrease in performance from the refactor, but that the code will be more readable and understandable, and thus easier to change and less bug-prone.  For certain iterations which can be parallelized efficiently, there may even be noticeable performance gains.

16 thoughts on “Fear of Streams

  1. The popular critique of this article on Reddit is “articles like this need code and at least some useless micro benchmarks.” This is a legitimate point, but it’s so general and so casually dismissive that I find it not especially constructive.

    I had thought about whether to put in some code and/or benchmarks, but I did not think it would be worth the extra reading time. It’s unfortunate that the Programmez article is behind a paywall, but I think the examples in the article are messy, and on closer examination I think the developers actually introduced serious bugs in their refactor. Of the 3 examples in the Programmez article (which all refer to proprietary classes that are not presented), there’s one which probably won’t even compile, one which looks like a clean and useful refactor (and which apparently improved performance) but which contains a bit of business logic that is unexplained and seems odd to me (you can find it in a file on Pastebin ), and one which seems to mangle the error-handling code in the original nested “for” loops. This code would be a distraction from the message of my article.

    A legitimate use of code in this article would be to illustrate to skeptics why it’s actually worthwhile to refactor an old-school loop into a Java-8 stream. This would be a great subject for another article, but my focus here was more on the problem of what to do when you know you have code where conversion of a loop to a stream would obviously enhance readability but there is the fear of a constant-order drop in performance. No one, in response to my StackOverflow question, gave me any code that would be useful for understanding whether such a refactor should proceed or not based on performance. As for micro benchmarks, the StackOverflow answers and even the critic on Reddit seem to confirm that they’re not the key point. If you want to see an example micro-benchmark for Java 8 streams vs. loops that gives a general idea, try this StackOverflow answer.

    Like

  2. @scepticalmeerkat: Good question (and in keeping with your pseudonym 🙂 ). The developers in question were surely trying to see what Java 8 could do to improve their code in general. They wanted to improve the readability and maintainability of the code, but the original code was not shockingly ugly or buggy.

    Like

    1. So the employer made the correct decision? 🙂
      I suspect that as these guys probably were not experienced with this new Stream API and only trying to learn, it is not even guaranteed that the code, after this unnecessary refactoring, won’t be less readable or maintainable then the original.
      In my book, if someone asks for “time to do refactoring”, the correct answer is “No”. It is OK to increase your estimates for a task because the part of the code you will need to change will require some refactoring (for a good reason, not to replace “for” with streams), but it is never good to ask to allocate special time for global refactoring “to improve readability” (such request smells bad).
      You need to follow “Boy Scout rule” instead of doing large refactoring.

      And yes, in your average boring application the hit on performance from streams will be negligible. So the problem is solvable by having regular performance testing and refactoring only the classes that are been changed.

      Like

      1. The “Boy Scout rule” might be good enough to deal with readability issues like the moving from loops to streams. Where I think some dedicated refactoring may be necessary is for issues of testability.

        Like

      2. You don’t want to refactor until you have tests. So build your tests first, even if it seems difficult. And once you have built tests, you won’t need any more refactoring for “testability”.

        I don’t believe in the design driven by testing. Tests should fit around the best possible production code design, not the other way around.

        Like

      3. Any code can be tested. You just need to put some effort into it. Don’t refactor before you have written your tests and don’t refactor because you decided to write tests (refactoring without tests so that writing tests is easier is like putting a cart before a horse).
        And no, you don’t want to mock everything to have tests. You only really need to mock what is outside of your application (database, network, etc). The less mocks the tests have, the less brittle they are (and it is very important for the tests to not be brittle).

        Like

      4. That approach may work for you, but I respectfully disagree with certain points you make.

        For example, an object which can’t be constructed without side effects like calling the database or debiting your credit card can’t be tested without a refactor.

        Even if the code is testable with effort, if the test requires 10 times the effort of the implementation, developers under time pressure will skip the test.

        Code which follows SOLID makes testing simpler, which has a number of positive effects :

        • There’s no excuse to not test and little time wasted setting up tests
        • Tests are more readable because there’s less setup, and readable tests can be used as documentation – an undisputable specification of behavior.
        • Tests with mocks are not brittle, because even if the overall product specification is unstable, the mocked components have clear, limited responsibilities which are unlikely to change.

        Like

      5. Why do we have tests in our application?
        We have them because we want to be able to modify our code (including refactoring) in relative safety – we want to be able to change our classes and still be sure that the overall functionality works.
        If when you write code you have tendency to mock all dependancies, every change in your classes interfaces (something you do when you refactor) will cause your tests to fail.
        And worst – your tests might even continue to pass while the application would fail.
        Imaging you have a class A that has a method, returning an amount in dollars. Classes B, C and D use class A. In your tests class A is always mocked. On day you decide to change class A so that the amount is returned in cents. You did the change and you change classes B and C, but failed to change class D. Because you mocked class A in all tests, your class D tests are still passing, although class D does not work correctly when instead of mock it uses a real class A.

        The more you mock, the more brittle your tests become. This is why you need to mock as little as possible.
        And yes, database in your examples is number one candidate for mocking – because it is an external dependency. You still need to write tests using the real database, but you will have a limited number of them due to need to compromise for test performance reason.

        Tests are not documentation and they should not be used like one. I personally never read tests to understand how production code works and I suspect very few people do.

        Like

      6. Changing the order of magnitude of the value returned by a method for a given state and/or inputs is not a refactor but a fundamental change in the meaning of the method. A smart refactor for the kind of situation in your example would be to replace the primitive return value in question by a value object (for example, change a float to a Money object).

        I’m for mocking anything which could be called a service or a façade. Its interface is its contract. Changing the interface’s arguments, name and/or return type are safe but slightly painful changes if the interface is widely used and widely mocked. Changes in the meaning of a method are more painful and should be done rarely and carefully whatever the types of tests that are in place.

        Like

      7. You obviously can see how much mocking limits what you can and what you can’t refactor 🙂

        Internal interfaces (between your classes) is what usually changes when you do refactoring, when you add functionality etc. If you mock them, your tests become useless.

        One should prefer using real objects to mocks unless it is not practical (like long-running processes, external dependences etc.)

        Like

      8. There is clearly a trade-off. Mocking helps isolate the logic you’re testing. It makes tests focused. One error in logic will normally break one test and you can pinpoint the problem instantly without using a debugger. Mocks can also make tests simpler to write and read because there is generally less set-up.

        If the same logic is re-tested in a lot of different places because you didn’t mock it out, 1 bad change to that logic might break 50 different tests and you’ll have to fiddle around to find the problem.

        I don’t think it’s useless to have tests which assume that the creator of a component used in the code under test does its job correctly. If that component is also tested in isolation, the assumption is validated.

        The trade-off when using mocks is that it can take a little bit more thought and effort to change existing method signatures of component interfaces. Of course the more a component is used in different parts of the production code the more thought and effort are required to evolve its interface regardless of how it’s tested, and adding a new argument to a method, for example, can be just as painful in tests which do not mock out the call.

        Like

      9. What one can achieve by “isolating the logic”?
        If you run tests regularly (as one should), it does not matter if one test failed or many – you know exactly which change caused it (it is the one you just made), so there is not “fiddling around”.

        It is not a “trade-off” when you have to put “a little bit more thought”. The point of having the tests is to ensure that if you did not put enough thought, your tests will fail. If you testing are not failing when you making the change, because you did not put enough thought, you tests are not fit for purpose.
        I have seen enough tests that fail but get fixed immediately after a mock is replaced with a real object.

        Like

      10. Let’s take a fairly simple example. We have 4 classes: A, B, C and D.
        A is the class under test.
        B and C are dependencies of A.
        D is the data layer, and is a dependency of B and C.
        If you want to test A.m1(), which calls B.m2() and C.m3(), both of which make calls to various methods of D, rather than mocking out B.m2() and C.m3(), you’re going to mock out all the interactions with D in B.m2() and C.m3().
        This approach seems more brittle than mocking out B.m2() and C.m3(), because your test is dependent not on the signatures of B.m2() and C.m3(), which should be relatively stable, but on their internal implementation. It’s also a pain to write the test, I would think, because you have to know the internal dependencies of your dependencies.

        Like

Leave a comment