“Premature optimization is the root of all evil” is the root of evil

Let’s face it, the quote from the title has been used to advocate bad programming far too often. Sometimes it’s complemented by the numbers from the full quote like they are the permanent truth or for the very least a well-measured observation:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Sometimes it’s just backed by the authority of Donald Knuth, the original author of the quote. However Knuth himself attributes the “evil quote” to Hoare and Hoare attributes it back to the Knuth, so it’s more of a nobody’s child.

The thing is, it was first printed 45 years ago in 1974 in an article devoted to justifying the use of “goto” statement in structured programming. In terms of black and white, it was pro optimization and not contra. And if you read TAOCP, you know that Donald Knuth himself puts algorithms and the machine far before paradigm is it structural, or functional, or any other type of programming. He publishes code examples in MMIX (formerly MIX) machine code instead of high-level languages because this is what really matters. Algorithm and its machine implementation were the core of computer programming 45 years ago, and there were never a good reason for this to change.

However, nowadays it is common knowledge that you should build things to work and only then optimize them to work fast. For me it sounds like: “let’s build a house of straw and beeswax and call the fire brigade when it catches fire”. It’s absurd for every other engineering discipline apart from software engineering. We are here to anticipate and prevent the problems, and not to spawn them ignorantly because of 45 years of self-deception.

In 1974 optimization indeed meant sacrificing code clarity for mere percents of performance improvement. But sometimes you manage to sacrifice less and gain more, which is good. In the very same article from which the “evil quote” is taken, Knuth also published actual results for the case of such optimizations:

The improvement in speed from Example 2 to Example 2a is only about 12%, and many people would pronounce that insignificant. The conventional wisdom shared by many of today’s software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise- and-pound-foolish programmers, who can’t debug or maintain their “optimized” programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn’t bother making such optimizations on a one-shot job, but when it’s a question of preparing quality programs, I don’t want to restrict myself to tools that deny me such efficiencies.

And this was in 1974. Today, exactly because of the decades of performance negligence, there are so many low hanging fruits, you don’t really have to go deep into scrambling your code, and still gain the most significant performance improvement. Here are some anecdotes from my practice for evidence.

Once we got 10% speedup in one routine by changing unsigned short to size_t in a loop counter. It happened to be the inner cycle of matrix transformation and apparently, the compiler we used at the time didn’t unroll it solely because of the counter type. That’s rare, but compilers still fail to optimize things every now and then, that’s why it is a good idea to check their output with disassembly tool at least for the very intense parts of the system. It’s not even that hard, in fact, everyone can read disassembly.

The other time we got 50% speedup by simply caching an operation that was considered cheap and hence uncacheable. Perhaps it was true some 10 or 20 years ago, but nowadays the difference in cache and RAM reading latency is too drastic to ignore. Machines change, we can only account for these changes by reassuring the facts we know with testing for things that were considered improbable in the past.

We also got 100% of speedup by eliminating an extra call of a rather heavy procedure. It should have been called once, but because of the architectural quirk, it was called twice in the same frame. We spotted this accidentally while profiling a completely different routine. The moral is, you don’t necessarily know what’s going under the hood unless you peek there periodically. Profiler helps not only in looking for bottlenecks when the whole thing’s on fire but as an investigation tool in its own right.

And that one time we got 400% speedup by rearranging the image processing pipeline. Images were stored on disk in one of the several integer formats, and when they were loaded, they were being converted to double precision floating points and then back to the integers, only to do some minor transformations on them. The whole thing was losing most of the performance by simply creating and filling intermediate structures. Alas, as processing pipeline was rather heavy and versatile, it wouldn’t be very wise to apply it to every image value separately because of the cost of dispatch. We thought the way to statically create all the possible combinations and then dispatch it once per image hence enabling cheap per-value processing without any intermediate structures. And the resulting code happened to be just about the size of the original. Well, sometimes it pays to know your way around meta-programming.

But the largest performance boost I ever acquired came from very simple thing. I used the default .NET facility for matrix multiplication Matrix.TransformPoints method and it was very slow, so I decided to re-implement it on site. After all, it’s just a vector-matrix multiplication, a simple operation. This re-implementation gave me easily 20000% of improvement! When I peeked under the hood to see how the original operation works, I saw this:

- System.Drawing.Drawing2D.Matrix.TransformPoints:
- System.Drawing.SafeNativeMethods.Gdip.ConvertPointToMemory,
- System.Drawing.SafeNativeMethods.Gdip.ConvertGPPOINTFArrayF:
- System.Drawing.UnsafeNativeMethods.PtrToStructure:
- System.Drawing.Internal.GPPOINTF..ctor (which is empty),
- System.RuntimeType.CreateInstanceSlow:
- System.Runtime.InteropServices.Marshal.PtrToStructure.

This, ladies and gentlemen, is exactly how the fruit of performance unawareness looks like. The perfect specimen! It has conversions and marshaling, constructors and creators, and only sometimes deep on the bottom it has the matrix multiplication I was looking for, but it doesn’t even show on the profiler.

My point is, it’s all bad. It is the fear of evil that took us to the point the very basic desktop software is sluggish as hell. At work, I use Visual Studio and Word, and Excel, and Outlook, and it’s all tremendously bad. Sometimes, when Visual Studio goes “busy”, I have to load my trusty Vim just to continue to work. And I’m not even a Vim fan, it’s only as good for me as it works while everything else hangs.

And on the Internet, it’s even worse. The guy wrote a message board engine in assembly and everybody is amazed how fast it is. Well, it’s not. There’s no magic in assembly per se, it’s just all the other implementations are horribly slow. Assembler only brings performance awareness up front.

So, it is bad. It’s all bad. But it’s also good. For me. Because I make money and reputation by making things better. I fear no evil walking through the darkest valley, and it pays well in the end. So for all those who cover own negligence with Donald Knuth’s “evil quote” I have one thing to say.

Thank you!