The Economics of Testing Ugly Code

wynand · on July 30, 2008

The author mentions that changing tested code decreases its value. The steepness of this curve varies greatly depending on the language. The strong guarantees of a language like Haskell mitigates the risk and so ensures a less dramatic drop.

But what I find more alarming, is that the author completely misses the human side from the perspective of programmers: a good programmer that is forbidden from changing ugly code will become frustrated and will either become less productive or leave.

If there are developers that don't care about ugliness and are very productive, then I am in big trouble (since they would be better hires from a business perspective than me). However, there is quite a strong connection between the caliber of a developer and his or her aesthetic sense in code.

If one looks at projects like Squeak, it's clear just how much good developers can achieve with a relatively small amount of well crafted code. A lot of good programmers are unable to reach this kind of potential because of the coding tar pits they have to negotiate every day at work.

briansmith · on July 30, 2008

"The strong guarantees of a language like Haskell mitigates the risk and so ensures a less dramatic drop."

Haskell's strong typing may help prevent some bugs, but Haskell's evaluation model means that even a slight change (literally one character) can change the performance characteristics of your program, in space and time, by orders of magnitude. Because of that, I've found that Haskell programs require more testing than Python programs, to ensure there are no edge cases that have space leaks.

I am a fan of Haskell for some classes of programs (like compilers and inference systems) where you can spend a lot of time perfecting a strongly-typed data model that is fairly unchanging--one where "Big Design Up Front" makes sense. But, I cannot say that Haskell's strong typing is always a net plus when the data model needs to be frequently updated. In Haskell you often have to write extra code, or code that is more abstract in order to get programs to typecheck. Then, you still have to write almost the same kinds of unit tests that you would have to write if you were using any other language. With Python or Ruby you can skip the type system workarounds, which often means you have less code to deploy. I am a big believer that less code == better code.

Dealing with ugly code is part of being a good developer. I believe you should leave ugly code around until you need to change it. When you need to change it, you can rewrite it to make it more beautiful, but only if that beautification facilitates the improvement you were originally tasked with making. But, it is usually a bad idea to go around tidying up code for the sake of tidiness.

Also, think about this: if the original code is really unclear, then what confidence do you have that you understand it well enough to rewrite it correctly; if the code is clear enough to be easily rewritten, then why does it need to be changed?

wynand · on July 31, 2008

My Haskell experience is too limited for me to have experienced performance changes resulting from small code changes. OCaml has a fairly predictable performance model and a language which combines aspects from OCaml and Haskell (this is the point where someone shouts "Scala!") will go a long way to reduce brittleness in the code and won't require much testing for unexpected performance issues.

I thought more about the original article yesterday and realized that - regardless of the language one uses - the author made a mistake by implying that the quality peak is equally high as one moves to the right of the graph. If by "beautiful code" we mean well designed code, then changes will have far less far-reaching effects. The quality peak will be lower (relative to the graph) and hence the descent induced by changes, less steep. And this happens because as you rightly state, less code == better code.

If a company needs to be agile (in the general sense of the word), their software should also be easy enough to adapt to changing circumstances. Given that good code allows for such changes, the extra effort to produce good code pays for itself.

On the other hand, for a piece of software that is at the end of its lifetime, there is little of sweeping changes. Beautification of end-of-life code is a bad use of time.

You are right that a programmer should be willing to endure some ugly code. Anything which encodes real world relationships (which are complex and messy) will have some ugliness in it. A programmer which rails against this makes life difficult for everyone around. But this can be taken too far and I have known people that are too conservative in this regard; this made the software unnecessarily hard to maintain and the programmers unhappy [1].

The answer to your last statement is that if you have to modify barely understandable code, then it is worth refactoring (and as a last resort, rewriting) it, even if you don't understand every aspect of it. If you don't you will likely spend more time trying to paste your code in somewhere in the hope that it will work and in so doing you the crufty monster will get even bigger and harder to maintain (and you don't gain an understanding of the code which is a very valuable asset to the owners of the code).

[1] No company has a legal obligation to keep employees happy. But such companies can expect mediocrity if they're lucky but probably much less.

Edit: I obviously don't know HN's markup. Fixed italization.

demallien · on July 31, 2008

Surround phrases to be shown in italics with asterixes...

silentbicycle · on July 30, 2008

The type systems in Haskell, OCaml, etc. fit the author's requirements for automated testing. Once you have enough experience to trust them when they check whether local changes are still globally correct, making bold changes to clean things up stops feeling reckless. (Automated testing suites are still helpful, but the type system writes/adapts many of the test cases for you.)

But what I find more alarming, is that the author completely misses the human side from the perspective of programmers: a good programmer that is forbidden from changing ugly code will become frustrated and will either become less productive or leave. YES. The inevitable pain and misery in maintaining a really huge, ill-factored codebase can scare good developers away. While solving hard technical problems is often intrinsically rewarding for hackers, solving problems largely caused by disorganization is just a sigh of relief at best.

Tamerlin · on July 30, 2008

"solving problems largely caused by disorganization is just a sigh of relief at best."

In most cases, it's a source of attrition and a waste of talent, because finding time to actually FIX it is such a huge task that no one has time for it, and good developers won't want to spend their time stagnating long enough to learn the ins and outs of the spaghetti code to be able to implement incremental fixes.

silentbicycle · on July 30, 2008

A core of ugly code perpetuates itself. Partially due to social causes, partially due to the sheer inertia.

Sad but true.

edw519 · on July 30, 2008

a good programmer that is forbidden from changing ugly code will become frustrated and will either become less productive or leave

Absolutely. Only so many bandaids can cover the Grand Canyon. Sometimes, you just gotta bite the bullet and make it right. It's extremely frustrating leaving a mess in your wake, regardless of source.

swombat · on July 30, 2008

Interesting. I'm not entirely sure how well the argument holds, but it does put forward some interesting points.

One of the weaker aspects of this article is that the author doesn't define "value". I'll take a stab at it: for a business, the value of its software is how much money it can make or save from it. There are two components to this value, an immediate, obvious one ("can we sell/use it tomorrow?") and a long term, less obvious one ("can we still sell/use it in two year's time?").

The graphs and explanations, I think, apply fairly well to the "immediate value", but less so to the long-term value. The long-term value of code you can keep evolving, cheaply, to meet user demands for years, is much greater than that of ugly code that will become a viscous nightmare within a few months. Immediate value is very important too, but generally is focused on at the expense of long-term value, particularly by non-developers, who aren't fully aware that code can reach a point where it's so nasty that the only solution is a bullet to the head and a rewrite.

So I think the problem is more one of making the long-term value of beautiful code clearer, rather than one of flattening spikes. The curve to the right of the spike rises much faster and much higher than in the graphs, and it's worth moving there if you have people who are capable of it.

pbh101 · on July 30, 2008

I think the graphs, bu fault of 'value' not being defined, are showing a hybrid of the long-term and immediate value of the code. Take, for example, a strict refactoring of the code, in the sense that only implementation is being changed and no features are being added. The immediate business value of that action is near zero, but the long term value is more. What the graphs point out is that, due to manual testing and production environments rigorously stress-testing a particular, perhaps brittle, snapshot of the code, the amount of refactoring and beautification of the codebase required to provide a baseline level of long-term value equivalent to the spike is much higher. Yes, the code will be cleaner, and easier to maintain and update for the future, and the resulting spikes from testing and production deployment will be that much higher, too.

Take a look at the very last graph. I would posit that once the code is beautified and a more flexible version deployed, non-source readers would see another spike at the end, but from the original spike, it's hard to justify the valley of little incremental value.

Tamerlin · on July 30, 2008

You're both missing something important. It's who decides on how to define "value" and in most cases, it's management, and it's almost invariably short-term value that wins out.

The reason?

Most organizations believe that software engineering is easy, and therefore software engineers are as interchangable as cogs. Most corporations don't understand or value experience.

swombat · on July 30, 2008

I'm not missing that. That's the gist of my comment...

13ren · on July 30, 2008

But testing also reduces ugliness: The act of making code unit testable makes it less ugly, because unit testing requires access to a portion of the code in isolation, thus making the code more modular. Modular code is less ugly (or so I claim).

That's an intriguing idea about DSLs enabling customers to see the ugliness. But why would customers look? (A parallel is non-lawyer customers looking at legal documents for beauty) Does anyone have experience with DSL's for non-developer customers? eg: SQL is a DSL for business analysts - but they develop in it, not just look at it.

This form of testing is called production. :)

gruseom · on July 31, 2008

Does anyone have experience with DSL's for non-developer customers?

I did that on one project. It didn't work. The response we got was "But I'm not a programmer." We replied, "But you don't understand! This is not a general-purpose programming language, it's a high-level domain-specific scripting language!" The response we got was, "But I'm not a programmer." I learned a lot from that experience.

learninglisp · on July 31, 2008

Yeah... but in my company, there are many cases where job positions have spontaneously materialized that are essentially programming positions, but that we have "con" someone into doing it even though it really is programming. The users are, for all intents and purposes, programmers... but we have to keep them from finding that out!

gruseom · on July 31, 2008

Surely you're not going to get very good programmers that way?

learninglisp · on July 31, 2008

But they're not technically programmers. I'm talking about applications that require extensive customizations to be useful. There's a lot of cases where people are having to trick people with the business knowledge into doing the configuration for them because the 'real' programmers can't or won't do the work. _This_ is where we should be looking at applying DSL's.

ars · on July 30, 2008

So in summary: don't write ugly code, because once you do it's permanent.

silentbicycle · on July 30, 2008

You missed a very important point: What makes it permanent (or at least provides much of the inertia) is the risk of reintroducing old bugs and not wanting to manually test all over again. Automated testing does not have this problem.

In practice, of course, this depends on how badly the testing framework is entangled with implementation details, rather than testing via a public interface. If making changes means having to rewrite 90% of your automated tests, you haven't gained much.

(The article has a couple other potential threads in it, as well, but that seems like the main idea to me.)

ibsulon · on July 30, 2008

Automated testing has less of this problem. It still has the problem that some testing must be done manually, and there is a large spike for code in production.

wynand · on July 30, 2008

"Automated testing does not have this problem."

Good point. I mentioned in another comment here that some languages (like Haskell with its strong, static type system) mitigate this problem. And for languages without static type systems of this nature, automated testing can produce the same effect (which is not to imply that tests for Haskell programs are useless).

silentbicycle · on July 30, 2008

Look like I simultaneously replied to your comment with the same comment you were posting on mine. :)

albertcardona · on July 30, 2008

Which means don't hire ugly-code-writing programmers, aka bad programmers.

silentbicycle · on July 30, 2008

Easier said than done.

albertcardona · on July 30, 2008

The only way to find out, in my experience, is to sit down and work with each candidate, for a couple of hours. Not easy indeed.