Quality is free

or, Let's heal the wound (again) between management and engineering

Feb 12, 2024

I apologize for not sending the newsletter last week. My family and I got sick, and I prioritized that. Plus, I didn’t want to send a low-quality email just to send it. We’re all better now. Enjoy!

The first time I heard the phrase “quality is free,” I laughed out loud. I had been putting so much effort into quality, it was hardly free. But then I started reading about what the phrase meant, and I had to admit it was true—at least in manufacturing. I think it’s true in software as well.

The phrase “quality is free” comes from Philip Crosby who wrote a book by the same title. It plays off the common sense tradeoff between quality and cost. I could buy a broom at the dollar store for $1. The bristles will fall out. Maybe it won’t even sweep up dust. But I can spend $10 and get a decent broom. When I’m cleaning the floor, I could do a quick job or put in more time and effort and get a higher-quality result. In many situations, quality has a cost. These form our intuitive notion of the tradeoff.

However, when we’re talking about complex tasks, the tradeoff doesn’t make sense anymore. If I’m stacking blocks as high as I can, it might take me 10 minutes to stack 20 blocks before they fall. To get back to 20, it will take me 10 minutes again. Or I could spend 12 minutes to get the stack of 20 very well alligned the first time so I don’t have to redo it. Because of an extra 2 minutes of investment, I saved 10 minutes of rework. Somewhere between 10 and 20 blocks, it became cost-effective to focus on quality.

The difference between sweeping and blocks is the dependence on the quality of prior work. You can’t stack high on unbalanced blocks. And if you sweep every day, you get a similar effect. Spending a little more time today might save you more time than that tomorrow when it’s time to sweep.

Software is more like stacking blocks than about sweeping once. The current quality of the software has a big influence on how fast I can code. And we know this in our industry, even if we don’t practice it so well. The tradeoff between quality and speed is an illusion.

The DORA research also bears this out. In Accelerate, the authors point out that the metric associated with speed, deployment frequency, was correlated with the metric associated with quality, change failure rate. According to reason, anecdote, and empirical data, quality is free.

The hard part is that that point where you get quality for free always seems beyond the next sprint. We tell ourselves, “It’s worth it this sprint to neglect quality so I can log those sweet story points.” In other words, we can stack the first 10 blocks in this sprint. We can deal with the rest in the next sprint. Slowing down takes a lot of discipline when all the incentives align with immediate speed.

This tradeoff can lead to a lot of tension between management and workers. Mediocre managers seem to only have one way to make things happen: to increase pressure. Deadlines, quotas, targets. Whatever you want to call that pressure, that’s what managers do.

Programmers know when their code sucks. It’s pain and friction. Programmers get caught between a manager pushing them to go faster and the code slowing them down. So they either push back on the pressure, saying they need to clean up to make it easier and faster, or they trudge forward as fast as they can, creating more pain and friction as they do. It certainly feels like a tradeoff at the time.

The tension between management and development heightens. There’s an organizational pattern where the tradeoff between speed and quality is personified. The product owner represents management’s needs. The technical lead represents the technical quality of the code. The management/technical conflict is supposed to result in a kind of creative tension that finds good solutions that balance the needs of both. But I think it’s bogus.

Firstly, I’ve never seen it happen. I’ve only heard legends of great product owner/technical lead pairs who could strike a good balance. Secondly—and more importantly—if the tradeoff is an illusion, why are we personifying it? Manager/worker tensions can spiral out of control. I’ve lived through such situations. They’re no good for anybody.

The solution is to align all of the incentives with long-term speed. Short-term speed is a trap. Your goal should be a slow, upward trendline of more done per sprint. But you don’t get that way by working faster in the next sprint. You get that way by focusing on removing the pain and friction.

The Toyota Production System is one of the best factories in the world. They didn’t start out that way, and they knew it. So the managers built into the factory a way to stop the whole line if there was a problem. If you saw a problem, you stopped everything, fixed the problem, and then started the factory up again. Stopping a lot seems like the opposite of improving speed, but over time, this practice has increased the overall speed of the factory. And they still pull the stop cord thousands of times per day.

Okay, so they don’t stop the line thousands of times per day. But they do pull the cord that many times. The andon cord, the cord above the workers’ heads that may stop the line, does two things: it signals the manager to help, and it starts a timer. The worker pulls the cord if there is an issue. If the manager cannot help and resolve the issue within the time it takes for the car to move one step in the assembly line, then it stops the line. They don’t want a car with an issue continuing to the next station in the assembly line. And they want the manager to be aware of the problem.

I’ve thought a lot about what an andon cord would look like in software development. I’ve heard of teams who stop everything if the build breaks. If you can’t deploy code, everybody stops what they’re doing and fixes CD. I’ve also suggested that if you can’t start a REPL because of some issue in your editor config, don’t try to code without it. It’s worth whatever time it takes to get the REPL going again.

However, I often wonder if we should stop at finer-grained problems. Fixing the build is a rather coarse signal. What about smaller things? What about if you find a messy function? Should you stop your task to refactor it? What about code without a test? Should you test it before doing anything else? What about an unclear variable name? I don’t know where the line should be, but after hearing that Toyota pulls the cord thousands of times daily, I think we should be stopping to fix much smaller problems than we do today. Stopping to fix problems frequently will cost short-term speed but pay back in long-term speed—and quality.

But there’s another part to the andon cord that’s often overlooked: The manager comes to help the worker. Not to apply pressure to make them go faster. Not to chide them for stopping the line. Not to remind them of the priorities and deadlines. To help. At the NUMMI factory, this Toyota way surprised GM workers. And it would surprise me if my manager slacked me to ask if I wanted help on the task I was working on. I think I’d cry.

The speed/quality tension between management and engineering is toxic. Your team might not get as bad as at the NUMMI plant, which had alcohol, sabotage, prostitution, drugs—you name it. They were the worst GM plant. But the same forces were at every GM plant—and they’re on your team, too. What’s worse is the tension is based on an illusory tradeoff. We know it’s untrue. Yet we still persist. How do we align speed and quality through incentives? How can management and workers structurally align to build better products, faster? How do we know when to stop a task and fix what we find?

If you’ve got a story about the quality/speed tension, share it in the comments. And if you’ve got stories about better ways to work, I’d really like to hear them.

Eric Normand's Newsletter

Discussion about this post