Some admin info before we begin:
I’m going on vacation for two weeks, so this will be the last email until July. Also, I’ll do office hours this week, but then the next one will be in July as well.
Reducing batch size pays for itself
There’s a book that stared at me from my shelf for four years before I found the courage to pick it up. It’s called Principles of Product Development Flow. It has 175 principles of “flow” from fields like internet congestion control and highway traffic management. And it’s written in a very dense style with no breaks. It felt like I was reading a single run-on sentence, desperately waiting till the full stop where I could get a complete sense of what I had just read, but, alas, never getting to it until the final page, where I couldn’t remember where the sentence even started. It was a big list of 175 principles.
I did eventually read it. And I think I remember nothing from it except two things:
The stuff in here is important enough to merit more than a reading.
I’m not going to remember any of it.
I exaggerate slightly. I gained a general appreciation for the importance of managing queues and how we can prioritize tasks by cost of delay (how the cost of something grows over time).
But at the same time, I wish I could build that thinking into my bones. My idea was to take each of the 175 principles and build mini simulators that could embedded in a web page. That way, you could intuitively see the cost of waiting to add capacity or the effects of lengthening queues. Please be my guest and steal the idea. No credit needed. Just let me know when you’re done so I can play with it.
But today, I just wanted to play off of the idea that Quality is Free and talk about another thing that costs time and money that is also free: reducing batch size.
So, I should clarify. It’s not actually free. But it pays for itself. So in the long run, it is free. Just like quality.
There is an optimum batch size based on how much it costs to deliver the batch and how much it costs to hold the batch. In our world, we see this clearly in software delivery. If delivering new software to customers costs a lot (say 2 weeks of labor), we tend to batch up the features to deliver them less often. However, holding those features might also cost the business revenue, so, we also get pressure to deliver more often. There is some optimum batch size that minimizes those costs.
If we look at the total cost of the whole year of deployments, including the cost of doing deploys and the cost of holding onto features that are ready but aren’t in customers’ hands, it is proportional to the square root of the deployment cost. So, reducing the cost of deployment will reduce the total cost.
In addition, if we lower the deployment cost, the optimal batch size will decrease. That has enormous benefits. First, each deployment has fewer features, so the deployment risk decreases. Second, the features get deployed faster, leading to faster feedback and learning.
But the magical one is that it pays for itself. If you work 40 hours to reduce the deployment cost from nine units to four units, you reduce the total cost for the year by one-third. That means you’re saving 1/3 of deployment costs and 1/3 of the missed revenue. That’s probably worth a week of time. Check out other benefits of automation.
Ideal deployment cost
The ideal deployment cost is zero. Zero time and zero compute and zero bandwidth. The big factor is time. Obviously, the ideal is not the same as a real deployment cost. But we know the direction it would take.
You might say that your deployment cost is near zero. You’ve automated everything. You might be right. But are you taking into account all of the costs? What if a deploy doesn’t work? For instance, what happens if you submit a deploy and a test fails? How long does it take you to know about it? How long does it take to debug it? How often is a bug discovered during deploy?
I once worked at a company where we couldn’t run tests on our local machines, so the only way to run the tests was to submit it to the build server. The build server would spin up containers with all of the microservices and containers with third-party service stubs, then run the tests. That would take 10-20 minutes. And you’d have to monitor a Slack channel to know if the tests passed. And sometimes, if the tests failed because of a timeout, you wouldn’t know for an hour. Even though I’m not actively doing anything, I couldn’t really work on anything else while the automations ran. You have to take that into account.
If you get your deployment cost down low enough, we call that continuous deployment.
Unfortunately, work on decreasing the deployment costs has diminishing returns. As the deployment cost goes down, the ideal batch size also goes down. But how small can your batch really be?
Ideal batch size
I used to say that the ideal batch size would be one feature. But then I learned about feature flags. Those allow you to push code that won’t run in production. So you can push the code before the feature is ready. That opened my mind to how small a batch size could be. In theory, with feature flags properly in place, you could push code as soon as you typed a character that led to a successful compile and test. If the cost to deploy is low enough, why not?
Unfortunately, I don’t have first-hand experience with feature flags. They seem like a handy way to reduce batch size, but they probably require a discipline that I don’t understand.
Continuous Integration
Once you understand the principle, you see it everywhere. If reducing the cost of deploying our software is advantageous, what about reducing the cost of committing code?
Many code changes we make are not new features. For instance, we could do a code cleanup. While we might deploy small refactors to the code, they won’t make us more money. However, they will benefit the other developers. Each refactor makes the code easier to work with. But until they’re pushed to the repo and pulled on the other developers’ machines, that value is unavailable. We should see the same benefits of reducing the cost of committing and pushing code as we did with reducing the cost of deployment.
If you get the cost of committing new code (and pulling that code from the repo) down low enough, we call that continuous integration. We’re merging our code into the trunk very often. And we do see the same benefits. The total cost of merging should go down (measured in the number and severity of merge conflicts).
This reminds me of Kent Beck's experiment to automatically commit code that passes the tests and revert it if it doesn’t. It’s an interesting experiment, but I think it incentivizes another thing: not writing tests! Anyway, it’s a cool experiment with reducing the cost to integrate code.
So, this was a meandering thought-stream about the size of batches. I know I’m preaching to the choir. Don’t we all believe that smaller batches are better, and the way to get them is to reduce the deployment time? I think so. But I’ve also encountered people who say “our deployment is not worth making faster.” But it probably is! It will pay for itself if you spend time on it. And imagine the greater feedback and lowered risk! Do it!
Using the "( Test && Commit ) || Rollback" ("TCR") approach is and has always been intended to be used with Test-Driven Development (TDD). Writing tests should be an integral part of it.
It was criticized for breaking the Red-Green-Refactor cycle.
So I described how to fix it to reenable Red-Green-Refactor (for TDD, and TCR)
https://jeffgrigg.wordpress.com/2018/11/23/test-driven-development-with-test-commit-testcodeonly-revert/
By my calculation, that 40 hour investment in improving the build time that you give as an example above, pays for itself about 17.5 times a year. In other words, it will take less than three weeks for it to pay for itself. That is, in less than a month, that one week of work investment is paid for, and then you get "pure profit" (IE: savings) from then on.
Less Than One Month.