Our last Apropos was with Bobbi. Check it out. Our next episode is with David Nolen on Tuesday April 8. Please watch us live so you can ask questions.
Have you seen Grokking Simplicity, my book for beginners to functional programming? Please check it out or recommend it to a friend. You can also get it from Manning. Use coupon code TSSIMPLICITY for 50% off.
Anti-entropic functions
There’s a rule in internet protocols called Postel’s Law. It states that you should accept input liberally (doing your best to decipher faulty messages you receive) but you should be very strict in your output. If every node in a network follows this principle, then we would expect the network to operate more robustly than one that did not. The receiver’s flexibility allows even buggy implementations of the sender’s protocol to operate effectively.
There is an analogous principle in electronics. In electronic components, noise is a major concern. So you build modules to tolerate some noise on the input but to do their best to minimize noise on the output. If every component does this, the system will work better. Noise is a fact of life, but if every component aims to reduce it, then the system will work better.
I’m not a huge fan of Postel’s Law. I’ve had to work with some horrendous HTML as input to my software. It was only horrendous because the browser is so tolerant that it rendered fine visually. But if you tried to parse it with the intent of making sense of it, good luck. In short, tolerance of input allows all sorts of deviant behavior with no consequences.
But Postel’s Law does work for the user of the browser. Instead of showing just an error message, the browser shows something. The page is readable. The links are clickable. And the user is happy. It’s not nothing. If you imagine the world where browsers weren’t tolerant, one stray < character could mess up the parse, which would cause the browser to crash, which would make the user lose their job, then they lose their house, then they’re living on the street. All because of a <.
If Postel’s Law is about protocol errors and the electronics version is about noise, programmers need a law about complexity. Our systems are so complex. We complain about it all the time as the main source of difficulty. So I thought we could formulate a principle like this:
Aim to reduce complexity at every step.
We already do this a little bit. We receive text from an HTTP request and parse it as JSON. This vastly reduces the space of input by structuring it. Then we normalize the JSON into a form that we want to work with, often translating it to a different, less complex data structure, like say a Document
. Then we call “accessors” on these data structures; for example, getStatus()
. The accessors boil down the complex data structure we pass it to a small set of possible states.
If you combine it with the idea I discussed in my last issue, that of extending the spectrum from partial function through total function to a function that is very forgiving of the input but manages to reduce the output to a small set of states.
In other words, they accept more complexity than needed and output as little complexity as needed. getStatus()
takes a whole Document
, which includes all the information relevant to status plus all the non-relevant information, and it spits out one of just four valid values for status. That’s a big reduction in complexity. A word for this might be anti-entropic.
When you’re writing a function, ask yourself: Does this function reduce complexity or add complexity? Does it make the problem easier to solve? Does it introduce other problems that will need to be solved?
For example, in How variants can reduce complexity, I talk about how returning different types from a function and attaching meaning to those types actually makes the problem harder. It’s too common in Clojure that we return a string or an integer from a function, with the idea that if it’s a string, it’s the URL of an external resource, but if it’s an integer, it’s the id of an entity from our database.
Returning different types forces any code that receives that value to interpret it. That means the code needs to:
Is coupled to the meanings of the types given elsewhere. Action at a distance.
Add an if statement to do the interpretation. Added code complexity.
Instead, aim to return the smallest structure that solves the problem. There are many options that are super dependent on context. But here are several distinct approaches. Here’s our original function:
function getDocument(e: Person): string | number;
One improvement would be to use more precise types:
function getDocument(e: Person): URL | DocumentID
That already looks much better. But we can improve it:
type DocumentLocator = {
locatorType: "url";
url: URL;
} | {
locatorType: "document-id";
url: DocumentID;
};
function getDocument(e: Person): DocumentLocator;
In this code, we’re defining a new type which captures the meaning we intend. You still have the if, but you don’t have the same kind of coupling.
But, more and more, I’m seeing the value of separating these two functions:
function getURLDocument(e: Person): URL | null;
function getDBDocument(e: Person): DocumentID | null;
You need an if statement to determine between them anyway. You might as well just ask for the one you want. I’ve added the null in there because in my imaginary domain, you have none or one or both.
A final approach is to forget the data altogether. Instead of returning a way to get a resource (by URL or by DB id), we return a function you call to get it:
function getDocumentFetcher(e: Person): () => (Document | null);
This one requires no ifs where it is used.
On the other hand, if you’re looking at an existing function and it’s a mess, the way to clean it up is to look for opportunities to introduce steps that reduce complexity. For instance, I once helped someone wrangle some really gnarly code. It was a big nested if statement, where each test was itself a complex boolean expression. We could have sat there and tried to detangle it, looking for some better way to express it.
Instead, I asked the person: What does this code do? After some back-and-forth, it was clear that the if statements were managing the states and transitions of a state machine. They were ushering a complex entity through a process. So we introduced a function that determined what state the entity was in and named it. While the if statements were still there, it detangled choosing what state you were in from choosing what to do, and that helped. The code had fewer nested ifs and the logic was clearer.
We could have attacked the if statements themselves head-on. I do that, too, sometimes. I do simple refactorings like eliminating double negatives, rearranging the tests to make them less nested, extract a function, etc. That would be addressing the style. But in this case, we didn’t. The purpose was to understand what it was doing—to understand the substance. I believe that the substance (the domain model) of our code is a much richer source for finding and eliminating code complexity.
So when I’m talking about reducing complexity at every step, I’m actually referring to model complexity. Reducing model complexity will reduce your code complexity. Code complexity is the biggest thing we complain about as programmers. We must look beyond the surface to bring clarity to the model—make the code more expressive of the model. And with this “principle” of reducing complexity at every step, we can ask ourselves not if the code in the body of a function is simple, but if the function returns something that is less complex than what it receives as arguments—and how perhaps it could reduce complexity even more. I want to believe that this could be a principle, but I’m jaded so I don’t trust it. What do you think? Do you see any holes in this idea? Is it yet another principle that only applies sometimes?