Our last Apropos was with Sean Corfield. Check it out. Our next episode is with Bobbi on March 25. Please watch us live so you can ask questions.
Have you seen Grokking Simplicity, my book for beginners to functional programming? Please check it out or recommend it to a friend. You can also get it from Manning. Use coupon code TSSIMPLICITY for 50% off.
The presentation was chaotic. My slides weren’t working. My daughter was sitting on my lap. And, what’s worse, the audience did not believe what I had to say. My main point was that you didn’t have to live in a nest of if statements. You could find a model in any domain. But they weren’t buying it.
That presentation was my first step toward the idea of domain modeling as the heart of software design. That was years ago. And it seems obvious now (so obvious that it seems weird to need to say it), but here’s where I eventually arrived: A domain model is judged by how well it corresponds to the domain. The model and the domain must have similar (ideally: identical) structure. The closeness of correspondence between model and domain is where the power of abstraction comes from. How do we get computers to do useful work? Correspondence.
But correspondence doesn’t sound great when you’re looking at a messy domain. In the intimate audience of that presentation, someone mentioned their domain had lots of government regulation. He believed there was no way to find a clean model. The only choice was building up spaghetti code to match the spaghetti of laws that made up their domain. And here I was trying to sell an answer to complexity by finding a better model.
I empathized with him. I’ve faced similar domains. But I’ve also been surprised. I came away from that presentation with a new awareness of the difficulty of the challenge I had taken on. You see, many times in my career, when we were struggling to find a simple model, we’ve eventually found a one (though sometimes too late). Well, not just a simple model—a much better model than what we currently had. The new model was so good, in fact, that when we did find them in time to build them into our code, they drastically reduced the complexity of our codebase. At the same time, they added dramatically to the business value.
I’ll tell a story that I’ve told before. But it’s very relevant. I worked at a company that helped Americans register to vote. In the US, each state determines its own voting registration requirements. A co-founder of the company had studied all of the laws of all the states, and she presented us with a big mess of overlapping categories, showing how crazy the logic was. It was clear she’d tried to organize the chaos but came up short.
We all struggled with the rules. Some states let you register the day of. Some you didn’t need to register. Some you had to register two weeks before. Some required you to live in the state for two years, unless the state you moved from didn’t allow you to vote there. Some just wanted you to vote where you lived. Ugh. It really was a mess. There were no clear seams that we could pull apart—which, in retrospect, makes sense when each state is acting mostly independently.
Eventually, I gave up. Let’s just do a big if statement! Why not? The first level of nesting will be by state. If you’re in Nebraska, then follow this logic. If you’re in New Jersey, follow this logic. We divide it up, and the task is 50x easier. We just have 50 new if statements to write, one for each state.
It’s sometimes useful to look at that worst case scenario. The worst case is that you have 50 small messes instead of one giant mess. Ask yourself: How would you do it in a straightforward way? It doesn’t mean you have to commit to going that way, but it helps jog the mind. And jog my mind it did.
Once it was a set of conditionals, I saw something we had overlooked: The question we were trying to answer was really simple: Can person x register in state y? That can be written in a nice function signature:
function canRegister(x: Person, y: State): boolean;
And each of those 50 branches of the conditional was then a question of this signature:
function canRegisterInNebraska(x: Person): boolean;
One of those functions for each state. That’s just a simple predicate on Person
. And when you look at the rules of Nebraska (or any state), they’re of a similar form:
function isMajorityAge(x: Person): boolean;
function isResident(x: Person): boolean;
function isCitizen(x: Person): boolean;
We can easily combine those smaller, easy to implement questions into one bigger question with some boolean operations:
function and(f1: (x:Person)=>boolean,
f2: (x:Person)=>boolean): (x:Person)=>boolean;
function or (f1: (x:Person)=>boolean,
f2: (x:Person)=>boolean): (x:Person)=>boolean;
It turns out that isMajorityAge()
is very reusable. Most states have that rule. So we were getting plenty of reuse. And we were able to build up rules that had a very similar structure to how the laws worked in each state. Each state was custom, but it was built out of modular, reusable parts.
In essence, we looked beyond the mess to find structure underneath. The structure was at a level of higher generality than we were looking at. We wanted to find a hierarchy to fit the states into. Instead, we needed to accept that each state was complicated and build a platform of atomized decisions that could be composed to mirror the structure of the state’s laws. We build a platform on which to model the states’ laws. I sometimes call such a platform a core abstraction.
I think of that story every time someone claims their domain has no structure. There’s always a structure. Humans build things with structure, even if it’s very messy and complicated. It’s never random. You can find the structure.
However, I don’t think it’s merely a mindset shift. I don’t really believe mindset is enough. Over the years, I’ve realized that it takes a very trained eye to spot that structure. How did we see that there was Boolean logic at the base? Lots of practice? How did I come up with that function signature? Years of working in Haskell. A part of my brain is always asking “what is the type?” And the composing functions? Luckily, this was a Clojure company, so we were used to thinking about higher-order functions. Would the average Java programmer have thought of that? I would guess not because they’re not used to thinking that way.
We should judge a model by how well it corresponds to the domain. The better it corresponds, the simpler our code that uses that model. But many domains are messy. Their structure is entangled. Does that mean that our model should be messy as well? I say no. It needn’t be messy, but you have to look for structure at a different level of generality than you’re currently looking.
But this a bittersweet synthesis. On the one hand, it’s hopeful. There is a core model that structures the mess. I’ve always found one, eventually. On the other hand, it’s not easy. I think of a passage from The Early History of Smalltalk (emphasis mine):
It started to hit home in the Spring of '74 after I taught Smalltalk to 20 PARC nonprogrammer adults. They were able to get through the initial material faster than the children, but just as it looked like an overwhelming success was at hand, they started to crash on problems that didn't look to me to be much harder than the ones they had just been doing well on. One of them was a project thought up by one of the adults, which was to make a little database system that could act like a card file or rolodex. They couldn't even come close to programming it. I was very surprised because I "knew" that such a project was well below the mythical "two pages" for end-users we were working within. That night I wrote it out, and the next day I showed all of them how to do it. Still, none of them were able to do it by themselves. Later, I sat in the room pondering the board from my talk. Finally, I counted the number of nonobvious ideas in this little program. They came to 17. And some of them were like the concept of the arch in building design: very hard to discover, if you don't already know them.
The connection to literacy was painfully clear. It isn't enough to just learn to read and write. There is also a literature that renders ideas. Language is used to read and write about them, but at some point the organization of ideas starts to dominate mere language abilities. And it helps greatly to have some powerful ideas under one's belt to better acquire more powerful ideas.
The challenge I’ve taken on in my book is to give people a handful of powerful ideas that help people model. I think the biggest idea is to judge code complexity not by how hard it is to read, but on how well it expresses the model. Then there are others, like to use total functions. And total functions is what we’ll talk about next week.
Do you have examples of finding core abstractions in an otherwise intractable domain? Is there a domain you gave up on and resorted to spaghetti code?
Your article somehow evokes hope ;) There actually is a domain I partially gave
up on.
Imagine a PIM holding multiple things, refering to them as "articles" and
"variants". Every size (like, t-shirt, bicycle…) is a variant linked to the
article, enums can be managed and linked cleanly from other tables and so on. An
article can exist in multiple years, so there's a link to a model year enum,
too. Whenever attributes of a model or variant change, a new model entry with
all variants gets created and tagged with years.
Now there's business needs. Your translators, shop managers etc. should not know
secret plans, thus planned articles for the future are managed in another tool.
That tool allows managers to give state to every variant of an article, where
every state other than "keep as it is for next year" will require some kind of
"splitting" the model in the PIM. Yes, that includes adding a new size to a
well-known vintage article.
Then articles are planned some years in advance. So while 2028's business is
planned, someone decides articles need to be changed from 2026 on. Someone else
later reverts that decision. Maybe from 2027. There's a wild series of
transformations, splits, merges of articles and variants, as well in the tool's
independent database as in the PIM.
While most of the syncing issues could be caught by a small rule engine, there's
so many mutually exclusive business cases causing subtle errors with the state
in the PIM - even some attributes which need to be synced but only in one
specific year (writing that down gives me an idea for another rule…). So, parts
of the system required me to open that can of paste and liberally pour it over
the codebase.
Nice article. I resonated particularly with the part about acquiring a larger set of ideas as it's something I've definitely noticed in my own development.
The Smalltalk story is something I experienced before several times; it reminded me of something I read once, "The only difference between you and Shakespeare is his use of idioms not vocabulary".
In other words, Shakespeare greatness comes from his broader set of ideas and the way he could compose them to produce his art, not by necessarily knowing more words.
Besides that, I want to also touch on this part - "There’s always a structure".
This is something I've believed too but recently I've been questioning that assumption.
I think it's still true for many domains and even complicated ones but after being exposed to the Complexity Sciences through Residuality Theory I'm a bit more skeptical.
As software "eats" more of the world, can we ever find structure in those ever complex and uncertain contexts?
Even if we find the structure that exists today will it be the same tomorrow?
It is true that at the end of the day we cannot escape structure since software systems must have one but can they truly represent the supposed essential structure that exists underneath?
What do you think about looking at software and the domains through that lens?