Bad data models lead to code complexity

Mar 11

Extra ifs are the cost of poor data models

6 Comments

This is great stuff. At work, over time, we started with a Boolean flag for one aspect of a member profile, and later had another slightly related Boolean flag for a different aspect of a member profile, then we realized we needed to synthesize a new status from the two Booleans -- a three-state status. And of course now we have four possible states where only three are valid.

We did exactly what you advocate here: we introduced functions to "get" and "set" the status and replaced references to the old fields with calls to this status function. Later we modified the "getter" to heal the profiles that were in that fourth, invalid state -- because of course that happened to us, despite it "not being possible". We still have the two underlying Boolean columns in the database -- we didn't feel it was worth the time and effort to add a new status column and migrate the old flags into that...

Our database schema dates back to 2012 at this point, and it's been heavily modified over time as business requirements have changed, so it has a trail of now-incorrectly named and/or unused columns -- which we paper over in the code, to maintain the accuracy of the domain model's evolution.

For example: in our legacy (pre-2012) system, members could "wink" at each. Then business renamed that action to "flirt" (with a slightly different set of rules), then it got changed to "like" with a reciprocal "connection" state, and then the rules changed to withhold delivery of likes until the initiating profile was approved, etc. We did a database migration from flirt to the initial version of like, and then built functions to support the domain for everything else.

Expand full comment

Michał Moroz

Mar 12

That was part of my talk I presented on a couple of conferences. I showed the connection between removing "not-reflecting-reality" states in the model of the reality and simplicity in reactive programming. At the end if we can remove such states from the program, that program would have less errors and less complexity. :)

In reactive programming it's even more important to do that early (e.g. at response from API)

Expand full comment

Jurjan-Paul Medema

Mar 15Edited

Thank you for sharing your thoughts on this!

It is indeed thought provoking for me as I still hold a preference for recognising and reasoning over separate ‘boolean’-like event dimensions in a ‘flow’ through ‘states’ over an enumeration of fixed states, mostly because I expect new (event) dimensions to be needed at any time that do not fit in the original one dimensional enumeration. But (?): YAGNI!?…

I have experienced major system failures because added states were not recognised and dealt with properly in every nook and cranny of the code base, because the semantics of the original states - now leaked through everywhere - have subtly changed in a way that is so easy to miss. Even in a recent project where these flow states were supposed to be completely clear and fixed forever, we soon discovered a previously unstated dimension: an ‘active’ flag that users can set to false. Would you model that with one new state or just double the original states?

In your (probably simpler than what you encountered in reality) example of wanting to know whether a document is in the approval state, this (boolean) question seems best served by having a (boolean!) `ready-for-approval?` function over the document anyway (which I’m sure you thought of). Calling that everywhere where that question is relevant requires discipline of course (or enforced encapsulation, of which I’m not necessarily in favour in this case). Ironically, for callers it is now of course no longer important how exactly the state is modelled internally.

As all reasoning deals with boolean decisions, I still find that it makes most sense for those decisions to be composed of ever smaller ‘units’ of truth (through separate functions). I would even go so far as to call out enumerated states as ‘complecting separate dimensions of reality’. 🙂 (I really don’t get the fear of allowing forbidden/impossible combinations; they either don’t occur, as was expected or they provide interesting input to challenge one’s own assumptions. It might not be a problem at all or, in the worst case, there is indeed a bug somewhere that’s worth discovering, investigating and fixing.)

I have recently become more convinced that the way this flow state (separate event booleans/dates vs. an enumeration value) is modelled internally is less critical than how the code reasons about it (via small functions as above), but for ultimate readability there is of course a lot to be said for keeping things aligned.

But also in this recent project the jury is still out (in my mind). Some of our team would have preferred reasoning and talking about the fixed enumeration of possible states the way the customer talks about them. That ‘shared language’ is the strongest argument that I can think of to make that the base of our model after all. Realising (and having experienced) that customers’ insights progress over time remains my argument for having state change events at the core. But by having built small reasoning functions on top that the rest of the code uses, we should not need to touch a lot of code if we eventually decide to change the core model after all.

Expand full comment

Tim King

Mar 13

Unfortunately, we see this kind of thing a lot in various systems: logic is duplicated and spread throughout the system, leading to low cohesion and tight coupling.

A general rule of thumb I try to keep in mind is that anytime there are branches or loops that depend on specific data, keep that logic close to the data. (And that's also the code that gets most heavily unit tested.) So documentStatus should have always been a method in the "document" module or object. And failing that, it should have been refactored into a method early in development. And failing that, the code was bound to become messy and confusing, especially as further modifications were demanded of that logic.

Another way to look at it is that every time you have to change the way document status is calculated, for example, to add a new document state, code all over the system had to be modified to implement that change. The very first time something like this happened, that could have been the signal to refactor that code, because it was indicating that this is a change that the system needs to be designed to handle reliably and cheaply. Those pieces of logic strewn throughout the system are highly cohesive, and they are causing all those different parts of the system to be coupled together. Code that changes together should be kept together in the same module, object, function, whatever.

Expand full comment