Happy Thanksgiving! It’s this Thursday here in the US. I hope it finds you safe and peaceful.
Please check out the first chapters of Runnable Specifications. Someone just last week told me they didn’t realize you could already start reading it! He did and said it was just the book he was looking for. Oh, man. I rushed back home and started writing more. I need more feedback like that.
And if you want to read a completed book, Grokking Simplicity is just sitting there, waiting to be opened by you. If you already read it, tell the world you liked it by writing an Amazon review. It’s the best way to help me!
The end of algorithms
Algorithms don’t scale. Here’s a story to illustrate:
On Friday, I decided to spend some time accessing the ChatGPT API. I had a few tasks that I thought it would be good at, so I dove into it.
I had about 1k records that I wanted ChatGPT to process. I tried the stupidest thing that could possibly work, which was to submit them all at once, but that didn’t work well. So I developed an algorithm to process them one at a time, with some accumulating state that gets added to the prompt each time. My tests on the first 10 records showed signs of success, so I unleashed it on the whole 1k data set.
My algorithm stopped working. The logic of it was still sound, but now some other issues started popping up:
I was being rate limited. About 10% through, I was getting errors, which I was not handling.
There were transient failures occasionally.
When something did fail, I was throwing away all of the work.
Plus, I could track progress because now it was measured in minutes instead of seconds.
My simple accumulating loop, which treated the API like a function call, could not scale to 1k rows. And this is not the first time this has happened to me. My usual conclusion is “distributed systems introduce lots of complexity.” But today, I want to explore a different conclusion, “Algorithms don’t scale.”
One person that often talks about the problems with data structures and algorithms is Alan Kay:
The algorithms are kind of gear like, and the idea that you can smash values and data structures and get away with it implies that you're thinking small.
Part of the epiphany of Smalltalk for him was a related idea:
For the first time I thought of the whole as the entire computer and wondered why anyone would want to divide it up into weaker things called data structures and procedures.
And on how poorly algorithms scale:
Systems were starting to become much more interesting than algorithms…but trying to scale up algorithms you get a hundred million line operating systems like Windows XP.
Alan Kay has been and continues to be one of my biggest inspirations. I still learn stuff even from rewatching lectures I’ve seen a dozen times. I don’t know if it’s because his ideas are so far from my thinking that it takes me that many times to get it, or that he’s not that great at explaining things. Either way, he’s brilliant and I’m still learning.
He explains that even back in the 70’s and 80’s, people saw that systems thinking was how we could get better software. We had the example of ethernet and internet, both systems with no top-down algorithm. But languages that tried to lock down the difference between data and procedure—ALGOL and Pascal—won. Their descendants have still won today.
When Kay talks about data structures and procedures, it to me sounds very close to how Rich Hickey answered in this exchange:
Q: What needs to be the focus of programming language design in terms of concepts and constructs?
Rich Hickey: Functions and data. <drops mic>
And of course, my whole book about the functional programming paradigm divides the world into data, calculations (pure functions), and actions (impure functions). I obviously believe there is a lot of merit to the function/data view.
But is this the same as what Kay is talking about? I think we need to let our minds wander back to the 1960s to really see what he means, similarly to how when Kay says OOP, he doesn’t mean Java, which is what most people think of when they think OOP. The meaning has changed over time.
Back in the day, programs were often written to follow a complex layout in memory. There was a lot of pointer chasing. And people realized that if they organized the memory in certain ways, they’d get certain desirable properties. Some operations, like finding something in memory, or modifying something, would get faster. Or the data would take less memory. Codifying this knowledge begat data structures as a field of knowledge.
People were also realizing that certain problems were very common, such as sorting and search. If someone could just figure out the fastest way to sort, we could write down how to do it and share it. Likewise, if we wanted to find a way to traverse all the nodes in a tree, or find the shortest path, we could figure out the minimal solution. Those procedures we called algorithms.
And so programming was largely about traversing data structures in memory using algorithms to process what was found. And it sucked. The logic for what you wanted to do at each node was intertwined with the logic for traversing the data structure. If you needed to change the data structure, the entire algorithm needed to be reworked. What’s more, the programs were very hard to read and modify, let alone get right. In short, they didn’t scale.
Alan Kay was looking for something that would scale, in the same way that the internet could scale. Nodes are added and removed from the internet all day long without a bottleneck and without anyone having to understand its topology. Could he find something like that for software?
What he and his team came up with were what they called “objects”. Each object sent and received messages from other objects. The object had its own local process state that was inaccessible except through messages, which the object interpreted according to its code.
You could, of course, imitate data structures using objects, but the idea was larger. As your system scaled up, the semantics of your objects was maintained not by a better and better algorithm, but by a careful separation of concerns: Each object maintained its own invariants. Because of that, the senders to those objects could omit those concerns from their own code. So, as each semantic idea became more resiliently implemented and coherent, the users of that idea could relax their own discipline and rely on them more.
Unfortunately, when C++ and Java took on the ideas of OOP, they didn’t really convey this idea well. Classes became a convenient means of method dispatch—the mechanical “gear” idea pervaded. And perhaps it was a little more disciplined to keep the code that accessed and modified data right next to the data definition itself. But this idea of coherent objects maintaining their invariants from a chaotic outside world is only present in diffuse concentrations in the vast sea that is object oriented design advice.
Kay again:
The flaw in how things have played out is that very few in computing actually put in the effort to grok the implications of “universal scalable systems of processes”, and instead have clung to very old and poorly scalable ways to program.
I should mention that when I look at old Smalltalk code, the kind of stuff they were doing at Xerox PARC, it looks much more like what we would call functional programming today. It’s often recursive methods on tiny classes used like Algebraic Data Types.
And when I look at modern FP, it’s using ideas like reusable primitives that maintain invariants—almost like providing a service. The data structures we use in Clojure are not primitive like arrays. They guarantee a lot: immutability, structural sharing, and their participation in the core abstractions.
Here’s Alan Kay again:
It *is* like having independent subroutines that can call each other, but extended in the form of protected modules that provide “services”, and can do many helpful things internally and safely.
A well designed OOP system will feel as easy as doing a subroutine call for easy things, but can extend outwards to much more complex interactions.
For scaling etc. you want to have the invocation of “services” be a more flexible coupling than a subroutine call (for example, you should be able to do many other things while the service is happening, you shouldn’t have your control frozen waiting for the the subroutine to respond, etc.).
To wit: Kay’s OOP is like subroutine calls, but enhanced to make them work better at scale.
In other words, Java is fine, but when you put getters and setters for every field by default, you’re essentially making a mutable data structure. The encapsulation that was protecting the invariants cannot work anymore. And it becomes too easy to write large algorithms that control things from the outside.
Which brings me back to the original ChatGPT API problem I was having. My algorithm needed to be enhanced with services:
I needed a decent error-handling policy.
I should retry transient errors.
I should retry with exponential backoff when I get rate limiting backpressure.
I should centralize the use of the API to control the rate, but in a way that doesn’t limit how I write the program.
I want to maintain a running state of previously completed rows.
To understand progress.
To resume from later in the case of failure.
That’s a lot more stuff that I have to handle. If I try to do it as an algorithm, I’m sure I would succeed but it would be very intricate work. In an algorithm, I have to figure out a list of steps that will always succeed.
Instead, I could start to think of it as a system—relax my grip on the controls—and see it as pieces interacting. Do I need to know what order things happen in? Are there properties of the system I can guarantee with local behavior instead of with global behavior? These are the kinds of questions that start opening us up to scale.
Some additions to the discussion
On the usefulness and ubiquity of data
Rich Hickey and Alan Kay had an interesting discussion about the importance of inert data. It’s worth a read. I gave my commentary on it, then commented on my own comments. The discussion is that rich.
Relevant to the topic here, I think data is a powerful idea with a lot of pragmatic benefits. It is durable (you can easily store it or transmit it to another system) and cross-platform. It has a long history (thousands of years), so its properties are well-understood. Unfortunately, we do not have a practical way of running real software on another machine across time, so sending code is not an option.
However, I love the research question that Kay poses: Can we get to a bigger scale if we send an interpreter along with the data? Alan Kay has done some research into describing how to bootstrap the early Smalltalk that could be etched in stone.
Primitives
I personally believe that one of the failures of the Java ecosystem is to invest in educating programmers to separate concerns effectively. Instead, they blew their training budget on Dog isA Animal-style class inheritance. What we see is a bunch of programmers who cannot identify reusable abstractions if their lives depended on it.
In the functional world, we do look for those reusable abstractions. We appreciate the primitives that Clojure gives us, like atom and future, but we aren’t shy about making new ones. If there was something I could do to help Java programmers see that they do have the tools to build more abstract and scalable systems with less effort, it would be to teach them to recognize those seams for separating the problem.
Propagators
An interesting avenue I have explored and still think has a lot of potential is propagators. There are some algorithms that are just too hard to write and maintain. We need to intricately specify the order so that the right answer happens every time. Propagators give us a way to ignore the order.
Propagators do that by using associative and commutative functions that monotonically approach some answer. You form a network of these propagators, who share their current state with their neighbors. The neighbors combine those current answers to form their own current answer. Eventually, the network converges.
The nice thing about them is that they arrive at their answer without you ever having to know what order things happened in. You can reason locally about each connection in the network (make sure it’s correct), and the whole network just works to find the answer. You can add new nodes to the network and it should continue working.
Learn more about propagators from this talk by Gerald Sussman. And for a Clojure library that parses using a similar idea, see this talk by Mark Engleberg.
Data and algorithms in the small
We started by saying that algorithms don’t scale. But you need some algorithms, right? I think you do, to some extent. Even the definition for calculating the length of a linked list is an algorithm, even if it’s tiny:
(defn length [list]
(if (empty? list)
0
(+ 1 (length (rest list)))))
What that means to me is that the systems thinking that Kay prefers to algorithms is built out of small algorithms. We can build a retry-with-backoff algorithm, which becomes a piece of the robust, scalable system we are ultimately building.
And I think this is also the answer that resolves the troubled discussion between Hickey and Kay: In the small, you need data and algorithms. But it only scales so far. At some point, you need to build up to systems-thinking. I happen to believe that Clojure and many contemporary languages do provide a lever for doing that better than starting from low-level languages. Clojure’s built-ins let you scale rather far—using just data and algorithms. But, as in any language, the free lunch comes to an end. At some point, you need to start finding new abstractions.
Probably the ultimate language for writing complex systems is Erlang—though I’m excited by some of the JVM developments following Virtual Threads, including structured concurrency and scoped values, which should close some of the gap between Erlang and the JVM.
Do type systems help with scale?
Alan Kay’s words:
[Work was done on Algol and later C and Pascal for] a type system that could deal with parameter matching of polymorphic procedures to new data types.
A big problem was that “data” could move around and be acted on by any procedure, even if the procedure was not helpful or at odds with the larger goals. “Being careful” didn’t scale well.
A lot of people talk about type systems helping deal with software as it scales. And I know what they mean—it still might be hard to understand or change, but at least the machine is telling you it doesn’t fail in certain ways.
I think what Alan Kay means by this statement is that people noticed this scaling issue—lots more type errors as you scale—and addressed it in an incremental fashion by having a type checker by your side. But this incremental improvement did not really scale that much farther. The real scaling came with time-sharing operating systems. Kay continues:
Meanwhile, time-sharing and multiprocessing OSs were being developed, and “being careful” did not work at all. Instead, the decision was rightly made to protect entities from each other — and themselves — via hardware protection mechanisms. This allowed processes made by many different people to coexist while being run, and it also allowed some processes to be “servers” — to provide “services” — to others.
Processes were software manifestations of whole computers — containing both processing and state — both hidden and protected.
In other words, scaling came from writing independent processes that communicated, not by locking down the correctness of the code.
More:
Now the thing to realize is that this — whole processes offering protected services — is really a good idea at any scale. First it allows much larger and more elaborate services to be done safely.
But it also makes things that weren’t safe enough at the line by line programming scales to become much more safe.
Even faulty programs can be useful if you can run them in a bubble.
Conclusions
Algorithms don’t scale. That’s why code starts to suck as it gets bigger.
We need to take global invariants and turn them into locally guaranteed invariants. This is hard, but it’s the essence of software design.
We build complex systems out of many small algorithms.
Rock on!
Eric
I am having some problems understanding what you mean by "algorithm" here. There is no such thing as a program without an algorithm; even if you try to sort a list by picking elements at random, that is an algorithm, even if it is a bad one. Clearly you must use the word as a proxy for a certain subset of algorithms.
That certainly applies to the internet as well, it is full of algorithms. That TCP guarantees delivery is ruled by an algorithm, the routers that make the internet work has lots of algorithms in order to manage their routing tables and also there are routing protocols that implement algorithms between the routers which is crucial to keep the information flowing (see for instance OSPF https://en.wikipedia.org/wiki/Open_Shortest_Path_First).