I’ve had these on my mind for a while and wanted to get them into text. Firstly for the benefit of my own thinking, and also so that I’ve got something to direct people to the next time I have a conversation about it. firstname.lastname at gmail dot com if you’ve got feedback.
Caveats
The discourse around AI has reached a religious level of intensity, and I think it’s important to be up-front about where you stand on the whole thing. Too many times I’ve read pieces that only make clear in the closing paragraphs that the author has maniacal beliefs in a particular direction and it’s important context for a piece like this.
I am a daily user of LLMs for a fairly small but growing set of tasks that includes: writing code (via a chat interface and sometimes via Claude Code), as a search engine replacement, and as a learning aid to verify my own understanding of whatever I’m working on. I find them useful and currently pay a small amount each month to use them (an amount that would increase if there were no free options).
At the same time, I am not convinced that agentic systems as built today are going to be effective for anything like the range of tasks that they’re being applied to. I think that for a fixed amount of training data, LLMs have an upper bound on their usefulness for economically valuable tasks. At the 99th percentile of effectiveness for the amount of training data available on planet earth, their economic impact will be marginal but measurable. Revenue generating but not revolutionary.
At a higher level I think something like:
- There’s a 99% chance that a generally intelligent system can be simulated in silico with some level of technology. This level may well involve a Dyson sphere.
- There’s a 20% chance that it can be done with ANNs as we know them.
- There’s a 1% chance that we’ll get there using something we’d recognise as an LLM.
LLMs as good news for VC-backed, non-SaaS founders
This follows on from a well-worn argument around LLMs having non-zero marginal costs that I’ll include here for completeness.
For a long time SaaS companies have been great for founders and investors, requiring basically no capital and having near-zero marginal costs in comparison to basically any other kind of business. Doing literally anything else requires either investing a lot of money in plant or people, or selling a thing or service that has significant minimum unit costs. By 2020 most verticals had lots of players in them, but a decent product with a switched-on distribution team could still make great margins. Investor expectations have been set at a high level by these businesses.
There weren’t many things to spend money on a unit basis that also made the product better for your users. Customer support is one thing (but for a functional product you’d expect marginal utility to decline pretty quickly - things should just work), and building custom features per-customer is another (but you quickly stop becoming a SaaS company as you head down that road).
AI has changed this. While I don’t think current LLMs are going to make your SaaS product 10x better than your competition, I do think they’re going to make it 1-2x better. In markets with relatively low switching costs that’s huge and most companies are going to have to play the game to keep up. The marginal cost of providing those features is no longer zero and from what I’ve seen with agentic system implementations there’s a case to be made that utility scales linearly with cost e.g. running the same bit of text through models or using times more inference-time compute gives you error rates that scale proportional to . If features like this are core to your product and your customers have a utility function that’s proportional to error rate, then the more you spend the better your product is.
Lots has been written about how this changes the economics for SaaS companies but I wonder if this might end up being a big win for people building businesses that have been overlooked by VC money for the past decade. Whether you’re building in healthcare, in a space that needs a lot of expensive, well trained people, or a manufacturing business that needs a lot of electricity and million pound machines, your cost structure might be starting to look more like a software business. Of course non-SaaS businesses can benefit from spending money on LLMs, but you’re not locked into the LLM-powered feature arms race in the same way as a SaaS company fighting 10 competitors in a mature vertical.
There is a competing effect here. If there is an impact on SaaS margins, you’d expect the average return on a VC-managed £ to fall, and as a result the total availability of VC-managed £s to fall as well. This might dwarf the effect of your non-SaaS business being relatively more attractive to a VC than a SaaS business (not a pro but I would assume that less risky returns on SaaS companies subsidise the risk that VCs are able to take on more “exciting” non-SaaS companies). If so, there’s a case to be made that in the long run LLMs make accessing capital relatively easier for non-SaaS companies but they’re still going to feel the squeeze.
Trivial vs. nearly impossible
I’m reminded of this edition of XKCD from the pre-deep learning era, which has at its core the idea that to non-technical people, the distinction between trivial and almost impossible tasks can seem non-obvious. We’re going through something similar with respect to the capabilities of LLM systems.
For example, at this point most SOTA LLMs can probably interpret and fix a React re-rendering issue that might not be obvious on first glance to a front end developer of average skill. Well-documented coding issues in libraries with vast amounts of documentation and online Q&As are pretty trivial for LLMs to solve. Now to a non-specialist, that problem looks extremely close to getting an LLM to fix a bug an unmaintained driver for an obscure piece of hardware. In the domain of all possible things you could ask an LLM to do, those problems are in some obvious senses very close together. But on the critical dimension of training data availability, these problems are about as far apart as you can get.
We’ve seen something similar with LLMs and mathematical ability. LLMs are pretty good at “solving” a wide range of high school (and undergraduate) mathematics problems, and to the untrained eye these are similar to problems from the International Mathematics Olympiad (IMO). The big difference is that the evidence suggests that LLMs have never solved an unseen IMO problem. Once again these sets of problems are close in the “all possible things you could ask an LLM to do” domain, but on opposite ends of the training data availability dimension.
This is really just a roundabout way of saying “LLMs don’t generalise very well” but I think it’s a good way of talking to less technical people about how task performance can easily be uncorrelated between tasks that are superficially similar. LLMs are unreasonably effective at interpolating their training data but can be unexpectedly bad when you take them off-road and you’re not familiar with what’s going on behind the scenes.
Agent divergence
Agentic systems as usually implemented at the moment can be represented as a DAG of prompts and tool calls in which text flows in at the top, flows out at the bottom, and on the way a bunch of prompts are invoked and side effects happen via tool calls. These side effects tend to be the main object of interest and can be edits to files in your filesystem, calls to APIs, etc.
In a less excitable sense these are rules-based workflow engines with LLM-superpowers, but in an optimistic sense these “agents” are going to become adaptable autonomous systems that behave more like a human than something you might build in Zapier.
The problem so far is that these agents tend to be divergent instead of convergent (and I use those terms in the most casual way) when presented with complex inputs. A small error at one node gets propagated without corrective feedback through the rest of the execution graph and leads to erroneous outputs. The problem is worse with bigger DAGs as error rates are multiplicative. Claude Code incorporates a feedback function which allows you to offer natural language error correction advice to the model, but it can still get stuck and find itself in an bad state from which you just have to start again.
Maybe we’ll stumble upon the one weird trick to incorporating error handling into these DAG-based agents. Maybe there’s a style of prompting. Maybe there’s a way to structure the DAG and the data passed between the nodes that might make error feedback easier. Until there’s a robust solution to keeping agents on target, I can’t see how we’re going to get from agents we have them now to the agents we’re being promised, that can operate independently over long execution sequences.