I grapple with how to transition from doing hands-on dev (what I’ve done so far), via using an agentic LLM as an assistant responding to very specific instructions, but still being one in charge (what I’m currently doing), to merely providing specs and tests.
There’s just so much detail in making software. And I see no way around that. This means we need very detailed specs and very detailed tests.
With human devs, the details surface as we implement things – some of which can be resolved on our own, but many requiring discussion and collaboration to figure out and decide how things should work.
Also with human devs, there’s a tradeoff in the level of detail. It’s usually faster (at least I believe so) to not try to get every detail right, but to begin implementation and uncover which details matter enough to decide collaboratively. It pays off to start early and let the design evolve – both the internal (code/architecture) and external (UI/UX).
In the future, when we’ll spec and test and let LLMs do much of the work, we’ll be able to iterate on functioning code. So maybe there’s some kind of tradeoff here as well – where we let specs be at a lower level of detail, and figure things out along the way.
But I can’t yet quite see for myself what that will look like.
As for tests, they have needed to be quite detailed – of course depending on context – but I have a sense that this will change as well. And I can’t yet imagine how.
Yes, ChatGPT, Claude, others, ’hallucinate’. Yes, they ’lie’. Yes, they’re just spitting out the statistically most relevant sequence of words – from the multi-dimensional region your own sequence of words happened to activate.
Still, they are more often than not very useful – far beyond merely ’generating’ text.
Also, they are, at the same time, familiar and very unfamiliar. We’re chatting with a completely new kind of entity, which appears human-like. But we can’t really tell when it responds with something incorrect and possibly misleading.
’ChatGPT can make errors. Check important info.’
An LLM might respond entirely correctly, because there was plenty of text about what you prompted it about – and your choice of words successfully activated the right multi-dimensional region – so the statistically relevant sequence of words you got back were something you could rely on.
(It might also have been the case that you asked the LLM about something for which a lot of human effort had been spent in post-training, to improve the quality of responses – likely because there were leaderboards ranking LLMs for inferences about those things.)
But the LLM might respond inaccurately not just due to the lack of training data (or post-training efforts). It could have been your choice of words. It could have been that you went on a tangent on something earlier in the chat session, that steered the LLM into another multi-dimensional region, where another (equally) statistically relevant sequence of words were fed back to you.
When ChatGPT, Claude, others, respond inaccurately – you can often not tell. The LLM itself can’t tell. What it does is to return text. It will always do so. And it will always feel as familiar as when they respond accurately.
Have LLMs gotten better in this regard? Yes. Will they continue to get better? I would think so. But even with GPT-5 – our newest frontier model – this still happens. Will we ever get models where we can trust all responses all the time? We’ll see.
Resiliens – förmågan att kunna leva och utvecklas med förändring … Några missförstår det … som att studsa tillbaka eller återhämta sig från någonting – en alldeles för snäv och bakåtsträvande tolkning.
Resiliens … handlar om att ha kapacitet att utvecklas med förändring, att kunna anpassa sig till en förändrad situation, att kunna leva med osäkerhet och komplexitet – och framförallt om förmågan att nyskapa, förändra, ställa om – om nya möjligheter, om att skifta över mot en ny, positiv framtid.
Kris kan skapa nya möjligheter. En människa med resiliens går stärkt ur krisen. En livskraftig skog förnyas av bränder. Ett resilient företag utvecklas med förändrade förutsättningar. Samhällen som överlevt i tusentals år har byggt kapacitet att vidareutvecklas med förändring.
When I learned programming as a kid, I wondered what ’real code’ looked like – written by real programmers.
This was in the 80s, so it wasn’t easy to find examples. I bought Dr Dobb’s Journal when I could afford to, and pored over the source listings.
Later, my first job was at the Stockholm Stock Exchange – where software was developed in-house! I remember trying to get glimpses of code as I passed by offices of programmers – on screens or dot-matrix print-outs.
This excitement lasts to this day – each time I start a new assignment and get to go over the codebase.
With LLMs in software development, the hope is to be able to make software at higher levels.
A tension here is the seemingly inevitable level of detail for many kinds of software – certainly of not software that matters.
These are things like: for these kinds of users, in these kinds of situations, things should work this way, whereas for other users or situations, they should work like that.
But not just that: whenever there are options, complexity compounds.
If humans weren’t attracted by shiny new things, just because they’re shiny and new, new things wouldn’t catch on.
I don’t think this analogy is wrong, but as I’ve said earlier, I think it’s lacking. Same with the colour-blindness one. But I think both are useful – and important.1
Yes, in a way an LLM is like a database. It’s fed a lot of text and you can query it by giving it text and it will respond based on what’s a statistically likely continuation – with a bit of randomness thrown in to make things more interesting.
I find it odd to dismiss LLMs completely the way Schubert does because they will hallucinate and lie when you query about things where there’s too little text. They are fed such huge quantities of text so it’s like querying the collective knowledge about nearly everything. That’s just missing out. It’s a little like refusing to use a search engine because it can’t return things that haven’t been indexed. (If you criticise my analogy I will just ignore you. Sure it’s not perfect. They never are.)
In the beginning, LLMs just continued whatever text you fed it. Then they were trained to respond in prompt-response form. This is training that happens after feeding them all the text.
More recently they’ve been trained to ’reason’ before responding, to use chain of ’thought’.
The database analogy is important as a reminder that they don’t actually reason or think. They also do not understand things. You don’t actually teach them things.
Here’s where the database analogy starts to become inadequate and therefore less useful. Because LLMs can do so much more than simply look up textual data by querying using text.
You don’t just query an LLM. You’re also instructing it to do something. You’re telling it what you want it to do. And it will do things using language – by ’thinking out loud’ to itself. As it does this it outputs statistically relevant bits of text that also guides the LLM itself. In a way it sees its thoughts and thereby makes a path to the response it will deliver back to you.
Why I don’t like the database analogy is that it reduces LLMs to something that they once were but no longer are.
Large language models today are as much the result of feeding them lots of text as they are of being subjected to the training that happens afterwards – like that which taught them the prompt-response form and chain of thought reasoning, but also feeding them prompts along with human-written responses about things that we want the models to handle well.
I wouldn’t rule out that it’s possible to feed one of the currently available models some text that would return a response that could be deemed equivalent to human creativity. A prompt that would cause its chain of thought to produce something that goes far beyond just regurgitating bits of text from the places in the multidimensional space that the input text activated. (Most responses already go beyond that.)
No, we shouldn’t anthropomorphise LLMs. We should be careful to say that they think or reason. We should remember that it is possible to get them to output anything with the right input. And we should remember that they will always output something – and that something might be a hallucination or a lie, so we must learn how to determine when that’s the case. For this, those two analogies are useful and important. But I think we need better ones.
What I do is never work unless I’m in the right state.
I assess my state by checking whether my body feels light and upright or heavy and slouched – or where between those I feel. I check what my mind feels like, clear or foggy. Am I having ideas? The breath is also a good signal – is it deep or shallow, tight or at ease. And do I feel pull to work or resistance? Do I feel that I can focus effortlessly or would it require effort?
If I’m on the edge, I might do some light work, review my plans, write a little to myself.
But if too many indicators are off, I’ll do what I think of as a reset.
Take a walk around the block. Do some yoga. Meditate. Clear up the kitchen. Go for a brisk walk in the woods. Do NSDR, breathwork. Take a shower.
Which reset fits is a bit of a learned skill. A calm walk or light chores works in most situations. Meditation or NSDR is bit tricky as they can pull you down further.
When I work I check in with myself after about 30–45 minutes to see if I can proceed.
Doing this seems to make the instances where I struggle to work more rare. They still happen on most days but doesn’t require as much in the way of resets as before.
A reset is a brief pause that helps your body and mind recover and refocus.
It’s not simply about resting – it’s about deliberately stepping back from what you’re doing, so you can return with greater clarity, energy, and focus.
A proper reset settles your nervous system, clears mental fog, and helps you change direction – whether you’re getting started, feeling stuck, or finishing up.
Rather than trying to block distractions, cultivate pull toward what you want to do.
Blocking apparent obstacles doesn’t remove the pull of distracting things.
You need to remind yourself clearly and often how you want to spend your time.
Say you’re writing a function or method. What does it do? It takes some input, returns
some output based on some computation – or maybe changes some state. Are the inputs and
outputs related? Do they make sense together? If not they should be separate.1
This happens at all scales: within functions (separating by blank lines), within a group
of functions (should they be together or apart), within a file, within a class, module,
subsystem, and so on.
Separating things means making decisions about who gets to access what
– call a function, read or write some state – and even more subtle, who gets to know that
some other part even exists.
This has significant effect on the quality of software – how easy it is to understand and
to change and how well it allows to keep complexity in check.
I don’t agree that life is short. A year is long if you look back. Sure, you remember
things vividly and they feel like yesterday – but if you review your experiences, you’ll
find many things that feel very distant – even things from a mere month ago.
I’m doing company admin today – tasks I’ve had rough checklists for over the years.
Some months ago, I rewrote them in detail, mainly to document the process for Svante,
who runs the company with me. Some tasks overlapped with his checklists, but others
were things I’d handled without explaining how. Documenting them made sense. What I
didn’t expect – given that I already had checklists – was how much more efficient the
work has become – less cognitive load and it takes less time.
Many programmers struggle to name things because they try to find an abstract
name that covers everything it might do. But names can easily be changed,
so choosing a specific name for what the thing does now often unblocks you.
I took Jost Hochuli’s ’Detail in typography’ off the shelf this
morning to look up what the full vertical extent of type is called. He refers to it as ’vertical
height’ (a somewhat tautological phrase) – or ’hp-height’, measured from the ascender of the h
to the descender of the p.
If I look at coding, programming – which is one area where AI is making the most progress – what we are finding is we are not far from a world – I think we are going to be there in three to six months, where AI is writing ninety percent of the code – and then in twelve months we may be in a world where AI is writing essentially all of the code.
But the programmers still needs to specify, you know, what are the conditions of what you’re doing, what is the overall app you’re trying to make, what’s the overall design decision, how do we collaborate with other code that’s been written – you know, how do we have some common sense on whether this is a secure design or an insecure design.
So as long as there are these small pieces that a programmer, a human programmer, needs to do – the AI isn’t good at – I think human productivity will actually be enhanced. But on the other hand, I think that eventually all those little island will get picked off by AI systems – and then we will eventually reach the point where, you know, the AIs can do everything that humans can. And I think that will happen in every industry.
In developing software, you have an idea of how you want it to behave. Software is behaviour. That’s in a sense what you create in a software project.
Behaviour quickly becomes insanely complex. Even as a single developer, having written every line of code, it’s a challenge to maintain an accurate mental picture of the behaviour of the system.
With more people involved in making the software, this is increasingly the case – and I don’t think it’s linear.
As we use AI more and more to make software, maintaining a mental picture of the behaviour of the software becomes even more of a challenge.
As the provider of some software you are responsible for that behaviour. It’s what you offer customers for money. Understanding what it does has always been critical, and I don’t see that changing.
Now what can one do to understand a piece of software. You can participate in writing the code for it. That gives you an understanding. You can read the code that others have written. And you can use the software itself.
As for documentation in the form of text and diagrams, that’s helpful to some extent, but it is mostly for getting an overview. When describing the details, I know of no better representation than the code.
What about asking an LLM how the code works? That’s a good idea but I don’t think it’s mature enough yet. But it might become like asking a longtime developer to explain it.
As code is increasingly written by machines, the practice of code review needs to evolve. I can’t see that we’ll come to a point soon where the makers of software wouldn’t want to peek under the hood. And I think we’ll want to put significant effort on review in the coming years.
One thing you can do with an LLM is to, in some sense, ask the collective consciousness things, in a conversation format. You ask something, get an answer – and that answer is a statistically relevant one for the region of the LLM that your question activated. Ask about perennial flowers and your question shoots off inside the LLM and reaches a region where there’s statistics about text about this topic. And the LLM spits out tokens plucked from this region, and there’s your answer.
The illusion is that you’re having a conversation with an entity that understands and responds to what you’re saying. But I think it’s important to try to remind yourself to not approach them in this way. I’m not sure that AGI will make a difference in this regard, but it will probably make the illusion much more convincing.
I think in some sense that the categories front and back end obscure the fact that it’s (or should be) modules all the way down.
Every part (module!) of the system should have boundaries as strong as those between front and back end. (For a moment ignoring that front and back ends don’t have strong enough boundaries.)
Every module should have integrity. Every module should keep some things to itself, thereby freeing other modules from having to concern themselves with those things.
It should be very clear which the responsibilities of a modules are, and which aren’t.
Are there really categories of responsibilities are wholly belong to either the front or back end?
If it accepts it, that means accepting responsibility for it.
By rejecting it, that signals to the sending party that it is their responsibility to prepare the input in some other way, before sending it.
If the module accepts it, it is then responsible for transforming the input such that it can be processed.
The API should make clear what things will be accepted or rejected.
Anything that is not clear from inspecting the API will be left to exploration. And then there’s the risk that some variations aren’t found before release into the world. Then things will turn out to be rejected when they were expected to be accepted.
A module can choose to be very strict, which reduces the likelihood that such unexpected rejections happen in the wild.
Making software is an exploration. Like hiking in the mountains, sometimes you are in an open landscape: you can see where you are going and you can see where you have been and how long it took to get to where you are now.
At other times, you can see a steep climb, and then you can’t know how long it will take to complete the next stage of your trip, and how much effort – or even if you’ll have to turn back and take another route.
At yet other times, your vision will be obscured, so you know nothing about what lies ahead.
When I’m a member of a software team, I want to help establish an environment where anyone at any level of experience dares to contribute ideas and be open about what they don’t know or understand.
In our profession as software makers, knowledge and experience are central to one’s status. In this field we are still struggling with what it takes to succeed with projects, to deliver things of quality at a competitive pace.
So being seen as someone who is experienced and knows a lot makes you attractive and in the end determines your value.
Sometimes there’s a lot of pressure on a team. The organisation might not be happy about what the team delivers or how long it takes.
If you as a member of the team are struggling to meet the estimates, often this makes you less likely to ask for help, so having a culture on the team where it’s safe to do so is important, for the individuals as well as for the team.
For a member of a team to ask for help, to say that they are confused about something, or worried that they might not be able to do fulfil a task, they must see other team mates do so. So one factor that discourages this is that it seldom happens.
Another factor is that organisations often have a poor understanding of what it means to make software.
We desperately want things to be predictable, that it will be possible to find a way to have an idea and carry it out exactly as envisioned, with as much effort we thought it would be.
Making software means being in a state of constant exploration. We can’t be certain about anything. And when we don’t recognise this openly, we also contribute to a culture where there’s great risk in being open about being uncertain.
In the early days of Agile, before the term was coined, the term ‘light-weight processes’ wasn’t deemed ideal, that it wouldn’t gain traction among those who felt a heavy-weight process was called for. Perhaps it would do better today, when Agile has become bogged down.
Making software, you can’t be sure of anything until it’s in the hands of your users, and you can tell that they understand what it enables them to do, and that it has the desired effects on your business.
Talking to users and drawing the UI in Figma doesn’t make this not true.
We do many things in a software project to gain certainty: do research, draw designs, write specs, discuss things, estimate the effort to implement – and how all this don’t make things as certain as we feel they are. I think it would be a good thing to recognise this uncertainty.
When projects are launched without detailed and rigorous plans, issues are left
unresolved that will resurface during delivery, causing delays, cost overruns,
and breakdowns. […]
Gehry and his team had spent two years up front thinking through and simulating every detail,
in effect building the museum on computers before they built it in reality. […]
Relatively speaking, planning is cheap, delivery is expensive.
There was a time when we thought this was the case for software projects. We, on the other hand,
want to move to delivery as soon as possible, and capture the details in code, not plans.
One major factor in the emergence of agile methods in the 90s was the idea that we needed to manage software projects in harmony with the nature of making software. We fixed some of that, with shorter releases, iterating on plans; the emergence of the web as platform also helped.
But wasn’t agile rather about handling uncertainty and embracing change? Wasn’t it about any type of project facing uncertainty? Well, earlier efforts to get control over software projects were rather about removing uncertainty, deciding early on what shouldn’t change.
The belief was that late change was the very cause that projects failed – either exceeding budgets and deadlines, if finishing at all. We looked to other engineering disciplines in these efforts. Which was a mistake as it meant rejecting the nature of making software.
Also, having frequent, as in at least daily, discussions with those responsible for the behaviour and appearance of product as received by the users. You can’t successfully make software without this.
In some projects, UX becomes a barrier to such discussions. Those discussions have already taken place, and some artifact is handed over. Invariably there are aspects that haven’t been considered, usually leading to compromises in the resulting system design.
Civil dusk. When you can still read. Nautical dusk. When at sea you can still make out the horizon. Astronomical dusk. When you can begin to observe faint stars.
If you create an interface, be it a function or an API endpoint, you have two options for the inputs it takes. You can be strict, and then it’s the callers’ responsibility to prepare the inputs in an exact way. Or you can be loose, and then it is your responsibility to normalise.
Being strict means it’s your responsibility to reject every incorrect input. You can still reject things if you are loose, but for that which you don’t reject you have to be strict about shaping the input in the way the caller would have done if you had been strict.
There are directors who think they have to provoke people to get the best work out of them. […] I try to create a very loose set, filled with jokes and concentration. It sounds surprising, but the two things go together nicely.
Above all, Hitchcock inspires comparison with Shakespeare, a conjunction that some may find hard to justify but which, having studied and written on each, I am more and more inclined to support. Both had a prescribed structure, adapted material taken from elsewhere to their own lexicon and vision, and used humour to season their drama. Both were enormously confident in their imaginations, prolific in their output, and astute in the ways of business and promotion. Both were moralists who took issue with the morality of their day while managing not to show their loyalties too clearly or ever lose popular support. “You have to design your film just as Shakespeare did his plays – for an audience”, Hitchcock told Truffaut.
Shakespeare’s greatness is wedded to language and Hitchcock’s to cinematic expression, in keeping with the ages in which they worked. Hitchcock’s apprenticeship during the silent era meant that he would always think, first, in images, and his greatest sequences are without dialogue (though often brilliantly scored). Some of my favourites: the camera closing in on the blinking drummer in Young and Innocent (1937); the train’s arrival with a cloud of black smoke bringing the suave killer to his sister’s family in Shadow of a Doubt; the extraordinary stalking and strangling scene in the fairground in Strangers on a Train; the reverse crane shot that shows us the key in Ingrid Bergman’s hand in Notorious (1946); the panning shot that opens Rear Window and tells us everything we need to know about the laid-up protagonist; the stabbing in the shower in Psycho; and the movement through the cemetery in Family Plot (1976). Even a disappointingly incoherent film like Topaz (1969) is made unforgettable by the image of a woman falling to the floor after being shot, her purple dress cascading around her like a pool of blood, but also like a flower, finding its fullest expression in the moment of death.