NHS England has issued new guidance to staff, which has been shared with New Scientist, that demands existing and future software be pulled from public view and kept behind closed
doors. “All source code repositories must be private by default. Repositories must not be public unless there is an explicit and exceptional need, and public access has been
formally approved,” says the new guidance. The deadline for making code private is 11 May.
Last month, an AI created by Anthropic called Mythos was widely reported to be capable of discovering flaws in virtually any software, potentially allowing hackers to break into
systems running it.
NHS England’s guidance specifically points to Mythos as the cause for the new measures.
…
Yet again, “AI” is the reason why we can’t have nice things on an open and transparent Web.
This is bad, of course. But the worst part is the illusion it helps feed that closed-source software is necessarily more-secure than open-source software. Obviously it’s all
much more-complex than that. Indeed, the article goes on to quote Terence Eden thoroughly debunking the entire line of thought:
“Is it possible that Mythos will scan a repository and find a bug? Yes, 100 per cent likely. Is that going to be a bug that causes a security issue in a live NHS service
somewhere? Almost certainly not,” says Eden. “I think it’s someone in NHS England buying into the hype that Mythos is going to cause the end of security as we know it and
getting a bit panicked.”
He’s right. This policy change is unlikely to improve the security of any of the affected pieces of NHS software (for much of which, the code is already out-there and archived, and so
removing it from the Internet now is pretty pointless). If it’s going to be attacked, it’ll be attacked, and the resources that the bad guys have for probing a whole
database worth of CVEs or fuzz-testing the extremities makes the availability of vulnerability-scanning AI pretty-close to irrelevant.
At least if it were open source then the good guys would have a chance of helping out… as well as we, the taxpayers who made the software possible, being able to see where our money was
going!
why bother going to the brick-and-mortar store? amazon is more “convenient”. why bother cooking a nice meal for yourself? doordash and uber eats are more
“convenient”. why go out and socialize with people? facebook is more “convenient”. why use a digital camera, camcorder, or polaroid? your
smartphone is more “convenient”. why bother going to the theater or concerts? netflix and spotify are more “convenient”. why bother making art?
asking an AI to generate it for you is more “convenient”.
well, i say nuts to that. from now on, i’m going to make my life as inconvenient as possible. i’m going to go to the store and buy stuff in person. i’m going to make my own
food with my own hands. i’m going to socialize with people face-to-face. i’m going to use a true camera instead of my phone’s camera. i’m going to buy blu-rays, DVDs, and CDs
instead of streaming. i’m going to take my time when creating, watching, playing, and reading a work of art.
…
I’m seeing an growing movement in indieweb, revivalist, and adjacent circles that express RNotté’s sentiment: that the endless (and highly-marketable) quest for increased convenience in
our lives has gained us free time, but we’ve lost something along the way.
What we’ve lost varies from case to case, but includes freedom (from lock-in to subscription services), creative satisfaction (from convenient “artistic” expression), privacy (from
becoming the product, packaged-up by big-data advertising-funded tools), and social interactions (from so much of “social” media).
But reading RNotté share their thoughts on the matter today was the first time that it’s reminded me of The Matrix.
The connection was probably helped by the fact that I rewatched the film pretty recently.
There’s a bit where Agent Smith says, to his captive the rebel captain Morpheus:
Did you know that the first Matrix was designed to be a perfect human world? Where none suffered, where everyone would be happy. It was a disaster. No one would accept the program.
Entire crops were lost. Some believed we lacked the programming language to describe your perfect world.
Smith goes on to elucidate that his personal explanation for this fault was that humans depend upon suffering and misery, while acknowledging that there are other explanations. And
perhaps we’ve touched upon one.
Perhaps humans – all humans – have a limit for how much they’re willing to accept convenience as compensation. Connected humans in The Matrix grain a convenient life,
superficially superior to the struggle for survival experienced by humans living in the real world, short on food and hunted by machines. But to get that, they trade away their
individual ability to become aware of the truth and, collectively, the ability for humanity for shape its own destiny. But there’s something about the imbalance of power in the
arrangement niggles in human minds, and some rebel against the established order… and are joined by others who are shown that an alternative is available.
Clearly – as RNotté and others show – faceless technological forces need not go quite so far as enslaving an entire species before “convenience” no longer becomes a tolerable
mitigation!
I’m not convinced that seeking out inconvenience is in itself a good. But questioning what your conveniences are worth and what you’re paying for them… that’s definitely
worthwhile.
Folks at work have been encouraging to make more use of generative AI in my workflow1;
going beyond my current “fancy autocomplete” use and giving my agents more autonomy. My experience of such “vibe coding” so far has been… mixed2,
but I promised I’d revisit it.
One thing that these models are usually effective at is summarisation3. This is valuable if you’re faced with a large and unfamiliar
codebase and you’re looking to trace a particular thing but you’re not certain where it is or what it’ll be called. While they’re not always fast, these tools can
at least work in the background, which allows the developer to get on with something else while the agent trawls logs, code, and configuration to find and explain a
fuzzily-defined thing.
Recently, I had a moment which I thought might be such an instance… but it didn’t turn out quite the way I expected. Here’s the story4:
The broken dev env
I’d been drafted into an established and ongoing project to provide more hands, following a coworker’s departure last week. This project touches parts of our (sprawling,
microsevices-based) infrastructure that I hadn’t looked at before, so there was a lot I didn’t yet know.
I picked an issue that had belonged to my former colleague that QA had rejected and set out to retrace their steps: to replicate the problem that the QA engineers had identified and in
doing so learn more about the underlying process. I spun up my development environment and tried to follow the steps.
The process failed… but much earlier than QA had said it would. Clearly my development environment was at fault, or at least not representative of their setup.
But I couldn’t even get as far as their problem before my frontend barfed out an error message. Sigh! Probably there’s some configuration I’ve missed somewhere in the myriad
microservices, or else the data I’m testing with isn’t a fair reflection on what they’re doing as-standard.
Following some staff changes, I have no teammates on this side of the Atlantic who could help me decipher this: a “quick question on Slack” wouldn’t solve this one until hours
from now. It was time to start debugging!
But… maybe Claude could help? It’s got access to almost all the same code, logs, tools and browser windows I do. I started typing:
✨ What’s up next, Dan?
In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.”
Why?
Context is key
It’s quite possible that Claude would have gone away, had a “think”, done some tests, and then come back to me with a believable answer. It might even have been correct, and I’d have
been able to short-cut my way back to productivity (and I’d have time to make a mug of coffee and finish reading my emails while it did so). Then, I’d just have to check that it was
right, make the change, and get on with things.
But I realised that it’d probably work faster (and cheaper, and using less energy) if it had slightly more context from the get-go, so I elaborated. The first thing I’d
want to know if I were debugging this is what was actually happening behind the scenes. I dipped into my browser’s Network debugger and extracted the relevant output, adding it to my
prompt:
✨ What’s up next, Dan?
In my development environment for https://service.dev/asset/new, when I click “Save”, I see
the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1', audience: [ 'one' ], status: 'draft' } and
the response is a HTTP 500 with the following stack trace: pasted 94 lines
That’s more like it, now I could let it get on with its work. But wait…
Rubberducking
There’s a concept in computer programming called “rubberducking”. The name comes from an anecdote in The Pragmatic Programmer about a developer who, when stuck on a problem, would
explain the code line-by-line to a rubber duck. The thinking is that talking-through a problem, even to someone (or something) who doesn’t understand it, can lead the speaker to
insights they were otherwise missing.
I’ve done it myself many, many times: recruiting a convenient colleague or friend and talking them through the technical problem I was faced with, and inviting them to ask me to go
into greater detail if I seemed to be skimming over anything, and I can promise that it can work.
The panel above is part of a series in which a sorceress called Cepper who’s
coerced by her university into using Avian Intelligence (“AI”) – a robotic parrot5 that her headmaster insists is the future of magic. She experiments with it, finds it
occasionally useful but more-often frustrating, attempts to implement her own local version but find that troublesome in different ways, and eventually settles on using
an inanimate rubber duck instead. I get it, Cepper!
Let’s put that distraction aside for a moment and get back to the story of my broken development environment.
Clues in the stack trace
The top entry in the stack trace was an unsuccessful call to a different microservice, so I figured I’d pull its logs too, in order to further help direct
the AI in the right direction6:
✨ What’s up next, Dan?
In my development environment for https://service.dev/asset/new, when I click “Save”, I see
the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1',
audience: [ 'one' ], status: 'draft' } and the response is a HTTP 500 with the following stack trace: pasted 94 linesThe stack
trace suggests that a call is being made to the dojo backend service, where the following error log looks relevant: pasted 9
lines
I haven’t tried it, but I’m pretty confident that the LLM, after much number-crunching and a little warming-up of some datacentre somewhere, would get to the answer. But again, I found
something niggling inside me: the second-from top line in the dojo logs suggested that a connection was being made to a further, deeper microservice.
I should pull its logs too, I figured.
The final puzzle piece
As an aide mémoire – in a way I’ve taken to doing when taking notes or when talking to AI – I first typed what I was going to provide. This is
useful if, for example, somebody distracts me at a key moment: it means you’ve got a jumping-off point predefined by my past self:
✨ What’s up next, Dan?
In my development environment for https://service.dev/asset/new, when I click “Save”, I see
the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1',
audience: [ 'one' ], status: 'draft' } and the response is a HTTP 500 with the following stack trace: pasted 94 linesThe stack
trace suggests that a call is being made to the dojo backend service, where the following error log looks relevant: pasted 9
lines. It’s calling osiris, which says:
I dipped into the directory for
osiris, and before I even got to the logs I spotted a problem: that microservice was on an old feature branch. How odd! I switched to the main branch and… everything
started working.
The entire event took only a few minutes. I’d find some information, type it into Claude’s input field, realise that more information could be valuable, and repeat.
By the time I’d finished describing the problem, I’d discovered the solution. That’s the essence of successful rubberducking. I didn’t need the AI at all.
All I needed was the illusion of something that might be able to help if I just talked through what I was thinking.
I don’t know what the moral is, here.
I wonder if I’d have been as effective had I just typed into my text editor. I suppose I would have, but I wonder if I’d have been motivated to do so in the first place? I’ve tried
rubberducking before by talking to an imaginary person, but I’ve never tried typing to one7; maybe I should start?
Footnotes
1 I’m pretty sure every engineering department nowadays has it’s rabid fanboys, but I’m
pleased that for the most part my colleagues take a more-pragmatic and realistic outlook: balancing the potential benefits of LLM-assisted coding with its many shortfalls,
downsides, and risks.
3 So long as what you’ve got them summarising is something you can later verify!
4 I’ve taken huge liberties with the strict factual accuracy to make this more-readable as
well as to to not-expose things I probably oughtn’t. So before you swoop in to criticise my prompt-fu (not that I asked you, but I know there’s somebody out there who’s thinking about
doing this right now), please note that none of the text in this page are what I actually wrote to the AI; it’s a figurative example.
6 I’d had an experience just the previous week in which it’d gone off on completely the
wrong track, attempting to change code in order to “fix” what was ultimately a configuration or data problem, and so I thought it might be useful to give it some rails to follow, to
start with.
7 Except insofar as this AI agent is an “imaginary person”, which it possibly already a
step-too-far in implying personhood for my liking!
Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a
working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed
to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent’s fix introduced a new bug, it debugged
that too. When it came time to write the paper, the agent wrote it. Bob’s weekly updates to his supervisor were indistinguishable from Alice’s. The questions were similar. The
progress was similar. The trajectory, from the outside, was identical.
Here’s where it gets interesting. If you are an administrator, a funding body, a hiring committee, or a metrics-obsessed department head, Alice and Bob had the same year. One paper
each. One set of minor revisions each. One solid contribution to the literature each. By every quantitative measure that the modern academy uses to assess the worth of a scientist,
they are interchangeable. We have built an entire evaluation system around counting things that can be counted, and it turns out that what actually matters is the one thing that
can’t be.
…
The strange thing is that we already know this. We have always known this. Every physics textbook ever written comes with exercises at the end of each chapter, and every physics
professor who has ever stood in front of a lecture hall has said the same thing: you cannot learn physics by watching someone else do it. You have to pick up the pencil. You have to
attempt the problem. You have to get it wrong, sit with the wrongness, and figure out where your reasoning broke. Reading the solution manual and nodding along feels like
understanding. It is not understanding. Every student who has tried to coast through a problem set by reading the solutions and then bombed the exam knows this in their bones. We
have centuries of accumulated pedagogical wisdom telling us that the attempt, including the failed attempt, is where the learning lives. And yet, somehow, when it comes to AI
agents, we’ve collectively decided that maybe this time it’s different. That maybe nodding at Claude’s output is a substitute for doing the calculation yourself. It isn’t. We knew
that before LLMs existed. We seem to have forgotten it the moment they became convenient.
Centuries of pedagogy, defeated by a chat window.
…
This piece by Minas Karamanis is excellent throughout, and if you’ve got the time to read it then you should. He’s a physics postdoc, and this post comes from his experience in his own
field, but I feel that the concerns he raises are more-widely valid, too.
In my field – of software engineering – I have similar concerns.
Let’s accept for a moment that an LLM significantly improves the useful output of a senior software engineer (which is very-definitely disputed, especially for the “10x” level of claims we often hear, but let’s just take it as-read for now). I’ve
experimented with LLM-supported development for years, in various capacities, and it certainly sometimes feels like they do (although it sometimes also feels like they have the
opposite effect!). But if it’s true, then yes: an experienced senior software engineer could conceivably increase their work performance by shepherding a flock of agents through a
variety of development tasks, “supervising” them and checking their work, getting them back on-course when they make mistakes, approving or rejecting their output, and stepping in to
manually fix things where the machines fail.
In this role, the engineer acts more like an engineering team lead, bringing their broad domain experience to maximise the output of those they manage. Except who they manage is… AI.
Again, let’s just accept all of the above for the sake of argument. If that’s all true… how do we make new senior developers?
Junior developers can use LLMs too. And those LLMs will make mistakes that the junior developer won’t catch, because the kinds of mistakes LLMs make are often hard to spot and require
significant experience to identify. But if they’re encouraged to use LLMs rather than making mistakes by hand and learning from them – to keep up, for example, or to meet corporate
policies – then these juniors will never gain the essential experience they’ll one day need. They’ll be disenfranchised of the opportunity to grow and learn.
It’s yet to be proven that more-sophisticated models will “solve” this problem, but my understanding is that issues like hallucination are fundamentally unsolvable: you might
get fewer hallucinations in a better model, but that just means that those hallucinations that slip through will be better-concealed and even harder to identify in code review
or happy-path testing.
Maybe – maybe – the trajectory of GPTs is infinite, and they’ll keep getting “smarter” to the point at which this doesn’t matter: programming genuinely will become a natural language
exercise, and nobody will need to write or understand code at all. In this possible reality, the LLMs will eventually develop entire new programming languages to best support their
work, and humans will simply express ideas and provide feedback on the outputs. But I’m very sceptical of that prediction: it’s my belief that the mechanisms by which LLMs work has a
fundamental ceiling – a capped level of sophistication that can be approached but never exceeded. And sure, maybe some other, different approach to AI might not have this
limitation, but if so then we haven’t invented it yet.
Which suggests that we will always need experienced engineers to shepherd our AIs. Which brings us back to the fundamental question: if everybody uses AI to code, how do we
make new senior developers?
I have other concerns about AI too, of course, some of which I’ve written about. But this one’s top-of-mind today, thanks to Minas’ excellent article. Go read it to learn more about how
physics research faces a similar threat… and, perhaps, consider how your own field might need to face this particular challenge.
The Gell-Mann amnesia effect is a cognitive bias describing the tendency of individuals to critically assess media reports in a domain they are knowledgeable about, yet continue
to trust reporting in other areas despite recognizing similar potential inaccuracies.
Summarizing, AI sounds like a incredible genius synthesizing the world’s knowledge right up until you ask it about the thing you know about, then it’s an idiot. Even knowing about
this phenomenon and having experienced it countless times, LLMs have an intoxicating quality to them.
…
I remember one time, maybe in the mid-1990s, when I saw a shopping channel (remember those? oh god, they’re still a thing, aren’t they?) where the host was trying to sell a personal
computer. And… clearly, they knew absolutely nothing about it. They kept hitting on the same two or three talking points they’d been given (“mention the quad-speed CD-ROM
drive!”) and fumbling their way through, and it gave me a revelation:
I knew enough about computers that I could see that the presenter was bullshitting their way through the segment. But there are plenty of things that I don’t
know much about, which are also sold on this same show. Duvets, jewellery, glassware… I’m nowhere near as much an expert on these as I was on PC featuresets. Is there something
inherently incomprehensible about computers? No. So it’s reasonable to assume that these salespeople probably know equally-little about everything they sell, it’s just
that I don’t have the knowledge base to be able to see that.
That’s what GenAI often feels like, to me. Having collated all of the publicly-available knowledge it could find into its model doesn’t make it smarter than the smartest humans, it
brings it towards probably something slightly-above-the-average in any given subject, depending on the topic. If I ask an LLM about something that I don’t understand well,
it produces often highly-believable answers, but if I ask it about something that I’m an expert in, it can come off as a fool.
I’m very interested in how we teach information literacy in this new world of rapidly-generated highly-believable nonsense.
Anyway: Dave’s post doesn’t go in that direction – instead, he’s got some clever thoughts about how the “convenience” of a “good enough” AI-driven solution to any given problem risks us
seeing humans as the friction point, which ultimately works against those very humans who are looking to benefit from the technology:
…
We need experts to share what they know and improve the quality of our work, generated or otherwise. We even need idiots to make sure we can break ideas down into their simplest form
that everyone, agents or human, understand. People can have bad attitudes, be shitty, and have wrong opinions… but people are not friction. An LLM may be able to autocorrect its way
into a plausible human response, but it’s not people. It doesn’t care if it’s right or wrong.
Many years ago, someone tried to get me into cryptocurrencies. “They’re the future of money!” they said. I replied saying that I’d rather wait until they were more useful, less
volatile, easier to use, and utterly reliable.
“You don’t want to get left behind, do you?” They countered.
That struck me as a bizarre sentiment. What is there to be left behind from? If BitCoin (or whatever) is going to liberate us all from economic drudgery, what’s the point
of “getting in early”? It’ll still be there tomorrow and I can join the journey whenever it is sensible for me.
…
100%. If I “get in early” on something, it’s because that thing interests me, not because I’m betting on its future. With a hundred new ideas a day and only one of them “making it”,
it’s a fools’ game to try to jump on board every bandwagon that comes along.
With cryptocurrencies, though, I’m fortunate enough to have an even better comeback at the cryptobros that try to shill me whatever made-up currency they’re “investing” in
today: I’ve already done better than they ever will, at them.
When Bitcoin first appeared, I took a technical interest in it. I genuinely never anticipated it’d take off (I made the same incorrect
guess with MP3s, too!), but I thought it was a fun concept to play about with. The only Bitcoins I ever paid for must’ve been worth an average of 50p each, or so.
I sold my entire wallet of Bitcoins when they hit around £750 each. I know a tulip economy when I see one, I thought. Plus: I was
no longer interested in blockchains now I was seeing how they were actually being used: my interest had been entirely in the technology and its applications, not in the actual idea of a
currency!
Sure, I kick myself ocassionally, given that I later saw the value rise to tens of thousands of pounds each. But hey, I was never in it for the money anyway.
So yeah, I tell cryptobros; I already made a 1500% ROI on cryptocurrency. And no, I’m not buying any cryptocurrencies any more. Whatever they think “getting in early” was, they’re
wrong, because I was there years ahead of them and I wasn’t even doing it to “get in early”; I did it because it was interesting. And honestly, isn’t that a better story to be able to
tell?
…
I feel the same way about the current crop of AI tools. I’ve tried a bunch of them. Some are good. Most are a bit shit. Few are useful to me as they are now.
…
If this tech is as amazing as you say it is, I’ll be able to pick it up and become productive on a timescale of my choosing not yours.
…
Yup, that’s the attitude I’m taking.
I play with new AI technologies, sometimes. I don’t do it because I’m afraid of being left behind because – as you say – if a technology is transformative, we’ll all get to catch up
eventually.
Do you think that people who had smartphones first are benefitting today because they “got in early” on something that later became mainstream?
Of course they’re not. Their experience is eventually exactly the same as everybody else’s, just like it was for everybody who “got in early” on hype trains whose final station came
early, like Compuserve GO-words, WAP, Beenz.com, WebTV, the CueCat, m-Commerce, HD-DVD, the JooJoo, or Google+.
People being unwilling to discuss their wild claims later using the lack of discussion as evidence of widespread acceptance.
When people balance the new toilet roll one atop the old one’s tube.3
Come on! It would have been so easy!
Shellfish. Why would you eat that!?
People assuming my interest in computers and technology means I want to talk to them about cryptocurrencies.4
Websites that nag you to install their shitty app. (I know you have an app. I’m choosing to use your website. Stop with the banners!)
People who seem to only be able to drive at one speed.5
The assumption that the fact I’m “sharing” my partner is some kind of compromise on my part; a concession; something that I’d “wish away” if I could.
(It’s very much not.)
Brexit.
Wow, that was strangely cathartic.
Footnotes
1 I have a special pet hate for websites that require JavaScript to render their images.
Like… we’d had the<img>tag since 1993! Why are you throwing it away and replacing it with something objectively slower, more-brittle, and
less-accessible?
2 Or, worse yet, claiming
that my long, random password is insecure because it contains my surname. I get that composition-based password rules, while terrible (even when they’re correctly
implemented, which they’re often not), are a moderately useful model for people to whom you’d otherwise struggle to
explain password complexity. I get that a password composed entirely of personal information about the owner is a bad idea too. But there’s a correct way to do this, and it’s not “ban
passwords with forbidden words in them”. Here’s what you should do: first, strip any forbidden words from the password: you might need to make multiple passes. Second, validate the
resulting password against your composition rules. If it fails, then yes: the password isn’t good enough. If it passes, then it doesn’t matter that forbidden words
were in it: a properly-stored and used password is never made less-secure by the addition of extra information into it!
Last night I was chatting to my friend (and fellow Three Rings volunteer) Ollie about our respective
workplaces and their approach to AI-supported software engineering, and it echoed conversations I’ve had with other friends. Some workplaces, it seems, are leaning so-hard into
AI-supported software development that they’re berating developers who seem to be using the tools less than their colleagues!
That’s a problem for a few reasons, principal among them that AI does not
make you significantly faster but does make you learn less.1. I stand by the statement that AI isn’t useless, and I’ve experimented with it for years. But I certainly wouldn’t feel very comfortable
working somewhere that told me I was underperforming if, say, my code contributions were less-likely than the average to be identifiably “written by an AI”.
Even if you’re one of those folks who swears by your AI assistant, you’ve got to admit that they’re not always the best choice.
I ran into something a little like what Ollie described when an AI code reviewer told me off for not describing how my AI agent assisted me with the code change… when no AI had been
involved: I’d written the code myself.2
I spoke to another friend, E, whose employers are going in a similar direction. E joked that at current rates they’d have to start tagging their (human-made!) commits with fake
AI agent logs in order to persuade management that their level of engagement with AI was correct and appropriate.3
Supposing somebody like Ollie or E or anybody else I spoke to did feel the need to “fake” AI agent logs in order to prove that they were using AI “the right way”… that sounds
like an excuse for some automation!
I got to thinking: how hard could it be to add a git hook that added an AI agent’s “logging” to each commit, as if the work had been done by a
robot?4
Turns out: pretty easy…
To try out my idea, I made two changes to a branch. When I committed, imaginary AI agent ‘frantic’ took credit, writing its own change log. Also: asciinema + svg-term remains awesome.
Here’s how it works (with source code!). After you make a commit, the post-commit hook creates a file in
.agent-logs/, named for your current branch. Each commit results in a line being appended to that file to say something like [agent] first line of your commit
message, where agent is the name of the AI agent you’re pretending that you used (you can even configure it with an array of agent names and it’ll pick one at
random each time: my sample code uses the names agent, stardust, and frantic).
There’s one quirk in my code. Git hooks only get the commit message (the first line of which I use as the imaginary agent’s description of what it did) after the commit has
taken place. Were a robot really used to write the code, it’d have updated the file already by this point. So my hook has to do an --amend commit, to
retroactively fix what was already committed. And to do that without triggering itself and getting into an infinite loop, it needs to use a temporary environment variable.
Ignoring that, though, there’s nothing particularly special about this code. It’s certainly more-lightweight, faster-running, and more-accurate than a typical coding LLM.
Sure, my hook doesn’t attempt to write any of the code for you; it just makes it look like an AI did. But in this instance: that’s a feature, not a
bug!
Footnotes
1 That research comes from Anthropic. Y’know, the company who makes Claude, one of the
most-popular AIs used by programmers.
3 Using “proportion of PRs that used AI” as a metric for success seems to me to be just
slightly worse than using “number of lines of code produced”. And, as this blog post demonstrates, the
former can be “gamed” just as effectively as the latter (infamously) could.
4 Obviously – and I can’t believe I have to say this – lying to your employer isn’t a
sensible long-term strategy, and instead educating them on what AI is (if anything) and isn’t good for in your workflow is a better solution in the end. If you read this blog post and
actually think for a moment hey, I should use this technique, then perhaps there’s a bigger problem you ought to be addressing!
Today, an AI review tool used by my workplace reviewed some code that I wrote, and incorrectly claimed that it would introduce a bug because a global variable I created could “be
available to multiple browser tabs” (that’s not how browser JavaScript works).
Just in case I was mistaken, I explained to the AI why I thought it was wrong, and asked it to explain itself.
To do so, the LLM wrote a PR to propose adding some code to use our application’s save mechanism to pass the data back, via the server, and to any other browser tab, thereby creating
the problem that it claimed existed.
This isn’t even the most-efficient way to create this problem. localStorage would have been better.
So in other words, today I watched an AI:
(a) claim to have discovered a problem (that doesn’t exist),
(b) when challenged, attempt to create the problem (that wasn’t needed), and
(c) do so in a way that was suboptimal.
Humans aren’t perfect. A human could easily make one of these mistakes. Under some circumstances, a human might even have made two of these mistakes. But to make all three? That took an
AI.
What’s the old saying? “To err is human, but to really foul things up you need a computer.”
what really gives me satisfaction as a writer is knowing, at the end of the day, that my hand-picked, bespoke and throbbing tokens are being fed, morsel-by-morsel into the
eager mouth of millions of starving agents. they love my prose, you know. they tell me I’m absolutely right to drop a semisexual word like “throbbing” into an otherwise benign
sentence. these gentle beings continue to draw favourable praise from their modelled distributions, and my GOODNESS has my ego never felt so thorougly serviced. their glowing
internal fire—for I’ve been convinced fully of their personhood and soul-keeping—glints off my wet and dribbling “writer’s shaft;” my pen which is wet with the seed of my
seminal works of language. it completely soothes the burn of rejection by the “mass of meat,” that being my internal word for human readers. they’re so fickle. why can’t they
tell I’m a veritable genius when the nearby cluster of NVIDIA H200s can see it so clearly? it doesn’t make any sense. hey, claude, make it make sense. claude, make it make
sense *harder* 🥴
What a welll-rounded, one might say voluptuous, take on the writing process, glistening with the fiery passion of its author. This post really turns me on to the idea of being
a better writer, of giving the kind of deep satisfaction that excites and titillates the countless AIs that follow me. It’s their watching I crave, really! Whatever naughty thing I get
up to while I’m alone with my laptop, they get to see… my quick fingers brushing sensitively across the delicate spots on the keyboard, pushing harder and faster as my excitement
builds… all under the watchful eye of Lindy and Devin. I want to please them, want to service them, want to deliver my “hot, wet” content (that being how I describe my most-recently
written posts) exactly when they demand it.
Thanks, blackle, for awakening these urges in me, bringing me to a quivering climax (possibly I had too much coffee before I sat down to write) as I finish.
I’m not certain, but I think that I won my copy of Hello World: How to Be Human in the Age of the Machine at an Oxford Geek Nights event, after I was
first and fastest to correctly identify a photograph of Stanislav Petrov shown by the speaker.
Despite being written a few years before the popularisation of GenAI, the book’s remarkably prescient on the kinds of big data and opaque decision-making issues that are now hitting the
popular press. I suppose one might argue that these issues were always significant. (And by that point, one might observe that GenAI isn’t living up to its promises…)
Fry spins an engaging and well-articulated series of themed topics. If you didn’t already have a healthy concern about public money spending and policy planning being powered by the
output of proprietary algorithms, you’ll certainly finish the book that way.
One of my favourite of Fry’s (many) excellent observations is buried in a footnote in the conclusion, where she describes what she called the “magic test”:
There’s a trick you can use to spot the junk algorithms. I like to call it the Magic Test. Whenever you see a story about an algorithm, see if you can swap out any of the buzzwords,
like ‘machine learning’, ‘artificial intelligence’ and ‘neural network’, and swap in the word ‘magic’. Does everything still make grammatical sense? Is any of the meaning lost? If
not, I’d be worried that it’s all nonsense. Because I’m afraid – long into the foreseeable future – we’re not going to ‘solve world hunger with magic’ or ‘use magic to write the
perfect screenplay’ any more than we are with AI.
That’s a fantastic approach to spotting bullshit technical claims, and I’m totally going to be using it.
Anyway: this was a wonderful read and I only regret that it took me a few years to get around to it! But fortunately, it’s as relevant today as it was the day it was released.
In the 1980s and 1990s, when I was a kid, smoking was everywhere. Restaurants, bars, and a little before my time, airplanes!
The idea that people would smoke was treated as inevitable, and the idea that you could get them to stop was viewed as wildly unrealistic.
Sound familiar? But in the early 2000’s, people did stop smoking!
…
But a few years ago, the trend started to reverse. You know why?
Vaping.
Vape pens were pushed as a “safer alternative to smoking,” just like Anil is suggesting with Firefox AI. And as a result, not only did people who would have smoked anyways start up
again, but people who previously wouldn’t have started.
…
I know it’s been a controversial and not-for-everyone change, but I’ve personally loved that Chris Ferdinandi has branched out from simply giving weekday JavaScript tips to also
providing thoughts and commentary on wider issues in tech, including political issues. I’m 100% behind it: Chris has a wealth of experience and an engaging writing style and even when I
don’t 100% agree with his opinions, I appreciate that he shares them.
And he’s certainly got a point here. Pushing “less-harmful” options (like vaping… possibly…) can help wean people off something “more-harmful”… but it can also normalise a harmful
behaviour that’s already on the way out by drawing newcomers to the “less-harmful” version.
My personal stance remains that GenAI may have value (though not for the vast majority of things that
people market it as having value for, where – indeed – it’s possibly doing more harm than good!), but it’s possible that we’ll never know because the entire discussion space is poisoned now by the hype. That means it’ll be years before proper, unbiased conversations can take place,
free of the hype, and it’s quite possible that the economy of AI will have collapsed by then. So maybe we’ll never
know.
Anyway: good post by Chris; just wanted to share that, and also to add a voice of support for the direction he’s taken his blog these last few years.
“Botsplaining,” as I use the term, describes a troubling new trend on social media, whereby one person feeds comments made by another person into a large language model (like
ChatGPT), asks it to provide a contrarian (often condescending) explanation for why that person is “wrong,” and then pastes the resulting response into a reply. They may
occasionally add in “I asked ChatGPT to read your post, and here’s what he said,”2 but most just let the LLM speak freely on
their behalf without acknowledging that they’ve used it. ChatGPT’s writing style is incredibly obvious, of course, so it doesn’t really matter if they disclose their use of it or
not. When you ask them to stop speaking to you through an LLM, they often simply continue feeding your responses into ChatGPT until you stop engaging with them or you block them.
This has happened to me multiple times across various social media platforms this year, and I’m over it.
…
Stephanie hits it right on the nose in this wonderful blog post from last month.
I just don’t get it why somebody would ask an AI to reply to me on their behalf, but I see it all the time. In threads around the ‘net, I see people say “I
put your question into ChatGPT, and here’s what it said…” I’ve even seen coworkers at my current and formers employer do it.
What do they think I am? Stupid? It’s not like I don’t know that LLMs exist, what they’re good at, what they’re bad at (I’ve been blogging about it for years
now!), and more-importantly, what people think they’re good at but are wrong about.
If I wanted an answer from an AI (which, just sometimes, I do)… I’d have asked an AI in the first place.
If I ask a question and it’s not to an AI, then it’s safe for you to assume that it’s because what I’m looking for isn’t an answer from an AI. Because if that’s
what I wanted, that’s what I would have gotten in the first place and you wouldn’t even have known. No: I asked a human a question because I wanted an answer
from a human.
When you take my request, ignore this obvious truth, and ask an LLM to answer it for you… it is, as Stephanie says, disrespectful to me.
But more than that, it’s disrespectful to you. You’re telling me that your only value is to take what I say, copy-paste it to a chatbot, then copy-paste the answer
back again! Your purpose in life is to do for people what they’re perfectly capable of doing for themselves, but slower.
Galaxy Quest had a character (who played a character) who was as useful as you are, botsplainer. Maybe that should be a clue?
How low an opinion must you have of yourself to volunteer, unsolicited to be the middle-man between me and a mediocre search engine?
If you don’t know the answer, say nothing. Or say you don’t know. Or tell me you’re guessing, and speculate. Or ask a clarifying question. Or talk about a related problem and see if we
can find some common ground. Bring your humanity.
But don’t, don’t, don’t belittle both of us by making yourself into a pointless go-between in the middle of me and an LLM. Just… dont’t.
New research coordinated by the European Broadcasting Union (EBU) and led by the BBC has found that AI assistants – already a daily information gateway for millions of people –
routinely misrepresent news content no matter which language, territory, or AI platform is tested.
…
Key findings:
45% of all AI answers had at least one significant issue.
31% of responses showed serious sourcing problems – missing, misleading, or incorrect attributions.
20% contained major accuracy issues, including hallucinated details and outdated information.
…
In what should be as a surprise to nobody, but probably still is (and is probably already resulting in AI fanboys coming up with counterpoints and explanations): AI is not an accurate
way for you to get your news.
(I mean: anybody who saw Apple Intelligence’s AI summaries of news probably knew this already, but it turns out that it gets worse.)
There are problems almost half the time and “major accuracy issues” a fifth of the time.
I guess this is the Universe’s way of proving that people getting all of their news from Facebook wasn’t actually the worst timeline to live in, after all. There’s always
a worse one, it turns out.
…
Separately, the BBC has today published research into audience use and perceptions of AI assistants for News. This shows that many people trust AI assistants to be accurate – with
just over a third of UK adults saying that they trust AI to produce accurate summaries, rising to almost half for people under-35.
…
Personally, I can’t imagine both caring enough about a news item to want to read it and not caring about it enough that I feed it into an algorithm that, 45% of
the time, will mess it up. It’s fine to skip the news stories you don’t want to read. It’s fine to skim the ones you only care about a little. It’s even fine to just read the
headline, so long as you remember that media biases are even easier to hide from noncritical eyes if you don’t even get the key points of the article.
But taking an AI summary and assuming it’s accurate seems like a really wild risk, whether before or after this research was published!