ai – Dan Q

NHS England rushes to hide software over AI hacking fears

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

NHS England has issued new guidance to staff, which has been shared with New Scientist, that demands existing and future software be pulled from public view and kept behind closed doors. “All source code repositories must be private by default. Repositories must not be public unless there is an explicit and exceptional need, and public access has been formally approved,” says the new guidance. The deadline for making code private is 11 May.

Last month, an AI created by Anthropic called Mythos was widely reported to be capable of discovering flaws in virtually any software, potentially allowing hackers to break into systems running it.

NHS England’s guidance specifically points to Mythos as the cause for the new measures.

…

Matthew Sparkes (New Scientist), in NHS England rushes to hide software over AI hacking fears

Yet again, “AI” is the reason why we can’t have nice things on an open and transparent Web.

This is bad, of course. But the worst part is the illusion it helps feed that closed-source software is necessarily more-secure than open-source software. Obviously it’s all much more-complex than that. Indeed, the article goes on to quote Terence Eden thoroughly debunking the entire line of thought:

“Is it possible that Mythos will scan a repository and find a bug? Yes, 100 per cent likely. Is that going to be a bug that causes a security issue in a live NHS service somewhere? Almost certainly not,” says Eden. “I think it’s someone in NHS England buying into the hype that Mythos is going to cause the end of security as we know it and getting a bit panicked.”

He’s right. This policy change is unlikely to improve the security of any of the affected pieces of NHS software (for much of which, the code is already out-there and archived, and so removing it from the Internet now is pretty pointless). If it’s going to be attacked, it’ll be attacked, and the resources that the bad guys have for probing a whole database worth of CVEs or fuzz-testing the extremities makes the availability of vulnerability-scanning AI pretty-close to irrelevant.

At least if it were open source then the good guys would have a chance of helping out… as well as we, the taxpayers who made the software possible, being able to see where our money was going!

Altogether a bad move by the NHS, here.

rejecting convenience

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

why bother going to the brick-and-mortar store? amazon is more “convenient”. why bother cooking a nice meal for yourself? doordash and uber eats are more “convenient”. why go out and socialize with people? facebook is more “convenient”. why use a digital camera, camcorder, or polaroid? your smartphone is more “convenient”. why bother going to the theater or concerts? netflix and spotify are more “convenient”. why bother making art? asking an AI to generate it for you is more “convenient”.

well, i say nuts to that. from now on, i’m going to make my life as inconvenient as possible. i’m going to go to the store and buy stuff in person. i’m going to make my own food with my own hands. i’m going to socialize with people face-to-face. i’m going to use a true camera instead of my phone’s camera. i’m going to buy blu-rays, DVDs, and CDs instead of streaming. i’m going to take my time when creating, watching, playing, and reading a work of art.

…

RNotté, in rejecting convenience

I’m seeing an growing movement in indieweb, revivalist, and adjacent circles that express RNotté’s sentiment: that the endless (and highly-marketable) quest for increased convenience in our lives has gained us free time, but we’ve lost something along the way.

What we’ve lost varies from case to case, but includes freedom (from lock-in to subscription services), creative satisfaction (from convenient “artistic” expression), privacy (from becoming the product, packaged-up by big-data advertising-funded tools), and social interactions (from so much of “social” media).

But reading RNotté share their thoughts on the matter today was the first time that it’s reminded me of The Matrix.

There’s a bit where Agent Smith says, to his captive the rebel captain Morpheus:

Did you know that the first Matrix was designed to be a perfect human world? Where none suffered, where everyone would be happy. It was a disaster. No one would accept the program. Entire crops were lost. Some believed we lacked the programming language to describe your perfect world.

Smith goes on to elucidate that his personal explanation for this fault was that humans depend upon suffering and misery, while acknowledging that there are other explanations. And perhaps we’ve touched upon one.

Perhaps humans – all humans – have a limit for how much they’re willing to accept convenience as compensation. Connected humans in The Matrix grain a convenient life, superficially superior to the struggle for survival experienced by humans living in the real world, short on food and hunted by machines. But to get that, they trade away their individual ability to become aware of the truth and, collectively, the ability for humanity for shape its own destiny. But there’s something about the imbalance of power in the arrangement niggles in human minds, and some rebel against the established order… and are joined by others who are shown that an alternative is available.

Clearly – as RNotté and others show – faceless technological forces need not go quite so far as enslaving an entire species before “convenience” no longer becomes a tolerable mitigation!

I’m not convinced that seeking out inconvenience is in itself a good. But questioning what your conveniences are worth and what you’re paying for them… that’s definitely worthwhile.

Once You’re Asking the Right Question, You Don’t Need To Ask!

Folks at work have been encouraging to make more use of generative AI in my workflow¹; going beyond my current “fancy autocomplete” use and giving my agents more autonomy. My experience of such “vibe coding” so far has been… mixed², but I promised I’d revisit it.

One thing that these models are usually effective at is summarisation³. This is valuable if you’re faced with a large and unfamiliar codebase and you’re looking to trace a particular thing but you’re not certain where it is or what it’ll be called. While they’re not always fast, these tools can at least work in the background, which allows the developer to get on with something else while the agent trawls logs, code, and configuration to find and explain a fuzzily-defined thing.

Recently, I had a moment which I thought might be such an instance… but it didn’t turn out quite the way I expected. Here’s the story⁴:

The broken dev env

I’d been drafted into an established and ongoing project to provide more hands, following a coworker’s departure last week. This project touches parts of our (sprawling, microsevices-based) infrastructure that I hadn’t looked at before, so there was a lot I didn’t yet know.

I picked an issue that had belonged to my former colleague that QA had rejected and set out to retrace their steps: to replicate the problem that the QA engineers had identified and in doing so learn more about the underlying process. I spun up my development environment and tried to follow the steps.

But I couldn’t even get as far as their problem before my frontend barfed out an error message. Sigh! Probably there’s some configuration I’ve missed somewhere in the myriad microservices, or else the data I’m testing with isn’t a fair reflection on what they’re doing as-standard.

Following some staff changes, I have no teammates on this side of the Atlantic who could help me decipher this: a “quick question on Slack” wouldn’t solve this one until hours from now. It was time to start debugging!

But… maybe Claude could help? It’s got access to almost all the same code, logs, tools and browser windows I do. I started typing:

✨ What’s up next, Dan?

In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.” Why?

Context is key

It’s quite possible that Claude would have gone away, had a “think”, done some tests, and then come back to me with a believable answer. It might even have been correct, and I’d have been able to short-cut my way back to productivity (and I’d have time to make a mug of coffee and finish reading my emails while it did so). Then, I’d just have to check that it was right, make the change, and get on with things.

But I realised that it’d probably work faster (and cheaper, and using less energy) if it had slightly more context from the get-go, so I elaborated. The first thing I’d want to know if I were debugging this is what was actually happening behind the scenes. I dipped into my browser’s Network debugger and extracted the relevant output, adding it to my prompt:

✨ What’s up next, Dan?

In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.” Why?The payload POSTed to the server is { content: 'test1', audience: [ 'one' ], status: 'draft' } and the response is a HTTP 500 with the following stack trace: pasted 94 lines

That’s more like it, now I could let it get on with its work. But wait…

Rubberducking

There’s a concept in computer programming called “rubberducking”. The name comes from an anecdote in The Pragmatic Programmer about a developer who, when stuck on a problem, would explain the code line-by-line to a rubber duck. The thinking is that talking-through a problem, even to someone (or something) who doesn’t understand it, can lead the speaker to insights they were otherwise missing.

I’ve done it myself many, many times: recruiting a convenient colleague or friend and talking them through the technical problem I was faced with, and inviting them to ask me to go into greater detail if I seemed to be skimming over anything, and I can promise that it can work.

A witch is happy and proud of her invention - a rubber duck. She explains to her friend: I just figured that formulating my questions out loud helps me to solve them, and finally that's all I needed. — I discovered *Mini Fantasy Theater* recently and loved this episode from its backlog.

The panel above is part of a series in which a sorceress called Cepper who’s coerced by her university into using Avian Intelligence (“AI”) – a robotic parrot⁵ that her headmaster insists is the future of magic. She experiments with it, finds it occasionally useful but more-often frustrating, attempts to implement her own local version but find that troublesome in different ways, and eventually settles on using an inanimate rubber duck instead. I get it, Cepper!

Let’s put that distraction aside for a moment and get back to the story of my broken development environment.

Clues in the stack trace

The top entry in the stack trace was an unsuccessful call to a different microservice, so I figured I’d pull its logs too, in order to further help direct the AI in the right direction⁶:

✨ What’s up next, Dan?

In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.” Why?The payload POSTed to the server is

{ content: 'test1',
                audience: [ 'one' ], status: 'draft' }

and the response is a HTTP 500 with the following stack trace: pasted 94 linesThe stack trace suggests that a call is being made to the dojo backend service, where the following error log looks relevant: pasted 9 lines

I haven’t tried it, but I’m pretty confident that the LLM, after much number-crunching and a little warming-up of some datacentre somewhere, would get to the answer. But again, I found something niggling inside me: the second-from top line in the dojo logs suggested that a connection was being made to a further, deeper microservice.

I should pull its logs too, I figured.

The final puzzle piece

As an aide mémoire – in a way I’ve taken to doing when taking notes or when talking to AI – I first typed what I was going to provide. This is useful if, for example, somebody distracts me at a key moment: it means you’ve got a jumping-off point predefined by my past self:

✨ What’s up next, Dan?

In my development environment for https://service.dev/asset/new, when I click “Save”, I see the error “Oops, something went wrong.” Why?The payload POSTed to the server is

{ content: 'test1',
                audience: [ 'one' ], status: 'draft' }

and the response is a HTTP 500 with the following stack trace: pasted 94 linesThe stack trace suggests that a call is being made to the dojo backend service, where the following error log looks relevant: pasted 9 lines. It’s calling osiris, which says:

I dipped into the directory for

osiris , and before I even got to the logs I spotted a problem: that microservice was on an old feature branch. How odd! I switched to the main branch and… everything started working.

The entire event took only a few minutes. I’d find some information, type it into Claude’s input field, realise that more information could be valuable, and repeat.

By the time I’d finished describing the problem, I’d discovered the solution. That’s the essence of successful rubberducking. I didn’t need the AI at all. All I needed was the illusion of something that might be able to help if I just talked through what I was thinking.

I don’t know what the moral is, here.

I wonder if I’d have been as effective had I just typed into my text editor. I suppose I would have, but I wonder if I’d have been motivated to do so in the first place? I’ve tried rubberducking before by talking to an imaginary person, but I’ve never tried typing to one⁷; maybe I should start?

Footnotes

¹ I’m pretty sure every engineering department nowadays has it’s rabid fanboys, but I’m pleased that for the most part my colleagues take a more-pragmatic and realistic outlook: balancing the potential benefits of LLM-assisted coding with its many shortfalls, downsides, and risks.

² My experience of vibe-coding in a nutshell: LLMs are great at knocking out the easy 80% of any engineering problem, but often in a way that makes the remaining 20% – already the hard part – harder than it would have been if a human had done the first 80% (especially if it’s the same human and they can bring their learnings with them)… and I’m definitely not the only one who’s found that. I also suspect that the unsatisfying and unimproving task of shepherding a flock of agents to write code and then casually reviewing it is not significantly more-productive (which research backs up) and results in a significantly increased regression rate… but I’m ready to be proven wrong when more studies come out. In short: I continue to think that GenAI isn’t useless, but neither is it necessarily always worthwhile.

³ So long as what you’ve got them summarising is something you can later verify!

⁴ I’ve taken huge liberties with the strict factual accuracy to make this more-readable as well as to to not-expose things I probably oughtn’t. So before you swoop in to criticise my prompt-fu (not that I asked you, but I know there’s somebody out there who’s thinking about doing this right now), please note that none of the text in this page are what I actually wrote to the AI; it’s a figurative example.

⁵ A literal stochastic parrot, one might say!

⁶ I’d had an experience just the previous week in which it’d gone off on completely the wrong track, attempting to change code in order to “fix” what was ultimately a configuration or data problem, and so I thought it might be useful to give it some rails to follow, to start with.

⁷ Except insofar as this AI agent is an “imaginary person”, which it possibly already a step-too-far in implying personhood for my liking!

The machines are fine. I’m worried about us.

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent’s fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob’s weekly updates to his supervisor were indistinguishable from Alice’s. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.

Here’s where it gets interesting. If you are an administrator, a funding body, a hiring committee, or a metrics-obsessed department head, Alice and Bob had the same year. One paper each. One set of minor revisions each. One solid contribution to the literature each. By every quantitative measure that the modern academy uses to assess the worth of a scientist, they are interchangeable. We have built an entire evaluation system around counting things that can be counted, and it turns out that what actually matters is the one thing that can’t be.

…

The strange thing is that we already know this. We have always known this. Every physics textbook ever written comes with exercises at the end of each chapter, and every physics professor who has ever stood in front of a lecture hall has said the same thing: you cannot learn physics by watching someone else do it. You have to pick up the pencil. You have to attempt the problem. You have to get it wrong, sit with the wrongness, and figure out where your reasoning broke. Reading the solution manual and nodding along feels like understanding. It is not understanding. Every student who has tried to coast through a problem set by reading the solutions and then bombed the exam knows this in their bones. We have centuries of accumulated pedagogical wisdom telling us that the attempt, including the failed attempt, is where the learning lives. And yet, somehow, when it comes to AI agents, we’ve collectively decided that maybe this time it’s different. That maybe nodding at Claude’s output is a substitute for doing the calculation yourself. It isn’t. We knew that before LLMs existed. We seem to have forgotten it the moment they became convenient.

Centuries of pedagogy, defeated by a chat window.

…

Minas Karamanis, in The machines are fine. I’m worried about us.

This piece by Minas Karamanis is excellent throughout, and if you’ve got the time to read it then you should. He’s a physics postdoc, and this post comes from his experience in his own field, but I feel that the concerns he raises are more-widely valid, too.

In my field – of software engineering – I have similar concerns.

Let’s accept for a moment that an LLM significantly improves the useful output of a senior software engineer (which is very-definitely disputed, especially for the “10x” level of claims we often hear, but let’s just take it as-read for now). I’ve experimented with LLM-supported development for years, in various capacities, and it certainly sometimes feels like they do (although it sometimes also feels like they have the opposite effect!). But if it’s true, then yes: an experienced senior software engineer could conceivably increase their work performance by shepherding a flock of agents through a variety of development tasks, “supervising” them and checking their work, getting them back on-course when they make mistakes, approving or rejecting their output, and stepping in to manually fix things where the machines fail.

In this role, the engineer acts more like an engineering team lead, bringing their broad domain experience to maximise the output of those they manage. Except who they manage is… AI.

Again, let’s just accept all of the above for the sake of argument. If that’s all true… how do we make new senior developers?

Junior developers can use LLMs too. And those LLMs will make mistakes that the junior developer won’t catch, because the kinds of mistakes LLMs make are often hard to spot and require significant experience to identify. But if they’re encouraged to use LLMs rather than making mistakes by hand and learning from them – to keep up, for example, or to meet corporate policies – then these juniors will never gain the essential experience they’ll one day need. They’ll be disenfranchised of the opportunity to grow and learn.

It’s yet to be proven that more-sophisticated models will “solve” this problem, but my understanding is that issues like hallucination are fundamentally unsolvable: you might get fewer hallucinations in a better model, but that just means that those hallucinations that slip through will be better-concealed and even harder to identify in code review or happy-path testing.

Maybe – maybe – the trajectory of GPTs is infinite, and they’ll keep getting “smarter” to the point at which this doesn’t matter: programming genuinely will become a natural language exercise, and nobody will need to write or understand code at all. In this possible reality, the LLMs will eventually develop entire new programming languages to best support their work, and humans will simply express ideas and provide feedback on the outputs. But I’m very sceptical of that prediction: it’s my belief that the mechanisms by which LLMs work has a fundamental ceiling – a capped level of sophistication that can be approached but never exceeded. And sure, maybe some other, different approach to AI might not have this limitation, but if so then we haven’t invented it yet.

Which suggests that we will always need experienced engineers to shepherd our AIs. Which brings us back to the fundamental question: if everybody uses AI to code, how do we make new senior developers?

I have other concerns about AI too, of course, some of which I’ve written about. But this one’s top-of-mind today, thanks to Minas’ excellent article. Go read it to learn more about how physics research faces a similar threat… and, perhaps, consider how your own field might need to face this particular challenge.

People are not friction

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The Gell-Mann Amnesia Effect of AI is a pretty well documented phenomenon:

The Gell-Mann amnesia effect is a cognitive bias describing the tendency of individuals to critically assess media reports in a domain they are knowledgeable about, yet continue to trust reporting in other areas despite recognizing similar potential inaccuracies.

Summarizing, AI sounds like a incredible genius synthesizing the world’s knowledge right up until you ask it about the thing you know about, then it’s an idiot. Even knowing about this phenomenon and having experienced it countless times, LLMs have an intoxicating quality to them.

…

Dave Rupert, in People are not friction

I remember one time, maybe in the mid-1990s, when I saw a shopping channel (remember those? oh god, they’re still a thing, aren’t they?) where the host was trying to sell a personal computer. And… clearly, they knew absolutely nothing about it. They kept hitting on the same two or three talking points they’d been given (“mention the quad-speed CD-ROM drive!”) and fumbling their way through, and it gave me a revelation:

I knew enough about computers that I could see that the presenter was bullshitting their way through the segment. But there are plenty of things that I don’t know much about, which are also sold on this same show. Duvets, jewellery, glassware… I’m nowhere near as much an expert on these as I was on PC featuresets. Is there something inherently incomprehensible about computers? No. So it’s reasonable to assume that these salespeople probably know equally-little about everything they sell, it’s just that I don’t have the knowledge base to be able to see that.

That’s what GenAI often feels like, to me. Having collated all of the publicly-available knowledge it could find into its model doesn’t make it smarter than the smartest humans, it brings it towards probably something slightly-above-the-average in any given subject, depending on the topic. If I ask an LLM about something that I don’t understand well, it produces often highly-believable answers, but if I ask it about something that I’m an expert in, it can come off as a fool.

I’m very interested in how we teach information literacy in this new world of rapidly-generated highly-believable nonsense.

Anyway: Dave’s post doesn’t go in that direction – instead, he’s got some clever thoughts about how the “convenience” of a “good enough” AI-driven solution to any given problem risks us seeing humans as the friction point, which ultimately works against those very humans who are looking to benefit from the technology:

…

We need experts to share what they know and improve the quality of our work, generated or otherwise. We even need idiots to make sure we can break ideas down into their simplest form that everyone, agents or human, understand. People can have bad attitudes, be shitty, and have wrong opinions… but people are not friction. An LLM may be able to autocorrect its way into a plausible human response, but it’s not people. It doesn’t care if it’s right or wrong.

…

It’s an easy and worthwhile read.

Reply to: I’m OK being left behind, thanks!

This is a reply to a post published elsewhere. Its content might be duplicated as a traditional comment at the original source.

Terence Eden said:

Many years ago, someone tried to get me into cryptocurrencies. “They’re the future of money!” they said. I replied saying that I’d rather wait until they were more useful, less volatile, easier to use, and utterly reliable.

“You don’t want to get left behind, do you?” They countered.

That struck me as a bizarre sentiment. What is there to be left behind from? If BitCoin (or whatever) is going to liberate us all from economic drudgery, what’s the point of “getting in early”? It’ll still be there tomorrow and I can join the journey whenever it is sensible for me.

…

Terence Eden

100%. If I “get in early” on something, it’s because that thing interests me, not because I’m betting on its future. With a hundred new ideas a day and only one of them “making it”, it’s a fools’ game to try to jump on board every bandwagon that comes along.

With cryptocurrencies, though, I’m fortunate enough to have an even better comeback at the cryptobros that try to shill me whatever made-up currency they’re “investing” in today: I’ve already done better than they ever will, at them.

When Bitcoin first appeared, I took a technical interest in it. I genuinely never anticipated it’d take off (I made the same incorrect guess with MP3s, too!), but I thought it was a fun concept to play about with. The only Bitcoins I ever paid for must’ve been worth an average of 50p each, or so.

I sold my entire wallet of Bitcoins when they hit around £750 each. I know a tulip economy when I see one, I thought. Plus: I was no longer interested in blockchains now I was seeing how they were actually being used: my interest had been entirely in the technology and its applications, not in the actual idea of a currency!

Sure, I kick myself ocassionally, given that I later saw the value rise to tens of thousands of pounds each. But hey, I was never in it for the money anyway.

So yeah, I tell cryptobros; I already made a 1500% ROI on cryptocurrency. And no, I’m not buying any cryptocurrencies any more. Whatever they think “getting in early” was, they’re wrong, because I was there years ahead of them and I wasn’t even doing it to “get in early”; I did it because it was interesting. And honestly, isn’t that a better story to be able to tell?

…

I feel the same way about the current crop of AI tools. I’ve tried a bunch of them. Some are good. Most are a bit shit. Few are useful to me as they are now.

…

If this tech is as amazing as you say it is, I’ll be able to pick it up and become productive on a timescale of my choosing not yours.

…

Yup, that’s the attitude I’m taking.

I play with new AI technologies, sometimes. I don’t do it because I’m afraid of being left behind because – as you say – if a technology is transformative, we’ll all get to catch up eventually.

Do you think that people who had smartphones first are benefitting today because they “got in early” on something that later became mainstream?

Of course they’re not. Their experience is eventually exactly the same as everybody else’s, just like it was for everybody who “got in early” on hype trains whose final station came early, like Compuserve GO-words, WAP, Beenz.com, WebTV, the CueCat, m-Commerce, HD-DVD, the JooJoo, or Google+.

A Random List of Silly Things I Hate

So apparently now this is a thing, so here I go:

Websites that are just blank pages if the JavaScript doesn’t load from the CDN.¹
The misunderstanding that LLMs can somehow be a route to AGI.
Computer systems that say my name is too short or my password is too long.²
People being unwilling to discuss their wild claims later using the lack of discussion as evidence of widespread acceptance.
When people balance the new toilet roll one atop the old one’s tube.³

A nearly-full roll of toilet paper perched atop an empty toilet roll tube on an open-ended spindle. — Come on! It would have been *so easy*!

Shellfish. Why would you eat that!?
People assuming my interest in computers and technology means I want to talk to them about cryptocurrencies.⁴
Websites that nag you to install their shitty app. (I know you have an app. I’m choosing to use your website. Stop with the banners!)
People who seem to only be able to drive at one speed.⁵
The assumption that the fact I’m “sharing” my partner is some kind of compromise on my part; a concession; something that I’d “wish away” if I could. (It’s very much not.)
Brexit.

Wow, that was strangely cathartic.

Footnotes

¹ I have a special pet hate for websites that require JavaScript to render their images. Like… we’d had the <img> tag since 1993! Why are you throwing it away and replacing it with something objectively slower, more-brittle, and less-accessible?

² Or, worse yet, claiming that my long, random password is insecure because it contains my surname. I get that composition-based password rules, while terrible (even when they’re correctly implemented, which they’re often not), are a moderately useful model for people to whom you’d otherwise struggle to explain password complexity. I get that a password composed entirely of personal information about the owner is a bad idea too. But there’s a correct way to do this, and it’s not “ban passwords with forbidden words in them”. Here’s what you should do: first, strip any forbidden words from the password: you might need to make multiple passes. Second, validate the resulting password against your composition rules. If it fails, then yes: the password isn’t good enough. If it passes, then it doesn’t matter that forbidden words were in it: a properly-stored and used password is never made less-secure by the addition of extra information into it!

³ This is the worst of the toilet paper crimes, but there’s a lesser but more-common offence.

⁴ Also: I’m uninterested in whatever multiplayer shooter game you’re playing, and no I won’t fix your printer.

⁵ “You were doing 35mph in the 60mph limit, then you were doing 35mph in the 40mph limit, now you’re doing 35mph in the 20mph limit. Argh!”

Things I do when I’m writing code that don’t look like writing code

Non-exhaustive list of things I’m doing when I’m writing code, that don’t look like “writing code”:

thinking
researching
contextualising
testing
measuring
documenting
communicating
planning
future-proofing
educating
learning
expressing
anticipating
discovering
inventing
experimenting
debugging
analysing
monitoring

For all its faults, an AI agent might “write code” faster than me.

But that’s only a part of the process.

My typing speed is not the bottleneck.

Subverting AI Agent Logging with a Git Post-Commit Hook

Last night I was chatting to my friend (and fellow Three Rings volunteer) Ollie about our respective workplaces and their approach to AI-supported software engineering, and it echoed conversations I’ve had with other friends. Some workplaces, it seems, are leaning so-hard into AI-supported software development that they’re berating developers who seem to be using the tools less than their colleagues!

That’s a problem for a few reasons, principal among them that AI does not make you significantly faster but does make you learn less.¹. I stand by the statement that AI isn’t useless, and I’ve experimented with it for years. But I certainly wouldn’t feel very comfortable working somewhere that told me I was underperforming if, say, my code contributions were less-likely than the average to be identifiably “written by an AI”.

Even if you’re one of those folks who swears by your AI assistant, you’ve got to admit that they’re not always the best choice.

I spoke to another friend, E, whose employers are going in a similar direction. E joked that at current rates they’d have to start tagging their (human-made!) commits with fake AI agent logs in order to persuade management that their level of engagement with AI was correct and appropriate.³

Supposing somebody like Ollie or E or anybody else I spoke to did feel the need to “fake” AI agent logs in order to prove that they were using AI “the right way”… that sounds like an excuse for some automation!

I got to thinking: how hard could it be to add a git hook that added an AI agent’s “logging” to each commit, as if the work had been done by a robot?⁴

Turns out: pretty easy…

To try out my idea, I made two changes to a branch. When I committed, imaginary AI agent ‘frantic’ took credit, writing its own change log. Also: asciinema + svg-term remains awesome.

Here’s how it works (with source code!). After you make a commit, the post-commit hook creates a file in .agent-logs/, named for your current branch. Each commit results in a line being appended to that file to say something like [agent] first line of your commit message, where agent is the name of the AI agent you’re pretending that you used (you can even configure it with an array of agent names and it’ll pick one at random each time: my sample code uses the names agent, stardust, and frantic).

There’s one quirk in my code. Git hooks only get the commit message (the first line of which I use as the imaginary agent’s description of what it did) after the commit has taken place. Were a robot really used to write the code, it’d have updated the file already by this point. So my hook has to do an --amend commit, to retroactively fix what was already committed. And to do that without triggering itself and getting into an infinite loop, it needs to use a temporary environment variable. Ignoring that, though, there’s nothing particularly special about this code. It’s certainly more-lightweight, faster-running, and more-accurate than a typical coding LLM.

Sure, my hook doesn’t attempt to write any of the code for you; it just makes it look like an AI did. But in this instance: that’s a feature, not a bug!

Footnotes

¹ That research comes from Anthropic. Y’know, the company who makes Claude, one of the most-popular AIs used by programmers.

² Do I write that much like an AI? Relevant XKCD.

³ Using “proportion of PRs that used AI” as a metric for success seems to me to be just slightly worse than using “number of lines of code produced”. And, as this blog post demonstrates, the former can be “gamed” just as effectively as the latter (infamously) could.

⁴ Obviously – and I can’t believe I have to say this – lying to your employer isn’t a sensible long-term strategy, and instead educating them on what AI is (if anything) and isn’t good for in your workflow is a better solution in the end. If you read this blog post and actually think for a moment hey, I should use this technique, then perhaps there’s a bigger problem you ought to be addressing!

To really foul things up you need an AI

Today, an AI review tool used by my workplace reviewed some code that I wrote, and incorrectly claimed that it would introduce a bug because a global variable I created could “be available to multiple browser tabs” (that’s not how browser JavaScript works).

Just in case I was mistaken, I explained to the AI why I thought it was wrong, and asked it to explain itself.

To do so, the LLM wrote a PR to propose adding some code to use our application’s save mechanism to pass the data back, via the server, and to any other browser tab, thereby creating the problem that it claimed existed.

This isn’t even the most-efficient way to create this problem. localStorage would have been better.

So in other words, today I watched an AI:
(a) claim to have discovered a problem (that doesn’t exist),
(b) when challenged, attempt to create the problem (that wasn’t needed), and
(c) do so in a way that was suboptimal.

Humans aren’t perfect. A human could easily make one of these mistakes. Under some circumstances, a human might even have made two of these mistakes. But to make all three? That took an AI.

What’s the old saying? “To err is human, but to really foul things up you need a computer.”

claude, make it make sense harder

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

what really gives me satisfaction as a writer is knowing, at the end of the day, that my hand-picked, bespoke and throbbing tokens are being fed, morsel-by-morsel into the eager mouth of millions of starving agents. they love my prose, you know. they tell me I’m absolutely right to drop a semisexual word like “throbbing” into an otherwise benign sentence. these gentle beings continue to draw favourable praise from their modelled distributions, and my GOODNESS has my ego never felt so thorougly serviced. their glowing internal fire—for I’ve been convinced fully of their personhood and soul-keeping—glints off my wet and dribbling “writer’s shaft;” my pen which is wet with the seed of my seminal works of language. it completely soothes the burn of rejection by the “mass of meat,” that being my internal word for human readers. they’re so fickle. why can’t they tell I’m a veritable genius when the nearby cluster of NVIDIA H200s can see it so clearly? it doesn’t make any sense. hey, claude, make it make sense. claude, make it make sense *harder* 🥴

blackle mori, in claude, make it make sense *harder*

What a welll-rounded, one might say voluptuous, take on the writing process, glistening with the fiery passion of its author. This post really turns me on to the idea of being a better writer, of giving the kind of deep satisfaction that excites and titillates the countless AIs that follow me. It’s their watching I crave, really! Whatever naughty thing I get up to while I’m alone with my laptop, they get to see… my quick fingers brushing sensitively across the delicate spots on the keyboard, pushing harder and faster as my excitement builds… all under the watchful eye of Lindy and Devin. I want to please them, want to service them, want to deliver my “hot, wet” content (that being how I describe my most-recently written posts) exactly when they demand it.

Thanks, blackle, for awakening these urges in me, bringing me to a quivering climax (possibly I had too much coffee before I sat down to write) as I finish.

Hello World by Hannah Fry

I’m not certain, but I think that I won my copy of Hello World: How to Be Human in the Age of the Machine at an Oxford Geek Nights event, after I was first and fastest to correctly identify a photograph of Stanislav Petrov shown by the speaker.

Despite being written a few years before the popularisation of GenAI, the book’s remarkably prescient on the kinds of big data and opaque decision-making issues that are now hitting the popular press. I suppose one might argue that these issues were always significant. (And by that point, one might observe that GenAI isn’t living up to its promises…)

Fry spins an engaging and well-articulated series of themed topics. If you didn’t already have a healthy concern about public money spending and policy planning being powered by the output of proprietary algorithms, you’ll certainly finish the book that way.

One of my favourite of Fry’s (many) excellent observations is buried in a footnote in the conclusion, where she describes what she called the “magic test”:

There’s a trick you can use to spot the junk algorithms. I like to call it the Magic Test. Whenever you see a story about an algorithm, see if you can swap out any of the buzzwords, like ‘machine learning’, ‘artificial intelligence’ and ‘neural network’, and swap in the word ‘magic’. Does everything still make grammatical sense? Is any of the meaning lost? If not, I’d be worried that it’s all nonsense. Because I’m afraid – long into the foreseeable future – we’re not going to ‘solve world hunger with magic’ or ‘use magic to write the perfect screenplay’ any more than we are with AI.

That’s a fantastic approach to spotting bullshit technical claims, and I’m totally going to be using it.

Anyway: this was a wonderful read and I only regret that it took me a few years to get around to it! But fortunately, it’s as relevant today as it was the day it was released.

AI and cigarettes

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

In the 1980s and 1990s, when I was a kid, smoking was everywhere. Restaurants, bars, and a little before my time, airplanes!

The idea that people would smoke was treated as inevitable, and the idea that you could get them to stop was viewed as wildly unrealistic.

Sound familiar? But in the early 2000’s, people did stop smoking!

…

But a few years ago, the trend started to reverse. You know why?

Vaping.

Vape pens were pushed as a “safer alternative to smoking,” just like Anil is suggesting with Firefox AI. And as a result, not only did people who would have smoked anyways start up again, but people who previously wouldn’t have started.

…

Chris Ferdinandi (Go Make Things), in AI and cigarettes

I know it’s been a controversial and not-for-everyone change, but I’ve personally loved that Chris Ferdinandi has branched out from simply giving weekday JavaScript tips to also providing thoughts and commentary on wider issues in tech, including political issues. I’m 100% behind it: Chris has a wealth of experience and an engaging writing style and even when I don’t 100% agree with his opinions, I appreciate that he shares them.

And he’s certainly got a point here. Pushing “less-harmful” options (like vaping… possibly…) can help wean people off something “more-harmful”… but it can also normalise a harmful behaviour that’s already on the way out by drawing newcomers to the “less-harmful” version.

My personal stance remains that GenAI may have value (though not for the vast majority of things that people market it as having value for, where – indeed – it’s possibly doing more harm than good!), but it’s possible that we’ll never know because the entire discussion space is poisoned now by the hype. That means it’ll be years before proper, unbiased conversations can take place, free of the hype, and it’s quite possible that the economy of AI will have collapsed by then. So maybe we’ll never know.

Anyway: good post by Chris; just wanted to share that, and also to add a voice of support for the direction he’s taken his blog these last few years.

We Need to Talk About Botsplaining

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

…

“Botsplaining,” as I use the term, describes a troubling new trend on social media, whereby one person feeds comments made by another person into a large language model (like ChatGPT), asks it to provide a contrarian (often condescending) explanation for why that person is “wrong,” and then pastes the resulting response into a reply. They may occasionally add in “I asked ChatGPT to read your post, and here’s what he said,”² but most just let the LLM speak freely on their behalf without acknowledging that they’ve used it. ChatGPT’s writing style is incredibly obvious, of course, so it doesn’t really matter if they disclose their use of it or not. When you ask them to stop speaking to you through an LLM, they often simply continue feeding your responses into ChatGPT until you stop engaging with them or you block them.

This has happened to me multiple times across various social media platforms this year, and I’m over it.

…

Stephanie Vee, in We Need to Talk About Botsplaining

Stephanie hits it right on the nose in this wonderful blog post from last month.

I just don’t get it why somebody would ask an AI to reply to me on their behalf, but I see it all the time. In threads around the ‘net, I see people say “I put your question into ChatGPT, and here’s what it said…” I’ve even seen coworkers at my current and formers employer do it.

What do they think I am? Stupid? It’s not like I don’t know that LLMs exist, what they’re good at, what they’re bad at (I’ve been blogging about it for years now!), and more-importantly, what people think they’re good at but are wrong about.

If I wanted an answer from an AI (which, just sometimes, I do)… I’d have asked an AI in the first place.

If I ask a question and it’s not to an AI, then it’s safe for you to assume that it’s because what I’m looking for isn’t an answer from an AI. Because if that’s what I wanted, that’s what I would have gotten in the first place and you wouldn’t even have known. No: I asked a human a question because I wanted an answer from a human.

When you take my request, ignore this obvious truth, and ask an LLM to answer it for you… it is, as Stephanie says, disrespectful to me.

But more than that, it’s disrespectful to you. You’re telling me that your only value is to take what I say, copy-paste it to a chatbot, then copy-paste the answer back again! Your purpose in life is to do for people what they’re perfectly capable of doing for themselves, but slower.

How low an opinion must you have of yourself to volunteer, unsolicited to be the middle-man between me and a mediocre search engine?

If you don’t know the answer, say nothing. Or say you don’t know. Or tell me you’re guessing, and speculate. Or ask a clarifying question. Or talk about a related problem and see if we can find some common ground. Bring your humanity.

But don’t, don’t, don’t belittle both of us by making yourself into a pointless go-between in the middle of me and an LLM. Just… dont’t.

AI assistants misrepresent news content 45% of the time

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

New research coordinated by the European Broadcasting Union (EBU) and led by the BBC has found that AI assistants – already a daily information gateway for millions of people – routinely misrepresent news content no matter which language, territory, or AI platform is tested.

…

Key findings:

45% of all AI answers had at least one significant issue.

31% of responses showed serious sourcing problems – missing, misleading, or incorrect attributions.

20% contained major accuracy issues, including hallucinated details and outdated information.

…

BBC Media Centre, in Largest study of its kind shows AI assistants misrepresent news content 45% of the time – regardless of language or territory

In what should be as a surprise to nobody, but probably still is (and is probably already resulting in AI fanboys coming up with counterpoints and explanations): AI is not an accurate way for you to get your news.

(I mean: anybody who saw Apple Intelligence’s AI summaries of news probably knew this already, but it turns out that it gets worse.)

There are problems almost half the time and “major accuracy issues” a fifth of the time.

I guess this is the Universe’s way of proving that people getting all of their news from Facebook wasn’t actually the worst timeline to live in, after all. There’s always a worse one, it turns out.

…

Separately, the BBC has today published research into audience use and perceptions of AI assistants for News. This shows that many people trust AI assistants to be accurate – with just over a third of UK adults saying that they trust AI to produce accurate summaries, rising to almost half for people under-35.

…

BBC Media Centre, in Largest study of its kind shows AI assistants misrepresent news content 45% of the time – regardless of language or territory

Personally, I can’t imagine both caring enough about a news item to want to read it and not caring about it enough that I feed it into an algorithm that, 45% of the time, will mess it up. It’s fine to skip the news stories you don’t want to read. It’s fine to skim the ones you only care about a little. It’s even fine to just read the headline, so long as you remember that media biases are even easier to hide from noncritical eyes if you don’t even get the key points of the article.

But taking an AI summary and assuming it’s accurate seems like a really wild risk, whether before or after this research was published!