New Models and Old Challenges | SHIFT*: Digital Capability Acceleration

It has been a busy couple of weeks for analysts covering new AI launches, with some impactful new models and agentic systems being announced that add up to a significant advance in capabilities.

Hot on the heels of announcing Gemini 2.5 Pro, Google announced AI Mode for search, which is expected to utilise the capabilities of Gemini to deliver a better search experience. It is currently available to some early adopters through Google Labs.

Google also released what seems to be the most powerful AI video generation tool yet – Veo3 – and people are already exploring the uncanny valley of plausible synthetic videos, from (fake) news presenters to 30 second product adverts that cost very little to make.

image from u/codesamurai’s Veo3 AI video ‘Interdimensional Cable’

Two major agentic AI coding models were also released in the past couple of weeks. OpenAI released Codex, a powerful web-based coding system that runs its own mini computer within the browser, and can move through directories and run commands to automate more of a developer’s coding workflow.

And, more recently, Anthropic released Claude Code, which integrates with developers’ local command line setups to enhance the coding process. It can understand and amend large codebases, integrates with VS Code and JetBrains IDEs, and plays nicely with existing test suites and build systems. We have already seen examples of developers leaving Claude Code to grind away for hours at a time, coding, refactoring, testing and documenting its work. It seems genuinely impressive.

None of these new capabilities will end up being as cheap as the basic LLMs many people have used to date. Veo3 requires a $250 per month subscription to be used properly. Claude Code starts at $100 per month, but for many use cases it needs a higher tier that costs double that. Open AI’s Codex also seems to require a $200 per month Pro tier subscription. If you are brave enough, you can use these platforms with API token-based fees, but you need to know what you are doing to avoid unexpected costs.

Claude has also been upgraded, with the release of Opus 4 (its largest model) and Sonnet 4, which claim to be the most intelligent general models so far, and bring some welcome efficiency improvements. One experienced developer credits Opus 4 with finding and fixing a ‘white whale’ bug that he had sunk 200 hours into tackling over the past few years – in just 30 prompts.

Nathan Lambert has shared a very informative deep-dive on Claude 4, looking at the differences between benchmark scores and real world outcomes, and he believes it is emerging as the leader in software development.

Where Anthropic’s consumer touchpoints, i.e. chat apps, have been constantly behind ChatGPT, their enterprise and software tools, i.e. Claude Code, have been leading the pack (or relatively much better, i.e. the API). Anthropic is shipping updates to the chat interface, but they feel half-hearted relative to the mass excitement around Claude Code. Claude Code is the agent experience I liked the best over the few I’ve tried in the last 6 months. Claude 4 is built to advance this — in doing so it makes Anthropic’s path narrower yet clearer.

From software development to org development

Simon Willison always does a great job of keeping up with the emerging world of AI-enhanced coding, and his first impressions of using Claude Code have been positive, but as he also reminds us, you can’t just jump into a CLI and expect great results – you need to plan, design, discuss and manage the coding process, which is a lot more involved.

It’s going extremely well. So far Claude has helped get MySQL working on an older laptop (fixing some inscrutable Homebrew errors), disabled a CAPTCHA plugin that didn’t work on localhost, toggled visible warnings on and off several times and figured out which CSS file to modify in the theme that the site is using. It even took a reasonable stab at making the site responsive on mobile!

I’m now calling Claude Code honey badger on account of its voracious appetite for crunching through code (and tokens) looking for the right thing to fix.

There is no doubt that how we think of and approach software development is changing very rapidly. Internet veteran Tim O’Reilly has launched a new conference on this topic, and he put it this way:

It’s an extraordinary time to be in software development. After years of incremental advances that made the field feel somewhat predictable, we’re entering a period of radical innovation. The fundamental building blocks of how we create software are changing.

This isn’t just about using AI tools to write code faster—though that’s valuable. It’s about reimagining what software can do, who can create it, and how we approach problems that previously seemed intractable.

Agentic AI hype and excitement has been front-running actual use cases and deployments for some time, but I think we are starting to see more clearly how game-changing it could be, especially in the enterprise.

Enterprise AI agents are moving beyond simple chatbots as interfaces to information and automation, and will mostly work in the background and with each other, rather than competing for human attention. For an overview of the various types of agents and agentic systems we can expect to see, this paper is a good starting point that helps clarify the terminology.

As the analysts at Constellation wrote recently, this shift will shake up enterprise software and we can expect several players to battle it out in the hope of owning the role of ‘agent orchestrator’ in the enterprise. But in fact, as Deutsche Bank analysts recently suggested, Microsoft may be the goldilocks option here for many firms, given its commitment to agentic AI in the workplace.

Even leaving to one side the impact of AI, enterprise software is ripe for change and has been under-delivering for some time, as this McKinsey analysis argues:

Enterprise technology spending in the United States has been growing by 8 percent per year on average since 2022. This surge is not surprising, given the increasing role technology plays in how businesses function and create value. The issue lies in what companies are getting for that spend, and the track record on that score is mixed.

While analysis linking tech spend to labor productivity is notoriously inexact, labor productivity has grown by close to 2 percent over the same period of time

Instead of being passive, locked-in users of one-size-fits-all enterprise platforms, leading companies will do more of their own development, and with agentic AI they can adapt, augment and automate systems to their own specific requirements. As their systems and processes become more programmable as a result, the rest of the business will also start to move in the same direction.

Anthropic’s Sholto Douglas claims that in the next few years, AI will be capable of automating almost every white collar job, which echoes Sergey Brin’s recent statement that management is one of the easiest sets of tasks to do with AI.

What could go wrong?

At the same time, AI-enhanced software development clearly poses a lot of risks that need to be considered and mitigated. We have seen the amusing failures of Duolingo, the less amusing failures of fictitious legal analyses that can cause real world harms, and the rather predictable back-pedalling of AI cheerleaders like Klarna, who tried to replace humans in the loop for too quickly.

But these are nothing compared to what could happen when AI meets cyber-security threats or data leaks, or worse. In the United States, DOGE’s recklessness in grabbing sensitive data and running it through untested AI tools has probably created such wide open backdoors for hostile state actors that the real level of penetration may be too much to admit. In business, cyber attacks such as the Marks & Spencer episode are able to exploit small mistakes by IT contractors to paralyse entire retail operations at an estimated cost of hundreds of millions of pounds.

We need smarter approaches to security, and also to AI governance. Perhaps we will end up in an agentic arms race between white hat and black hat agents battling it out at the edges of our networks.

But we also need to urgently work on human preparedness, skills and learning.

The author Neal Stephenson recently shared his fears about AI at a talk in New Zealand:

Speaking of the effects of technology on individuals and society as a whole, Marshall McLuhan wrote that every augmentation is also an amputation. I first heard that quote twenty years ago from a computer scientist at Stanford who was addressing a room full of colleagues—all highly educated, technically proficient, motivated experts who well understood the import of McLuhan’s warning and who probably thought about it often, as I have done, whenever they subsequently adopted some new labor-saving technology. Today, quite suddenly, billions of people have access to AI systems that provide augmentations, and inflict amputations, far more substantial than anything McLuhan could have imagined. This is the main thing I worry about currently as far as AI is concerned. I follow conversations among professional educators who all report the same phenomenon, which is that their students use ChatGPT for everything, and in consequence learn nothing. We may end up with at least one generation of people who are like the Eloi in H.G. Wells’s The Time Machine, in that they are mental weaklings utterly dependent on technologies that they don’t understand and that they could never rebuild from scratch were they to break down. Earlier I spoke somewhat derisively of lapdogs. We might ask ourselves who is really the lapdog in a world full of powerful AIs.

For the past twenty years, governments and career advisors have tried to persuade young people to learn software development, which was seen as “the future”. Perhaps now that software development can be invoked by natural language – and so potentially anybody can do it – we can start to focus on the many, important human skills and forms of critical thinking that will be helpful in navigating this shift without losing our humanity.