Two weeks ago, Microsoft’s AI chief Mustafa Suleyman told the Financial Times that most white-collar work will be fully automated within 12 to 18 months.
Lawyers, accountants, project managers, marketing professionals – everyone who “sits at a computer.”
Anthropic’s CEO Dario Amodei has warned that AI could eliminate half of all entry-level office jobs. Ford’s Jim Farley predicts a halving of white-collar jobs in the US.
In early February, Anthropic launched Claude Cowork – an AI agent capable of performing legal work, among other things.
Thomson Reuters fell 16% in a single day, LegalZoom dropped 20%, and Atlassian lost 35% in a week. According to JP Morgan, it was the largest non-recession-driven decline in software equities in over 30 years.
These are dramatic claims. They deserve a serious economic response.
I should say upfront: I am not an AI sceptic. I use AI extensively in my own work – for research, for drafting, for editing and building a whole lot of macroeconomic, financial and geopolitical simulation models.
This very article was produced with substantial AI assistance. It raises the quality and saves time. But the claim that all office work will disappear within 18 months rests on a fundamental misunderstanding of what large language models are, what they can do, and what they cost.
In short: AI is a tool, not a thinker. It can automate the routine part of office work, but not the thinking part. And even as an automation tool, it is not free: it requires capital and energy on a scale that makes it far from certain that it is cheaper than human beings.
Solow’s paradox, revisited
“You can see the computer age everywhere but in the productivity statistics.”
So wrote Nobel laureate Robert Solow in the New York Times in 1987.
Nearly 40 years later, his observation is once again strikingly relevant. In February 2026, a team of prominent economists – including Nicholas Bloom and Steven J. Davis of Stanford – published the first representative international study of AI’s actual effects at the firm level.
The data come from central bank-sponsored surveys in four countries – the US (Atlanta Fed), the UK (Bank of England), Germany (Bundesbank) and Australia – covering nearly 6,000 senior executives, recruited by telephone rather than through paid online panels (which are riddled with fraud).
Over 90 per cent report no effect on employment over the past three years. 89 per cent report no effect on productivity. The average employment effect is literally zero – 0.00 per cent – and the productivity effect a modest 0.29 per cent.
The Solow paradox is alive and well.
This should not surprise anyone who understands what a large language model actually is.
Language models are econometrics in disguise
A large language model such as ChatGPT or Claude is built on the transformer architecture from Vaswani et al.’s “Attention Is All You Need” (2017).
At its core, the technology is next token prediction: the model predicts the next word in a sequence based on statistical patterns in the training data. When ChatGPT produces an analysis of the causes of inflation, it does not do so because it understands inflation.
It does so because it has seen millions of texts where certain words appear together in certain patterns.
This is – and this is my central point – fundamentally no different from econometrics. A regression model says: given these variables, what is the most likely value of Y?
A large language model does the same: given these words, what is the most likely next word? The mathematics is more complex, the dimensionality higher, but the principle is identical.
The financial sector has been using quantitative models for decades without calling it “intelligence.” VAR models, credit risk algorithms, algorithmic pricing – enormously useful tools that help trained analysts make better decisions. The AI revolution consists of democratisation and broader application, not a qualitative leap.
Gary Marcus, professor emeritus at NYU, has argued for over 25 years that neural networks lack the capacity for abstract reasoning.
His central claim – that language models operate on pattern recognition rather than genuine understanding, and that scaling does not solve this problem – was powerfully confirmed by Apple’s research paper “The Illusion of Thinking” (June 2025), which systematically tested reasoning models on controllable puzzles. At high complexity, all models – including the so-called “thinking” models – collapsed to zero per cent accuracy.
Even when given the correct algorithm as an explicit instruction, performance did not improve. Marcus’s conclusion stands: large language models do not build abstract models of the world. They are built to complete sentences.
The models have not fundamentally improved
The transformer architecture is the same in 2026 as it was in 2017.
The advances since ChatGPT 3.5 in November 2022 are primarily more training data, more parameters and better fine-tuning (RLHF). The underlying mechanism – next word prediction via statistical pattern recognition – is unchanged.
The hallucination problem is the best evidence. Three years and hundreds of billions of dollars later, the models still fabricate facts. They have become better at sounding authoritative, but the underlying tendency to invent things is intact. They are better at lying than a teenager about their homework – they politely but firmly insist that their fabricated answers are correct.
And the fundamental failures are banal. Ask ChatGPT: “I need to wash my car. The car wash is 150 metres away. Should I walk or drive?”
ChatGPT advises you to walk – it’s only 150 metres. But you need to wash the car. You rather need to bring it along. This kind of basic logical understanding is still missing. It is difficult to see this technology replacing every bank clerk within 18 months.
The GPT-5 launch in August 2025 illustrated the point. Reddit threads titled “GPT-5 is horrible” received thousands of upvotes. 3,000 users signed a petition to regain access to the older GPT-4o.
On the prediction market Polymarket, the probability that OpenAI would have the best model dropped from 75% to 14% in a single hour. The event was dubbed “Gary Marcus Day” on social media.
Three mechanisms that undermine scaling
Any econometrics student learns about overfitting in the first semester. When you add more variables to a statistical model, R² always rises – but it is a deception. The model has learnt the noise along with the structure.
When you test it on new data, precision often falls. Precisely the same happens with language models. Scaling – more data, more parameters, more compute – yields better reproduction of what the model has seen, but not necessarily better handling of what it has not.
Then there is context rot. A Microsoft/Salesforce study from May 2025 documented that all tested language models performed 30-40% worse in multi-turn conversations than in single-turn queries.
Chroma Research demonstrated that performance degrades systematically with increasing context length — even when the model can identify all relevant information with 100% accuracy.
This is the opposite of genuine understanding. An economist who receives more information typically delivers a sharper analysis. A language model does the opposite. This is a fundamental limitation, not an engineering problem to be solved with more hardware.
And finally, model collapse – a problem Hans Christian Andersen described better than any AI researcher.
In his story “It’s Quite True!” a hen loses a single feather. The story is retold from perch to perch, and with each retelling it grows: one feather becomes five hens that have plucked each other to death from unrequited love. Andersen captured something fundamental: when information passes through successive stages, it is not just signal that is lost – new patterns are invented.
A 2024 Nature paper (Shumailov et al.) documented that when language models are trained on text generated by previous models, they degenerate through successive iterations. The rare phenomena in the tails of the distribution disappear first – the unexpected formulations, the unique perspectives, the nuances.
The dominant patterns are amplified and distorted. An ICLR 2025 Spotlight paper sharpened the conclusion: even the tiniest fraction of synthetic data – as little as 1 in 1,000 – can trigger collapse. And larger models amplify the effect rather than dampening it.
As of April 2025, 74% of newly created web pages contained AI-generated text. The internet – the primary training source – is increasingly filled with text produced by previous models.
For an economist, this is a classic endogeneity problem: the dependent variable feeds back into the independent variables. The result is biased and inconsistent estimates that diverge from reality over time. Andersen’s henhouse on an industrial scale.
Theory is a modelling strategy
Here is what the entire AI debate is missing: an understanding of the role of theory.
Theory is not something abstract floating above empirics. Theory is the deliberate decision about what to include, what to exclude, and how to structure the relationship between variables.
It is a modelling strategy. Language models have none. They take everything in and search for statistical regularities. This is the opposite of theory – it is inductivism on an industrial scale.
Two of the 20th century’s greatest economists – who often disagreed on methodological matters – agreed on precisely this.
Milton Friedman argued in “The Methodology of Positive Economics” (1953) that a scientific theory must be judged on its predictions, not on the realism of its assumptions – but that one starts with a theory, not with data.
Ludwig von Mises, in “Human Action” (1949), went further: economic theory can be derived logically from fundamental axioms about human action, and empirics alone can never generate theory. They disagreed about method but agreed on the conclusion: pure inductivism – data without theory – is a dead end. That is precisely what language models do.
Try it yourself. Ask ChatGPT to analyse why food prices are rising. The result is not an analysis. It resembles an average article in the Financial Times.
All the right words are there. The structure is tidy. The arguments are recognisable. But there is no original thought, no unexpected angle, no theoretical framework organising the argument in a new way.
It is the average of everything written on the subject – because that is precisely what a language model produces: basically the statistical mean of its training data.
And average can often be good enough. Many people need a competent first draft or a quick summary. But nobody wants to listen to an average economist. Nobody wants to hear an average song – unless it is background music in a supermarket.
What creates value is precisely what deviates from the average: the sharp analysis that sees something others miss. Language models cannot produce that by definition, because they are built to find and reproduce the average.
The exceptional lies furthest from the centre of the training data distribution – and is therefore precisely what the model is worst at.
No, Suleyman – AI will not take all office jobs
Suleyman’s claim deserves a serious economic response. Think of office work as an aggregate of human labour and AI agents.
Economists use CES (constant elasticity of substitution) production functions to model how easily one factor of production can replace another. Even if we assume high substitutability – that AI can perform many office tasks – it does not follow that humans will be replaced.
Because AI agents are not free. Their “wage” is capital and energy.
Every token a language model generates consumes electricity. Every model requires data centres, GPUs, cooling, maintenance. Office workers are “powered” by relatively cheap energy – food, housing, a salary. AI agents are powered by semiconductors, data centres and electricity. It is far from given that the robot is cheaper than the office worker.
The economic logic is clear: even a perfect substitute only replaces human labour if it is cheaper per effective unit of output. And here two independent constraints apply: rising marginal costs (the more we replace human labour with AI, the more the energy market is squeezed) and the inability to think abstractly (even a free AI agent cannot replace the part of office work that requires theory, causal understanding and strategic judgement). Even perfect substitution on the routine dimension combined with zero substitution on the abstract dimension yields a far more modest effect than Suleyman imagines.
I have experienced the productivity paradox personally.
I sat down to write an article about AI and productivity. I chatted with Claude, iterated back and forth, fine-tuned – and suddenly I had spent five hours on something I would previously have written in one. AI felt faster. It was not. The personal productivity paradox mirrors the macroeconomic one.
Moreover, current prices are artificially low. It costs OpenAI more to run many ChatGPT subscriptions than they receive in revenue. They are running at a loss, funded by venture capital (an extremely easy monetary policy in the US in 2020-21). Sooner or later it must become profitable.
When the subscription price goes from $20 to perhaps $200 a month, many applications that currently feel productive will prove not to be economically viable. According to the IEA, global data centre electricity consumption will double to 945 TWh by 2030 – equivalent to Japan’s entire annual electricity consumption.
The evidence: AI makes experienced programmers slower
For those who believe “vibe coding” and AI agents are already replacing programmers, the METR study from July 2025 is a sobering corrective.
In a randomised controlled trial, researchers followed 16 experienced open-source developers across 246 tasks on mature codebases. Developers with AI tools were 19% slower than those without. But – and this is the key – they believed they were 20% faster.
Before the study, they predicted a 24% speed gain. External experts predicted nearly 40%. Everyone was wrong.
69% of developers continued using the AI tools after the study. Not because they were faster, but because it felt easier. The experience of AI productivity and actual AI productivity can be entirely different things. And remember: programming is arguably the domain where AI performs best, because code can be tested externally – you can run it and see whether it works.
If AI makes experienced programmers slower there, what happens in domains without external verification? Law, strategy, analysis, management?
What the data actually show
There is not a single example in economic history of large and sustained technological unemployment. Not one.
The steam engine replaced hand weavers but created factory workers. The car killed the horse and carriage but created mechanics, road builders, suburbs and the service economy.
The computer automated bookkeeping but created an entire IT industry and thousands of new job functions nobody had imagined. The pattern is always the same: the structure changes gradually. Some jobs disappear, new ones emerge, and the transition can be painful for those affected. But mass unemployment as a consequence of technological progress? It has never happened. And there is no reason to believe AI breaks this pattern.
John Maynard Keynes – arguably the most influential economist of the 20th century – predicted in his famous 1930 essay “Economic Possibilities for Our Grandchildren” that his grandchildren’s generation would work just 15 hours a week.
He got most of it right. Living standards in the advanced economies have risen six to eightfold since 1930, precisely as he predicted. But we work far more than 15 hours a week.
Think of it this way: when productivity rises, we receive an extra pound. We can spend that pound on material goods – higher wages, more consumption.
Or we can buy leisure with it – work less for the same pay. You might expect roughly a 50-50 split. But we have consistently chosen consumption over leisure. We would rather have a larger house, a new car than a three-day working week.
Keynes, who literally worked himself to death during the Bretton Woods negotiations in 1946, underestimated this drive. AI will contribute to a gradual reduction in working hours as we become richer and choose more leisure. But it will be a prosperity-driven choice – not involuntary mass unemployment.
Bloom et al.’s study confirms the broader picture. 69% of firms use at least one AI technology, but the average senior executive spends just one and a half hours per week on AI. 28% do not use it at all. Adoption and genuine integration are two very different things.
The software meltdown of February 2026 is itself revealing.
If the market truly believed AI would broadly replace office work, we should have seen a sharp rise in, say, banking equities – banks would stand to save enormously from cheaper software. That did not happen.
The decline in software equities reflects a general nervousness about overvalued technology stocks rather than a genuine expectation of mass automation. AI profit margins remain concentrated in Big Tech , but the broader stock markethas barely moved.
Suleyman is a salesman, not an economist
One ought to ask what Suleyman’s prediction actually is: an economic analysis or a sales pitch?
He is not an economist. He is the head of the AI division at a company investing hundreds of billions of dollars in AI infrastructure, with a clear interest in convincing investors and customers that their product will revolutionise the world.
And one should look at what Microsoft is actually doing – not what Suleyman says.
While he predicts that office work will vanish within 18 months, his company is investing heavily in developing precisely the office tools that AI supposedly renders obsolete.
Throughout 2026, new features and deeper Copilot integration are being rolled out across Word, Excel, PowerPoint and Outlook.
On 1 July 2026, Microsoft will raise prices on Microsoft 365 subscriptions. The company maintains long-term product roadmaps and support programmes that still require human IT staff to handle bugs and compatibility issues.
If Microsoft genuinely believed in full automation by 2027, why raise prices on subscriptions that presuppose human users? Why invest in product updates that AI supposedly makes unnecessary? The answer is obvious: because Microsoft is betting that Office and Azure will continue generating enormous revenues for years to come, with AI as a supplement – not a replacement. The bold predictions are for investors. The business model is for the balance sheet.
The same applies to Anthropic’s Amodei and all the other technology CEOs competing to outdo each other in dramatic predictions.
Their business models depend on investors and markets believing in the AI revolution. That does not make them liars, but it makes them interested parties – and their predictions should be treated accordingly. One would not uncritically accept an oil company’s assessment of climate policy. One should not uncritically accept an AI company’s assessment of AI’s economic consequences.
Implementation is the real bottleneck
I recently spoke with a large international hedge fund. They wanted to use AI internally, but rules around IT security, data protection and client information prevented them from simply opening up the systems. Banks cannot hand their trading systems to something that hallucinates. Hospital records cannot run on a model that invents diagnoses. These are not irrational barriers – they are the reality firms face.
I encounter far too many companies that believe they have “adopted AI” because they have rolled out Copilot in their Microsoft systems.
If that is the strategy, the firm is in trouble. But the opposite fear – of being overtaken – is also exaggerated. Three years ago, I myself believed it would move much faster. I spoke with business leaders who were eager to get started. A year later, I asked the same leaders how it was going. The typical answer: “Well, there were some other things, and it’s a bit difficult, and we actually don’t want to throw away the company culture.” All perfectly rational choices. But it means implementation takes years, not months.
The parallel to the internet is striking. In the mid-1990s, every company needed a website. But nobody knew what to use it for. It became a glorified address book with a phone number and a postal address. Then people began saying you could perhaps sell things through it. Today you can buy everything online. But it took 15 to 20 years. E-commerce only truly broke through in the 2010s – far later than any of us believed in the mid-1990s. AI will likely follow the same pattern.
This is Solow’s paradox once more: the computer’s productivity gains came only decades later, when firms learnt to reorganise their work processes. Implementation, not faster processors, drove the growth. For AI investors and decision-makers, the implication is clear: if the value is in implementation, people with domain knowledge become more valuable, not less.
The last 5% is everything that matters
The first 90–95% of many tasks is routine. Writing a standard letter, producing background music for an advertisement, drafting a market analysis, generating boilerplate code, summarising a document. This is assembly-line work in the knowledge economy. AI can do it more cheaply and quickly. Those gains should be captured.
But the last 5-10% is everything that matters. The original insight in an analysis, the unexpected move in a composition, the theoretical framework that gives data meaning, the strategic judgement that correctly weighs risks. Abstract thought. Precisely what language models cannot do.
And here we hit something civilisational.
The capacity for abstract thought does not arise in a vacuum. It develops through practice – through wrestling with problems, making mistakes, revising one’s understanding, building intuition over time.
Think of the assistant analyst in a bank’s research department who is asked to photocopy and bind presentations. It is routine work – but in the process she reads the material, walks over to the economist who assigned the task, and they discuss what it actually means. That is on-the-job training. That is how one learns.
If AI takes over the routine, that learning process vanishes. A young economist who has never struggled to specify a model, understand residuals and think about what the data are telling her will never develop the deep intuition that enables her to deliver the crucial 10%.
Just as in music: one cannot compose anything original without having practised scales, played other people’s music, understood harmony and structure from the inside. The repetitive exercises are the precondition for the creative leap. If AI takes over the exercises, the foundation for the leap disappears.
The real risk is not that AI replaces thought now – it cannot. It is that it is just intelligent enough to tempt us to stop thinking for ourselves. Slow atrophy, not sudden disruption. Far harder to measure and counteract than job losses.
AI’s greatest threat is not that it is too intelligent. It is that it is just intelligent enough to fool us into believing it is intelligent.
Conclusion
Mustafa Suleyman is wrong. Not about everything – AI is changing office work, and it is doing so already. But the claim that most office work will vanish within two years is economically untenable. And one should remember that Suleyman is not an economist – he is a salesman for the product he predicts will revolutionise the world.
AI cannot think. It is pattern recognition, not abstraction. It lacks theory, causal understanding and the ability to weigh relevance. It delivers average analyses, never exceptional ones.
It is not free. It requires capital and energy on a scale that makes it far from certain that it is cheaper than humans – especially for the complex tasks where value is created.
The empirical evidence shows it. The METR study shows that AI makes experienced programmers slower.
And there is not a single example in economic history of large and sustained technological unemployment. Keynes predicted in 1930 that we would work 15 hours a week. We chose to work more and consume more. AI will not change that dynamic.
AI will deliver real productivity gains – primarily concentrated in domains that are already ripe for automation. Valuable, but not a new industrial revolution. For economists, strategists and decision-makers, the message is clear: AI is a tool, not a replacement for thinking. The most important asset in the knowledge economy remains the human one: the ability to formulate a theory, weigh relevance and understand abstract quantities. No language model can do that.
Data without theory is noise — whether the tool is a simple regression or the world’s largest language model.

Harry Chernoff
/ February 24, 2026Me
Harry Chernoff
/ February 24, 2026Methinks thou doth protest too much. The list of extreme right-tail AI accomplishments is growing: AlphaFold, Halicin, GNoME, for starters. Yes, this version of the industrial revolution and creative destruction will take time but it will also be faster than ever before and at an unimaginable scale.