Spending is up. Returns aren't. Inside the 2026 AI p…

Are You Using AI, or Just Feeling Productive?

You finish a task with AI and it feels fast.

The blank page disappears quickly. The first draft arrives before you have fully organised your own thoughts. The tool suggests the next line, fills the gap, explains the unfamiliar bit, writes the boilerplate, and keeps the work moving.

It feels productive.

That is the dangerous part.

In July 2025, the nonprofit research group METR ran a small but revealing experiment. Sixteen experienced developers were asked to complete 246 real-world tasks taken from their own open-source projects. These were not toy problems. They were mature codebases, often with millions of lines of code, and the developers already knew the projects well.

Before the trial, the developers expected AI to make them about 24% faster. After the work was done, they still believed it had made them about 20% faster.

The stopwatch disagreed.

With AI, they were 19% slower.

That gap is the most important part of the story. Not because it proves AI does not work. It does not prove that. AI can be useful, and in many contexts it clearly is. The more uncomfortable lesson is that people are not always good at knowing when AI is helping.

We can feel faster without being faster.
We can report productivity gains without producing financial return.
We can adopt AI widely without using it well.

That is becoming one of the most expensive judgement calls in enterprise software.

The spending is real. The proof is harder.

The defining feature of AI in 2026 is not simply its capability. It is the difficulty of telling whether that capability is actually paying off.

The largest technology companies are spending hundreds of billions of dollars on AI infrastructure. Enterprises are buying tools, running pilots, training teams, and rewriting strategy decks around generative AI and agentic systems.

The activity is obvious.

The return is less obvious.

Across enterprise surveys, the pattern keeps repeating. Many organisations report productivity gains from AI. Far fewer can point to meaningful bottom-line impact once the cost of tooling, integration, training, governance, and operational change is included.

That distinction matters.

A team can use AI more and still not save money. A department can produce more drafts, more summaries, more tickets, more prototypes, and still not improve the economics of the business. A company can look modern, busy and AI-enabled while quietly adding cost, complexity and risk.

This is the AI productivity trap.

The work feels faster. The organisation feels more advanced. The dashboards show adoption. But nobody has done the harder measurement: did the work become cheaper, better, faster or more valuable in a way that actually matters?

Why some teams get value and others do not

It would be easy to read this as a bubble story. That is too simple.

The better reading is that AI value is real, but uneven. A small group of organisations are getting measurable returns. The majority are still confusing activity with progress.

The difference is not usually enthusiasm. Almost everyone is enthusiastic.

The difference is discipline.

The teams getting better results tend to do four things differently.

First, they start with a specific business problem rather than a general AI ambition. They are not asking, “Where can we use AI?” They are asking, “Which costly, slow or repetitive part of our operation should improve, and how will we know?”

That question changes everything. It forces the team to define the baseline before choosing the tool. It also prevents AI from becoming a vague innovation exercise with no commercial owner.

Second, they put AI inside the workflow rather than next to it. Tools that sit in a separate window are easy to try and easy to abandon. Tools embedded into the document, ticket, approval flow, CRM, codebase or reporting process are much harder to ignore because they meet the user where the work already happens.

Adoption follows friction.

If the tool requires people to stop, switch context, explain the task, copy the output back, check it, reformat it and then continue, the novelty has to do too much work. In most companies, novelty fades faster than process changes.

Third, the better teams give AI initiatives a named owner with financial accountability. Not a committee. Not a vague innovation group. Not “IT and the business”.

A person.

Someone has to own the outcome. Not usage. Not adoption. Not the number of prompts submitted or licences activated. The outcome.

Did support resolution time fall?
Did quote turnaround improve?
Did manual review cost reduce?
Did conversion increase?
Did error rates drop?
Did revenue or margin move?

Without that ownership, AI becomes another place where companies can generate impressive activity without changing the underlying result.

Fourth, they measure before they scale. This is the least glamorous habit and probably the most important.

The organisations that struggle usually deploy first and try to justify later. The stronger ones define the use case, baseline, target outcome and failure criteria before rollout. They prove value in one narrow area, then expand.

That sounds obvious. It is not how many companies behave.

A lot of AI adoption is still driven by pressure. Pressure from competitors. Pressure from investors. Pressure from boards. Pressure from employees who want better tools. Pressure from vendors selling the future.

That pressure creates motion. It does not automatically create value.

The individual version is the same problem

The same pattern shows up at your desk.

AI often feels useful because it removes friction from the beginning of a task. It gives you a first version, a starting point, an explanation, a direction. That is valuable, especially when the alternative is staring at a blank page or trying to understand something unfamiliar from scratch.

But the beginning of the task is not the whole task.

The cost often moves elsewhere.

You save time writing the first draft, then spend time checking whether it is true. You generate code quickly, then spend time finding the subtle issue. You ask for a summary, then realise it missed the one point that mattered. You get a confident answer, then have to verify the source. You move faster at first, but slower by the end.

That is where AI becomes difficult to judge. It makes the visible part of work faster. It can make the hidden part longer.

The METR study is useful because it captured that hidden cost. The developers were not careless. They were experienced, familiar with their own projects, and working in good faith. They still misjudged the effect of the tool.

That should make everyone using AI a little more cautious.

Not pessimistic. Cautious.

There are still plenty of places where AI is genuinely useful. It can be very strong for drafting, summarising, boilerplate, research preparation, idea generation, code scaffolding, test suggestions, explanation, translation, formatting and repetitive administrative work.

The key distinction is verification.

AI is most useful where verification is fast. If you can quickly tell whether the output is right, useful or close enough, the risk is manageable. If verification is slow, expensive or requires deep expertise, the productivity gain can disappear quickly.

That is why AI can be helpful for drafting a meeting summary, but risky for production logic. Helpful for generating options, but risky for final judgement. Helpful for speeding up familiar administrative work, but dangerous when nobody knows enough to check the answer properly.

The question is not whether to use AI more or less.

The question is where the review cost sits.

Feeling productive is not the same as being productive

This is the part most AI conversations avoid.

AI makes work feel different. It reduces silence. It reduces blank space. It reduces the feeling of being stuck. It gives you something to react to, which is often easier than creating from nothing.

That has psychological value.

But psychological value and business value are not the same thing.

A tool can make a person feel more capable while producing no measurable improvement. A team can feel more modern while adding operational complexity. A company can appear more innovative while making its processes harder to govern.

None of that means AI is fake. It means AI needs measurement.

And measurement is boring.

It means timing repeated tasks. Checking error rates. Comparing before and after. Looking at cost per outcome, not just usage. Asking whether the work got better or merely more comfortable. Deciding in advance what success would need to look like.

This is the discipline most companies skip because it slows down the story.

The story they want is simple: we adopted AI, so we are more efficient.

The reality is more conditional: we adopted AI in this specific workflow, measured this baseline, changed this process, trained these people, removed this manual step, reduced this cost, and verified the result.

That version is less exciting. It is also more likely to be true.

The tools will improve. The measurement problem will not disappear.

The models used in the METR experiment are already old by AI standards. Newer systems are more capable. Agentic workflows are different from autocomplete. Tools that can run in the background, propose patches, write tests, inspect documents or complete multi-step tasks may change the economics again.

That is exactly why measurement matters more, not less.

As the tools become more capable, the feeling of progress will become even more persuasive. Outputs will look better. Agents will do more. Interfaces will become smoother. The line between useful assistance and expensive theatre may become harder to see from the inside.

Every major technology wave creates the same split.

One group treats the technology as a question. They test it, measure it, narrow it, improve it and scale what works.

Another group treats the technology as an answer. They buy it, announce it, spread it everywhere and look for proof afterwards.

The first group usually writes the case studies.

The second group usually appears in them.

AI is not different, except in scale. The capital is larger. The pace is faster. The pressure is higher. The perception gap is wider. And the cost of being wrong grows with every new licence, workflow, integration and strategic promise.

The most expensive thing in technology right now may not be AI itself.

It may be confidence without measurement.

The lesson from METR is not that AI makes developers slower. The lesson is more important than that. Even skilled people, doing real work they understand, can be wrong about whether AI helped them.

That is the warning worth carrying through 2026.

The technology will keep improving. The discipline will still matter. And every team, executive and individual using AI seriously will need to answer the same question.

Are you using AI?

Or are you just feeling productive?

Researching a software decision? get a free project report or send us a brief.

Spending is up. Returns aren't. Inside the 2026 AI paradox.