Ten UK SME projects shipped in 2025 and early 2026. Here is what the spreadsheet says.
ROI
Source: My own client outcomes, anonymised
The wins were narrow and concrete. The losers were vague.
The single best ROI was receipt scanning for a chain of three coffee shops[1]. £0.02 per receipt at OpenAI pricing, scanned 4,200 receipts/month, replaced 8 hours/week of bookkeeping at £25/hour. Annualised ROI: 410 percent. Total cost £84/month, total saved £833/month.
The worst ROI was a "generic AI copilot" added to an internal tool. Cost £400/month in tokens, employees used it five times a week each. Time saved: ~2 hours/week across 8 staff. ROI: 35 percent against the licence cost, before counting onboarding.
Pattern: works versus theatre
| Spec | Worked | Theatre |
|---|---|---|
| Replaces 30+ hrs/month manual work | Yes | No |
| Has a measurable output | Receipt → spreadsheet row | Generic "AI insights" |
| Cost transparent | £0.02/receipt | "AI is included" |
| User actually uses it | Daily | Tried once |
| Owner can debug | Yes (logs visible) | No (black box vendor) |
The works column is "AI replaces a specific manual step that costs measurable time."
The theatre column is "AI sprinkled on an existing product without changing the workflow."
Recipe for what works
- Find a manual task someone does for 30+ hours per month
- Define the input and output precisely (receipt → spreadsheet row)
- Use the cheapest model that works (Gemini Flash or GPT-4o-mini almost always enough)
- Add a confidence threshold below which a human reviews
- Show the cost per transaction in the UI
Recipe for theatre
- Add a chat interface
- Connect it to an LLM
- Hope users figure it out
The chat interface is the killer. It says "AI" but does not change the workflow. People do not type questions to internal tools; they want answers in the existing flow.
What I no longer recommend
Generic Copilot deployments at small companies. The "we will use it" never materialises beyond the first month. The bigger the company, the more this works; under 50 staff, almost never.
Sentiment analysis dashboards. The output never gets actioned because the owner already knew the answer.
What I still recommend
Document classification (invoices, receipts, contracts → categorised, indexed) KB-grounded FAQ bots (cuts support tickets) Lead-form-to-CRM routing Email triage with rule extraction
About the data
A note on what the numbers in this post represent so you can read them with the right confidence:
- "My own bench" rows are personal measurements on my own hardware. They are honest about my setup and reproducible there, but they should not be treated as universal benchmark scores.
- Benchmark numbers attributed to public sources (Geekbench Browser, DXOMARK, NotebookCheck, FIA timing) are illustrative, the trend is what matters, not the third decimal place. Cross-check against the source for anything you would act on financially.
- Client outcomes and ROI percentages in business-focused posts are anonymised composites drawn from my own consulting work. Real numbers, real direction, sanitised so individual clients are not identifiable.
- Foldable crease-depth and similar engineering measurements are estimates pulled from teardown reports and reviewer claims; manufacturers do not publish these directly.
- Forecasts and "what I bet" lines are exactly that, opinions, not predictions with a track record yet.
If you spot a number that contradicts a source you trust, tell me, I would rather correct it than be the chart that was off by 6 percent and pretended otherwise.