Back to Blog

AI

Manus vs Cursor vs Devin in 2026: which AI coding agent actually ships your code?

Three months running real client tasks through Manus, Cursor Composer, and Devin. The data shows where each one wins and where each one fails.

S
Sarma
22 March 202613 min readLast verified 3 May 2026
ShareLinkedInX

I have run 50 real client tasks through four AI coding agents this quarter. Here is the honest comparison.

Pass rates

Chart
Task completion rate, 50 real client tasks

Source: My own tracking, Q1 2026

Claude Code (the CLI tool) won this on first-try pass rate (62 percent). Cursor was close second (56 percent). Manus and Devin were lower.

That is for tasks under 30 minutes of human work. For tasks that take a human three or four hours, the numbers shift; Manus catches up because it can actually research.

Strengths

When to use what
SpecManusCursorDevinClaude Code
Best forMulti-step researchIDE coding flowAsync background tasksTerminal-led editing
Failure modeWandersQuick refusalSlow loopsOver-edits
Speed (avg task)12 min4 min38 min6 min
Cost (per real task)~$0.80~$0.20~$2.50~$0.30
My pick forSpec explorationDaily codingOvernight choresSurgical edits

Manus is best when I do not know what the answer is. "Research the cheapest way to host a Postgres for an EU SaaS, write me a one-pager and a deploy script" is a Manus task[1].

Cursor is best when I know what I want and want fast iteration in the IDE. Composer mode with Claude Sonnet 4.6 handles most of my daily code work[2].

Devin is best for async background work. "Take this list of 30 small bug reports and try to fix each one overnight." Slower per task but you give it 8 hours and come back to a PR list[3].

Claude Code is best for surgical editing in a single repo. The terminal flow is faster than mouse-driven IDE work for many tasks.

Cost

A typical "fix a bug, add a test" task:

  • Cursor: $0.20 in tokens
  • Claude Code: $0.30
  • Manus: $0.80 (does more, costs more)
  • Devin: $2.50 (slow loops)

For a freelancer billing £80/hour these are rounding error. The constraint is correctness and time, not token cost.

Where they all fail

Cross-cutting refactors that touch 20+ files. All four lose context, all four make inconsistent choices across files. Humans still do these better.

Anything that requires understanding business logic that is not in the codebase. Easy to fall back to "well, I will guess." Critical if you do not review carefully.

My setup

For active client work: Cursor Composer + Claude Code in a tmux split. Cursor for IDE-led tasks, Claude Code for terminal-led tasks.

For overnight: Devin handles a queue of "boring but real" tickets, I review the PRs in the morning.

For exploration: Manus when I do not know the answer.

There is no single tool. The 2026 AI coding setup is a toolkit, not a single product.

About the data

A note on what the numbers in this post represent so you can read them with the right confidence:

  • "My own bench" rows are personal measurements on my own hardware. They are honest about my setup and reproducible there, but they should not be treated as universal benchmark scores.
  • Benchmark numbers attributed to public sources (Geekbench Browser, DXOMARK, NotebookCheck, FIA timing) are illustrative, the trend is what matters, not the third decimal place. Cross-check against the source for anything you would act on financially.
  • Client outcomes and ROI percentages in business-focused posts are anonymised composites drawn from my own consulting work. Real numbers, real direction, sanitised so individual clients are not identifiable.
  • Foldable crease-depth and similar engineering measurements are estimates pulled from teardown reports and reviewer claims; manufacturers do not publish these directly.
  • Forecasts and "what I bet" lines are exactly that, opinions, not predictions with a track record yet.

If you spot a number that contradicts a source you trust, tell me, I would rather correct it than be the chart that was off by 6 percent and pretended otherwise.

Live: latest HN discussion on Manus

The agent space changes weekly. Latest threads matching "manus" on HN:

Live
Hacker News mentions (live)
Fetching live data…

Source: HN Algolia · cached 10–60 min

References

  1. [1]

    Manus capabilities

    https://manus.im
  2. [2]

    Cursor pricing and models

    https://www.cursor.com/pricing
  3. [3]

    Devin AI overview

    https://devin.ai

Comments

Sign in to comment, reply, and like.

By signing in, Sarma will receive your name, avatar, email, sign-in provider, and approximate location (country/city, derived from your IP) for moderation and reply purposes. None of this is shown publicly, only your name and avatar appear on the post. No newsletter, no marketing, no third-party sharing.

Loading comments…
S

Sarma

Independent software engineer, AI systems, automation platforms, and modern infrastructure.

More in AI

Work with Sarma

Have a project in mind?

I take on a small number of projects each quarter, AI systems, automation, infrastructure, and full-stack engineering.

Get in touch