Back to Blog

Cloud

Azure OpenAI vs AWS Bedrock vs GCP Vertex in 2026: where to actually run Claude, GPT, and Gemini

Three clouds. Three model marketplaces. Different pricing curves and very different lock-in stories. The data shows which one fits which workload.

S
Sarma
30 January 202614 min readLast verified 3 May 2026
ShareLinkedInX

I run production AI workloads on all three clouds in 2026. Each has a position. None is universally best.

Cost

Chart
Per-million-token cost (USD), GPT-4o-class model

Source: Public price lists, May 2026 — Gemini 2.5 Pro: $1.25 in / $10 out per Google Cloud docs

Azure OpenAI matches OpenAI direct for input and output token pricing[1]. Bedrock's Claude 3.5 Sonnet is roughly 20 percent more expensive on input and 50 percent more expensive on output[2]. Vertex Gemini 2.5 Pro is the cheapest GPT-4o-class option, by a meaningful margin[3].

If your workload is "lots of small inference calls", Vertex Gemini will save you money. If it is "large prompts with few outputs", any of them works.

Latency

Chart
p50 first-token latency (ms), London region

Source: My own bench, 100 req/min

Azure OpenAI EU is the fastest for users in London. Vertex eu-west2 is second. Bedrock eu-west is the slowest, partially because Anthropic's models are routed through US backends in some regions even when you select EU.

Pick by workload

Pick by workload
SpecAzure OpenAIAWS BedrockGCP Vertex
Headline modelGPT-4o, o1Claude 3.5 / 3.7Gemini 2.5 Pro
Compliance / SOC2ExcellentExcellentExcellent
Private endpointYes (PrivateLink)Yes (PrivateLink)Yes (PSC)
Token cachingYesYesYes (best)
Throughput tierPTU (reserved)On-demand onlyOn-demand + commitments
Best forGPT exclusive workloadsClaude + multi-modelCheapest Gemini at scale
Vendor lockHigh (Azure SDK)Medium (Bedrock SDK)Medium (Vertex SDK)

The decision is not really "which is fastest" or "which is cheapest." It is:

  • Are you locked into Azure/AWS/GCP for non-AI reasons (existing footprint)?
  • Do you have a model preference (GPT-4o → Azure, Claude → Bedrock, Gemini → Vertex)?
  • Do you need PTU/reserved capacity (Azure has the best story here)?

What I actually do

For SarmaLink-AI and other Sarma-side products: Vertex Gemini 2.5 Flash for cheap, fast, large-volume calls. Azure OpenAI for anything that absolutely needs GPT-4o or o1. Bedrock for Claude when I want extended thinking models.

The right answer in 2026 is "use whichever model fits the task" and pick the cloud that hosts that model with the best pricing in your region.

About the data

A note on what the numbers in this post represent so you can read them with the right confidence:

  • "My own bench" rows are personal measurements on my own hardware. They are honest about my setup and reproducible there, but they should not be treated as universal benchmark scores.
  • Benchmark numbers attributed to public sources (Geekbench Browser, DXOMARK, NotebookCheck, FIA timing) are illustrative, the trend is what matters, not the third decimal place. Cross-check against the source for anything you would act on financially.
  • Client outcomes and ROI percentages in business-focused posts are anonymised composites drawn from my own consulting work. Real numbers, real direction, sanitised so individual clients are not identifiable.
  • Foldable crease-depth and similar engineering measurements are estimates pulled from teardown reports and reviewer claims; manufacturers do not publish these directly.
  • Forecasts and "what I bet" lines are exactly that, opinions, not predictions with a track record yet.

If you spot a number that contradicts a source you trust, tell me, I would rather correct it than be the chart that was off by 6 percent and pretended otherwise.

Live: Azure OpenAI threads on Hacker News

Production stories age faster than vendor docs. Latest threads on "azure openai":

Live
Hacker News mentions (live)
Fetching live data…

Source: HN Algolia · cached 10–60 min

References

  1. [1]
  2. [2]
  3. [3]

Comments

Sign in to comment, reply, and like.

By signing in, Sarma will receive your name, avatar, email, sign-in provider, and approximate location (country/city, derived from your IP) for moderation and reply purposes. None of this is shown publicly, only your name and avatar appear on the post. No newsletter, no marketing, no third-party sharing.

Loading comments…
S

Sarma

Independent software engineer, AI systems, automation platforms, and modern infrastructure.

More in Cloud

Work with Sarma

Have a project in mind?

I take on a small number of projects each quarter, AI systems, automation, infrastructure, and full-stack engineering.

Get in touch