I run production AI workloads on all three clouds in 2026. Each has a position. None is universally best.
Cost
Source: Public price lists, May 2026 — Gemini 2.5 Pro: $1.25 in / $10 out per Google Cloud docs
Azure OpenAI matches OpenAI direct for input and output token pricing[1]. Bedrock's Claude 3.5 Sonnet is roughly 20 percent more expensive on input and 50 percent more expensive on output[2]. Vertex Gemini 2.5 Pro is the cheapest GPT-4o-class option, by a meaningful margin[3].
If your workload is "lots of small inference calls", Vertex Gemini will save you money. If it is "large prompts with few outputs", any of them works.
Latency
Source: My own bench, 100 req/min
Azure OpenAI EU is the fastest for users in London. Vertex eu-west2 is second. Bedrock eu-west is the slowest, partially because Anthropic's models are routed through US backends in some regions even when you select EU.
Pick by workload
| Spec | Azure OpenAI | AWS Bedrock | GCP Vertex |
|---|---|---|---|
| Headline model | GPT-4o, o1 | Claude 3.5 / 3.7 | Gemini 2.5 Pro |
| Compliance / SOC2 | Excellent | Excellent | Excellent |
| Private endpoint | Yes (PrivateLink) | Yes (PrivateLink) | Yes (PSC) |
| Token caching | Yes | Yes | Yes (best) |
| Throughput tier | PTU (reserved) | On-demand only | On-demand + commitments |
| Best for | GPT exclusive workloads | Claude + multi-model | Cheapest Gemini at scale |
| Vendor lock | High (Azure SDK) | Medium (Bedrock SDK) | Medium (Vertex SDK) |
The decision is not really "which is fastest" or "which is cheapest." It is:
- Are you locked into Azure/AWS/GCP for non-AI reasons (existing footprint)?
- Do you have a model preference (GPT-4o → Azure, Claude → Bedrock, Gemini → Vertex)?
- Do you need PTU/reserved capacity (Azure has the best story here)?
What I actually do
For SarmaLink-AI and other Sarma-side products: Vertex Gemini 2.5 Flash for cheap, fast, large-volume calls. Azure OpenAI for anything that absolutely needs GPT-4o or o1. Bedrock for Claude when I want extended thinking models.
The right answer in 2026 is "use whichever model fits the task" and pick the cloud that hosts that model with the best pricing in your region.
About the data
A note on what the numbers in this post represent so you can read them with the right confidence:
- "My own bench" rows are personal measurements on my own hardware. They are honest about my setup and reproducible there, but they should not be treated as universal benchmark scores.
- Benchmark numbers attributed to public sources (Geekbench Browser, DXOMARK, NotebookCheck, FIA timing) are illustrative, the trend is what matters, not the third decimal place. Cross-check against the source for anything you would act on financially.
- Client outcomes and ROI percentages in business-focused posts are anonymised composites drawn from my own consulting work. Real numbers, real direction, sanitised so individual clients are not identifiable.
- Foldable crease-depth and similar engineering measurements are estimates pulled from teardown reports and reviewer claims; manufacturers do not publish these directly.
- Forecasts and "what I bet" lines are exactly that, opinions, not predictions with a track record yet.
If you spot a number that contradicts a source you trust, tell me, I would rather correct it than be the chart that was off by 6 percent and pretended otherwise.
Live: Azure OpenAI threads on Hacker News
Production stories age faster than vendor docs. Latest threads on "azure openai":
Source: HN Algolia · cached 10–60 min