Technical Deep Dive

How SarmaLink-AI works

The complete breakdown. How 36 engines across 7 providers deliver 99.9999% uptime at zero cost. Why every technology was chosen. How it compares to everything else.

The Problem

Every AI app has
a single point of failure.

You build an app on top of OpenAI. It works beautifully for 3 months. Then one day GPT-4o returns a 429 error. Your users see a broken page. Your support inbox fills up.

You switch to Anthropic. Works for 2 months. Then Claude goes down for maintenance. Same story.

The problem is not that providers are unreliable. They are remarkably reliable for 99% of the time. The problem is that the 1% is unpredictable, and your users experience 100% of it.

OpenAI
Rate limit (429)
User sees error
Anthropic
Maintenance window
App down 2 hours
Google
API quota exhausted
Chat stops working
Groq
Capacity spike
Slow or failed requests
This happens to every single-provider app
The Solution

Chain every provider together.
Fail over in 50ms.

Instead of depending on one provider, SarmaLink-AI treats every provider as a commodity. If one is busy, the next fires instantly. Users never see errors.

<span class="dim">User sends: "Draft a follow-up email to the supplier"</span> <span class="hl">Auto-router</span> detects: professional writing → <span class="hl">Smart mode</span> (14-engine failover) <span class="hl">Step 1</span> · SambaNova · DeepSeek V3.2 (685B) → Key 3 (round-robin rotation across 8 keys) → POST /v1/chat/completions → <span class="ok">200 OK</span> · First token in 820ms → Streaming SSE chunks to browser... → <span class="ok">Done. 380 tokens in 2.1 seconds.</span> <span class="dim">User sees a polished email. No errors. No delay.</span> <span class="dim">If Step 1 had returned 429, Step 2 would fire in 47ms.</span> <span class="dim">If all 14 steps fail, the user sees a rate-limit message.</span> <span class="dim">That has never happened in production.</span>
How Failover Works

What happens when a provider fails

A real trace from production. The primary engine returned 429. The user never knew.

<span class="dim">User: "Fix: 'Type Date is not assignable to string'"</span> <span class="hl">Auto-router</span> detects TypeScript error → <span class="hl">Coder mode</span> (9-engine failover) <span class="hl">Step 1</span> · SambaNova · DeepSeek V3.2 685B → rotate to key 3/8 (round-robin) → POST /v1/chat/completions → <span class="err">429 Too Many Requests</span> (quota exceeded this minute) → logEvent(status: 'rate_limited', latency: 38ms) <span class="dim">↓ 47ms later</span> <span class="hl">Step 2</span> · Cerebras · Qwen 3 Coder 480B → rotate to key 1/4 → POST /v1/chat/completions (streaming) → <span class="ok">200 OK</span> · First token in 94ms → Streaming... → <span class="ok">Done. 340 tokens in 1.4 seconds total.</span> <span class="dim">User saw: a correct TypeScript fix in 1.4 seconds.</span> <span class="dim">User experienced: zero errors, zero retries, zero friction.</span> <span class="dim">What actually happened: a 685B model was busy, a 480B model took over.</span> <span class="dim">The handoff took 47 milliseconds.</span>
47
milliseconds to fail over
Faster than a human blink (150ms)
14
maximum failover steps
Smart mode — the deepest chain
0
errors users have ever seen
From a full-chain exhaustion
Technology Choices

Why this, not that

Every technology in the stack was chosen for a reason. Here is the reasoning — and what was rejected.

Next.js 14 (App Router)

Why we use it

Server components keep API keys server-side. App Router gives file-based routing, streaming responses, and edge deployment. The entire backend is API routes — no separate server needed.

Why not the alternative

Express.js — requires a separate server process, no built-in SSR, no edge deployment. Would need two repos (frontend + backend) instead of one.

TypeScript

Why we use it

Catches provider API shape changes at compile time. When SambaNova changes their response format, TypeScript finds every broken call site before users do. 90 tests + strict types = confidence to ship fast.

Why not the alternative

JavaScript — runtime errors from undefined properties are the #1 cause of "it worked on my machine." In a multi-provider system with 7 different API shapes, that is suicide.

Supabase (PostgreSQL)

Why we use it

Auth, database, and Row-Level Security in one service. RLS means even if my code has a bug, PostgreSQL refuses to serve User A's conversations to User B. The free tier gives 1GB storage and unlimited API calls.

Why not the alternative

Firebase — NoSQL makes cross-user queries (admin health, usage analytics) painful. No row-level security enforcement at the database level. Firestore's pricing model punishes high-read workloads like chat.

Vercel

Why we use it

Zero-config deployment for Next.js. Push to GitHub, live in 60 seconds. Edge functions for the streaming endpoints. Generous free tier (100GB bandwidth). But the app runs on ANY Next.js host — not locked in.

Why not the alternative

AWS / GCP — 50x more configuration for the same result. ECS, ALB, Route53, ACM, CloudFront — or one git push to Vercel. For a solo dev shipping an open-source project, complexity is the enemy.

SSE (Server-Sent Events)

Why we use it

One-way streaming from server to client — exactly what chat needs. Works through every proxy, CDN, and firewall. Native browser support with EventSource. No library needed on the client.

Why not the alternative

WebSockets — bidirectional, but chat only needs server→client streaming. WebSockets break through many corporate proxies, require connection management, and don't work on Vercel Edge. Overkill.

Cloudflare R2

Why we use it

S3-compatible object storage with zero egress fees. 10GB free. Stores uploaded PDFs, Excel files, and generated images. Signed URLs expire in 7 days for security.

Why not the alternative

AWS S3 — egress fees add up fast when serving images. Vercel Blob — limited free tier and vendor-locked. R2 is S3-compatible, so migration is a one-line endpoint change.

Vitest

Why we use it

Vite-native test runner — 90 tests in 800ms. Same module resolution as the Next.js build. No config file needed. Jest compatibility mode means most patterns transfer directly.

Why not the alternative

Jest — slower startup, requires separate ts-jest config, doesn't share Vite's module graph. For a project with path aliases (@/lib/...) and TypeScript, Vitest is drop-in.

Honest Comparison

SarmaLink-AI vs everything else

No marketing spin. What each option does well, what it doesn’t, and an honest verdict.

ChatGPT Plus

$20/monthHosted AI chat by OpenAI
Strengths
  • +Best-in-class models (GPT-4o, o1)
  • +Polished UI
  • +Plugins ecosystem
Weaknesses
  • -One provider — if OpenAI is down, you're down
  • -No self-hosting
  • -No data ownership
  • -No failover
  • -Can't customise the system prompt
  • -$240/year per user
Great product. But you're renting it. You can't read the code, can't host it yourself, can't control your data. When OpenAI raises prices or deprecates a feature, you eat it.

LibreChat

Free (open source)Multi-provider chat UI
Strengths
  • +Open source
  • +Supports multiple providers
  • +Plugin system
  • +Good UI
Weaknesses
  • -Requires Docker to deploy
  • -No automatic failover — if your selected provider fails, the message fails
  • -No built-in live tools
  • -No persistent memory
  • -Complex config
The closest alternative. But no failover means a 429 from OpenAI = user sees an error. Requires Docker knowledge to deploy. No AI-assisted setup.

OpenWebUI

Free (open source)Self-hosted ChatGPT UI for Ollama/OpenAI
Strengths
  • +Beautiful UI
  • +Ollama integration for local models
  • +Active community
Weaknesses
  • -Designed for local models — cloud provider support is secondary
  • -No multi-provider failover
  • -Requires Docker
  • -Python backend — different ecosystem from web apps
  • -No live tools
Best choice if you want local LLMs on your own GPU. But if you want cloud providers with failover, it's not designed for that.

LobeChat

Free (open source)Modern chat UI with plugin marketplace
Strengths
  • +Beautiful design
  • +Plugin marketplace
  • +Supports many providers
Weaknesses
  • -No automatic failover
  • -Client-side API calls expose keys in the browser
  • -No server-side auth or RLS
  • -No persistent memory across sessions
Pretty UI, but API keys live in the browser. No server-side security. Fine for personal use, risky for teams.

LiteLLM

Free (library)Python library for multi-provider LLM calls
Strengths
  • +Supports 100+ providers
  • +Unified API
  • +Good fallback config
Weaknesses
  • -It's a library, not an app — you still have to build everything else
  • -Python only
  • -No UI, no auth, no database, no streaming, no deployment
Excellent library. But it gives you the failover engine and nothing else. No auth, no database, no streaming UI, no memory, no tools. You're building an app from scratch.

SarmaLink-AI

Free (open source)Full-stack AI assistant with automatic failover
Strengths
  • +36 engines, 7 providers, automatic failover in <50ms
  • +Full app — auth, database, RLS, streaming, memory, tools
  • +AI-assisted setup — non-developers can deploy in 15 min
  • +White-label via env vars — zero code changes
  • +TypeScript + Next.js — the web's most popular stack
  • +MIT license
Weaknesses
  • -Newer project — smaller community than LibreChat/OpenWebUI
  • -No local model support yet (cloud providers only)
  • -Opinionated stack (Next.js + Supabase + Vercel)
The only option that gives you failover + full app + AI setup + white-labeling out of the box.
What No One Else Does

Let AI set up your AI.

Every open-source AI project requires Docker, terminal commands, and 45 minutes of documentation reading. SarmaLink-AI ships with a setup skill that any AI coding tool can use.

Every other open-source AI project
Read a 200-line README
Install Docker
Run docker-compose up
Debug port conflicts
Manually create database tables
Copy 15+ environment variables
Figure out which keys are required vs optional
Debug build errors alone
Google the error messages
Give up and use ChatGPT instead
SarmaLink-AI
Clone the repo
Open in Claude Code (or Cursor, Copilot, ChatGPT)
Say "help me set up"
AI walks you through creating free accounts
AI creates your .env.local
AI runs the database migration
AI tests every API key
AI builds and deploys
15 minutes. Zero terminal knowledge.
Your AI assistant is live.
The Economics

How it costs $0

Every provider in the stack offers a free tier. Combined, they serve thousands of requests per day. No credit card needed for any of them.

Groq

Free tier: 14,000 req/day per key
Keys: Unlimited keys via Gmail aliases
126,000+ req/day

SambaNova

Free tier: 5,000 req/day per key
Keys: 8 keys
40,000 req/day

Cerebras

Free tier: 5,000 req/day per key
Keys: 4 keys
20,000 req/day

Google Gemini

Free tier: 250 req/day per key
Keys: 12 keys
3,000 req/day

OpenRouter

Free tier: 1,000 req/day (:free)
Keys: 5 keys
5,000 req/day

Cloudflare

Free tier: 10,000 neurons/day
Keys: Workers AI free tier
10,000 images/day
207,000+
combined requests per day — enough for ~15,000 daily active users
vs ChatGPT Plus: $20/user/month × 15,000 users = $300,000/month
Who It’s For

Built for builders

Solo developers

Replace ChatGPT Plus with your own instance. Persistent memory, image gen, live tools. $0 monthly.

Startups

White-label it as your product. Change the name with one env var. MIT license — no strings attached.

Agencies

Deploy for clients as a value-add service. Each client gets their own Supabase project, their own data.

Internal tools teams

HR policies, finance lookups, ops runbooks. RLS keeps per-user data separate. Admin health dashboard built in.

Non-technical founders

AI-assisted setup means you can deploy without writing code. Clone, let AI set it up, done.

Open source contributors

TypeScript, Next.js, Supabase — the most popular web stack. 90 tests, clean architecture, well-documented.

Ready to try it?

Clone the repo. Let AI set it up. Deploy your own AI assistant in 15 minutes.

Built by Sarma Linux · MIT License · v1.1.0