Technical Deep Dive

How SarmaLink-AI works

The complete breakdown. How 36 engines across 7 providers deliver 99.9999% uptime at zero cost. Why every technology was chosen. How it compares to everything else.

View Source Code →Product Page

The Problem

Every AI app has
a single point of failure.

You build an app on top of OpenAI. It works beautifully for 3 months. Then one day GPT-4o returns a 429 error. Your users see a broken page. Your support inbox fills up.

You switch to Anthropic. Works for 2 months. Then Claude goes down for maintenance. Same story.

The problem is not that providers are unreliable. They are remarkably reliable for 99% of the time. The problem is that the 1% is unpredictable, and your users experience 100% of it.

OpenAI

Rate limit (429)

User sees error

Anthropic

Maintenance window

App down 2 hours

Google

API quota exhausted

Chat stops working

Groq

Capacity spike

Slow or failed requests

This happens to every single-provider app

The Solution

Chain every provider together.
Fail over in 50ms.

Instead of depending on one provider, SarmaLink-AI treats every provider as a commodity. If one is busy, the next fires instantly. Users never see errors.

User sends: "Draft a follow-up email to the supplier" Auto-router detects: professional writing → Smart mode (36-engine failover) Step 1 · SambaNova · DeepSeek V3.2 (685B) → Key 3 (round-robin rotation across 8 keys) → POST /v1/chat/completions → 200 OK · First token in 820ms → Streaming SSE chunks to browser... → Done. 380 tokens in 2.1 seconds. User sees a polished email. No errors. No delay. If Step 1 had returned 429, Step 2 would fire in 47ms. If all 14 steps fail, the user sees a rate-limit message. That has never happened in production.

How Failover Works

What happens when a provider fails

A real trace from production. The primary engine returned 429. The user never knew.

User: "Fix: 'Type Date is not assignable to string'" Auto-router detects TypeScript error → Coder mode (9-engine failover) Step 1 · SambaNova · DeepSeek V3.2 685B → rotate to key 3/8 (round-robin) → POST /v1/chat/completions → 429 Too Many Requests (quota exceeded this minute) → logEvent(status: 'rate_limited', latency: 38ms) ↓ 47ms later Step 2 · Cerebras · Qwen 3 Coder 480B → rotate to key 1/4 → POST /v1/chat/completions (streaming) → 200 OK · First token in 94ms → Streaming... → Done. 340 tokens in 1.4 seconds total. User saw: a correct TypeScript fix in 1.4 seconds. User experienced: zero errors, zero retries, zero friction. What actually happened: a 685B model was busy, a 480B model took over. The handoff took 47 milliseconds.

milliseconds to fail over

Faster than a human blink (150ms)

maximum failover steps

Smart mode, the deepest chain

errors users have ever seen

From a full-chain exhaustion

v2 release · ten new capabilities

From a failover gateway to a full runtime

v2 widens the gateway from a chat completions failover into an agent runtime, a voice stack, a live-data layer and a quota-aware tool catalog. Every piece ships on the same Next.js app, same Supabase project, same MIT licence.

Intent auto-router

Regex sweep plus a tiny LLM classifier picks the mode per message. Smart, Reasoner, Coder, Fast, Live, Vision, all auto-selected, no UI toggle required.

Multi-step agent runner

POST /api/v1/agent runs planner, workers and synthesiser server-side, streams every step over SSE.

MCP-shaped catalog

Bearer-protected /api/v1/mcp exposes list_tools and plugins in the Model Context Protocol shape. Any MCP-aware client mounts the gateway with one URL.

TTS cascade

MeloTTS on Cloudflare Workers AI first, Gemini TTS as fallback. Streamable opus or mp3 audio.

STT route

Groq Whisper primary, Cloudflare Workers AI fallback. Sub-second transcription on a clean clip.

Zero-key live tools

Open-Meteo for weather, Frankfurter (ECB) for FX, HN Algolia for news. No tokens, no per-call cost.

FLUX with key rotation

Image generation across paired Cloudflare account and token pairs. When one cap hits, the next pair fires.

Quota tracker

GET /api/v1/quota returns per-user and company-wide counters from a Supabase view.

Smart suggestions

After each reply, three grounded follow-up prompts ready to chip into the UI.

Reasoning-leak stripper

Chain-of-thought wrappers and internal commentary scrubbed from the stream before the client sees them.

Markdown to PDF, JSON to XLSX

PDFKit and ExcelJS endpoints turn any answer into a print-ready PDF or a structured spreadsheet.

rendering

v2 architecture. Auto-router → agent or mode failover → cascades and tools on the side → leak stripper, quota tracker, suggestions, export endpoints.

v1.2, Released 2026-05-03

Now it routes to other open-source repos too

The v1.1 router picked an LLM. The v1.2 router can pick an entire sibling repo first, research goes to rag-over-pdf, voice goes to voice-agent-starter, evals go to ai-eval-runner. The LLM is the fallback, not the default.

Cross-repo plugins

lib/plugins/index.ts registers ten sibling repos as routable tools: voice-agent-starter, agent-orchestrator, ai-eval-runner, local-llm-router, mcp-server-toolkit, rag-over-pdf, receipt-scanner, webhook-to-email, k8s-ops-toolkit, terraform-stack.

Each plugin has one env var that holds its endpoint. If you set RAG_OVER_PDF_URL on your deployment, the rag plugin lights up. If you don’t, it stays dormant. List everything live at GET /api/v1/plugins; dispatch via POST /api/v1/plugins/invoke.

Intent auto-routing

lib/services/plugin-autorouter.ts runs before the LLM. It scans the user’s message for intent keywords and, when one matches an enabled plugin, dispatches there directly. No model call, no token cost, no failover chain, just the right tool for the job.

Distinct from lib/intent.ts, which picks a mode (Smart / Reasoner / Coder / Live / Fast / Vision). The new router picks a plugin. Both can fire, plugin first, mode second. Gated by ENABLE_PLUGIN_AUTOROUTE (off by default).

User: "find me research on retrieval-augmented generation latency" Plugin auto-router scans message → intent: research → match table: /\b(research|cite|literature)\b/i → slug: rag-over-pdf → registry lookup: RAG_OVER_PDF_URL is set → enabled → dispatch POST {RAG_OVER_PDF_URL}/query → 200 OK · streamed citations + chunks → result streamed straight back to browser No LLM call. No failover chain. The right tool fired directly. If RAG_OVER_PDF_URL was unset, the message would fall through to the standard mode router and hit Smart mode like in v1.1.

rendering

Two-stage routing in v1.2. The plugin router gets first refusal; modes still pick up everything it doesn't claim.

Manus integration with persistent state

The typed Manus client (createTask, getTask, cancelTask, awaitTask) wraps the upstream task API. The webhook receiver verifies HMAC-SHA256 signatures before accepting any payload, unsigned requests get 401'd, never reach the database. Every state transition is upserted into manus_tasks via the migration at supabase/migrations/002_manus_tasks.sql. Clients poll GET /api/v1/manus/tasks/[id] to read the latest state without burning upstream quota.

// Client kicks off a long-running Manus task POST /api/v1/manus/tasks → { id: "task_abc", status: "queued" } → upsert into manus_tasks // Time passes. Manus pings the webhook with progress updates. POST /api/v1/manus/webhook (X-Signature: HMAC-SHA256) → verify signature against MANUS_WEBHOOK_SECRET → valid → upsert(task_abc, status: "running") → invalid → 401, drop payload // Dashboard polls without round-tripping to Manus. GET /api/v1/manus/tasks/task_abc → SELECT latest FROM manus_tasks WHERE id = 'task_abc' → { status: "completed", output: {...} }

/docs page

A server-rendered route lists all ten plugins with live env-var status badges (enabled / not configured), repo links, and a Manus invite call-to-action. The invite code is read from NEXT_PUBLIC_MANUS_INVITE_CODE, so every fork can drop in its own without touching source.

MAKE-IT-YOURS guide

Why we use it

Vite-native test runner, 90 tests in 800ms. Same module resolution as the Next.js build. No config file needed. Jest compatibility mode means most patterns transfer directly.

Why not the alternative

Jest, slower startup, requires separate ts-jest config, doesn't share Vite's module graph. For a project with path aliases (@/lib/...) and TypeScript, Vitest is drop-in.

Honest Comparison

SarmaLink-AI vs everything else

No marketing spin. What each option does well, what it doesn’t, and an honest verdict.

ChatGPT Plus

$20/monthHosted AI chat by OpenAI

Strengths

+Best-in-class models (GPT-4o, o1)
+Polished UI
+Plugins ecosystem

Weaknesses

-One provider, if OpenAI is down, you're down
-No self-hosting
-No data ownership
-No failover
-Can't customise the system prompt
-$240/year per user

Great product. But you're renting it. You can't read the code, can't host it yourself, can't control your data. When OpenAI raises prices or deprecates a feature, you eat it.

LibreChat

Free (open source)Multi-provider chat UI

Strengths

+Open source
+Supports multiple providers
+Plugin system
+Good UI

Weaknesses

-Requires Docker to deploy
-No automatic failover, if your selected provider fails, the message fails
-No built-in live tools
-No persistent memory
-Complex config

The closest alternative. But no failover means a 429 from OpenAI = user sees an error. Requires Docker knowledge to deploy. No AI-assisted setup.

OpenWebUI

Free (open source)Self-hosted ChatGPT UI for Ollama/OpenAI

Strengths

+Beautiful UI
+Ollama integration for local models
+Active community

Weaknesses

-Designed for local models, cloud provider support is secondary
-No multi-provider failover
-Requires Docker
-Python backend, different ecosystem from web apps
-No live tools

Best choice if you want local LLMs on your own GPU. But if you want cloud providers with failover, it's not designed for that.

LobeChat

Free (open source)Modern chat UI with plugin marketplace

Strengths

+Beautiful design
+Plugin marketplace
+Supports many providers

Weaknesses

-No automatic failover
-Client-side API calls expose keys in the browser
-No server-side auth or RLS
-No persistent memory across sessions

Pretty UI, but API keys live in the browser. No server-side security. Fine for personal use, risky for teams.

LiteLLM

Free (library)Python library for multi-provider LLM calls

Strengths

+Supports 100+ providers
+Unified API
+Good fallback config

Weaknesses

-It's a library, not an app, you still have to build everything else
-Python only
-No UI, no auth, no database, no streaming, no deployment

Excellent library. But it gives you the failover engine and nothing else. No auth, no database, no streaming UI, no memory, no tools. You're building an app from scratch.

SarmaLink-AI

Free (open source)Full-stack AI assistant with automatic failover

Strengths

+36 engines, 7 providers, automatic failover in <50ms
+Full app, auth, database, RLS, streaming, memory, tools
+AI-assisted setup, non-developers can deploy in 15 min
+White-label via env vars, zero code changes
+TypeScript + Next.js, the web's most popular stack
+MIT license

Weaknesses

-Newer project, smaller community than LibreChat/OpenWebUI
-No local model support yet (cloud providers only)
-Opinionated stack (Next.js + Supabase + Vercel)

The only option that gives you failover + full app + AI setup + white-labeling out of the box.

What No One Else Does

Let AI set up your AI.

Every open-source AI project requires Docker, terminal commands, and 45 minutes of documentation reading. SarmaLink-AI ships with a setup skill that any AI coding tool can use.

Every other open-source AI project

xRead a 200-line README

xInstall Docker

xRun docker-compose up

xDebug port conflicts

xManually create database tables

xCopy 15+ environment variables

xFigure out which keys are required vs optional

xDebug build errors alone

xGoogle the error messages

xGive up and use ChatGPT instead

SarmaLink-AI

+Clone the repo

+Open in Claude Code (or Cursor, Copilot, ChatGPT)

+Say "help me set up"

+AI walks you through creating free accounts

+AI creates your .env.local

+AI runs the database migration

+AI tests every API key

+AI builds and deploys

+15 minutes. Zero terminal knowledge.

+Your AI assistant is live.

The Economics

How it costs $0

Every provider in the stack offers a free tier. Combined, they serve thousands of requests per day. No credit card needed for any of them.

Groq

Free tier: 14,000 req/day per key

Keys: Unlimited keys via Gmail aliases

126,000+ req/day

SambaNova

Free tier: 5,000 req/day per key

Keys: 8 keys

40,000 req/day

Cerebras

Free tier: 5,000 req/day per key

Keys: 4 keys

20,000 req/day

Google Gemini

Free tier: 250 req/day per key

Keys: 12 keys

3,000 req/day

OpenRouter

Free tier: 1,000 req/day (:free)

Keys: 5 keys

5,000 req/day

Cloudflare

Free tier: 10,000 neurons/day

Keys: Workers AI free tier

10,000 images/day

207,000+

combined requests per day, enough for ~15,000 daily active users

vs ChatGPT Plus: $20/user/month × 15,000 users = $300,000/month

Who It’s For

Built for builders

Solo developers

Replace ChatGPT Plus with your own instance. Persistent memory, image gen, live tools. $0 monthly.

Startups

White-label it as your product. Change the name with one env var. MIT license, no strings attached.

Agencies

Deploy for clients as a value-add service. Each client gets their own Supabase project, their own data.

Internal tools teams

HR policies, finance lookups, ops runbooks. RLS keeps per-user data separate. Admin health dashboard built in.

Non-technical founders

AI-assisted setup means you can deploy without writing code. Clone, let AI set it up, done.

Open source contributors

TypeScript, Next.js, Supabase, the most popular web stack. 90 tests, clean architecture, well-documented.

Ready to try it?

Clone the repo. Let AI set it up. Deploy your own AI assistant in 15 minutes.

View on GitHub →Read Whitepaper Product Page Join Discord

Built by Sarma Linux · MIT License · v2.0.0

How SarmaLink-AI works

Every AI app hasa single point of failure.

Chain every provider together.Fail over in 50ms.

What happens when a provider fails

From a failover gateway to a full runtime

Intent auto-router

Multi-step agent runner

MCP-shaped catalog

TTS cascade

STT route

Zero-key live tools

FLUX with key rotation

Quota tracker

Smart suggestions

Reasoning-leak stripper

Markdown to PDF, JSON to XLSX

Now it routes to other open-source repos too

Why this, not that

Next.js 14 (App Router)

TypeScript

Supabase (PostgreSQL)

Vercel

SSE (Server-Sent Events)

Cloudflare R2

Vitest

SarmaLink-AI vs everything else

ChatGPT Plus

LibreChat

OpenWebUI

LobeChat

LiteLLM

SarmaLink-AI

Let AI set up your AI.

How it costs $0

Groq

SambaNova

Cerebras

Google Gemini

OpenRouter

Cloudflare

Built for builders

Solo developers

Startups

Agencies

Internal tools teams

Non-technical founders

Open source contributors

Ready to try it?

Every AI app has
a single point of failure.

Chain every provider together.
Fail over in 50ms.