How SarmaLink-AI works
The complete breakdown. How 36 engines across 7 providers deliver 99.9999% uptime at zero cost. Why every technology was chosen. How it compares to everything else.
Every AI app has
a single point of failure.
You build an app on top of OpenAI. It works beautifully for 3 months. Then one day GPT-4o returns a 429 error. Your users see a broken page. Your support inbox fills up.
You switch to Anthropic. Works for 2 months. Then Claude goes down for maintenance. Same story.
The problem is not that providers are unreliable. They are remarkably reliable for 99% of the time. The problem is that the 1% is unpredictable, and your users experience 100% of it.
Chain every provider together.
Fail over in 50ms.
Instead of depending on one provider, SarmaLink-AI treats every provider as a commodity. If one is busy, the next fires instantly. Users never see errors.
What happens when a provider fails
A real trace from production. The primary engine returned 429. The user never knew.
From a failover gateway to a full runtime
v2 widens the gateway from a chat completions failover into an agent runtime, a voice stack, a live-data layer and a quota-aware tool catalog. Every piece ships on the same Next.js app, same Supabase project, same MIT licence.
Intent auto-router
Regex sweep plus a tiny LLM classifier picks the mode per message. Smart, Reasoner, Coder, Fast, Live, Vision, all auto-selected, no UI toggle required.
Multi-step agent runner
POST /api/v1/agent runs planner, workers and synthesiser server-side, streams every step over SSE.
MCP-shaped catalog
Bearer-protected /api/v1/mcp exposes list_tools and plugins in the Model Context Protocol shape. Any MCP-aware client mounts the gateway with one URL.
TTS cascade
MeloTTS on Cloudflare Workers AI first, Gemini TTS as fallback. Streamable opus or mp3 audio.
STT route
Groq Whisper primary, Cloudflare Workers AI fallback. Sub-second transcription on a clean clip.
Zero-key live tools
Open-Meteo for weather, Frankfurter (ECB) for FX, HN Algolia for news. No tokens, no per-call cost.
FLUX with key rotation
Image generation across paired Cloudflare account and token pairs. When one cap hits, the next pair fires.
Quota tracker
GET /api/v1/quota returns per-user and company-wide counters from a Supabase view.
Smart suggestions
After each reply, three grounded follow-up prompts ready to chip into the UI.
Reasoning-leak stripper
Chain-of-thought wrappers and internal commentary scrubbed from the stream before the client sees them.
Markdown to PDF, JSON to XLSX
PDFKit and ExcelJS endpoints turn any answer into a print-ready PDF or a structured spreadsheet.
Now it routes to other open-source repos too
The v1.1 router picked an LLM. The v1.2 router can pick an entire sibling repo first, research goes to rag-over-pdf, voice goes to voice-agent-starter, evals go to ai-eval-runner. The LLM is the fallback, not the default.
lib/plugins/index.ts registers ten sibling repos as routable tools: voice-agent-starter, agent-orchestrator, ai-eval-runner, local-llm-router, mcp-server-toolkit, rag-over-pdf, receipt-scanner, webhook-to-email, k8s-ops-toolkit, terraform-stack.
Each plugin has one env var that holds its endpoint. If you set RAG_OVER_PDF_URL on your deployment, the rag plugin lights up. If you don’t, it stays dormant. List everything live at GET /api/v1/plugins; dispatch via POST /api/v1/plugins/invoke.
lib/services/plugin-autorouter.ts runs before the LLM. It scans the user’s message for intent keywords and, when one matches an enabled plugin, dispatches there directly. No model call, no token cost, no failover chain, just the right tool for the job.
Distinct from lib/intent.ts, which picks a mode (Smart / Reasoner / Coder / Live / Fast / Vision). The new router picks a plugin. Both can fire, plugin first, mode second. Gated by ENABLE_PLUGIN_AUTOROUTE (off by default).
The typed Manus client (createTask, getTask, cancelTask, awaitTask) wraps the upstream task API. The webhook receiver verifies HMAC-SHA256 signatures before accepting any payload, unsigned requests get 401'd, never reach the database. Every state transition is upserted into manus_tasks via the migration at supabase/migrations/002_manus_tasks.sql. Clients poll GET /api/v1/manus/tasks/[id] to read the latest state without burning upstream quota.
A server-rendered route lists all ten plugins with live env-var status badges (enabled / not configured), repo links, and a Manus invite call-to-action. The invite code is read from NEXT_PUBLIC_MANUS_INVITE_CODE, so every fork can drop in its own without touching source.
docs/MAKE-IT-YOURS.md is the white-label rebrand guide. Includes a copy-paste v0 prompt that generates a full branded front end, the list of files where logo / colour tokens / copy live, and the full Supabase + Vercel deploy path. A non-engineer founder can clone, run the prompt, change five env vars, and ship.
Why this, not that
Every technology in the stack was chosen for a reason. Here is the reasoning, and what was rejected.
Next.js 14 (App Router)
Server components keep API keys server-side. App Router gives file-based routing, streaming responses, and edge deployment. The entire backend is API routes, no separate server needed.
Express.js, requires a separate server process, no built-in SSR, no edge deployment. Would need two repos (frontend + backend) instead of one.
TypeScript
Catches provider API shape changes at compile time. When SambaNova changes their response format, TypeScript finds every broken call site before users do. 90 tests + strict types = confidence to ship fast.
JavaScript, runtime errors from undefined properties are the #1 cause of "it worked on my machine." In a multi-provider system with 7 different API shapes, that is suicide.
Supabase (PostgreSQL)
Auth, database, and Row-Level Security in one service. RLS means even if my code has a bug, PostgreSQL refuses to serve User A's conversations to User B. The free tier gives 1GB storage and unlimited API calls.
Firebase, NoSQL makes cross-user queries (admin health, usage analytics) painful. No row-level security enforcement at the database level. Firestore's pricing model punishes high-read workloads like chat.
Vercel
Zero-config deployment for Next.js. Push to GitHub, live in 60 seconds. Edge functions for the streaming endpoints. Generous free tier (100GB bandwidth). But the app runs on ANY Next.js host, not locked in.
AWS / GCP, 50x more configuration for the same result. ECS, ALB, Route53, ACM, CloudFront, or one git push to Vercel. For a solo dev shipping an open-source project, complexity is the enemy.
SSE (Server-Sent Events)
One-way streaming from server to client, exactly what chat needs. Works through every proxy, CDN, and firewall. Native browser support with EventSource. No library needed on the client.
WebSockets, bidirectional, but chat only needs server→client streaming. WebSockets break through many corporate proxies, require connection management, and don't work on Vercel Edge. Overkill.
Cloudflare R2
S3-compatible object storage with zero egress fees. 10GB free. Stores uploaded PDFs, Excel files, and generated images. Signed URLs expire in 7 days for security.
AWS S3, egress fees add up fast when serving images. Vercel Blob, limited free tier and vendor-locked. R2 is S3-compatible, so migration is a one-line endpoint change.
Vitest
Vite-native test runner, 90 tests in 800ms. Same module resolution as the Next.js build. No config file needed. Jest compatibility mode means most patterns transfer directly.
Jest, slower startup, requires separate ts-jest config, doesn't share Vite's module graph. For a project with path aliases (@/lib/...) and TypeScript, Vitest is drop-in.
SarmaLink-AI vs everything else
No marketing spin. What each option does well, what it doesn’t, and an honest verdict.
ChatGPT Plus
$20/monthHosted AI chat by OpenAI- +Best-in-class models (GPT-4o, o1)
- +Polished UI
- +Plugins ecosystem
- -One provider, if OpenAI is down, you're down
- -No self-hosting
- -No data ownership
- -No failover
- -Can't customise the system prompt
- -$240/year per user
LibreChat
Free (open source)Multi-provider chat UI- +Open source
- +Supports multiple providers
- +Plugin system
- +Good UI
- -Requires Docker to deploy
- -No automatic failover, if your selected provider fails, the message fails
- -No built-in live tools
- -No persistent memory
- -Complex config
OpenWebUI
Free (open source)Self-hosted ChatGPT UI for Ollama/OpenAI- +Beautiful UI
- +Ollama integration for local models
- +Active community
- -Designed for local models, cloud provider support is secondary
- -No multi-provider failover
- -Requires Docker
- -Python backend, different ecosystem from web apps
- -No live tools
LobeChat
Free (open source)Modern chat UI with plugin marketplace- +Beautiful design
- +Plugin marketplace
- +Supports many providers
- -No automatic failover
- -Client-side API calls expose keys in the browser
- -No server-side auth or RLS
- -No persistent memory across sessions
LiteLLM
Free (library)Python library for multi-provider LLM calls- +Supports 100+ providers
- +Unified API
- +Good fallback config
- -It's a library, not an app, you still have to build everything else
- -Python only
- -No UI, no auth, no database, no streaming, no deployment
SarmaLink-AI
Free (open source)Full-stack AI assistant with automatic failover- +36 engines, 7 providers, automatic failover in <50ms
- +Full app, auth, database, RLS, streaming, memory, tools
- +AI-assisted setup, non-developers can deploy in 15 min
- +White-label via env vars, zero code changes
- +TypeScript + Next.js, the web's most popular stack
- +MIT license
- -Newer project, smaller community than LibreChat/OpenWebUI
- -No local model support yet (cloud providers only)
- -Opinionated stack (Next.js + Supabase + Vercel)
Let AI set up your AI.
Every open-source AI project requires Docker, terminal commands, and 45 minutes of documentation reading. SarmaLink-AI ships with a setup skill that any AI coding tool can use.
How it costs $0
Every provider in the stack offers a free tier. Combined, they serve thousands of requests per day. No credit card needed for any of them.
Groq
SambaNova
Cerebras
Google Gemini
OpenRouter
Cloudflare
Built for builders
Solo developers
Replace ChatGPT Plus with your own instance. Persistent memory, image gen, live tools. $0 monthly.
Startups
White-label it as your product. Change the name with one env var. MIT license, no strings attached.
Agencies
Deploy for clients as a value-add service. Each client gets their own Supabase project, their own data.
Internal tools teams
HR policies, finance lookups, ops runbooks. RLS keeps per-user data separate. Admin health dashboard built in.
Non-technical founders
AI-assisted setup means you can deploy without writing code. Clone, let AI set it up, done.
Open source contributors
TypeScript, Next.js, Supabase, the most popular web stack. 90 tests, clean architecture, well-documented.
Ready to try it?
Clone the repo. Let AI set it up. Deploy your own AI assistant in 15 minutes.
Built by Sarma Linux · MIT License · v2.0.0