Open Source Β· MIT License Β· v1.1.0

SarmaLink-AI

An open-source AI assistant that routes every message through up to 14 engines across 7 providers. If one is at capacity, the next fires in under 50 milliseconds. Powered by DeepSeek V3.2 (685 billion parameters), Google Gemini 3, GPT-OSS 120B, and 33 more engines. Built by Sarma Linux.

36
AI Engines
7
Providers
14
Max Failover
41ms
Fastest
685B
Primary Model

Why this exists

Every major AI provider offers a free tier. Groq hosts GPT-OSS 120B. SambaNova runs DeepSeek V3.2 (685B parameters). Cerebras does 2,000 tokens per second on their WSE-3 chip. Google Gemini has grounded Google Search built in. Each is individually generous. Each on its own still hits rate limits.

The moment a single provider returns a 429, the app breaks. Users see an error. They lose trust. The common workaround β€” paying for an upgrade β€” defeats the point of using free tiers in the first place.

SarmaLink-AI chains every free tier together. If Groq is busy, SambaNova fires. If SambaNova is busy, Cerebras. Then Gemini. Then OpenRouter's free model pool as the final safety net. Users never see errors β€” they always get an answer, from whichever engine is available.

How it routes your message

Six real questions. The auto-router picks the mode. The failover picks the engine. The green check shows which model actually answered β€” live as of today.

β€œDraft a polite rejection email for a late supplier delivery.”
✨
Smart
auto-routed
βœ“
SambaNova Β· DeepSeek V3.2 685B
200 OK
820ms first token380 tokens out
Primary engine β€” 685B MoE frontier model
β€œDoes GDPR Article 17 apply to database backups?”
🧠
Reasoner
auto-routed
βœ“
SambaNova Β· DeepSeek V3.2 (reasoning)
200 OK
4.2s total Β· thinking trace shown1,240 tokens out
Collapsible chain-of-thought panel
β€œWhat's the weather in Singapore right now?”
πŸ”΄
Live
auto-routed
πŸ”§
Auto-router β†’ weather tool detected
Open-Meteo API
βœ“
Gemini 2.5 Flash Lite (formatter)
200 OK
680ms end-to-end95 tokens out
Tool runs before model β€” no LLM round-trip needed for data
β€œWhat's the synonym for 'utilise'?”
⚑
Fast
auto-routed
βœ“
Groq Β· GPT-OSS 20B (LPU)
200 OK
41ms first token12 tokens out
Fastest route β€” Groq LPU chip
β€œFix: 'Type Date is not assignable to string'”
πŸ’»
Coder
auto-routed
‡
SambaNova Β· DeepSeek V3.2
429 rate-limited
βœ“
Cerebras Β· Qwen 3 Coder 480B
200 OK
1.4s total (47ms failover)340 tokens out
Failover kicked in β€” user saw zero error
β€œ[receipt.jpg uploaded] What was the total?”
πŸ‘
Vision
auto-routed
βœ“
Groq Β· Llama-4 Scout 17B (vision)
200 OK
1.1s end-to-end85 tokens out
Auto-activates on image upload

Deep-dive: a request that needed the failover

The Coder example above β€” full trace of what happens when the primary engine is rate-limited and the system hands off to the next.

User: "Fix: 'Type Date is not assignable to string'"
   ↓
Auto-router: detects TypeScript error pattern
   β†’ mode = Coder (9-engine failover)
   ↓
Step 1 Β· SambaNova Β· DeepSeek V3.2 685B
   β†’ rotate to key 3 (round-robin, 8 keys total)
   β†’ POST /v1/chat/completions
   β†’ 429 Too Many Requests (quota exceeded this minute)
   β†’ logEvent(status: 'rate_limited', latency: 38ms)
   ↓ 47ms later
Step 2 Β· Cerebras Β· Qwen 3 Coder 480B
   β†’ rotate to key 1
   β†’ POST /v1/chat/completions (streaming)
   β†’ 200 OK Β· first token in 94ms
   β†’ streaming SSE chunks to client
   ↓
Response streamed in 1.4 seconds total
   β†’ Backend label: "Cerebras Qwen 3 Coder 480B"
   β†’ logEvent(status: 'success', latency_ms: 1403, tokens_out: 340)
   ↓
Memory extractor (fire-and-forget)
   β†’ runs Llama 3.1 8B on Groq after session save
   β†’ no new facts extracted (code-only context)
   ↓
Session persisted to Supabase Β· RLS scoped to auth.uid()
Every step written to ai_events Β· queryable via /api/admin/health

Full walkthrough with Mermaid diagrams: How Failover Works Β· Architecture Diagrams

Six specialised modes

Each mode is backed by a different failover of engines, optimised for a specific type of task. The auto-router picks the right one β€” or users choose manually.

✨

Smart

1,000/day Β· 14-engine failover
DeepSeek V3.2 (685B MoE)

Professional emails, summaries, deep analysis, brainstorming. Primary engine outscores GPT-4o on MATH-500 (90.2% vs 76.6%) and HumanEval (92.7% vs 90.2%). 14 engines failover across 4 providers.

🧠

Reasoner

500/day Β· 10-engine failover
DeepSeek V3.2 + V3.1

Complex logic, multi-step maths, legal reasoning, strategy. Shows its thinking process in a collapsible panel β€” click to follow the chain of thought. 10-engine failover.

πŸ”΄

Live

1,000/day Β· 4-engine failover
Gemini 2.5 Flash + Google Search

Real-time web search grounded in Google. Current news, weather, exchange rates, container tracking, sports scores. Sources cited at the bottom of every answer.

⚑

Fast

5,000/day Β· 9-engine failover
Groq GPT-OSS 20B (41ms)

First token in 41 milliseconds. Quick lookups, one-liner rewrites, simple questions. 9-engine failover means practically unlimited capacity.

πŸ’»

Coder

800/day Β· 9-engine failover
DeepSeek V3.2

TypeScript, Python, SQL, HTML/CSS. Spots bugs, writes tests, refactors legacy code. DeepSeek V3.2 topped the SWE-bench coding leaderboard.

πŸ‘

Vision

500/day Β· 6-engine failover
Llama-4 Scout 17B

Reads photos, screenshots, receipts, diagrams. Auto-activates on image upload. FLUX.2 klein edits images with natural language instructions.

Deep dive on every mode: The 6 Modes β†’

Built-in features

Everything below works out of the box. Clone, add your keys, deploy.

πŸ”„

Smart Failover

Up to 14 engines per mode. If one returns 429 or 5xx, the next fires in under 50 milliseconds. Round-robin key rotation spreads load so no single connection is hit first.

🧠

Persistent Memory

After each conversation, a cheap model (Llama 3.1 8B) extracts key facts β€” name, role, preferences, projects. Those facts are injected into every future chat. Works across all modes and model switches.

🎯

Auto-Router

Regex-based intent classifier detects code, web search, quick questions, deep reasoning, and vision from the message text. Routes to the right mode instantly β€” zero API calls, zero latency.

🎨

Image Gen & Editing

FLUX.2 klein 9B generates images from text in ~1.5 seconds. Upload an image and say "change to emerald green" β€” it actually changes the colour (verified by a second AI model). Failover: 9B β†’ 4B β†’ FLUX.1-schnell.

πŸ’±

Live Exchange Rates

Powered by frankfurter.app (European Central Bank data). 13+ currencies, instant conversion. "Convert 5000 GBP to EUR" β†’ real-time answer. No API key required.

🌀

Weather Anywhere

Open-Meteo β€” global coverage, 3-day forecast, auto-geocoding. "Weather in Milan" β†’ current temp, humidity, wind, UV, and forecast. No API key required.

πŸ“¦

Container Tracking

Auto-detects shipping carrier from container prefix (ISO 6346 database β€” 25+ prefixes). Searches Tavily for live status. Generates direct tracking links for Maersk, MSC, CMA CGM, Hapag-Lloyd, COSCO, Evergreen, ONE, Yang Ming, ZIM.

πŸ“Ž

Document Analysis

Upload PDFs, Excel spreadsheets, Word documents β€” up to 10 per conversation. Text extracted via Gemini Vision (PDF) or server-side libraries (xlsx, mammoth). Files persist in Cloudflare R2 across messages.

πŸ’¬

50 Saved Conversations

Each user gets 50 conversation slots. Oldest auto-deleted when the limit is reached. Thinking traces and backend model labels are saved so you can see which engine answered each message.

πŸŒ“

Dark & Light Mode

Full theme support with CSS variables. Markdown rendering includes syntax-highlighted code blocks, tables, lists, and images β€” all theme-aware.

πŸ”

Prompt Injection Defence

Every external input β€” user messages, tool results, saved memories β€” is wrapped in explicit untrusted markers. Known jailbreak patterns are stripped before reaching the model. Tool outputs never execute as instructions.

πŸ“Š

Observability Endpoint

The /api/admin/health route exposes per-provider success rates, p50/p95 latency, dead-model detection, and 24-hour volume. Built from the ai_events audit log that records every failover step.

Use cases

What people actually build with this.

Personal Assistant

Replace ChatGPT Plus. Unlimited practical capacity across 6 modes. Persistent memory across sessions. Β£0 monthly cost.

Team Internal Tools

HR policies, finance lookups, ops runbooks. Every request logged to your own database. RLS keeps per-user data separate.

Customer Support Backbone

Plug the SSE streaming API into any frontend. Auto-router surfaces the right mode without user selection. Sources cited for regulated-industry compliance.

Research & Reasoning

DeepSeek V3.2 on heavy maths, GPQA, PhD-level questions. Reasoner mode exposes chain-of-thought traces you can audit.

Code Generation & Review

Coder mode with SWE-bench 42%. Paste a diff, ask for bugs or refactors. TypeScript, Python, SQL, Go, Rust.

Document Intelligence

Upload contracts, invoices, spreadsheets, PDFs. Ask questions in natural language. Text extraction runs server-side before the model sees it.