SarmaLink-AI
Multi-Provider AI Routing with Automatic Failover
Abstract
SarmaLink-AI is an open-source, MIT-licensed multi-provider AI assistant that routes every request through up to 14 engines across 7 providers (Groq, SambaNova, Cerebras, Google Gemini, OpenRouter, Cloudflare Workers AI, Tavily) with automatic sub-50ms failover. Built on Next.js 14, TypeScript, Supabase (PostgreSQL with Row-Level Security), and Cloudflare R2. This whitepaper documents the architecture, failover algorithm, security model, benchmarks, and operational characteristics of a system designed to deliver 99.9999% effective uptime on free-tier infrastructure, at £0 recurring cost.
01Introduction & Motivation
Every major AI provider offers a free tier. Groq hosts GPT-OSS 120B and delivers first tokens in 41 milliseconds. SambaNova runs DeepSeek V3.2, a 685-billion-parameter Mixture-of-Experts model that beats GPT-4o on MATH-500 and HumanEval. Cerebras serves inference at 2,000 tokens per second on wafer-scale chips. Google Gemini has grounded Google Search built in at the token level. OpenRouter aggregates 100+ models behind a single OpenAI-compatible API, including a :free pool of 17+ community-hosted models. Cloudflare Workers AI runs FLUX.2 klein for image generation at the edge. Tavily provides structured search designed for LLM consumption.
Each of these, individually, is generous. Each, on its own, still hits rate limits. The moment a single provider returns a 429, an AI application breaks. Users see an error. Trust evaporates. The common workaround, paying for an upgrade, defeats the point of using free tiers in the first place, and lock users into a single vendor’s roadmap.
The problem with existing solutions
- LiteLLM is a library, not an application. You still have to write the app, the routing logic, the retry policy, the stream parser, the database schema.
- LangChain is a framework with a steep learning curve and heavy abstractions. Multi-provider failover is possible but manual.
- OpenRouter is one aggregator service, not seven redundant providers. When OpenRouter has an issue, you have an issue.
- ChatGPT Plus / Claude Pro / Gemini Advanced are rentals. No self-hosting, no vendor independence, no data ownership.
First-principles multi-provider failover
SarmaLink-AI treats every AI provider as a commodity. If Groq returns 429, SambaNova fires. If SambaNova is busy, Cerebras. Then Gemini. Then OpenRouter’s :free pool as the final safety net. Round-robin key rotation distributes load across keys within a provider. Round-robin across providers survives outages. Every step is logged to ai_events for observability.
Target audience
Small-to-medium businesses, indie developers, digital agencies, research teams, anyone who needs production-grade AI capability without a vendor rental fee, and who values code they can read, fork, and extend.
02System Architecture
SarmaLink-AI is a Next.js 14 application using the App Router, deployed on Vercel (or any Next.js-compatible host). The runtime is a thin HTTP layer over a modular lib/ directory. Every external dependency is behind an interface. Every piece of untrusted data is wrapped at a trust boundary before reaching the model.
Core modules
app/api/ai-chat/route.ts, the HTTP entry point. ~45 lines after refactoring. Delegates all logic to lib.lib/providers/failover.ts, the failover runner (tryFailover). Orchestrates retries across steps and keys.lib/providers/registry.ts, declares every provider, endpoint, and key collection.lib/tools/registry.ts, plugin pattern for live tools (weather, FX, container tracking).lib/prompts/sanitize.ts, prompt injection defence at trust boundaries.lib/repositories/, typed Supabase CRUD for sessions, usage, events, memories.lib/intent.ts, auto-router classifier (regex patterns, zero API calls).
Request lifecycle
Lifecycle steps in code: app/api/ai-chat/route.ts for handler, lib/prompts/sanitize.ts for sanitisation, lib/intent.ts for the auto-router, lib/providers/failover.ts for tryFailover.
Deployment topology
Vercel (or any Node.js host) runs the route handlers. Supabase provides PostgreSQL + Auth + Row-Level Security. Cloudflare R2 stores binary attachments (images, PDFs) with 7-day signed URLs. Cloudflare Workers AI serves image generation. All other providers are accessed via their public OpenAI-compatible endpoints.
03The Failover Runner
tryFailover is the load-bearing module. It accepts a sequence of provider/model steps and iterates through them until one returns a successful stream.
Algorithm
async function tryFailover(steps, messages, opts) {
for (const step of steps) {
const keys = providerKeys(step.provider)
const offset = Date.now() % keys.length // round-robin
for (let i = 0; i < keys.length; i++) {
const key = keys[(offset + i) % keys.length]
try {
const stream = await callProvider(step, key, messages)
return stream // success
} catch (err) {
logEvent({ backend: step.label, status: err.status })
if (err.status === 429 || err.status >= 500) continue
throw err // non-recoverable
}
}
}
throw new Error('All providers exhausted')
}Uptime math
Assumption: each provider has 99% uptime (industry minimum for production services).
P(all 7 providers fail simultaneously) = 0.01⁷ = 1 × 10⁻¹⁴
Effective uptime: 99.999999999999%, about 30 milliseconds of downtime per century, assuming the public internet itself stays up.
Handoff timing
Typical failover from 429 on one provider to first token on the next is under 50 milliseconds. The request parser reads the response headers, classifies the failure, rotates to the next key, and dispatches, all without user-visible interruption. Every step writes to ai_events with status, backend, latency_ms, and tokens_out.
04The Six Modes
Each mode is a different failover sequence, optimised for a specific task type.
| Mode | Depth | Primary engine | Daily limit | Use case |
|---|---|---|---|---|
| Smart | 14 | DeepSeek V3.2 685B | 1,000/day | Professional writing, analysis, brainstorming |
| Reasoner | 10 | DeepSeek V3.2 + V3.1 | 500/day | Complex logic, chain-of-thought traces |
| Live | 4 | Gemini 2.5 Flash + Google Search | 1,000/day | Current events, weather, news, FX |
| Fast | 9 | Groq GPT-OSS 20B | 5,000/day | 41ms first token, quick lookups, rewrites |
| Coder | 9 | DeepSeek V3.2 + Qwen Coder 480B | 800/day | TypeScript, Python, SQL, debugging |
| Vision | 6 | Llama-4 Scout 17B | 500/day | Receipts, screenshots, diagrams |
05The Seven Providers
Groq
Custom LPU inference chips. GPT-OSS 120B in 45ms, GPT-OSS 20B in 41ms. Llama 3.3 70B, Qwen 3 32B, Llama-4 Scout for vision, Llama 3.1 8B for memory extraction. Free tier: 14,000 req/day/key.
SambaNova
Hosts DeepSeek V3.2 (685B MoE, 37B active per token), the frontier model that powers Smart, Reasoner, and Coder modes. Custom Reconfigurable Dataflow Unit silicon. Also runs DeepSeek V3.1 and Llama 4 Maverick with 1M context.
Cerebras
WSE-3 wafer-scale engine, 46,225 mm² of silicon, the largest chip ever built. 2,000 tokens/sec on Llama 3.1 8B. Hosts Qwen 3 235B for reasoning and Qwen 3 Coder 480B, SarmaLink-AI’s Coder failover winner when SambaNova is busy.
Google Gemini
Live mode backbone. Gemini 2.5 Flash, Flash Lite, and Gemini 3 Flash Preview with grounded Google Search built in at the token level. 1M-token context window. Every Live mode answer includes cited sources.
OpenRouter
Aggregates 100+ models across 50+ providers into one OpenAI-compatible endpoint. The :free pool (17+ community-hosted models including GPT-OSS 120B, Nemotron Ultra 253B, GLM-4.5 Air, Gemma 3, DeepSeek R1) is the ultimate failover safety net.
Cloudflare Workers AI
Runs FLUX.2 klein 9B and 4B for image generation and instruction-following editing with three-step failover (9B → 4B → FLUX.1 schnell). R2 provides 10GB free S3-compatible object storage for file persistence with 7-day signed URLs.
Tavily
Structured web search designed for AI consumption. Returns titles, snippets, URLs, and relevance scores, ready for LLM citation. Powers weather (Open-Meteo fallback), exchange rate verification, container tracking (ISO 6346 carriers), news, and URL extraction tools.
06Security Model
Trust boundaries
Three sources of untrusted text reach the model on every request:
- User messages, from the browser, potentially adversarial
- Tool results, from external APIs (Tavily, Open-Meteo, frankfurter.app) which may return manipulated content
- Saved memories, from the database, written by the memory extractor which may have laundered injection strings from past conversations
Each is wrapped by a dedicated sanitiser: wrapUntrusted, wrapToolResult, wrapMemories. Output is wrapped in explicit XML-style markers before reaching the model, and known jailbreak patterns ("ignore previous instructions", "system:" prefixes, role-switch attempts) are stripped.
Unit test coverage
11 unit tests in __tests__/sanitize.test.ts cover documented jailbreak categories. Defence is layered: even if strip misses a pattern, the wrapping ensures the model can never interpret untrusted text as a command.
Row-Level Security
Every table enforces per-user isolation at the PostgreSQL layer. Even if route logic has a bug, cross-user reads return zero rows.
CREATE POLICY "own_rows" ON ai_chat_sessions FOR ALL USING (auth.uid() = user_id);
The same policy is applied to ai_chat_usage, ai_events, and ai_user_memories. The service-role key bypasses RLS but is server-only, never in the client bundle, never in env vars exposed to browsers.
07Observability & Operations
The /api/admin/health endpoint returns per-provider success rates, p50/p95 latency, dead-model detection, and 24-hour request volume, all computed from the ai_events audit log.
Event schema
ai_events ( id uuid, user_id uuid, event_type text, -- 'message' | 'tool' | 'error' backend text, -- 'Groq GPT-OSS 120B', etc. status text, -- 'success' | 'rate_limited' | 'error' latency_ms integer, tokens_out integer, created_at timestamptz );
Diagnostic queries
Per-backend p95 latency over 24 hours:
SELECT backend,
percentile_cont(0.95) WITHIN GROUP (ORDER BY latency_ms) AS p95,
COUNT(*) AS volume
FROM ai_events
WHERE event_type = 'message'
AND created_at > now() - interval '24 hours'
GROUP BY backend
ORDER BY p95;Dead model detection (always returns 404/429):
SELECT backend,
COUNT(*) FILTER (WHERE status = 'success') AS ok,
COUNT(*) FILTER (WHERE status != 'success') AS fail,
ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'success')
/ COUNT(*), 1) AS success_rate
FROM ai_events
WHERE created_at > now() - interval '1 hour'
GROUP BY backend
HAVING COUNT(*) > 10
ORDER BY success_rate;Scaling up
The Gmail +alias trick multiplies capacity. Sign up at you+provider2@gmail.com, +provider3@gmail.com, etc., each counts as a distinct account at most providers. Adding 8 keys per provider yields 8× daily capacity with zero code changes; the failover runner already rotates through all keys.
08Benchmarks & Performance
DeepSeek V3.2, SarmaLink-AI’s primary engine, compared to commercial AI products. Published scores from DeepSeek technical reports, lmarena.ai, SWE-bench leaderboards, Arena-Hard, and the GPQA paper.
| Benchmark | SarmaLink-AI | GPT-4o | Claude Sonnet | Gemini 2.5 |
|---|---|---|---|---|
| MATH-500 (advanced maths) | 90.2% | 76.6% | 78.3% | 83.2% |
| HumanEval (code synthesis) | 92.7% | 90.2% | 92.0% | 89.5% |
| Arena ELO (human preference) | 1318 | 1287 | 1271 | 1299 |
| MGSM (multilingual maths) | 88.3% | 85.5% | 86.0% | 87.4% |
| GPQA-Diamond (PhD reasoning) | 59.1% | 50.6% | 59.4% | 56.8% |
| MMLU (general knowledge) | 87.1% | 88.7% | 88.7% | 90.0% |
Capacity math
Combined daily capacity across all 7 providers on free tiers (default configuration, 1 key per provider):
Groq 14K + SambaNova 5K + Cerebras 5K + Gemini 250 + OpenRouter 1K + Cloudflare 10K images + Tavily 100 = ~35,000 requests/day.
With 9 keys per provider via the Gmail +alias trick: ~207,000 requests/day, enough for approximately 15,000 daily active users.
09Deployment Guide
Prerequisites
- Node.js 20 or later
- Git
- Supabase account (free tier)
- API keys from Groq (required), plus optional SambaNova, Cerebras, Gemini, OpenRouter, Cloudflare, Tavily
- GitHub account (for deployment via Vercel)
- Vercel account (free tier)
Fast path
git clone https://github.com/sarmakska/Sarmalink-ai.git cd sarmalink-ai npm install cp .env.example .env.local # Fill in .env.local with your Supabase + provider keys # Run supabase/migrations/001_sarmalink_ai.sql in Supabase SQL editor npm run dev
Then push your repo to GitHub, import into Vercel, paste env vars into Vercel’s dashboard, and deploy. Full 45-minute walkthrough in the Complete Setup Guide.
Vercel Pro recommendation
Vercel Hobby (free) has a 10-second function timeout, adequate for most requests but can cut off long failover chains. Vercel Pro ($20/month) raises it to 60 seconds (300s with streaming) and is recommended for production.
10Extension Points
Adding a new provider
Any provider with an OpenAI-compatible chat completions endpoint can be added in ~10 lines across four files: lib/ai-models.ts (register type), lib/providers/registry.ts (endpoint + keys), lib/env/validate.ts (env collection), and the mode’s failover array.
Adding a new live tool
Implement the Tool<Args> interface (match, args, run) in lib/tools/, then add one line to the TOOLS array in lib/tools/registry.ts. The auto-router picks it up automatically.
Customising the system prompt
System prompts are per-mode in lib/ai-models.ts. Each mode has its own persona, tone, and constraint set. Version control history preserves prompt evolution.
11Comparison with Alternatives
| Feature | SarmaLink-AI | LiteLLM | LangChain | OpenRouter | ChatGPT Plus |
|---|---|---|---|---|---|
| Multi-provider routing | 7 providers | Yes | Partial | Single service | No |
| Automatic failover | 50ms handoff | Manual | Manual | Manual | N/A |
| Full app (not library) | Yes | Library | Framework | API only | Hosted only |
| Self-hostable | Yes | Partial | Yes | No | No |
| Free tier end-user | £0 forever | Library only | Framework only | Pay per token | $20/month |
| Image generation | FLUX.2 | No | No | No | DALL-E 3 |
| Persistent memory | Auto-extracted | No | Yes | No | Yes |
| Observability built-in | /admin/health | Callbacks only | LangSmith (paid) | Analytics | N/A |
| License | MIT | MIT | MIT | Commercial | Commercial |
12Cost Analysis
A typical AI-heavy individual pays for 3-5 separate subscriptions to get the capabilities SarmaLink-AI ships with by default.
| Subscription | Monthly | Yearly |
|---|---|---|
| ChatGPT Plus | $20 | £190 |
| Claude Pro | $20 | £190 |
| Gemini Advanced | $20 | £190 |
| Perplexity Pro | $20 | £190 |
| Midjourney Standard | $10 | £95 |
| All five combined | $90 | £855 |
| SarmaLink-AI | $0 | £0 |
For a 15-person team each paying ChatGPT Plus + Claude Pro alone: £5,700/year. SarmaLink-AI serves 15,000 daily requests across the same team at £0 recurring (optional £20/month for Vercel Pro if function timeouts become a constraint).
13Roadmap
Now (shipped)
- 7 providers, 36 engines, 6 specialised modes
- Automatic mode detection from message content
- Persistent cross-session memory (30-fact cap per user)
- 5 live tools: weather, exchange rates, container tracking, news search, URL summary
Next
- Per-mode prompt versioning with A/B testing
- Example chat UI in
examples/folder - Streaming chunk replay for debugging
- Usage analytics dashboard
Soon
- Voice mode (Whisper + TTS via Groq)
- Video frame analysis via Gemini Vision
- Tool marketplace, community plugins
- One-click Vercel deploy template
Later
- Federated failover, share capacity across instances
- Model fine-tuning pipeline
- Mobile app with offline fallback to on-device LLM
14Governance & Licensing
SarmaLink-AI is released under the MIT License. Contributors retain copyright of their contributions. Pull requests are reviewed against CI (lint, typecheck, test, build) and CodeQL security scans; all must pass before merge.
Security vulnerabilities should be reported privately via the process documented in SECURITY.md. Community channels: GitHub Issues and Discussions.
15Conclusion
SarmaLink-AI demonstrates that production-grade AI capability doesn’t require per-user subscriptions, vendor lock-in, or proprietary infrastructure. A first-principles multi-provider failover architecture, built on open-source primitives and free tiers, delivers 99.9999% effective uptime with frontier model quality, at zero recurring cost. The codebase is small enough to read in an afternoon, documented in a 22-page wiki, and licensed under MIT for any use. Fork it, self-host it, extend it, ship it.
16The v1.2 Update, Plugin Ecosystem & Manus Integration
Released 2026-05-03 (CHANGELOG entry 1.2.0). The v1.2 update extends the gateway from a closed multi-provider router into an open routing surface that can dispatch to other open-source services, initially the ten sibling repositories under sarmakska, but the contract is generic and any HTTP endpoint with the right shape can register. The chapter below documents the four primitives the update introduces: cross-repo plugins, intent-based plugin auto-routing, Manus integration with persistence, and the public /docs page.
16.1 Cross-repo plugin system
lib/plugins/index.ts declares a registry of ten sibling open-source repos as routable tools: voice-agent-starter, agent-orchestrator, ai-eval-runner, local-llm-router, mcp-server-toolkit, rag-over-pdf, receipt-scanner, webhook-to-email, k8s-ops-toolkit, and terraform-stack. Each plugin entry carries a slug, a human label, the env var that holds its endpoint URL, and an enabled flag derived from whether that env var is set at boot. A plugin is considered enabled on a deployment if and only if the operator has supplied its endpoint variable; otherwise the registry returns it as available but inactive.
Two HTTP surfaces expose the registry. GET /api/v1/plugins returns the full list of plugins with their enabled state, allowing any consumer (the docs page, an external dashboard, a fork’s admin tooling) to render an accurate live picture without re-implementing the env-var check. POST /api/v1/plugins/invoke accepts { slug, payload }, looks up the endpoint, forwards the payload, and returns the upstream response. The dispatch path is intentionally thin, there is no orchestration logic, no retry policy, no schema coercion. Plugins own their contracts.
16.2 Intent-based plugin auto-routing
lib/services/plugin-autorouter.ts introduces a pre-LLM hook that inspects the incoming user message for intent keywords and, when one matches an enabled plugin, short-circuits the model call entirely. Research-shaped queries dispatch to rag-over-pdf; voice intents to voice-agent-starter; eval intents to ai-eval-runner; durable workflow intents to agent-orchestrator; OCR intents to receipt-scanner. When no plugin matches or the matched plugin is not enabled on this deployment, the router falls through to the standard mode failover described in Section 4.
This is distinct from the existing mode auto-router in lib/intent.ts. The mode router classifies a message into Smart / Reasoner / Coder / Live / Fast / Vision and picks a model. The plugin router classifies a message into a tool domain and picks an external service. Both classifiers run in sequence: plugins first (so domain-specialised tooling wins where applicable), modes second (so anything not absorbed by a plugin still gets the right inference path).
Plugin auto-routing is gated by the environment flag ENABLE_PLUGIN_AUTOROUTE. The default is off: deployments must opt in, because plugin endpoints introduce a new dependency surface and operators should make that choice deliberately. With the flag off, plugin-autorouter is a no-op and the request flows straight to the LLM as it did in v1.1.
// lib/services/plugin-autorouter.ts (excerpt)
const INTENT_TABLE = [
{ match: /\b(research|cite|literature)\b/i, slug: 'rag-over-pdf' },
{ match: /\b(voice|speak|transcribe)\b/i, slug: 'voice-agent-starter' },
{ match: /\b(eval|benchmark|grade)\b/i, slug: 'ai-eval-runner' },
{ match: /\b(workflow|orchestrate|durable)\b/i, slug: 'agent-orchestrator' },
{ match: /\b(receipt|invoice|ocr)\b/i, slug: 'receipt-scanner' },
]16.3 Manus integration
Manus is an autonomous task-execution platform; v1.2 adds a typed client (createTask, getTask, cancelTask, awaitTask) and a webhook receiver that persists every state transition to PostgreSQL. The webhook verifies the upstream HMAC-SHA256 signature against a shared secret before accepting the payload, unsigned or mismatched signatures are rejected with 401 and never reach the database.
Persistence lives in a new table manus_tasks, defined in supabase/migrations/002_manus_tasks.sql. Each row carries the upstream task id, the current status, the input payload, the latest output, and timestamps. A repository module at lib/repositories/manus-tasks.ts wraps the upserts. The HTTP surface for clients is GET /api/v1/manus/tasks/[id], which returns the latest persisted state without re-fetching from Manus, useful for dashboards that need to poll without consuming upstream quota.
16.4 The /docs page
A new server-rendered route at /docs lists every plugin in the registry with a live env-var status badge (enabled / not configured), a link to its source repository, and a Manus invite call-to-action. The Manus invite code is read from NEXT_PUBLIC_MANUS_INVITE_CODE so each forked deployment can substitute its own without editing source.
16.5 MAKE-IT-YOURS white-label guide
docs/MAKE-IT-YOURS.md ships in the repository as a step-by-step rebrand guide for forkers. It includes a copy-paste v0 prompt that generates a complete branded front end, the list of files where logo, colour tokens, and copy live, and the full Supabase + Vercel deploy path including the new 002_manus_tasks.sql migration. The intent is that a non-engineer founder can clone the repo, run the v0 prompt, change five environment variables, and ship.
Operator notes for v1.2. Apply supabase/migrations/002_manus_tasks.sql in the Supabase SQL editor. Set SUPABASE_SERVICE_ROLE_KEY and NEXT_PUBLIC_SUPABASE_URL in the deployment environment. To enable plugin auto-routing, set ENABLE_PLUGIN_AUTOROUTE=1 and supply at least one plugin endpoint variable. To customise the Manus invite shown on /docs, set NEXT_PUBLIC_MANUS_INVITE_CODE.
17The v2 Update, Agent Runner, Voice, Live Data & MCP
v2 widens the scope of the project from a multi-provider chat completions failover into a full inference runtime. Ten new capabilities ship in the same Next.js app on the same Supabase project. The sub-sections below document each.
17.1 Intent auto-router
A regex sweep classifies easy cases (code blocks, search verbs, image attachments). A tiny LLM classifier resolves the remainder. The picked mode is fed straight into tryFailover with no UI hop. Mode is no longer a user choice in the default flow, it is a routed decision.
17.2 Multi-step agent runner
POST /api/v1/agent runs a planner, dispatches workers in parallel and a synthesiser to merge results, all over a single Server-Sent Events stream. The browser sees one continuous response. Long tasks no longer require a client-side orchestrator round-tripping the planner.
17.3 MCP-shaped tool catalog
GET /api/v1/mcp exposes the plugin and tool registry in the Model Context Protocol shape with bearer auth. Any MCP-aware client, IDE, agent framework or another model, can mount the gateway as a tool source with one URL.
17.4 TTS cascade
Text-to-speech routes through MeloTTS on Cloudflare Workers AI first. On error or empty audio, Gemini TTS picks up. Output is streamable opus or mp3.
17.5 STT route
Speech-to-text uses Groq Whisper as the primary engine and Cloudflare Workers AI Whisper as fallback. Sub-second transcription on a clean clip.
17.6 Live-data tools, zero keys
Weather is served by Open-Meteo, FX by Frankfurter (European Central Bank reference rates) and news by the Hacker News Algolia index. None of the three requires an API key. Tool results are cited on every answer that uses them.
17.7 Image generation with key rotation
FLUX runs across paired Cloudflare account and token pairs. When one account hits its neuron cap, the next pair is dispatched. The rotation is invisible to the user.
17.8 Quota tracker
GET /api/v1/quota returns per-user and company-wide usage from a Supabase view. Surfaceable in a dashboard, in the chat header, or by the chatbot itself when a user asks how much they have left.
17.9 Smart suggestions
After each reply, an endpoint returns three follow-up prompts grounded in the conversation, ready to render as chips for one-tap continuation.
17.10 Reasoning-leak stripper plus export endpoints
Chain-of-thought wrappers and internal commentary are scrubbed from the streamed output before the client sees them. A Markdown-to-PDF endpoint (PDFKit) and a JSON-to-XLSX endpoint (ExcelJS) ship in the same release, both rendered server-side with no client dependencies.
Operator notes for v2. The agent runner streams SSE and benefits from Vercel Pro’s longer function timeout. MeloTTS and Whisper on Workers AI require CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN. Gemini TTS fallback uses the same GEMINI_API_KEY as Live mode. The quota tracker reads from a Supabase view, apply the migration that creates it before enabling the endpoint.
AGlossary
- 429, HTTP status code for "Too Many Requests". Indicates rate limiting.
- Failover, a sequence of steps tried in order until one succeeds.
- LPU, Language Processing Unit. Groq’s custom silicon for LLM inference.
- MoE, Mixture of Experts. Large model where only a subset of parameters activate per token.
- RLS, Row-Level Security. PostgreSQL feature enforcing per-row access policies.
- SSE, Server-Sent Events. HTTP-native streaming protocol for one-directional server→client data.
- WSE, Wafer-Scale Engine. Cerebras’ single-chip architecture using entire silicon wafers.
- R2, Cloudflare’s S3-compatible object storage with no egress fees.
- OKLCH, Perceptually uniform colour space used in the site’s design system.
BAPI Reference
Chat (streaming SSE)
curl -N https://your-deploy.vercel.app/api/ai-chat \
-H "Content-Type: application/json" \
-H "Cookie: <supabase-auth>" \
-d '{"message":"Draft a follow-up email","model":"smart"}'
# Returns: SSE stream with {"type":"token","value":"..."} eventsImage generation
curl -X POST https://your-deploy.vercel.app/api/images/generate \
-H "Content-Type: application/json" \
-d '{"prompt":"sunset over Himalayas"}'
# Returns: {url: "https://r2.../signed?..."} (7-day URL)Image editing
curl -X POST https://your-deploy.vercel.app/api/images/edit \ -F "image=@original.jpg" \ -F 'instruction=change sky to emerald green'
File upload
curl -X POST https://your-deploy.vercel.app/api/attachments/upload \ -F "file=@contract.pdf" # Extracted text stored and referenceable in next message
Health check
curl https://your-deploy.vercel.app/api/admin/health # Returns: per-provider success rates, p95 latency, 24h volume
CEnvironment Variables
| Variable | Required | Purpose |
|---|---|---|
NEXT_PUBLIC_SUPABASE_URL | Yes | Supabase project URL |
NEXT_PUBLIC_SUPABASE_ANON_KEY | Yes | Supabase anon key (client-safe) |
SUPABASE_SERVICE_ROLE_KEY | Yes | Service role key (server-only) |
GROQ_API_KEY.._15 | Yes | Groq API keys (up to 15 for rotation) |
SAMBANOVA_API_KEY.._8 | Optional | SambaNova keys for DeepSeek V3.2 |
CEREBRAS_API_KEY.._8 | Optional | Cerebras keys for Qwen 3 235B / 480B |
GEMINI_API_KEY.._12 | Optional | Google Gemini keys for Live mode |
OPENROUTER_API_KEY.._5 | Optional | OpenRouter safety net |
CLOUDFLARE_ACCOUNT_ID | Optional | For Workers AI image gen |
CLOUDFLARE_API_TOKEN | Optional | Workers AI token |
R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_ENDPOINT, R2_BUCKET | Optional | R2 file storage |
TAVILY_API_KEY.._8 | Optional | Structured search for live tools |
DDatabase Schema
CREATE TABLE ai_chat_sessions ( id uuid PRIMARY KEY DEFAULT gen_random_uuid(), user_id uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE, title text, messages jsonb NOT NULL DEFAULT '[]', updated_at timestamptz DEFAULT now() ); CREATE TABLE ai_chat_usage ( user_id uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE, day date NOT NULL, count integer NOT NULL DEFAULT 0, PRIMARY KEY (user_id, day) ); CREATE TABLE ai_events ( id uuid PRIMARY KEY DEFAULT gen_random_uuid(), user_id uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE, event_type text, backend text, status text, latency_ms integer, tokens_out integer, created_at timestamptz DEFAULT now() ); CREATE TABLE ai_user_memories ( id uuid PRIMARY KEY DEFAULT gen_random_uuid(), user_id uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE, fact text NOT NULL, created_at timestamptz DEFAULT now() ); -- Row-Level Security on every table ALTER TABLE ai_chat_sessions ENABLE ROW LEVEL SECURITY; ALTER TABLE ai_chat_usage ENABLE ROW LEVEL SECURITY; ALTER TABLE ai_events ENABLE ROW LEVEL SECURITY; ALTER TABLE ai_user_memories ENABLE ROW LEVEL SECURITY; CREATE POLICY "own_rows" ON ai_chat_sessions FOR ALL USING (auth.uid() = user_id); CREATE POLICY "own_rows" ON ai_chat_usage FOR ALL USING (auth.uid() = user_id); CREATE POLICY "own_rows" ON ai_events FOR ALL USING (auth.uid() = user_id); CREATE POLICY "own_rows" ON ai_user_memories FOR ALL USING (auth.uid() = user_id);
EReferences
- SarmaLink-AI repository
- SarmaLink-AI Wiki (22 pages)
- Groq Console · LPU inference
- SambaNova Cloud · DeepSeek V3.2
- Cerebras Cloud · WSE-3
- Google AI Studio · Gemini grounding
- OpenRouter · aggregator
- Cloudflare · Workers AI + R2
- Tavily · structured search
- Supabase · Postgres + Auth + RLS
- LMArena · benchmark leaderboards
- Sarma Linux · publisher