Technical Whitepaper · v2.0

SarmaLink-AI

Multi-Provider AI Routing with Automatic Failover

MIT LicensedOpen SourceSelf-Hostable7 Providers6 ModesZero Lock-in

~207Kreq/day capacity

41msfirst token

~50msfailover handoff

10⁻¹⁴P(total outage)

Version 1.0.0 · April 2026 · Sai Sarma · Sarma Linux

github.com/sarmakska/Sarmalink-ai Wiki ← Back to product page

Abstract

SarmaLink-AI is an open-source, MIT-licensed multi-provider AI assistant that routes every request through up to 36 engines across 7 providers (Groq, SambaNova, Cerebras, Google Gemini, OpenRouter, Cloudflare Workers AI, Tavily) with automatic sub-50ms failover. Built on Next.js 14, TypeScript, Supabase (PostgreSQL with Row-Level Security), and Cloudflare R2. This whitepaper documents the architecture, failover algorithm, security model, benchmarks, and operational characteristics of a system designed to deliver 99.9999% effective uptime on free-tier infrastructure, at £0 recurring cost.

01Introduction & Motivation

Every major AI provider offers a free tier. Groq hosts GPT-OSS 120B and delivers first tokens in 41 milliseconds. SambaNova runs DeepSeek V3.2, a 685-billion-parameter Mixture-of-Experts model that beats GPT-4o on MATH-500 and HumanEval. Cerebras serves inference at 2,000 tokens per second on wafer-scale chips. Google Gemini has grounded Google Search built in at the token level. OpenRouter aggregates 100+ models behind a single OpenAI-compatible API, including a :free pool of 17+ community-hosted models. Cloudflare Workers AI runs FLUX.2 klein for image generation at the edge. Tavily provides structured search designed for LLM consumption.

Each of these, individually, is generous. Each, on its own, still hits rate limits. The moment a single provider returns a 429, an AI application breaks. Users see an error. Trust evaporates. The common workaround, paying for an upgrade, defeats the point of using free tiers in the first place, and lock users into a single vendor’s roadmap.

The problem with existing solutions

LiteLLM is a library, not an application. You still have to write the app, the routing logic, the retry policy, the stream parser, the database schema.
LangChain is a framework with a steep learning curve and heavy abstractions. Multi-provider failover is possible but manual.
OpenRouter is one aggregator service, not seven redundant providers. When OpenRouter has an issue, you have an issue.
ChatGPT Plus / Claude Pro / Gemini Advanced are rentals. No self-hosting, no vendor independence, no data ownership.

First-principles multi-provider failover

SarmaLink-AI treats every AI provider as a commodity. If Groq returns 429, SambaNova fires. If SambaNova is busy, Cerebras. Then Gemini. Then OpenRouter’s :free pool as the final safety net. Round-robin key rotation distributes load across keys within a provider. Round-robin across providers survives outages. Every step is logged to ai_events for observability.

Target audience

Small-to-medium businesses, indie developers, digital agencies, research teams, anyone who needs production-grade AI capability without a vendor rental fee, and who values code they can read, fork, and extend.

02System Architecture

SarmaLink-AI is a Next.js 14 application using the App Router, deployed on Vercel (or any Next.js-compatible host). The runtime is a thin HTTP layer over a modular lib/ directory. Every external dependency is behind an interface. Every piece of untrusted data is wrapped at a trust boundary before reaching the model.

Core modules

app/api/ai-chat/route.ts, the HTTP entry point. ~45 lines after refactoring. Delegates all logic to lib.
lib/providers/failover.ts, the failover runner (tryFailover). Orchestrates retries across steps and keys.
lib/providers/registry.ts, declares every provider, endpoint, and key collection.
lib/tools/registry.ts, plugin pattern for live tools (weather, FX, container tracking).
lib/prompts/sanitize.ts, prompt injection defence at trust boundaries.
lib/repositories/, typed Supabase CRUD for sessions, usage, events, memories.
lib/intent.ts, auto-router classifier (regex patterns, zero API calls).

Request lifecycle

rendering

Request lifecycle, browser to streamed tokens, with auth, RLS, sanitisation, routing and failover broken out as participants.

Lifecycle steps in code: app/api/ai-chat/route.ts for handler, lib/prompts/sanitize.ts for sanitisation, lib/intent.ts for the auto-router, lib/providers/failover.ts for tryFailover.

Deployment topology

Vercel (or any Node.js host) runs the route handlers. Supabase provides PostgreSQL + Auth + Row-Level Security. Cloudflare R2 stores binary attachments (images, PDFs) with 7-day signed URLs. Cloudflare Workers AI serves image generation. All other providers are accessed via their public OpenAI-compatible endpoints.

03The Failover Runner

tryFailover is the load-bearing module. It accepts a sequence of provider/model steps and iterates through them until one returns a successful stream.

Algorithm

async function tryFailover(steps, messages, opts) {
  for (const step of steps) {
    const keys = providerKeys(step.provider)
    const offset = Date.now() % keys.length  // round-robin
    for (let i = 0; i < keys.length; i++) {
      const key = keys[(offset + i) % keys.length]
      try {
        const stream = await callProvider(step, key, messages)
        return stream  // success
      } catch (err) {
        logEvent({ backend: step.label, status: err.status })
        if (err.status === 429 || err.status >= 500) continue
        throw err  // non-recoverable
      }
    }
  }
  throw new Error('All providers exhausted')
}

Uptime math

Assumption: each provider has 99% uptime (industry minimum for production services).
P(all 7 providers fail simultaneously) = 0.01⁷ = 1 × 10⁻¹⁴
Effective uptime: 99.999999999999%, about 30 milliseconds of downtime per century, assuming the public internet itself stays up.

Handoff timing

Typical failover from 429 on one provider to first token on the next is under 50 milliseconds. The request parser reads the response headers, classifies the failure, rotates to the next key, and dispatches, all without user-visible interruption. Every step writes to ai_events with status, backend, latency_ms, and tokens_out.

04The Six Modes

Each mode is a different failover sequence, optimised for a specific task type.

Mode	Depth	Primary engine	Daily limit	Use case
Smart	14	DeepSeek V3.2 685B	1,000/day	Professional writing, analysis, brainstorming
Reasoner	10	DeepSeek V3.2 + V3.1	500/day	Complex logic, chain-of-thought traces
Live	4	Gemini 2.5 Flash + Google Search	1,000/day	Current events, weather, news, FX
Fast	9	Groq GPT-OSS 20B	5,000/day	41ms first token, quick lookups, rewrites
Coder	9	DeepSeek V3.2 + Qwen Coder 480B	800/day	TypeScript, Python, SQL, debugging
Vision	6	Llama-4 Scout 17B	500/day	Receipts, screenshots, diagrams

05The Seven Providers

Groq

Custom LPU inference chips. GPT-OSS 120B in 45ms, GPT-OSS 20B in 41ms. Llama 3.3 70B, Qwen 3 32B, Llama-4 Scout for vision, Llama 3.1 8B for memory extraction. Free tier: 14,000 req/day/key.

SambaNova

Hosts DeepSeek V3.2 (685B MoE, 37B active per token), the frontier model that powers Smart, Reasoner, and Coder modes. Custom Reconfigurable Dataflow Unit silicon. Also runs DeepSeek V3.1 and Llama 4 Maverick with 1M context.

Cerebras

WSE-3 wafer-scale engine, 46,225 mm² of silicon, the largest chip ever built. 2,000 tokens/sec on Llama 3.1 8B. Hosts Qwen 3 235B for reasoning and Qwen 3 Coder 480B, SarmaLink-AI’s Coder failover winner when SambaNova is busy.

Google Gemini

Live mode backbone. Gemini 2.5 Flash, Flash Lite, and Gemini 3 Flash Preview with grounded Google Search built in at the token level. 1M-token context window. Every Live mode answer includes cited sources.

OpenRouter

Aggregates 100+ models across 50+ providers into one OpenAI-compatible endpoint. The :free pool (17+ community-hosted models including GPT-OSS 120B, Nemotron Ultra 253B, GLM-4.5 Air, Gemma 3, DeepSeek R1) is the ultimate failover safety net.

Cloudflare Workers AI

Runs FLUX.2 klein 9B and 4B for image generation and instruction-following editing with three-step failover (9B → 4B → FLUX.1 schnell). R2 provides 10GB free S3-compatible object storage for file persistence with 7-day signed URLs.

Tavily

Structured web search designed for AI consumption. Returns titles, snippets, URLs, and relevance scores, ready for LLM citation. Powers weather (Open-Meteo fallback), exchange rate verification, container tracking (ISO 6346 carriers), news, and URL extraction tools.

06Security Model

Trust boundaries

Three sources of untrusted text reach the model on every request:

User messages, from the browser, potentially adversarial
Tool results, from external APIs (Tavily, Open-Meteo, frankfurter.app) which may return manipulated content
Saved memories, from the database, written by the memory extractor which may have laundered injection strings from past conversations

Each is wrapped by a dedicated sanitiser: wrapUntrusted, wrapToolResult, wrapMemories. Output is wrapped in explicit XML-style markers before reaching the model, and known jailbreak patterns ("ignore previous instructions", "system:" prefixes, role-switch attempts) are stripped.

Unit test coverage

11 unit tests in __tests__/sanitize.test.ts cover documented jailbreak categories. Defence is layered: even if strip misses a pattern, the wrapping ensures the model can never interpret untrusted text as a command.

Row-Level Security

Every table enforces per-user isolation at the PostgreSQL layer. Even if route logic has a bug, cross-user reads return zero rows.

CREATE POLICY "own_rows" ON ai_chat_sessions
  FOR ALL USING (auth.uid() = user_id);

The same policy is applied to ai_chat_usage, ai_events, and ai_user_memories. The service-role key bypasses RLS but is server-only, never in the client bundle, never in env vars exposed to browsers.

07Observability & Operations

The /api/admin/health endpoint returns per-provider success rates, p50/p95 latency, dead-model detection, and 24-hour request volume, all computed from the ai_events audit log.

Event schema

ai_events (
  id          uuid,
  user_id     uuid,
  event_type  text,      -- 'message' | 'tool' | 'error'
  backend     text,      -- 'Groq GPT-OSS 120B', etc.
  status      text,      -- 'success' | 'rate_limited' | 'error'
  latency_ms  integer,
  tokens_out  integer,
  created_at  timestamptz
);

Diagnostic queries

Per-backend p95 latency over 24 hours:

SELECT backend,
       percentile_cont(0.95) WITHIN GROUP (ORDER BY latency_ms) AS p95,
       COUNT(*) AS volume
FROM ai_events
WHERE event_type = 'message'
  AND created_at > now() - interval '24 hours'
GROUP BY backend
ORDER BY p95;

Dead model detection (always returns 404/429):

SELECT backend,
       COUNT(*) FILTER (WHERE status = 'success') AS ok,
       COUNT(*) FILTER (WHERE status != 'success') AS fail,
       ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'success')
             / COUNT(*), 1) AS success_rate
FROM ai_events
WHERE created_at > now() - interval '1 hour'
GROUP BY backend
HAVING COUNT(*) > 10
ORDER BY success_rate;

Scaling up

The Gmail +alias trick multiplies capacity. Sign up at you+provider2@gmail.com, +provider3@gmail.com, etc., each counts as a distinct account at most providers. Adding 8 keys per provider yields 8× daily capacity with zero code changes; the failover runner already rotates through all keys.

08Benchmarks & Performance

DeepSeek V3.2, SarmaLink-AI’s primary engine, compared to commercial AI products. Published scores from DeepSeek technical reports, lmarena.ai, SWE-bench leaderboards, Arena-Hard, and the GPQA paper.

Benchmark	SarmaLink-AI	GPT-4o	Claude Sonnet	Gemini 2.5
MATH-500 (advanced maths)	90.2%	76.6%	78.3%	83.2%
HumanEval (code synthesis)	92.7%	90.2%	92.0%	89.5%
Arena ELO (human preference)	1318	1287	1271	1299
MGSM (multilingual maths)	88.3%	85.5%	86.0%	87.4%
GPQA-Diamond (PhD reasoning)	59.1%	50.6%	59.4%	56.8%
MMLU (general knowledge)	87.1%	88.7%	88.7%	90.0%

Capacity math

Combined daily capacity across all 7 providers on free tiers (default configuration, 1 key per provider):
Groq 14K + SambaNova 5K + Cerebras 5K + Gemini 250 + OpenRouter 1K + Cloudflare 10K images + Tavily 100 = ~35,000 requests/day.
With 9 keys per provider via the Gmail +alias trick: ~207,000 requests/day, enough for approximately 15,000 daily active users.

09Deployment Guide

Prerequisites

Node.js 20 or later
Git
Supabase account (free tier)
API keys from Groq (required), plus optional SambaNova, Cerebras, Gemini, OpenRouter, Cloudflare, Tavily
GitHub account (for deployment via Vercel)
Vercel account (free tier)

Fast path

git clone https://github.com/sarmakska/Sarmalink-ai.git
cd sarmalink-ai
npm install
cp .env.example .env.local
# Fill in .env.local with your Supabase + provider keys
# Run supabase/migrations/001_sarmalink_ai.sql in Supabase SQL editor
npm run dev

Then push your repo to GitHub, import into Vercel, paste env vars into Vercel’s dashboard, and deploy. Full 45-minute walkthrough in the Complete Setup Guide.

Vercel Pro recommendation

Vercel Hobby (free) has a 10-second function timeout, adequate for most requests but can cut off long failover chains. Vercel Pro ($20/month) raises it to 60 seconds (300s with streaming) and is recommended for production.

10Extension Points

Adding a new provider

Any provider with an OpenAI-compatible chat completions endpoint can be added in ~10 lines across four files: lib/ai-models.ts (register type), lib/providers/registry.ts (endpoint + keys), lib/env/validate.ts (env collection), and the mode’s failover array.

Adding a new live tool

Implement the Tool<Args> interface (match, args, run) in lib/tools/, then add one line to the TOOLS array in lib/tools/registry.ts. The auto-router picks it up automatically.

Customising the system prompt

System prompts are per-mode in lib/ai-models.ts. Each mode has its own persona, tone, and constraint set. Version control history preserves prompt evolution.

11Comparison with Alternatives

Feature	SarmaLink-AI	LiteLLM	LangChain	OpenRouter	ChatGPT Plus
Multi-provider routing	7 providers	Yes	Partial	Single service	No
Automatic failover	50ms handoff	Manual	Manual	Manual	N/A
Full app (not library)	Yes	Library	Framework	API only	Hosted only
Self-hostable	Yes	Partial	Yes	No	No
Free tier end-user	£0 forever	Library only	Framework only	Pay per token	$20/month
Image generation	FLUX.2	No	No	No	DALL-E 3
Persistent memory	Auto-extracted	No	Yes	No	Yes
Observability built-in	/admin/health	Callbacks only	LangSmith (paid)	Analytics	N/A
License	MIT	MIT	MIT	Commercial	Commercial

12Cost Analysis

A typical AI-heavy individual pays for 3-5 separate subscriptions to get the capabilities SarmaLink-AI ships with by default.

Subscription	Monthly	Yearly
ChatGPT Plus	$20	£190
Claude Pro	$20	£190
Gemini Advanced	$20	£190
Perplexity Pro	$20	£190
Midjourney Standard	$10	£95
All five combined	$90	£855
SarmaLink-AI	$0	£0

For a 15-person team each paying ChatGPT Plus + Claude Pro alone: £5,700/year. SarmaLink-AI serves 15,000 daily requests across the same team at £0 recurring (optional £20/month for Vercel Pro if function timeouts become a constraint).

13Roadmap

Now (shipped)

7 providers, 36 engines, 6 specialised modes
Automatic mode detection from message content
Persistent cross-session memory (30-fact cap per user)
5 live tools: weather, exchange rates, container tracking, news search, URL summary

Per-mode prompt versioning with A/B testing
Example chat UI in examples/ folder
Streaming chunk replay for debugging
Usage analytics dashboard

Soon

Voice mode (Whisper + TTS via Groq)
Video frame analysis via Gemini Vision
Tool marketplace, community plugins
One-click Vercel deploy template

Later

Federated failover, share capacity across instances
Model fine-tuning pipeline
Mobile app with offline fallback to on-device LLM

14Governance & Licensing

SarmaLink-AI is released under the MIT License. Contributors retain copyright of their contributions. Pull requests are reviewed against CI (lint, typecheck, test, build) and CodeQL security scans; all must pass before merge.

Security vulnerabilities should be reported privately via the process documented in SECURITY.md. Community channels: GitHub Issues and Discussions.

15Conclusion

SarmaLink-AI demonstrates that production-grade AI capability doesn’t require per-user subscriptions, vendor lock-in, or proprietary infrastructure. A first-principles multi-provider failover architecture, built on open-source primitives and free tiers, delivers 99.9999% effective uptime with frontier model quality, at zero recurring cost. The codebase is small enough to read in an afternoon, documented in a 22-page wiki, and licensed under MIT for any use. Fork it, self-host it, extend it, ship it.

Star the repo Read the wiki Back to product page

16The v1.2 Update, Plugin Ecosystem & Manus Integration

Released 2026-05-03 (CHANGELOG entry 1.2.0). The v1.2 update extends the gateway from a closed multi-provider router into an open routing surface that can dispatch to other open-source services, initially the ten sibling repositories under sarmakska, but the contract is generic and any HTTP endpoint with the right shape can register. The chapter below documents the four primitives the update introduces: cross-repo plugins, intent-based plugin auto-routing, Manus integration with persistence, and the public /docs page.

16.1 Cross-repo plugin system

lib/plugins/index.ts declares a registry of ten sibling open-source repos as routable tools: voice-agent-starter, agent-orchestrator, ai-eval-runner, local-llm-router, mcp-server-toolkit, rag-over-pdf, receipt-scanner, webhook-to-email, k8s-ops-toolkit, and terraform-stack. Each plugin entry carries a slug, a human label, the env var that holds its endpoint URL, and an enabled flag derived from whether that env var is set at boot. A plugin is considered enabled on a deployment if and only if the operator has supplied its endpoint variable; otherwise the registry returns it as available but inactive.

Two HTTP surfaces expose the registry. GET /api/v1/plugins returns the full list of plugins with their enabled state, allowing any consumer (the docs page, an external dashboard, a fork’s admin tooling) to render an accurate live picture without re-implementing the env-var check. POST /api/v1/plugins/invoke accepts { slug, payload }, looks up the endpoint, forwards the payload, and returns the upstream response. The dispatch path is intentionally thin, there is no orchestration logic, no retry policy, no schema coercion. Plugins own their contracts.

rendering

Plugin dispatch path. The registry is the single source of truth for which sibling repos are reachable on this deployment.

16.2 Intent-based plugin auto-routing

lib/services/plugin-autorouter.ts introduces a pre-LLM hook that inspects the incoming user message for intent keywords and, when one matches an enabled plugin, short-circuits the model call entirely. Research-shaped queries dispatch to rag-over-pdf; voice intents to voice-agent-starter; eval intents to ai-eval-runner; durable workflow intents to agent-orchestrator; OCR intents to receipt-scanner. When no plugin matches or the matched plugin is not enabled on this deployment, the router falls through to the standard mode failover described in Section 4.

This is distinct from the existing mode auto-router in lib/intent.ts. The mode router classifies a message into Smart / Reasoner / Coder / Live / Fast / Vision and picks a model. The plugin router classifies a message into a tool domain and picks an external service. Both classifiers run in sequence: plugins first (so domain-specialised tooling wins where applicable), modes second (so anything not absorbed by a plugin still gets the right inference path).

Plugin auto-routing is gated by the environment flag ENABLE_PLUGIN_AUTOROUTE. The default is off: deployments must opt in, because plugin endpoints introduce a new dependency surface and operators should make that choice deliberately. With the flag off, plugin-autorouter is a no-op and the request flows straight to the LLM as it did in v1.1.

// lib/services/plugin-autorouter.ts (excerpt)
const INTENT_TABLE = [
  { match: /\b(research|cite|literature)\b/i, slug: 'rag-over-pdf' },
  { match: /\b(voice|speak|transcribe)\b/i,   slug: 'voice-agent-starter' },
  { match: /\b(eval|benchmark|grade)\b/i,     slug: 'ai-eval-runner' },
  { match: /\b(workflow|orchestrate|durable)\b/i, slug: 'agent-orchestrator' },
  { match: /\b(receipt|invoice|ocr)\b/i,      slug: 'receipt-scanner' },
]

16.3 Manus integration

Manus is an autonomous task-execution platform; v1.2 adds a typed client (createTask, getTask, cancelTask, awaitTask) and a webhook receiver that persists every state transition to PostgreSQL. The webhook verifies the upstream HMAC-SHA256 signature against a shared secret before accepting the payload, unsigned or mismatched signatures are rejected with 401 and never reach the database.

Persistence lives in a new table manus_tasks, defined in supabase/migrations/002_manus_tasks.sql. Each row carries the upstream task id, the current status, the input payload, the latest output, and timestamps. A repository module at lib/repositories/manus-tasks.ts wraps the upserts. The HTTP surface for clients is GET /api/v1/manus/tasks/[id], which returns the latest persisted state without re-fetching from Manus, useful for dashboards that need to poll without consuming upstream quota.

rendering

Manus integration with HMAC-verified webhook persistence. Clients poll the gateway, never the upstream.

16.4 The /docs page

A new server-rendered route at /docs lists every plugin in the registry with a live env-var status badge (enabled / not configured), a link to its source repository, and a Manus invite call-to-action. The Manus invite code is read from NEXT_PUBLIC_MANUS_INVITE_CODE so each forked deployment can substitute its own without editing source.

16.5 MAKE-IT-YOURS white-label guide

docs/MAKE-IT-YOURS.md ships in the repository as a step-by-step rebrand guide for forkers. It includes a copy-paste v0 prompt that generates a complete branded front end, the list of files where logo, colour tokens, and copy live, and the full Supabase + Vercel deploy path including the new 002_manus_tasks.sql migration. The intent is that a non-engineer founder can clone the repo, run the v0 prompt, change five environment variables, and ship.

Operator notes for v1.2. Apply supabase/migrations/002_manus_tasks.sql in the Supabase SQL editor. Set SUPABASE_SERVICE_ROLE_KEY and NEXT_PUBLIC_SUPABASE_URL in the deployment environment. To enable plugin auto-routing, set ENABLE_PLUGIN_AUTOROUTE=1 and supply at least one plugin endpoint variable. To customise the Manus invite shown on /docs, set NEXT_PUBLIC_MANUS_INVITE_CODE.

17The v2 Update, Agent Runner, Voice, Live Data & MCP

v2 widens the scope of the project from a multi-provider chat completions failover into a full inference runtime. Ten new capabilities ship in the same Next.js app on the same Supabase project. The sub-sections below document each.

17.1 Intent auto-router

A regex sweep classifies easy cases (code blocks, search verbs, image attachments). A tiny LLM classifier resolves the remainder. The picked mode is fed straight into tryFailover with no UI hop. Mode is no longer a user choice in the default flow, it is a routed decision.

17.2 Multi-step agent runner

POST /api/v1/agent runs a planner, dispatches workers in parallel and a synthesiser to merge results, all over a single Server-Sent Events stream. The browser sees one continuous response. Long tasks no longer require a client-side orchestrator round-tripping the planner.

rendering

Multi-step agent runner. Planner, workers, synthesiser, all server-side, one SSE stream to the client.

17.3 MCP-shaped tool catalog

GET /api/v1/mcp exposes the plugin and tool registry in the Model Context Protocol shape with bearer auth. Any MCP-aware client, IDE, agent framework or another model, can mount the gateway as a tool source with one URL.

17.4 TTS cascade

Text-to-speech routes through MeloTTS on Cloudflare Workers AI first. On error or empty audio, Gemini TTS picks up. Output is streamable opus or mp3.

17.5 STT route

Speech-to-text uses Groq Whisper as the primary engine and Cloudflare Workers AI Whisper as fallback. Sub-second transcription on a clean clip.

17.6 Live-data tools, zero keys

Weather is served by Open-Meteo, FX by Frankfurter (European Central Bank reference rates) and news by the Hacker News Algolia index. None of the three requires an API key. Tool results are cited on every answer that uses them.

17.7 Image generation with key rotation

FLUX runs across paired Cloudflare account and token pairs. When one account hits its neuron cap, the next pair is dispatched. The rotation is invisible to the user.

17.8 Quota tracker

GET /api/v1/quota returns per-user and company-wide usage from a Supabase view. Surfaceable in a dashboard, in the chat header, or by the chatbot itself when a user asks how much they have left.

17.9 Smart suggestions

After each reply, an endpoint returns three follow-up prompts grounded in the conversation, ready to render as chips for one-tap continuation.

17.10 Reasoning-leak stripper plus export endpoints

Chain-of-thought wrappers and internal commentary are scrubbed from the streamed output before the client sees them. A Markdown-to-PDF endpoint (PDFKit) and a JSON-to-XLSX endpoint (ExcelJS) ship in the same release, both rendered server-side with no client dependencies.

rendering

v2 architecture end to end. One auto-router, two runtime paths (mode or agent), cascades on the side, leak stripper and quota tracker before the response leaves.

Operator notes for v2. The agent runner streams SSE and benefits from Vercel Pro’s longer function timeout. MeloTTS and Whisper on Workers AI require CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN. Gemini TTS fallback uses the same GEMINI_API_KEY as Live mode. The quota tracker reads from a Supabase view, apply the migration that creates it before enabling the endpoint.

AGlossary

429, HTTP status code for "Too Many Requests". Indicates rate limiting.
Failover, a sequence of steps tried in order until one succeeds.
LPU, Language Processing Unit. Groq’s custom silicon for LLM inference.
MoE, Mixture of Experts. Large model where only a subset of parameters activate per token.
RLS, Row-Level Security. PostgreSQL feature enforcing per-row access policies.
SSE, Server-Sent Events. HTTP-native streaming protocol for one-directional server→client data.
WSE, Wafer-Scale Engine. Cerebras’ single-chip architecture using entire silicon wafers.
R2, Cloudflare’s S3-compatible object storage with no egress fees.
OKLCH, Perceptually uniform colour space used in the site’s design system.

BAPI Reference

Chat (streaming SSE)

curl -N https://your-deploy.vercel.app/api/ai-chat \
  -H "Content-Type: application/json" \
  -H "Cookie: <supabase-auth>" \
  -d '{"message":"Draft a follow-up email","model":"smart"}'
# Returns: SSE stream with {"type":"token","value":"..."} events

Image generation

curl -X POST https://your-deploy.vercel.app/api/images/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"sunset over Himalayas"}'
# Returns: {url: "https://r2.../signed?..."}  (7-day URL)

Image editing

curl -X POST https://your-deploy.vercel.app/api/images/edit \
  -F "image=@original.jpg" \
  -F 'instruction=change sky to emerald green'

File upload

curl -X POST https://your-deploy.vercel.app/api/attachments/upload \
  -F "file=@contract.pdf"
# Extracted text stored and referenceable in next message

Health check

curl https://your-deploy.vercel.app/api/admin/health
# Returns: per-provider success rates, p95 latency, 24h volume

CEnvironment Variables

Variable	Required	Purpose
`NEXT_PUBLIC_SUPABASE_URL`	Yes	Supabase project URL
`NEXT_PUBLIC_SUPABASE_ANON_KEY`	Yes	Supabase anon key (client-safe)
`SUPABASE_SERVICE_ROLE_KEY`	Yes	Service role key (server-only)
`GROQ_API_KEY`..`_15`	Yes	Groq API keys (up to 15 for rotation)
`SAMBANOVA_API_KEY`..`_8`	Optional	SambaNova keys for DeepSeek V3.2
`CEREBRAS_API_KEY`..`_8`	Optional	Cerebras keys for Qwen 3 235B / 480B
`GEMINI_API_KEY`..`_12`	Optional	Google Gemini keys for Live mode
`OPENROUTER_API_KEY`..`_5`	Optional	OpenRouter safety net
`CLOUDFLARE_ACCOUNT_ID`	Optional	For Workers AI image gen
`CLOUDFLARE_API_TOKEN`	Optional	Workers AI token
`R2_ACCESS_KEY_ID`, `R2_SECRET_ACCESS_KEY`, `R2_ENDPOINT`, `R2_BUCKET`	Optional	R2 file storage
`TAVILY_API_KEY`..`_8`	Optional	Structured search for live tools

DDatabase Schema

CREATE TABLE ai_chat_sessions (
  id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id     uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  title       text,
  messages    jsonb NOT NULL DEFAULT '[]',
  updated_at  timestamptz DEFAULT now()
);

CREATE TABLE ai_chat_usage (
  user_id     uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  day         date NOT NULL,
  count       integer NOT NULL DEFAULT 0,
  PRIMARY KEY (user_id, day)
);

CREATE TABLE ai_events (
  id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id     uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  event_type  text,
  backend     text,
  status      text,
  latency_ms  integer,
  tokens_out  integer,
  created_at  timestamptz DEFAULT now()
);

CREATE TABLE ai_user_memories (
  id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id     uuid NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  fact        text NOT NULL,
  created_at  timestamptz DEFAULT now()
);

-- Row-Level Security on every table
ALTER TABLE ai_chat_sessions ENABLE ROW LEVEL SECURITY;
ALTER TABLE ai_chat_usage     ENABLE ROW LEVEL SECURITY;
ALTER TABLE ai_events         ENABLE ROW LEVEL SECURITY;
ALTER TABLE ai_user_memories  ENABLE ROW LEVEL SECURITY;

CREATE POLICY "own_rows" ON ai_chat_sessions
  FOR ALL USING (auth.uid() = user_id);
CREATE POLICY "own_rows" ON ai_chat_usage
  FOR ALL USING (auth.uid() = user_id);
CREATE POLICY "own_rows" ON ai_events
  FOR ALL USING (auth.uid() = user_id);
CREATE POLICY "own_rows" ON ai_user_memories
  FOR ALL USING (auth.uid() = user_id);

EReferences

SarmaLink-AI repository
SarmaLink-AI Wiki (22 pages)
Groq Console · LPU inference
SambaNova Cloud · DeepSeek V3.2
Cerebras Cloud · WSE-3
Google AI Studio · Gemini grounding
OpenRouter · aggregator
Cloudflare · Workers AI + R2
Tavily · structured search
Supabase · Postgres + Auth + RLS
LMArena · benchmark leaderboards
Sarma Linux · publisher