AI Engineering
I build AI systems that run in production, multi-provider LLM gateways with automatic failover, RAG pipelines grounded in your data, eval harnesses that measure quality, and voice agents with sub-second latency.
At a glance
- Backed by public open-source code, not just a description on a page.
- Long-form essays on the same topics, with sources cited.
- Production patterns the same hiring team can lift straight into their stack.
About Sarma
Sarma is a UK-based software engineer running Sarmalinux as a one-person studio. He ships nineteen open-source repositories spanning LLM gateways, coding agents, inference, storage engines and consensus, and writes long-form engineering essays at sarmalinux.com/blog. Senior IC, end to end.
Most AI integrations break in one of three ways: they hit a single provider that goes down, they answer confidently with hallucinated facts, or they can't be measured so nobody knows if they are getting worse. I have shipped SarmaLink-AI, a 14-engine, 7-provider gateway with sub-50ms failover, as well as a production MCP server toolkit, a local-LLM router, a WebRTC voice agent, and a DuckDB-backed eval runner. When I take on AI work for a client I bring the same rigour: provider redundancy, grounding, evals, and observable production instrumentation.
What this covers in practice
Multi-provider LLM gateway
Design and deploy an OpenAI-compatible gateway that routes across providers, OpenAI, Groq, Anthropic, Mistral, Ollama, and more, with automatic failover. SarmaLink-AI is the production reference.
RAG pipelines
Retrieval-augmented generation grounded in your documents or database. Chunking strategy, embedding model selection, retrieval quality evaluation, re-ranking, and answer generation in one tested pipeline.
Eval harnesses
Evals-as-code with DuckDB persistence and a FastAPI/HTMX viewer. Run regression tests on prompts, compare model outputs across versions, and catch regressions before production.
Voice agents
Real-time voice loops with WebRTC + mediasoup, Deepgram STT, LLM turn, and TTS, targeting sub-second end-to-end latency. Based on voice-agent-starter.
MCP servers
Production-grade Model Context Protocol servers using FastAPI and Python 3.12, typed tool schemas, resource endpoints, auth, and full test coverage.
Agent orchestration
Durable multi-agent workflows with deterministic replay using Postgres + Drizzle + BullMQ. Handles long-running tasks, retries, and a visual Inspector UI.
Stack
Recent work in this lane
Open-source repositories
- slipstream, token-efficient coding agent runner
- SarmaLink-AI, 14-engine LLM gateway
- forge-infer, minimal LLM inference server in Rust
- voice-agent-starter, sub-second WebRTC voice loop
- mcp-server-toolkit, production MCP server
- ai-eval-runner, evals as code with DuckDB
- agent-orchestrator, durable multi-agent workflows
What a hiring team gets
Read the evidence
Open the public repositories, browse past work, then look at the hiring page if a PAYE shape fits your team.