Hi, I'm Sarma
I design and ship LLM systems, systems software, and full-stack platforms, end to end. Eighteen public open-source repositories spanning a coding agent runner, a multi-provider gateway, a Rust inference server, storage engines, consensus, and WebAssembly sandboxes. Eighty-plus long-form engineering essays, and a portfolio built in the open. Open to permanent full-time roles in the UK.

Five years building things. Most of it now public.
I am Sarma, a software engineer based in the UK. Around five years of production experience across AI systems, agent orchestration, voice pipelines, evaluation harnesses, RAG, OCR, storage engines, consensus, WebAssembly sandboxes, and the infrastructure that holds all of it together.
What pulls me back to the desk every weekend is the same thing that pulled me into the industry in the first place, the quiet thrill of building something from scratch. A blank repository, a problem worth solving, a system that did not exist yesterday and ships today. The hours go past unnoticed. I have always built this way; for a long time it sat in private repositories, the kind nobody else could see.
Earlier this year I made a deliberate decision: take the months and years of private work, sharpen the rough edges, document the trade-offs, write the architecture diagrams I always wished had been there, and put all of it into the open. Not as a portfolio prop, as a contribution. The community gave me the tools, the libraries, the writeups, the stack-overflow answers. Open-sourcing my work is how I pay that bill back.
The result is a constellation of nineteen MIT-licensed repositories under github.com/sarmakska with two flagships (a multi-provider AI gateway and a token-efficient coding agent with persistent memory), plus a durable multi-agent orchestrator, a real-time voice agent loop, evals-as-code, a Model Context Protocol starter, RAG, vision OCR, an OpenAI-compatible local LLM router, a webhook bridge, a Helm chart for shipping Next.js to Kubernetes, a Terraform stack, a multi-tenant SaaS starter, an LSM-tree storage engine, a Raft KV store, a WebAssembly sandbox, and a minimal LLM inference server in Rust. All shipped, all production-shaped, all free to fork.
Around the code I write long-form. Eighty-plus essays on AI infrastructure, platform engineering, observability, and the indie-SaaS stack. The blog is a working notebook, real numbers, real charts, real citations, not a content-marketing surface.
I am happy to contribute time to non-profit and open-source initiatives where I can be useful. Beyond that, what I am really looking for is the next chapter, a permanent, full-time role on a team that is shipping something serious.
Connect on LinkedIn
Code that survives the six-month test
Code that works in a demo and dies in production is technical theatre. I optimise for the moment six months after launch, when somebody else has to read it, change it, and own the on-call page when something breaks.
Operating principles
Boring tech, surgical complexity. Postgres before Mongo. Server-rendered HTML before another SPA framework. Reach for the exotic only when the boring option genuinely runs out.
Open source by default. Eighteen public repositories under MIT, covering coding agents, gateways, inference, storage engines, consensus, and sandboxes. If a piece of work is generally useful, I publish it.
Numbers over narratives. Every blog post that claims a benchmark cites the source. Every chart marks whether the row is from a public benchmark or my own. A transparency footer on every post invites readers to flag bad numbers.
Ship the smallest thing that proves the next thing. Small commits, frequent deploys, observability before features. Big-bang releases are how products get cancelled mid-flight.
Defaults that respect the user. No silent analytics. No cookie banners I would hate. Real auth on day one, real row-level security on every table. The defaults you would want if you cloned my code at midnight.
Twelve lanes, nineteen repositories
Not abstract capability lists. Each card below maps to repositories you can clone, read, and run today.
Multi-provider AI gateways
SarmaLink-AI: multi-engine failover across fourteen providers, OpenAI-compatible proxy, persistent memory, image generation, live tools. Zero-cost frontier tier as the default route.
Agent orchestration
Durable multi-agent workflows with deterministic replay, journaled state in Postgres, hard tool and token budgets, BullMQ queue, Inspector UI. Workflows that survive restarts and pass audit.
Real-time voice loops
Sub-second WebRTC voice agent with mediasoup SFU, pluggable STT, LLM and TTS adapters, explicit turn-state machine, barge-in cancellation tested across the awkward cases.
Evals as code
Datasets as files, scorers as functions, traces in DuckDB, viewer in HTMX. Six built-in scorers including LLM-as-judge. Regression mode fails CI when a release loses ground against the baseline.
Production infrastructure
Helm charts for Next.js with the full observability stack (Prometheus, Grafana, Loki, Alertmanager) preconfigured. Terraform stack composing Vercel, Supabase, Cloudflare, DigitalOcean.
RAG and document intelligence
A clean end-to-end RAG starter you can clone, run, and ship in ten minutes. PDF chunking, embeddings, cosine retrieval, streaming answers with citations. Receipt OCR with Zod-validated JSON output.
Inference internals
forge-infer: a minimal LLM inference server in Python with paged KV-cache, continuous batching, and speculative decoding. Built to understand the layer that the SDKs hide.
Storage and consensus
lsmdb: a log-structured merge-tree engine in Go with WAL, SSTables, bloom filters, MVCC snapshots. raftkv: a Raft key-value store with a fault-injection harness proving linearizability under partitions.
Sandboxing and isolation
sandboxd: a WebAssembly sandbox in Rust with a deny-by-default host ABI and strict CPU, wall-clock, and memory limits. Built for the moment somebody hands an agent untrusted code.
Observability that pays its rent
Structured logs, RED metrics, exemplar-linked traces, dashboards that diagnose rather than decorate. If a graph never gets opened during an incident, it does not deserve to exist.
Multi-tenant SaaS plumbing
shipyard: tenant isolation by row and by schema, RBAC, audit log, billing hooks, rate limits. The boring foundation under every B2B product, ready to clone.
Developer-grade tooling
slipstream: a token-efficient coding agent with persistent memory and a live local dashboard. The tool I use on myself before I ship it to anybody else.
I publish the plumbing, on purpose
Most production AI systems share the same shape underneath. A multi-provider gateway. A webhook receiver. A RAG starter. A voice loop. An evaluation harness. A Helm chart for the Next.js workload. A Terraform module for the cloud accounts. The domain logic on top is genuinely bespoke. The plumbing underneath is plumbing.
So I publish the plumbing. Eighteen MIT-licensed repositories and counting.
It is not a marketing move. It is how I keep my own work honest. If a piece of infrastructure has to survive being read by strangers, run on somebody else's laptop, and accept pull requests, it cannot hide behind the kind of cleverness that only the original author understands. Open source is the harshest possible code review. I would rather take that review every day than ship a private repo that looks fine because nobody ever opened it.
The substantive view on AI underneath all of it. Failover across providers is not optional. Single-provider risk in production is a pager waiting to happen. "Powered by GPT" is not a moat. The interesting work sits at the layer where your business logic meets the model, the layer of context, memory, guardrails, evaluation, observability. That layer is bespoke. Everything underneath should be commodity, well-understood, and ideally already running in somebody else's production by the time it reaches yours.
Eighteen repositories is what that belief looks like written out in code.
Three things you get that you would not get from a CV alone
The receipts are public
You can read the code before the interview
Every repository is on GitHub under MIT. The architecture decisions, the test harnesses, the failure modes, the commit history. No portfolio screenshots, no NDA-shaped silhouettes. Read it before we speak.
The plumbing is already paid for
I have already solved the boring half
Gateways, voice loops, RAG, evals, Helm charts, Terraform modules, storage engines, consensus. The work I bring into a team starts above the layer most projects waste their first quarter rebuilding.
The reasoning is on the page
You see how I think, not just what I ship
Eighty-plus long-form essays on the studio site. Trade-offs argued in public, benchmarks cited, mistakes named. You hire the writer with the code, and the code with the writer.
How I work as an engineer
Eight principles I have arrived at after a few years of shipping production systems. Set once, lived consistently across every project on my desk.
Boring tech, surgical complexity
Postgres before Mongo. Server-rendered HTML before another SPA framework. Reach for the exotic only when the boring option genuinely runs out, then commit to it fully.
Open source by default
Generic infrastructure goes public, MIT-licensed. Eighteen repositories and counting. If a problem is general, the answer should be shared, not paywalled.
Numbers over narratives
Every benchmark cites the source. Every chart marks whether it is from a public bench or my own. The transparency footer on every post invites readers to flag bad numbers.
Ship small, deploy often
Tight commits, frequent deploys, observability before features. Big-bang releases are how products get cancelled mid-flight. Steady cadence beats heroics every quarter.
Safe defaults, day one
Real auth, real row-level security on every table, real audit logs, real observability. The defaults you would actually want if you cloned my code at midnight.
Read the source first
Before reaching for the SDK, read what the SDK calls. Most "magic" is a thin wrapper around a documented endpoint. Knowing the layer below is the unfair advantage.
Write the diagram first
An ADR, a Mermaid sketch, a one-pager of trade-offs. If I cannot explain the system in a paragraph, I do not understand it well enough to ship it.
Senior IC, end to end
I do not hand off to juniors. I do not hide behind layers. From schema to deploy script, the person on the email is the person at the keyboard.
Let's build something good.
You've got a problem. I solve problems with software for a living.
The fastest way to find out if we can work together is to talk.
Stack I build with