Systems I have shipped

What I build, when I sit down at the keyboard

The categories of system below are not a menu of services. They are the shapes of software I have already shipped in the nineteen open-source repositories under my name, and the shapes I am happy to ship again from inside a team that hires me as an employee.

18
Open-source repositories
12+
System categories shipped
MIT
Licensed end to end
UK
Based, PAYE only
Twelve categories

Twelve shapes of system

Each card is a category of system I have already built and pushed to a public repository. The evidence beneath each card points to the exact code, the whitepaper, or the long-form essay that explains how it works.

Multi-provider LLM gateways

OpenAI-compatible HTTP surface in front of a ladder of inference engines, with per-engine circuit breakers, dynamic model discovery, and a 24h cache for live model lists. The kind of layer that turns a single provider outage into a non-event.

  • SarmaLink-AI ships a 14-engine failover ladder and survived a real provider outage in production
  • Intent-based plugin auto-routing across ten open-source plugins behind a single env var
  • HMAC-SHA256 verified webhook persistence with idempotent upserts

Durable agent orchestration

A workflow engine on Postgres and BullMQ with idempotent step handlers, a deterministic replay log, per-step tool budgets, and a tRPC Inspector UI for live graph state. The pattern that turns ad-hoc agent scripts into systems you can debug a week later.

  • agent-orchestrator: TypeScript, Postgres journal, BullMQ queue, replay from any offset
  • slipstream: the same idea applied to local coding agents with persistent project memory
  • Per-step tool budget enforcement so a runaway agent cannot drain the wallet

Low-level inference servers

Paged KV-cache, continuous batching, speculative decoding, OpenAI-shaped HTTP. Written small enough that the request path fits in a single sitting and every allocation is visible.

  • forge-infer in Rust: paged attention KV cache and continuous batching, no vLLM
  • OpenAI-compatible /v1/chat/completions surface
  • Designed to teach the internals as much as to serve traffic

Real-time voice loops

Sub-second round-trip on WebRTC with pluggable STT, LLM, and TTS adapters, correct barge-in, and per-stage telemetry so the latency budget is visible from the first call. The primitive behind every serious voice-agent product.

  • voice-agent-starter: mediasoup + Fastify + Next.js client in a pnpm workspace
  • Sentence-by-sentence streaming TTS so the first word lands before the model is finished
  • Site chatbot Martha runs on the same primitives

Retrieval over private documents

Chunked ingestion, vector search, and citation spans that survive into the generated answer so the reader can audit the source. Built for the case where a wrong answer with a confident tone is worse than no answer.

  • rag-over-pdf: a minimal retrieval stack that keeps spans through generation
  • Citation-first answer shape, not citation-last
  • Cached embeddings and source hashes so reindexing is cheap

Evals as code

A Python CLI plus FastAPI HTMX viewer that turns model evaluation into a thing you run on every push, store in DuckDB, and trend over time. Evals living next to the prompts in the same repository, not in a closed dashboard.

  • ai-eval-runner: Python 3.12, uv, Typer, DuckDB result store
  • FastAPI + HTMX viewer for trend lines and regression catches
  • Per-suite cost and latency budgets enforced in CI

Multi-tenant SaaS scaffolds

Tenant isolation, RBAC, billing, audit log, rate limits, and a magic-link auth surface, wired together so a fresh studio can be online in a single command. The plumbing every B2B product reinvents in its first month.

  • shipyard: TypeScript, Next.js 16, Supabase, Tailwind v4
  • Stripe billing and role-based admin plus tenant surfaces
  • Reproducible end to end from terraform-stack

Storage and consensus internals

A log-structured merge-tree storage engine in Go, with WAL, SSTables, bloom filters, MVCC snapshots, and a Raft KV store proving linearizability under partitions. Written to learn the layer that the SDK is hiding from you.

  • lsmdb: Go, compaction, bloom filters, MVCC snapshots, wire test against RocksDB
  • raftkv: hand-rolled leader election, log replication, snapshots, membership
  • Fault-injection harness that randomises partitions, slow links, and dropped messages

Sandboxes and capability boundaries

A WebAssembly sandbox with a deny-by-default host ABI, capability handles passed in from the host, and fuel metering so a runaway guest cannot tie up a worker. The shape every plug-in surface needs and almost no SaaS gets right.

  • sandboxd: Rust, deny-by-default host ABI, capability passing, fuel metering
  • Strict CPU, wall-clock, and memory limits per guest
  • Designed to be embedded into agent runners and SaaS plug-in surfaces

Platform engineering, end to end

Reproducible Vercel, Supabase, Cloudflare, and DigitalOcean stacks from a single Terraform repository, plus a Helm chart for the Kubernetes side with observability bootstrapped on day one. The kind of platform a team should not have to assemble by hand.

  • terraform-stack: one repo for Vercel, Supabase, Cloudflare, DigitalOcean modules
  • k8s-ops-toolkit: ingress-nginx, cert-manager, kube-prometheus-stack, Loki
  • Runs my own production today, not a theoretical sketch

MCP servers and tool surfaces

Model Context Protocol servers that expose typed tools over stdio JSON-RPC, with a small audited surface and tests that drive the request handler directly. The bridge between an agent and the rest of your platform.

  • mcp-server-toolkit: production MCP server starter in Python and FastAPI
  • slipstream ships a hand-rolled MCP server with nine sp_* tools over stdio
  • Zero runtime deps on the MCP path so the server can be audited in one file

Operational glue and webhooks

Webhook receivers, transactional email pipelines, and OCR-to-JSON ingestors that turn the messy edge of a product into structured rows in the database. Small services, written to be boring and rerunnable.

  • webhook-to-email: a Resend-backed receiver for payment, order, and incident events
  • receipt-scanner: vision OCR for receipts and shipping documents into typed JSON
  • Idempotent by construction, replayable from the log
If your company is hiring

These categories travel well into a full-time seat

I am open to permanent full-time PAYE employment in the United Kingdom only, and only until at least February 2030. Employee roles only. If a category above maps to something your team is building, the hire-me page has the role shapes, the capability matrix, and the email that reaches me directly.

Read the hire-me page