Whitepaper . slipstream

slipstream

A token-efficient coding-agent runner with persistent per-project memory, lossless compaction, a guardrailed skill library, and a live local agent dashboard.

MIT LicensedOpen SourceSelf-HostableLocal-onlyMCP over stdioNo telemetry
9MCP tools
5pillars
88tests / 11 files
59guardrailed skills

v1.0 . May 2026 . Sai Sarma . Sarma Linux

Abstract

slipstream is an MIT-licensed plugin for the popular coding-agent IDE in VS Code. It replaces whole-file reads with a bundled MCP server that exposes precise tools (sp_map, sp_symbol, sp_lines, sp_search), persists durable facts in a file-based memory store, writes a structured digest before the context window is compacted, reloads only the signal-ranked relevant subset on the next session, and stands up a local server-sent-events dashboard that observes the live session and replays finished ones. The MCP tools, the token-saving core, work in any MCP-capable editor. Everything else is one plugin, one helper, one append-only event log per session, on the developer's own machine, with no telemetry.

01Executive Summary

A long agent session usually dies one of two ways. Either it reads whole files until the context window is full and starts forgetting the start of its own plan, or it does good work and then the session ends and every decision evaporates. slipstream is built around two enforced habits that fix both: read a compact project map and pull a single slice instead of opening whole files, and write durable facts to a structured store that survives a compaction.

The five pillars are token efficiency, persistent memory with lossless compaction, a guardrailed skill library, a mind map and statusline in chat, and a live local agent dashboard. The dashboard is the headline: a 127.0.0.1 SSE server with four panels (agents, activity, token budget, plan and mind map) that tails an append-only JSONL event log produced by lifecycle hooks. Because the log is the source of truth, replay is the same fold over a finished log.

On a real example from this repository, sp_symbol pulls 1,381 bytes for the one declaration the agent needs, where the whole-file read pulls 4,841 bytes: a 71% reduction for the work that prompted the read. Orienting in the whole src/ tree by reading every file costs roughly 40,597 tokens; reading the sp_map index instead costs roughly 2,173, which is 5.4% of the naive cost. Numbers are produced by the helper using slipstream's own conservative 3.6 bytes-per-token estimate.

02Background and Motivation

I ship small production sites on Cloudflare, Supabase, Vercel and Resend. I lean on the coding agent in my IDE to do the boring parts: scaffold a route, write a migration, draft a test, ship a fix. The default loop works for short tasks. It strains on long ones.

The pattern that kept biting me had two halves. First, the read shape: the agent would open a 1,200-line component to change one prop. The budget would bleed for an hour, and three prompts later the convention we agreed on at the top had paged out. Second, the compaction: the IDE summarises and trims the conversation to keep within the window, and the moment that happened the durable facts went with the noise. I tried writing everything to a notes file by hand. The notes file rotted within a day, because writing it was a manual chore the agent did not own.

What I wanted was not better discipline. It was a tool that enforced two habits without my having to remember them, and that gave me a window into the session so I could trust it long enough to walk away from the keyboard.

03The Problem

Concretely, the default coding-agent loop has four failure modes I wanted to eliminate.

  • Whole-file reads dominate the budget. The default Read tool returns the entire file. The agent rarely needs all of it. Most reads cost an order of magnitude more than the work they enable.
  • Compaction is lossy. When the IDE compacts, structured facts (the open task, the schema you agreed on, the file you decided was the right one to edit) blur into prose. Resumed sessions feel like talking to a stranger holding your notes.
  • Memory is either nothing or everything. A notes file is nothing until you write to it. A naive memory layer that reloads the full store on every session gets more expensive the more useful the store becomes.
  • You cannot watch the agent work. Without an outside view, the only signal you have for "is this going well" is whether the chat looks confident. Confidence is not progress.

slipstream attacks all four directly. Precise tools replace whole-file reads. A PreCompact hook writes a structured digest before compaction. Recall is signal-ranked, bounded by a token budget, and loads nothing without a signal. A live local dashboard makes every agent step visible.

04Goals and Non-goals

The goals are narrow and load-bearing.

  • Cut tokens per read by an order of magnitude on the common case. Without making the agent dumber.
  • Preserve durable facts across compactions and across sessions. Reviewable, diffable, drift-proof.
  • Show the operator what is happening. Live, locally, with replay.
  • Stay local-only. No telemetry, no accounts, no cloud, no inbound network. If it phones home, it is not slipstream.
  • Be auditable in one sitting. The helper compiles to a single dist tree, the MCP server is one file, the dashboard server is one file.

The non-goals are equally deliberate.

  • Universal scaffolding. The skill library targets the stack I actually ship on (Cloudflare, Supabase, Vercel, Resend). It is not trying to be a framework-of-frameworks.
  • Replacing the agent loop. slipstream observes and supports the agent. It does not drive it; it cannot pause a tool call or steer a subagent.
  • Precise token metering. The helper cannot read the IDE's internal counter; the budget is a conservative byte-count estimate, guidance not gospel.
  • A hosted product. No sign-up, no SaaS layer, no managed dashboard. The whole point of being local is being yours.

05Architecture

The repository is both the published plugin and the helper the plugin calls. The plugin surface (manifest, slash commands, hooks, skills, subagents, output style, statusline) is what the IDE loads. The helper (compiled TypeScript under dist/) is what the hooks and commands invoke for the heavy lifting. The bundled MCP server is part of the helper.

There are eleven helper modules: src/mcp (stdio JSON-RPC server and the nine sp_ tools), src/map (the scan, generate and retrieve path), src/memory (file-based store, signal-ranked recall, PreCompact digest), src/context (byte-count budget and the read guard), src/dashboard in two halves (the mind map and artifact; and the live event log, server, state-fold and launcher), src/engine (the skill contract and loader), src/statusline (the pure statusline formatter), src/doctor (the install check), src/plugin-validate (the manifest validator), and src/cli (the dispatcher).

All data the helper writes lives under .claude/slipstream/ in the project: map.md and map.json; one Markdown fact per file under memory/ with a regenerated MEMORY.md index; one append-only <session>.jsonl event log under dashboard/; a server.json recording the running server; and an optional dashboard.json for settings. The whole tree is git-ignored by default and is intended to be local per developer unless you choose to commit the memory.

The data path is one direction: hooks write events to the JSONL log; the dashboard server tails the log and pushes folded state to the browser over SSE. Browser to server traffic is minimal (a session-id query string). Replay is the same fold applied to a finished log file.

06Key Technical Decisions

Six choices carry most of the design weight.

Hand-rolled MCP server, not the SDK. The slice of the protocol in play (initialize, tools/list, tools/call) is small and stable. A plugin that bundles a server should add as little as possible to a user's install. I implement the newline-delimited JSON-RPC framing in one file. The benefit is zero runtime dependencies on the MCP path and a server I can audit in a sitting. The request handler is a pure exported function, so tests drive it without a process; a separate suite spawns the real binary over stdio.

Signal-ranked recall, not load-everything. The obvious memory design reloads the whole store on every session. I rejected it because it gets more expensive the more useful the store becomes. Recall instead ranks against a cheap task signal (git branch, files changed in the working tree, last prompt) and reloads only the subset that fits a token budget (~1,200 tokens). With no signal it loads nothing and defers to the index, because loading arbitrary facts with no signal is the very thing I was trying to avoid.

Server-sent events on node:http, not WebSocket on Express. Dashboard traffic is one-directional, server to browser. SSE is a handful of lines over plain HTTP and the browser reconnects on its own. A WebSocket on Express would buy me a duplex channel I do not need and a dependency tree that could break the plugin build. The cost is writing the tiny router by hand.

Append-only JSONL, not SQLite. A line-per-event file is append-only by construction, tailable, human-readable when something goes wrong, and replay is a pure fold. I rejected SQLite because a native module complicates packaging a plugin meant to install cleanly everywhere. The trade-off is hand-rolled concurrency control via a small advisory lock so two racing hook processes never collide on a sequence number; that path is tested under 25 parallel writers.

Files for memory, not a database. Reviewable, diffable, survives without a running service. The MEMORY.md index is regenerated from the files, so it cannot silently drift. The same packaging argument that ruled out SQLite for the event log rules it out here.

A byte-count budget estimate, not a real token meter. The helper cannot read the IDE's internal counter. It estimates from bytes-into-context at a cautious 3.6 bytes per token. This is guidance, not a guarantee, and the wording everywhere says so. I would rather be honestly approximate and conservative than precise-looking and wrong.

07Comparison with the Default Loop

The fairest comparison is the default coding-agent loop in the same IDE, without slipstream installed.

  • Reading a single declaration in a 1,200-line file. Default loop: whole-file Read, full file into context. slipstream: sp_map orients, sp_symbol returns one slice. On the worked example in this repository, the saving is 71% for the prompted read. The wider the file or the deeper the project, the larger the gap.
  • Orienting in an unfamiliar source tree. Default loop: tends to read files one by one. slipstream: sp_map returns files, symbols and one-line purpose in a single tool call. Reading every file in src/ here costs ~40,597 tokens; sp_map costs ~2,173, a factor of about 19.
  • Surviving a compaction. Default loop: a summary replaces the conversation; structured facts blur into prose. slipstream: a PreCompact hook builds a structured digest (open task, decisions, files touched, next step) and writes it as a durable memory; the next session reloads it first.
  • Long-running sessions with subagents. Default loop: the operator watches the chat. slipstream: the dashboard shows every agent and subagent, status, current task, the budget bar and the plan, with the activity stream grouped per-agent so a subagent's work does not tangle with the main thread.
  • Verifiable steps. Default loop: the agent claims it shipped. slipstream: shipping skills carry a verification gate the agent must pass, with subagents (sp-shipper, sp-schema, sp-reviewer) that refuse to advance past a red gate.

Compared to a generic MCP-only setup in any MCP-capable editor (Cursor, Windsurf, Antigravity), the MCP tools alone give you the read-shape win. The full plugin layer (hooks, skills, dashboard, statusline, output style, doctor) requires plugin support in the IDE.

08Results and Performance

Numbers below are produced by the helper on a developer machine (Apple Silicon, Node 25), using the conservative 3.6 bytes-per-token estimate in src/context/budget.ts.

  • 71% fewer tokens. sp_symbol(retrieve.ts, retrieveSymbol) returns 1,381 bytes against 4,841 for a whole-file Read of src/map/retrieve.ts, for the same work.
  • 5.4% of the orienting cost. sp_map at ~2,173 tokens against ~40,597 tokens to read every file in src/ one by one.
  • Bounded recall. Signal-ranked recall is hard-capped at ~1,200 tokens per session start. With no task signal, recall returns nothing.
  • 88 tests across 11 files. Pure handler tests for the MCP server, a real-binary stdio test, concurrency-safe event log tests under 25 parallel writers, a real SSE server end-to-end test, idempotent start, replay, the PreCompact digest round-trip, signal-ranked recall under budget, a snapshot of the statusline format, and doctor against both the real tree and a deliberately broken one. The full suite runs in roughly 2.1 seconds on Apple Silicon.
  • Local-only network footprint. One bound port on 127.0.0.1, no outbound calls, no telemetry. Verify with lsof or the dashboard's server.json.

The dashboard's token-budget bar makes the saving visible while it happens. With the tools on, the bar crawls; with whole-file reads, it lurches.

09Trade-offs and Limitations

These are the honest costs. They are documented in the wiki under Roadmap and Limitations and surfaced here so operators can plan around them.

  • The token budget is an estimate. It is tuned to warn early. Treat the percentages as a strong hint, not gospel.
  • The dashboard observes, it does not control. It cannot pause a tool call or steer a subagent. That is by design.
  • Subagent visibility depends on what the IDE exposes. There is a reliable SubagentStop, so the dashboard infers a subagent from the first event that names it and flips its status on stop. If a future IDE adds a real SubagentStart, I will wire it.
  • The skill library targets one stack. Cloudflare, Supabase, Vercel and Resend. It is not trying to be a universal scaffolder for every framework.
  • Secret redaction is blunt. Pattern-based, will mask things that are not secrets before it lets a real one through, which is the safe direction. Do not treat it as a vault.
  • The plugin layer is IDE-specific. The MCP tools work anywhere; the skills, hooks, slash commands and dashboard need a plugin-capable IDE.

10Conclusion

slipstream is the runner I ship with because it removes two recurring failures of the default coding-agent loop on long sessions: budget bleed from whole-file reads, and the loss of durable facts through compaction. The dashboard makes the work visible, which is the thing that lets me trust the agent long enough to walk away from the keyboard.

The design choices are conservative on purpose. A hand-rolled MCP server with zero runtime dependencies. An append-only JSONL log instead of a database. Files for memory. SSE on node:http instead of a framework. A byte-count budget that is honest about being approximate. Local-only by construction. MIT-licensed. If it phones home, it is not slipstream.

Roadmap items are explicit and small: a compaction timeline on the dashboard, an optional per-agent diff view, a shareable HTML session artifact in the same shape as the mind map artifact. Non-roadmap items are equally explicit: no hosted version, no telemetry, no accounts.

Appendix A . Configuration

The full reference lives in the wiki under Configuration and Tuning. The minimum useful set:

  • .claude/slipstream/dashboard.json{ "enabled": true, "autoOpen": false }. Per-project dashboard toggle.
  • SLIPSTREAM_DASHBOARD=0 → per-session disable.
  • SLIPSTREAM_DASHBOARD_OPEN=0 → per-session keep the browser shut.
  • .claude/slipstream/memory/MEMORY.md → regenerated index; commit if you want to share durable facts across the team.
  • output-styles/slipstream.md → enable with /output-style slipstream to spend fewer tokens per turn.
  • /slipstream:doctor → PASS / FAIL per check across the whole install.

Appendix B . Operator Checklist

Run this once after install, and again whenever the plugin updates.

  • Add the marketplace and install the plugin.
  • Open the project. Build the map once with /slipstream:map.
  • Run /slipstream:doctor and confirm every line reports PASS.
  • Open the dashboard URL printed in chat. Confirm the four panels render.
  • Trigger a tool call. Confirm an event lands in .claude/slipstream/dashboard/<session>.jsonl and the activity stream updates.
  • Write a durable fact with /slipstream:remember. Confirm a file appears under .claude/slipstream/memory/ and the index is regenerated.
  • Force a compaction. Confirm a digest is written to memory and reloaded on the next session.
  • Decide what to commit. By default the whole .claude/slipstream/ tree is git-ignored and per-developer.

For the full operator guide, see the wiki: Install in VS Code, Run it in any IDE, Troubleshooting, Roadmap and Limitations.

Slip into the stream

MIT-licensed, local-only, no telemetry. Five minutes from install to your first slice.