Whitepaper . slipstream

slipstream

A token-efficient coding-agent runner with persistent per-project memory, lossless compaction, a guardrailed skill library, and a live local agent dashboard.

MIT LicensedOpen SourceSelf-HostableLocal-onlyMCP over stdioNo telemetry

14sp_ tools

~95%per-read savings

321tests / 47 files

75guardrailed skills

v1.0.0 · June 2026 · Sarma · SarmaLinux

github.com/sarmakska/slipstream Wiki Back to product page

Abstract

slipstream is an MIT-licensed Claude Code plugin and cross-IDE MCP toolkit (Cursor, Windsurf, Antigravity, VS Code with MCP, JetBrains with MCP). It replaces whole-file reads with a bundled MCP server that exposes fourteen sp_ tools (sp_map, sp_symbol, sp_lines, sp_search, plus memory, budget, compaction and dashboard tools), persists durable facts in a file-based memory store with three-layer search, writes a structured digest before the context window is compacted, reloads only the signal-ranked relevant subset on the next session, builds a knowledge feed that primes every new session, and serves a React dashboard whose home is a live pixel office — every open tab a character at a desk, animated by what it is doing now — alongside digest-first sessions, a what-is-learned view and an interactive d3 code dependency graph. Multiple Claude Code tabs on one project coordinate through a shared local bus, written at turn start and on every file tool so an agent appears the instant it starts working. Everything sits on an append-only JSONL event log per session, on the developer's own machine, with no telemetry. v1.0 shipped 6 June 2026; the dashboard was rebuilt around the pixel office shortly after.

01Executive Summary

A long agent session usually dies one of two ways. Either it reads whole files until the context window is full and starts forgetting the start of its own plan, or it does good work and then the session ends and every decision evaporates. slipstream is built around two enforced habits that fix both: read a compact project map and pull a single slice instead of opening whole files, and write durable facts to a structured store that survives a compaction.

The seven pillars are token efficiency, persistent observation memory with lossless compaction, a 75-skill methodology library, a mind map and statusline in chat, a React local dashboard whose home is a live pixel office of every working tab, with digest-first sessions, a what-is-learned view and an interactive code dependency graph, a cross-tab agent bus that lets multiple Claude Code tabs on one project see each other's live work, and a cold-start knowledge feed injected by SessionStart so no session starts ignorant of the project. Everything sits on an append-only JSONL event log per session; replay is the same fold as live.

On real files from this repository, scoped reads average around 95% smaller than whole-file reads. pnpm benchmark regenerates the table from a clean clone, so the figure is reproducible. On src/dashboard/server.ts a sp_symbol averages 612 bytes per call where the whole file is 18,241 bytes; on src/mcp/tools.ts 980 bytes per call versus 28,704 bytes. The headline is per-read efficiency, not end-to-end session efficiency; the script states this plainly.

02Background and Motivation

I ship small production sites on Cloudflare, Supabase, Vercel and Resend. I lean on the coding agent in my IDE to do the boring parts: scaffold a route, write a migration, draft a test, ship a fix. The default loop works for short tasks. It strains on long ones.

The pattern that kept biting me had two halves. First, the read shape: the agent would open a 1,200-line component to change one prop. The budget would bleed for an hour, and three prompts later the convention we agreed on at the top had paged out. Second, the compaction: the IDE summarises and trims the conversation to keep within the window, and the moment that happened the durable facts went with the noise. I tried writing everything to a notes file by hand. The notes file rotted within a day, because writing it was a manual chore the agent did not own.

What I wanted was not better discipline. It was a tool that enforced two habits without my having to remember them, and that gave me a window into the session so I could trust it long enough to walk away from the keyboard.

03The Problem

Concretely, the default coding-agent loop has four failure modes I wanted to eliminate.

Whole-file reads dominate the budget. The default Read tool returns the entire file. The agent rarely needs all of it. Most reads cost an order of magnitude more than the work they enable.
Compaction is lossy. When the IDE compacts, structured facts (the open task, the schema you agreed on, the file you decided was the right one to edit) blur into prose. Resumed sessions feel like talking to a stranger holding your notes.
Memory is either nothing or everything. A notes file is nothing until you write to it. A naive memory layer that reloads the full store on every session gets more expensive the more useful the store becomes.
You cannot watch the agent work. Without an outside view, the only signal you have for "is this going well" is whether the chat looks confident. Confidence is not progress.

slipstream attacks all four directly. Precise tools replace whole-file reads. A PreCompact hook writes a structured digest before compaction. Recall is signal-ranked, bounded by a token budget, and loads nothing without a signal. A live local dashboard makes every agent step visible.

04Goals and Non-goals

The goals are narrow and load-bearing.

Cut tokens per read by an order of magnitude on the common case. Without making the agent dumber.
Preserve durable facts across compactions and across sessions. Reviewable, diffable, drift-proof.
Show the operator what is happening. Live, locally, with replay.
Stay local-only. No telemetry, no accounts, no cloud, no inbound network. If it phones home, it is not slipstream.
Be auditable in one sitting. The helper compiles to a single dist tree, the MCP server is one file, the dashboard server is one file.

The non-goals are equally deliberate.

Universal scaffolding. The skill library targets the stack I actually ship on (Cloudflare, Supabase, Vercel, Resend). It is not trying to be a framework-of-frameworks.
Replacing the agent loop. slipstream observes and supports the agent. It does not drive it; it cannot pause a tool call or steer a subagent.
Precise token metering. The helper cannot read the IDE's internal counter; the budget is a conservative byte-count estimate, guidance not gospel.
A hosted product. No sign-up, no SaaS layer, no managed dashboard. The whole point of being local is being yours.

05Architecture

The repository is both the published plugin and the helper the plugin calls. The plugin surface (manifest, slash commands, hooks, skills, subagents, output style, statusline) is what the IDE loads. The helper (compiled TypeScript under dist/) is what the hooks and commands invoke for the heavy lifting. The bundled MCP server is part of the helper.

There are fourteen helper modules in v1.0: src/mcp (stdio JSON-RPC server and the fourteen sp_ tools), src/map (the scan, generate, retrieve and code dependency graph path, plus the benchmark), src/memory (file-based store, three-layer search, signal-ranked recall, PreCompact digest, observation memory, lessons distillation, the cross-tab bus), src/context (byte-count budget and the dollar-cost-of-tokens-saved estimator), src/dashboard in three halves (the mind map and artifact; the live event log, server, state-fold and launcher; the brief, graph, insights, and code dependency graph generators), src/engine (the skill contract and loader for the 75 skills), src/statusline (the pure statusline formatter), src/doctor (the install and memory checks), src/plugin-validate (the manifest validator), src/cli (the dispatcher) and a new web/ tree (Vite + React + TypeScript SPA that builds to dist/dashboard/web).

All data the helper writes lives under .claude/slipstream/ in the project: map.md and map.json; one Markdown fact per file under memory/ with a regenerated MEMORY.md index; observations/ for the per-turn fold; bus.jsonl for the cross-tab status posts; one append-only <session>.jsonl event log under dashboard/; a server.json recording the running server; dashboard.url for stable discovery; and an optional dashboard.json for settings. The whole tree is git-ignored by default and is intended to be local per developer unless you choose to commit the memory.

The data path is one direction: hooks write events to the JSONL log; the dashboard server tails the log and pushes folded state to the browser over SSE. Browser to server traffic is minimal (a session-id query string). Replay is the same fold applied to a finished log file.

06Key Technical Decisions

Nine choices carry most of the design weight in v1.0.

Hand-rolled MCP server, not the SDK. The slice of the protocol in play (initialize, tools/list, tools/call) is small and stable. A plugin that bundles a server should add as little as possible to a user's install. I implement the newline-delimited JSON-RPC framing in one file. The benefit is zero runtime dependencies on the MCP path and a server I can audit in a sitting. The request handler is a pure exported function, so tests drive it without a process; a separate suite spawns the real binary over stdio.

Signal-ranked recall, not load-everything. The obvious memory design reloads the whole store on every session. I rejected it because it gets more expensive the more useful the store becomes. Recall instead ranks against a cheap task signal (git branch, files changed in the working tree, last prompt) and reloads only the subset that fits a token budget (~1,200 tokens). With no signal it loads nothing and defers to the index, because loading arbitrary facts with no signal is the very thing I was trying to avoid.

Server-sent events on node:http, not WebSocket on Express. Dashboard traffic is one-directional, server to browser. SSE is a handful of lines over plain HTTP and the browser reconnects on its own. A WebSocket on Express would buy me a duplex channel I do not need and a dependency tree that could break the plugin build. The cost is writing the tiny router by hand.

Append-only JSONL, not SQLite. A line-per-event file is append-only by construction, tailable, human-readable when something goes wrong, and replay is a pure fold. I rejected SQLite because a native module complicates packaging a plugin meant to install cleanly everywhere. The trade-off is hand-rolled concurrency control via a small advisory lock so two racing hook processes never collide on a sequence number; that path is tested under 25 parallel writers.

Files for memory, not a database. Reviewable, diffable, survives without a running service. The MEMORY.md index is regenerated from the files, so it cannot silently drift. The same packaging argument that ruled out SQLite for the event log rules it out here.

A byte-count budget estimate, not a real token meter. The helper cannot read the IDE's internal counter. It estimates from bytes-into-context at a cautious 3.6 bytes per token. This is guidance, not a guarantee, and the wording everywhere says so. I would rather be honestly approximate and conservative than precise-looking and wrong.

React dashboard, not Mermaid + plain HTML. The v1.0 dashboard rebuilds around Vite + React + TypeScript with a design-token system and a typed JSON client. Nine routed views replace the four-panel single page. React, Vite and d3 are devDependencies bundled to static assets, so the plugin's runtime dependency story is unchanged: the same node:http server serves the built SPA. The trade-off is a build step where there used to be a render. The payoff is the interactive code dependency graph, the per-day journal, the clickable Sessions tab and the insights band.

Cross-tab agent bus, not real-time messaging. Multiple Claude Code tabs on one project used to be blind to each other. The bus posts each tab's open thread, files in flight and last tool to bus.jsonl at turn start and on every file tool, and only tabs active in the last few minutes count as live; every other session sees the others at its next prompt. Turn-boundary coordination is what the platform allows, so we ship turn-boundary coordination, not a mid-turn messaging protocol that would need a daemon.

Cold-start knowledge feed at SessionStart, not on first tool call. The freshly built brief (project description, organisation, most-connected files, recent prompts, durable memory) is injected before the user types anything. The agent walks in oriented. Bounded length so it does not blow the context budget; cap is configurable. The alternative (build the brief lazily on first call) made first prompts read as ignorant.

07Comparison with the Default Loop

The fairest comparison is the default coding-agent loop in the same IDE, without slipstream installed.

Reading a single declaration in a 1,200-line file. Default loop: whole-file Read, full file into context. slipstream: sp_map orients, sp_symbol returns one slice. On the worked example in this repository, the saving is 71% for the prompted read. The wider the file or the deeper the project, the larger the gap.
Orienting in an unfamiliar source tree. Default loop: tends to read files one by one. slipstream: sp_map returns files, symbols and one-line purpose in a single tool call. Reading every file in src/ here costs ~40,597 tokens; sp_map costs ~2,173, a factor of about 19.
Surviving a compaction. Default loop: a summary replaces the conversation; structured facts blur into prose. slipstream: a PreCompact hook builds a structured digest (open task, decisions, files touched, next step) and writes it as a durable memory; the next session reloads it first.
Long-running sessions with subagents. Default loop: the operator watches the chat. slipstream: the dashboard shows every agent and subagent, status, current task, the budget bar and the plan, with the activity stream grouped per-agent so a subagent's work does not tangle with the main thread.
Verifiable steps. Default loop: the agent claims it shipped. slipstream: shipping skills carry a verification gate the agent must pass, with subagents (sp-shipper, sp-schema, sp-reviewer) that refuse to advance past a red gate.

Compared to a generic MCP-only setup in any MCP-capable editor (Cursor, Windsurf, Antigravity), the MCP tools alone give you the read-shape win. The full plugin layer (hooks, skills, dashboard, statusline, output style, doctor) requires plugin support in the IDE.

08Results and Performance

Numbers below are produced by pnpm benchmark on a developer machine (Apple Silicon, Node 25) against real files in this repository. The script is checked in at scripts/benchmark-token-savings.mjs and reproducible from a clean clone in under a minute.

~95% per-read reduction (averaged). Three representative files: src/dashboard/server.ts 18,241 bytes whole vs 612 bytes scoped, src/memory/observe.ts 21,008 bytes whole vs 824 bytes scoped, src/mcp/tools.ts 28,704 bytes whole vs 980 bytes scoped. Per-read efficiency, not end-to-end.
~5% of the orienting cost. sp_map indexes the project in roughly 2 KB against roughly 40 KB to read every file in src/ one by one.
Bounded recall. Signal-ranked recall is hard-capped at ~1,200 tokens per session start. With no task signal, recall returns nothing.
Cold-start knowledge feed. Bounded brief (project description, organisation, most-connected files, recent prompts, durable memory) injected at SessionStart so no session begins blank. Cap is configurable.
321 tests across 47 files. Pure handler tests for the MCP server, real-binary stdio test, concurrency-safe event log tests under 25 parallel writers, a real SSE server end-to-end test, idempotent start, the React dashboard typed client, the cross-tab bus, the d3 code graph layout, the cold-start knowledge feed, the dollar-cost math, the reproducible benchmark, the PreCompact digest round-trip, signal-ranked recall under budget, a snapshot of the statusline format, and both doctors against a real tree and a deliberately broken one. The full suite runs in roughly 2.1 seconds on Apple Silicon.
Local-only network footprint. One bound port on 127.0.0.1, no outbound calls, no telemetry. Verify with lsof or the dashboard's server.json.

The dashboard's token-budget bar and the dollar-cost-of-tokens-saved figure make the saving visible while it happens. With the tools on, the bar crawls; with whole-file reads, it lurches.

09Trade-offs and Limitations

These are the honest costs. They are documented in the wiki under Roadmap and Limitations and surfaced here so operators can plan around them.

The token budget is an estimate. It is tuned to warn early. Treat the percentages as a strong hint, not gospel.
The dashboard observes, it does not control. It cannot pause a tool call or steer a subagent. That is by design.
Subagent visibility depends on what the IDE exposes. There is a reliable SubagentStop, so the dashboard infers a subagent from the first event that names it and flips its status on stop. If a future IDE adds a real SubagentStart, I will wire it.
The skill library targets one stack. Cloudflare, Supabase, Vercel and Resend. It is not trying to be a universal scaffolder for every framework.
Secret redaction is blunt. Pattern-based, will mask things that are not secrets before it lets a real one through, which is the safe direction. Do not treat it as a vault.
The plugin layer is IDE-specific. The MCP tools work anywhere; the skills, hooks, slash commands and dashboard need a plugin-capable IDE.

10Conclusion

slipstream is the runner I ship with because it removes two recurring failures of the default coding-agent loop on long sessions: budget bleed from whole-file reads, and the loss of durable facts through compaction. The dashboard makes the work visible, which is the thing that lets me trust the agent long enough to walk away from the keyboard.

The design choices are conservative on purpose. A hand-rolled MCP server with zero runtime dependencies. An append-only JSONL log instead of a database. Files for memory. SSE on node:http instead of a framework. A byte-count budget that is honest about being approximate. Local-only by construction. MIT-licensed. If it phones home, it is not slipstream.

Roadmap items are explicit and small: a compaction timeline on the dashboard, an optional per-agent diff view, a shareable HTML session artifact in the same shape as the mind map artifact. Non-roadmap items are equally explicit: no hosted version, no telemetry, no accounts.

Appendix A . Configuration

The full reference lives in the wiki under Configuration and Tuning. The minimum useful set:

.claude/slipstream/dashboard.json → { "enabled": true, "autoOpen": false }. Per-project dashboard toggle.
SLIPSTREAM_DASHBOARD=0 → per-session disable.
SLIPSTREAM_DASHBOARD_OPEN=0 → per-session keep the browser shut.
.claude/slipstream/memory/MEMORY.md → regenerated index; commit if you want to share durable facts across the team.
output-styles/slipstream.md → enable with /output-style slipstream to spend fewer tokens per turn.
/slipstream:doctor → PASS / FAIL per check across the whole install.

Appendix B . Operator Checklist

Run this once after install, and again whenever the plugin updates.

Add the marketplace and install the plugin.
Open the project. Build the map once with /slipstream:map.
Run /slipstream:doctor and confirm every line reports PASS.
Open the dashboard URL printed in chat. Confirm the four panels render.
Trigger a tool call. Confirm an event lands in .claude/slipstream/dashboard/<session>.jsonl and the activity stream updates.
Write a durable fact with /slipstream:remember. Confirm a file appears under .claude/slipstream/memory/ and the index is regenerated.
Force a compaction. Confirm a digest is written to memory and reloaded on the next session.
Decide what to commit. By default the whole .claude/slipstream/ tree is git-ignored and per-developer.

For the full operator guide, see the wiki: Install in VS Code, Run it in any IDE, Troubleshooting, Roadmap and Limitations.

What ships in v1.0

v1.0 tagged 6 June 2026 after eleven smaller releases in two months. The shape of the runner is unchanged; what is new is the dashboard, the cross-tab coordination and the steering that makes the per-read savings observable in practice.

React dashboard with nine routed views. A Vite + React + TypeScript single-page app in web/ replaces the server-rendered page. Sidebar grouped into Now (Live activity, Said and done, Full conversation), History (Daily journal, Sessions) and Knowledge (Project stats, Memory, Memory graph, Code map). A design-token system, a typed client over the existing JSON API, and an interactive click-through knowledge graph. Builds to dist/dashboard/web and serves from the same node:http server.
Interactive code dependency graph. A d3 force-directed view: files as nodes, imports as edges, with zoom, pan, drag, search, area colouring and god nodes ringed in white. Click any node to read its imports and importers. Also slipstream graph on the CLI and /api/codegraph over HTTP, so Claude can read the structure too.
Cross-tab agent bus. Multiple Claude Code tabs open on one project now coordinate. Each posts its open thread, files in flight and last tool to a shared local bus at turn start and on every file tool; every session sees the others at start and on each prompt, scoped to tabs active in the last few minutes. Turn-boundary coordination, not live mid-turn messaging (the platform does not allow that).
Cold-start knowledge feed. Every session opens with a freshly built bounded brief injected by the SessionStart hook: what the project is, how it is organised, the most-connected files to read first, what was recently asked and what is remembered. No session starts ignorant of the app.
Reproducible token-savings benchmark. pnpm benchmark runs scripts/benchmark-token-savings.mjs against real files and emits a Markdown table comparing whole-file reads with scoped symbol reads. The averages produce the ~95% per-read figure; anyone can regenerate it.
Dollar cost of tokens saved. Scoped-read savings now show as a money figure on the Overview and Live tabs, with the assumed per-million rate stated so the number is honest.
Memory doctor. A terminal health check for the memory store: total, duplicates, stale and a by-type breakdown. Exits non-zero when the store needs attention so a script can gate on it.
Downloadable session reports. A session can be exported as a Markdown document: the said-to-did story plus a summary. The honest version of team sharing.
Insights band. Every data tab opens with a natural-language paragraph plus three to five bullets describing the view, not just tabulating it. Deterministic templates, zero LLM, fully reproducible.
Project knowledge brief. slipstream brief dumps everything slipstream knows about a project into one Markdown document, available as a CLI, an Overview download button and the /api/brief endpoint, so a fresh session can pick the project up cold.
Forceful steering. A using-slipstream skill plus a per-turn hook reminder insist on scoped reads and turn-end memory writes, so the optimisation tally actually climbs and memory grows constantly. The discipline that makes the per-read savings real.
75-skill methodology library. A growing methodology and design library: planning patterns, testing, debugging, design tokens and ops, exposed as MCP prompts so they show up in Cursor and Windsurf alongside Claude Code.

Six editor install paths: Claude Code, Cursor, Windsurf, Antigravity, VS Code with MCP and JetBrains via MCP. 321 tests across 47 files. CI green. 127.0.0.1 only, no telemetry, no cloud.

Slip into the stream

MIT-licensed, local-only, no telemetry. Five minutes from install to your first slice.

View on GitHub Product page How it works Read the wiki