On 6 June 2026 I tagged slipstream v1.0.0. It is the toolkit that grew out of nearly two years of using Claude as my daily coding partner, distilled into an open-source MCP plugin that works across six editors. This is the long post on what it is, why it exists, how every part of it actually works, and the honest read on where it sits in a market full of AI coding tools.
If you only want the headline: slipstream is the observability and memory layer that sits underneath whichever AI coding agent you already use. It does not have its own chat. It does not lock you into one editor. It runs as an MCP server inside Claude Code, Cursor, Windsurf, Antigravity, VS Code with MCP, and JetBrains with MCP, and it makes whatever agent you have running a bit better at remembering, a lot cheaper to run, and visible to you on a local dashboard.
The repo is at github.com/sarmakska/slipstream[1]. MIT licensed, 127.0.0.1 only, no telemetry, no account, no cloud. 321 tests, 47 test files, CI green.
Why slipstream exists
A long agent session usually dies one of two ways. Either it reads whole files until the context window is full and starts forgetting the start of its own plan, or it does good work and then the session ends and every decision it made evaporates. The first kind is token waste. The second is state loss. Slipstream is built around two enforced habits that fix both: read a compact project map and pull a single slice instead of opening whole files, and write durable facts to a structured store that survives a compaction.
There is a third, quieter failure mode I noticed in my own work: the agent reads files it has already seen. It re-orients on a codebase it just navigated. It plans against a file structure it just memorised. Most of this re-reading is invisible because nobody measures it. Slipstream measures it. The dashboard tells you when the agent's last three sessions all read the same six files. The instrumentation is the discipline.
Before any code, the design constraints:
- Local-first. Every byte stays on the developer machine. The dashboard binds 127.0.0.1, never an external interface. No telemetry, no analytics endpoint, no phone home. Trust the code or run it on an air-gapped laptop, your choice.
- Editor-neutral. One MCP server, six editor install paths. Whatever editor you actually use, slipstream is the same tool.
- Observable. Everything is an append-only JSONL event log per session. Replay is the same fold as live. The dashboard is a view over the log, not a separate database.
- No new runtime dependencies. The MCP path uses
node:httpand the standard library. No Express, no ws, no transitive surface that can break a plugin install.
Source: pnpm benchmark on three real files in this repository
That chart shows the headline. On three real files in this repository, the average scoped read costs around 5 percent of the whole-file read. That is what pnpm benchmark produces today, and it is reproducible from a clean clone in under a minute.
The architecture, end to end
Slipstream is five components that share one event log. None of them know about each other. The log is the seam.
One. The MCP server. A bundled node:http JSON-RPC server that exposes fourteen sp_ tools the agent calls instead of reading whole files. sp_map returns a compact index of files, exported symbols and purpose. sp_symbol returns the body of one named declaration. sp_lines returns a slice. sp_search returns ranked locations. sp_remember and sp_recall are the durable memory store. sp_search_memory, sp_timeline and sp_observations are the three layers of recall. sp_budget reports the token gauge. sp_digest and sp_resume write and restore a lossless session digest for editors without PreCompact hooks. sp_savings reports the optimisation tally. sp_dashboard opens the live URL. sp_mindmap and sp_lessons round out the surface.
Two. The event log. A JSONL file per session at . Every lifecycle event (session-start, user-prompt, pre-tool, post-tool, stop) appends one line. Concurrency-safe under 25 parallel writers (there is a test for that). The log is the source of truth.
Three. The observation memory. A pure function foldObservations reads the event log and materialises observations: one entry per turn boundary, with the tools used, files touched, summary and an embedding. The store is . Search is three layers: a local 256-float term-frequency embedding for semantic recall, exact tag match, and chronological fallback. No external model, no network call.
Four. The dashboard. As of v1.0 a Vite + React + TypeScript single-page app with nine routed views. Sidebar grouped into Now (Live activity, Said and done, Full conversation), History (Daily journal, Sessions) and Knowledge (Project stats, Memory, Memory graph, Code map). A design-token system, a typed client over the existing JSON API, an interactive d3 code dependency graph. Builds to dist/dashboard/web and serves from the same node:http server. React, Vite and d3 are devDependencies bundled to static assets, so the runtime dependency story is unchanged.
Five. The skills and hooks. Seventy-five guardrailed skills, including six discipline skills (think-before-coding, write-plan, systematic-debugging, scoped-read, context-budget, compact-and-offload) that are exposed as MCP prompts in any editor. Lifecycle hooks emit the events the log absorbs. The PreCompact hook writes a lossless digest before the context window is compacted; SessionStart loads the cold-start knowledge feed.
The five components communicate only through the event log. That is what makes replay free: the dashboard rendering a live session and the same dashboard rendering yesterday's session are the same fold over the same log shape.
The token math, and a benchmark anyone can run
Here is the part I get most pushback on, so it is in code.
When the agent calls sp_symbol on a 28 KB file to pull one 980-byte function, the cost is the function plus a small header. When the agent calls Read on the same file, the cost is 28 KB. The math is not subtle. The honest question is whether agents actually use sp_symbol instead of Read when both are available.
In practice they need to be told. The using-slipstream skill that ships in v1.0 plus a per-turn hook reminder are the discipline that makes this real. Without that discipline the agent reaches for the familiar tool. With it the optimisation tally actually climbs.
The benchmark script is at scripts/benchmark-token-savings.mjs, accessible via pnpm benchmark from a clean clone. It picks three representative files, runs both scenarios on every exported symbol, and emits a Markdown table. The averages it produced today are what feed the chart above.
The number you should trust is the per-read figure. Per-read, scoped costs around 3 to 5 percent of whole-file. The end-to-end session figure depends on how often the agent re-reads, how often it pulls files it does not need, and how often it uses the recall path instead of re-reading from disk. Slipstream's discipline tries to maximise all three; the benchmark only proves the floor.
The dashboard, view by view
The nine routed views in v1.0:
Overview. The plain-English answer to "what is this project?" slipstream brief made downloadable, plus the headline savings, the dollar-cost-of-tokens-saved figure, the memory health line.
Live activity. The current session in real time: agents list, filterable activity timeline, plan, inline mind map, per-skill activity panel, in-place token budget editor. Each tab opens with the new insights band: one natural-language paragraph plus three to five bullets describing the view rather than only tabulating it. The band is generated deterministically from existing observation data; no LLM, no hallucination.
Said and done. The said-to-did timeline of every prompt and every tool call, grouped by turn. The honest record of what the agent actually did versus what you asked.
Full conversation. Prompts, replies, the lot, scrollable.
Daily journal. Per-day digest with the new detailed summary: observation counts, session counts, file counts, tool counts, top files, tools used, the sessions involved, with prev / today / next navigation.
Project stats. Six project-wide KPIs (sessions, observations, unique files, opt %, memories, drift) plus the 365-day activity heatmap, file leaderboard, kinds donut and distilled lessons grid.
Sessions. A clickable table of every recorded session. A click opens the session: prompts, tool calls, files touched, exchanges and failures, the full timeline, a downloadable Markdown report. The dashboard stops feeling empty.
Memory. Full-project observation search with kind filter chips (edit, plan, decision, search, map, error, run), colour-coded result badges, click to expand the full detail.
Memory graph. The previous knowledge graph: files on an outer ring sized by how often they were touched, sessions on an inner ring, an edge wherever a session changed a file. Navigate by relationship, not by list.
Code map. The new interactive d3 code dependency graph: files as nodes, imports as edges, force-directed with zoom, pan, drag, search, area colouring and god nodes ringed in white. Click any node to read its imports and importers in a side panel. Also slipstream graph on the CLI and /api/codegraph over HTTP, so Claude itself can read the structure.
Cross-tab agent bus
One of the more useful additions in v1.0 is the cross-tab agent bus. If you, like me, end up with three Claude Code tabs open on the same project (one drafting the spec, one writing the code, one writing tests), the tabs used to be blind to each other. Each one duplicated work the others had already done.
The bus changes that. At turn end, each session posts its open thread and files in flight to a shared local bus at . At session start, and on each prompt, every session sees the others' state. The platform does not allow live mid-turn messaging across processes, so the bus uses turn-boundary coordination. That is enough to stop the duplication without complicating the lifecycle.
The bus is a local file. No network, no message queue, no daemon. Five tests in src/memory/bus.ts. The simplest thing that solves the actual problem.
Cold-start knowledge feed
Every new session used to start ignorant of the project. You would prompt "carry on from where we left off" and the agent would re-orient by reading files it had read last time. The cold-start knowledge feed fixes that.
The SessionStart hook now injects a freshly built, bounded knowledge feed at the start of every session: what the project is (from the brief), how it is organised (from the code dependency graph), the most-connected files to read first (god nodes in the graph), what was recently asked (recent observations), and what is remembered (the durable memory store). The feed is capped in length so it does not blow the context budget; the cap is configurable.
The result is that no session starts cold. The agent walks in with the brief already in mind.
A 75-skill methodology library
Skills started as my discipline (think-before-coding, write-plan, systematic-debugging) and grew into a methodology and design library: planning patterns, testing patterns, design tokens, ops checklists, debugging playbooks. Seventy-five skills as of v1.0, exposed as MCP prompts in any editor so they show up in Cursor and Windsurf as well as Claude Code.
The using-slipstream skill is the one that makes the savings real. It tells the agent how to use the rest of slipstream: scoped reads over whole-file reads, recall before re-read, write durable memory at turn end, surface the budget gauge before a big operation. That skill plus a per-turn hook reminder is the discipline that closes the gap between the theoretical token savings and the savings you actually observe.
Source: GitHub REST API · cached 10–60 min
Honest positioning, the competitive matrix
The AI coding tools market is crowded and most of the oxygen sits in two buckets: native agents (Cursor, Cline, Continue, Aider) and observability platforms (LangSmith, Helicone, Langfuse). Slipstream is neither, which is both the opportunity and the marketing problem.
| Spec | Editor coverage | Local-only | Cross-session memory | Cost tracking | Code graph |
|---|---|---|---|---|---|
| slipstream | Six editors (MCP) | Yes | Yes | Yes (dollars) | Yes (d3 SPA) |
| Cline | VS Code only | No | Per-session only | No | No |
| Continue | VS Code, JetBrains | No | Optional | No | No |
| Aider | Terminal only | Yes | No | No | No |
| LangSmith | N/A (cloud) | No | Trace-level | Yes | No |
| Helicone | N/A (cloud proxy) | No | No | Yes | No |
The bets in this matrix are visible: cross-editor wins versus single-editor; local-only wins versus cloud; cross-session memory and code graph win against tools that only see the current trace. The trade-offs are equally visible: slipstream has no chat sidebar (deliberately), no hosted view (deliberately), no team-shareable rollup (deliberately, for now), and lower star adoption than Cline or Continue (because it is two months old).
The thesis under the matrix: agents will commodify and the layer below them is where durable value sits. If that turns out to be wrong, slipstream is a useful hobby project. If it turns out to be right, the work compounds for years.
The journey from v0.1 to v1.0
Source: git tag --list
A short narrative on each release, because some of them changed how I thought about the project:
- v0.1 to v0.3 (April 2026). The token efficiency story. Two scoped tools (
sp_map,sp_lines), then PreCompact compaction with a memory store, then a statusline with a budget gauge. Eighteen tests grew to seventy-one. The bet was that scoped reads were worth shipping by themselves. - v0.4 (May 2026). The first local dashboard. A 127.0.0.1 server with SSE, a session selector, a Mermaid mind map. The visuals were utilitarian. It was enough to know that I wanted to live in a dashboard.
- v0.5 (May 2026). Self-building observation memory. Tool calls became observations automatically. A local 256-float embedding made semantic recall work without an external model. Three-layer search (
sp_search_memory,sp_timeline,sp_observations) shipped. - v0.5.1 (late May). A bug-fix release that ended up mattering more than I expected: the Windows path-separator fix in the MCP entry guard, the true context tokens read from the host transcript, and the universal opt percentage. That release was the foundation for the cross-IDE story.
- v0.6.0 to v0.6.1 (4 June). Cross-IDE parity in one shipping day.
sp_digestandsp_resumebrought lossless compaction to Cursor, Windsurf and Antigravity.slipstream-setupwired every editor idempotently. The dashboard was redesigned around a KPI strip, sparklines, an inline-SVG mind map, a pause control and a per-skill activity panel. Then/api/healthand the version-aware restart followed in the patch. - v0.7.0 to v0.7.2 (4 to 5 June). A tabbed dashboard (Live, Project, Journal, Sessions, Memory), the Windows hook telemetry fix, and the MCP-only observation memory fix that finally made the memory populate in editors without hooks.
- v0.8.0 to v0.31.0 (6 June, this morning). The insights band turned tabs from data viewers into sentence generators. The reproducible benchmark turned "70 percent token savings" into a verifiable number. The project knowledge brief, the memory doctor, the downloadable session reports, the dollar cost of tokens saved, the React dashboard, the interactive code dependency graph, the cold-start knowledge feed, the grouped sidebar with plainer names, the clickable Sessions tab. Eleven minor releases in one Saturday.
- v1.0.0 (6 June, this evening). First major release. Cross-tab agent bus. Forceful steering. The constraint accepted: this is a 1.0 because every piece of the system finally agrees that the user should be inside the dashboard and the agent should be inside the discipline. 321 tests, CI green.
Where it does not go (deliberately)
Things people ask for that I am not going to ship.
- A chat sidebar inside the editor. Cline, Continue and Cursor have this market. Slipstream is the layer below.
- A hosted cloud version of the dashboard. Local-first is the trust story. A hosted version would dilute it.
- Production LLM-call observability. LangSmith and Helicone have this lane. Slipstream is optimised for the coding agent loop, not generic LLM tracing.
- A skill marketplace. Real value, six-week build, wait until the user base supports a marketplace with both supply and demand.
- An OpenTelemetry exporter. Useful for some users, not the headline gap.
How to try it in 90 seconds
git clone https://github.com/sarmakska/slipstream
cd slipstream
pnpm install && pnpm build
npx slipstream-setup --editor=autoThe setup command detects whether you have Claude Code, Cursor, Windsurf, Antigravity, VS Code or JetBrains in the project and writes the correct config. Idempotent. Refuses to double-wire. Open the dashboard with slipstream dashboard start and visit the URL it prints.
If you prefer the terminal:
slipstream observe # start the live dashboard and tail the event log
slipstream graph # print the code dependency graph
slipstream brief # dump everything slipstream knows about this project
slipstream doctor # 17 install checks with one-line fixes
slipstream memory doctor # health check for the memory store
pnpm benchmark # regenerate the token-savings figureWhat I want from this post
Three things.
- Feedback on the dashboard. Open it on your own project. The insights band should read as English; if any sentence feels like data not prose, that is a template bug. Tell me which one.
- A pull request on the benchmark. The current methodology measures per-read efficiency on three files. If you have a better methodology for end-to-end session efficiency, I want it.
- A star on the repo if it is useful. That is the only marketing this project will ever do. It works.
The repo, the wiki and the full whitepaper are linked at sarmalinux.com/products/slipstream and on github.com/sarmakska/slipstream[1]. The competitive positioning document is at STRATEGY.md[2]. The wiki has installation guides for every editor[3]. The MCP specification slipstream implements is at modelcontextprotocol.io[4].
Closest peers worth comparing against on your own work: Cline[5], Continue[6], Aider[7], LangSmith[8] and Helicone[9]. They are all good. They solve adjacent problems. Slipstream chose a different lane on purpose.
Two months from first commit to v1.0. Eleven tagged releases in one Saturday. 321 tests, CI green. The repo is open.
---
A note on this post
The numbers, code excerpts, and commit hashes referenced are real and verifiable in the linked repository. The token-savings figures are produced by the checked-in benchmark script described above and can be regenerated on any clone. Where this post draws on third-party tools or competitors, citations link to the primary source.