Technical Whitepaper · v1.0

MCP Server Toolkit

A plugin-first Model Context Protocol server with OAuth, OpenTelemetry, stdio and streamable HTTP transports , the production starter.

MIT LicencePython 3.12OAuth 2.1OpenTelemetryPlugin ArchitectureTwo Transports
4Built-in plugins
2Transports
< 60sCold-start
100 %Type-checked

§ Abstract

The Model Context Protocol (MCP) is the most considered tool-use standard the AI ecosystem has produced to date. The official Python SDK exposes the wire format cleanly, and the reference servers do exactly what reference servers should do: demonstrate one concept at a time. They are not, and were never meant to be, production servers.

MCP Server Toolkit fills that gap. It is the consolidation of the boilerplate that every team building MCP-backed assistants writes: OAuth 2.1 with PKCE for hosted clients, sandboxed plugin loading with lifecycle hooks, both stdio and streamable HTTP transports under a single plugin contract, OpenTelemetry tracing for every tool call, and Pydantic-typed configuration. It ships with four production-shaped plugins (filesystem, Postgres, GitHub, SarmaLink-AI) so you can verify end-to-end before you start writing your own.

This whitepaper documents the architectural decisions, the plugin contract, the OAuth flow, the observability pipeline, the implementation milestones we travelled through, and the limits we know about today. The intended audience is engineers planning to host an MCP server in production and wanting a candid description of what they are about to build.

1Executive Summary

MCP Server Toolkit is a Python 3.12 server that speaks the Model Context Protocol over either stdio or streamable HTTP. It is built on FastAPI and the official mcp Python SDK. The unique value is the plugin contract: every capability you expose to an MCP client is a small Python package implementing a single Plugin object. The server discovers plugins at boot, validates them, registers their tools, prompts and resources with the runtime, and threads OAuth and OpenTelemetry through the request path.

Four plugins ship in the repository. The filesystem plugin offers sandboxed read, write, list and search over an allow-listed root, with traversal prevention and size caps. The Postgres plugin offers schema introspection and either read-only or read-write query execution, with dangerous keyword detection in read-only mode. The GitHub plugin authenticates with a fine-grained PAT and exposes issues, pull requests, file reads and comments. The SarmaLink plugin calls into SarmaLink-AI as a sub-tool so a calling agent can borrow that project’s thirty-six-engine routing without adopting the rest of the stack.

The result is that a team can fork the repository, write a single new plugin module, add it to the discovered plugins list, and have a production MCP server in an afternoon. OAuth, telemetry, healthchecks, and CI are already wired.

2Background

The Model Context Protocol emerged from the observation that every chatbot, every assistant, every agent ends up reinventing the same three things: a structured way to declare tools, a structured way to fetch context, and a structured way to surface prompts. Each vendor had a private dialect. Tool calls written for one product did not transfer to another. Context windows were filled by ad-hoc string concatenation, with no shared notion of what a “resource” even meant.

MCP standardises three primitives. Tools are functions a model can invoke, with a JSON Schema for arguments and a typed return value. Resources are addressable pieces of context (files, database rows, web pages), retrievable by URI. Prompts are server-defined templates that clients can offer to users. The wire format is JSON-RPC 2.0, transported over stdio for local clients or over streamable HTTP with SSE for hosted ones. The official SDKs handle the wire format. They do not handle authentication, telemetry, plugin lifecycles, or anything else a production server needs.

That gap is where most teams lose two weeks. They build a FastAPI app around the SDK, hard-code their tools as Python functions, hand-roll a config loader, sprinkle a few logger calls, and ship. Then they need OAuth. Then they need a second transport for desktop clients. Then they need to know which tool call is taking a thousand milliseconds. Each addition reshapes the code. The architecture drifts.

3Problem in detail

Three concrete pain points motivate the toolkit.

Plugin lifecycle is non-trivial

Tools are not just Python functions. They have configuration that must come from the environment. They have resources that must be acquired at server start (database pools, HTTP clients, cached schemas) and released at shutdown. They have version metadata that needs to surface to clients. Hand-rolling this for each new tool is the kind of work that looks small until you have eight tools.

OAuth must be optional

Desktop MCP clients communicate over stdio and trust the local user. Hosted clients communicate over HTTP and must authenticate. The same tool code must serve both. Wrapping every tool in an auth check is repetitive; routing all tools through middleware loses scope-level granularity. The toolkit splits the difference: a transport-level authenticator runs once per request, then a per-tool scope decorator gates writes from reads.

Observability is not optional

Tool calls are the load-bearing element of an AI assistant’s reliability. When a user asks “why was my answer wrong”, the answer is almost always traceable to a tool call. Without OpenTelemetry across the server you are debugging by print statement. With it, every span is labelled with tool name, plugin, scope, latency, and outcome, and every log line carries the trace ID that ties them to a specific user prompt.

4Goals + non-goals

Goals

  • One Plugin contract that works for both stdio and streamable HTTP transports.
  • OAuth 2.1 with PKCE, JWKS caching, and scope-based tool gating, optional but built in.
  • OpenTelemetry spans across the request path, with OTLP export configurable via environment variables.
  • Four production-shaped plugins as worked examples. Forking them is the documented path.
  • Cold start under sixty seconds in a 256MB container.
  • Strict type checking. No untyped function in the public API.

Non-goals

  • A general-purpose plugin marketplace. Plugins are Python packages; choose your packaging.
  • A web UI. The server is headless. Use it from MCP clients.
  • An LLM. The toolkit calls models via the SarmaLink plugin or any client of your choice.
  • Multi-tenant token isolation. This is a single-tenant server. Run multiple instances if needed.

5Architecture

The server is a single FastAPI process. On boot, it loads its Pydantic settings, discovers plugins from the configured paths, instantiates each plugin’s lifecycle context, and registers their tools, prompts and resources with the official mcp runtime. From that point onward, request handling proceeds through three layers.

Request                Layer 1                Layer 2              Layer 3
  │                  Transport             OAuth + scope         Plugin tool
  │                stdio | HTTP+SSE        decoder + check         invocation
  │                  │                       │                       │
  ├──── stdio ──────▶│ jsonrpc framer      ▶│ skip if local trust  ▶│ run + emit span
  │                  │                       │                       │
  └──── HTTP ───────▶│ SSE writer          ▶│ verify Bearer + scope▶│ run + emit span

The transport layer normalises both protocols into the same internal Request object. The auth layer is a no-op for stdio (local user is trusted) and an OAuth verifier for HTTP. The plugin layer dispatches the call to the plugin’s tool function, wrapped in an OpenTelemetry span and a structured log. The response is then re-serialised by the transport layer.

The plugin contract is small. A Plugin is a Python class with five attributes: a name, a list of tools, a list of resources, a list of prompts, and an optional async lifecycle context manager. Tools are functions decorated with @tool(scope="read") or @tool(scope="write"). The decorator extracts the type hints into a JSON Schema and registers the function with the runtime.

6Key technical decisions

Python 3.12, not 3.11

3.12 brings the per-interpreter GIL groundwork and meaningfully better asyncio performance. The toolkit has no Python 3.11 features that we miss. The mcp SDK supports both. We use 3.12 because it is what we will be running in production for the next two years.

FastAPI, not Starlette directly

The OAuth middleware, the OTLP propagator, and the OpenAPI schema for the management endpoints all benefit from FastAPI’s ecosystem. The few microseconds of overhead per request are invisible next to the LLM round trip the result will eventually feed.

Pydantic Settings for configuration

Twelve-factor config via environment variables, with type checking, defaults, and validation. The settings object is injected into every plugin’s lifecycle context. Secrets are typed as SecretStr so they cannot leak into a log line by accident.

uv for packaging

uv builds the venv, resolves dependencies, runs scripts, and locks reproducibly. It is the fastest reasonable option in 2026 and replaces pip, virtualenv, and pip-tools in one tool.

OpenTelemetry, not custom telemetry

The right answer to “how should I instrument this” in 2026 is OpenTelemetry. We export OTLP. Whatever back-end you point it at, the data will land cleanly. There is no proprietary format anywhere in the toolkit.

7Implementation milestones

The project moved through four sequential milestones. Each landed before the next began.

Milestone 1 · transport parity

The first thing built was the dual-transport plumbing. A trivial “echo” tool was registered against the official SDK and exercised over both stdio and HTTP. The contract was that any future plugin would work over both with no changes. Once that held, the rest of the work could proceed against a stable transport layer.

Milestone 2 · plugin contract

The Plugin class, the tool decorator, the lifecycle context manager, and the discovery loop all landed together. The first “real” plugin was the filesystem plugin, because filesystem operations are the cleanest test of sandboxing. Path traversal tests came in this milestone.

Milestone 3 · OAuth and scopes

The OAuth verifier landed with JWKS caching and a scope decorator. The Postgres plugin was added at this point so that we had a plugin where read and write scope mattered. End-to-end tests using a real OAuth provider (Auth0 in CI) prove the flow.

Milestone 4 · OpenTelemetry and CI

Span instrumentation was added across the transport, auth, and plugin layers. The OTLP exporter was wired to a Tempo container in tests. CI now runs ruff, mypy strict, pytest, and a smoke test against both transports.

8Lessons / honest limits

Lessons

  • The plugin contract has to be small. Every field you add is a field every future plugin must understand. The first draft had eleven fields. It is now five.
  • OAuth is not optional in any hosted MCP deployment. Stdio gives you trust by transport. HTTP gives you nothing. Build it in from the start.
  • OpenTelemetry pays for itself within a week. The first time a slow tool is identified by trace data alone, the cost of the instrumentation is repaid.

Honest limits

  • No multi-tenant token isolation today. The OAuth check confirms a token; it does not partition data per tenant. Run separate instances if you need that.
  • The Postgres plugin is generic. If you need row-level security via session variables, you will extend it. Documented in the wiki.
  • Plugin discovery is path-based. No marketplace, no signing. You ship the plugins you trust.
  • Cold start in serverless. The toolkit boots in under sixty seconds in a 256MB container. Sub-second cold start in Lambda is not in scope; run it as a long-lived service.

9Conclusion

MCP Server Toolkit is what we wished existed when we first started building MCP-backed assistants. It is small enough that you can read every line in an afternoon, opinionated enough that you do not have to make twenty decisions before your first tool runs, and complete enough that the boring work of OAuth, telemetry, plugin lifecycles, and dual-transport plumbing is already done.

If you are about to start an MCP project, fork the toolkit. If you are already running an MCP server and your TODO list contains the words “OAuth”, “OpenTelemetry”, or “plugin loader”, replace what you have. The Plugin contract is portable; existing tool functions move across in minutes.

The repository is MIT licensed. Contributions are welcome , the labelled good-first-issue tickets are real, not theatre. The wiki contains a complete plugin authoring guide, an OAuth setup walkthrough for Auth0, Clerk, and Logto, and a migration guide for teams moving off bespoke MCP servers.

MCP Server Toolkit · Built by Sarma Linux · MIT licensed