Architecture deep dive . Shipyard

How Shipyard works

One repository chokepoint for tenant data. One context resolver. One permission guard. One audit writer. One bucket per key. One state machine. The spine is small enough to read in an hour and pinned down by 29 tests.

~230
lines of repo
4
roles
10
permissions
29
tests
TL;DR

Tenant isolation,
by construction.
Not by good intentions.

Every tenant-scoped read and write goes through one Repository. It injects organisationId = @organisationId into every WHERE clause and stamps it onto every insert. The caller cannot remove or override it.

The active tenant comes from the session, resolved server-side. A user pointing a session at a tenant they have no membership in resolves with no role and is refused. Authorisation runs on the server, every time.

The whole spine fits in roughly the same code footprint as a typical OpenAPI client. pnpm test runs 29 tests in about 470ms on an M3 Pro, with a fresh in-memory database per suite.

<span class="dim">repo.updateScoped(</span> <span class="hl">b.organisationId</span>, <span class="dim">// from session</span> "memberships", { role: "viewer" }, <span class="dim">// set</span> { userId: acmeUserId }, <span class="dim">// where</span> <span class="dim">)</span> <span class="dim">→ SQL</span> UPDATE "memberships" SET "role" = @set_role WHERE organisationId = @organisationId AND "userId" = @where_userId <span class="dim">→ @organisationId = Globex</span> <span class="dim">→ row belongs to Acme</span> <span class="ok">→ changed = 0</span>
Tenant isolation

The chokepoint pattern

One narrow path to tenant data. Three properties hold for every scoped operation.

// src/db/schema.ts
export const TENANT_SCOPED_TABLES = new Set([
  "memberships",
  "audit_log",
  "subscriptions",
  "usage_counters",
]);

// src/db/repository.ts
// 1. The tenant id is injected, never trusted.
const scoped = { ...(row as Row), organisationId };
//    Smuggled organisationId in the payload is overwritten by the spread.

// 2. The tenant predicate cannot be removed.
const conditions = ["organisationId = @organisationId"];
for (const [key, value] of Object.entries(where)) {
  if (key === "organisationId") continue; // never overridable
  conditions.push(`"${key}" = @${key}`);
}

// 3. Values are bound, never interpolated.
//    Every value is a named parameter to a prepared statement.
// src/lib/context.ts  - where the active tenant comes from
const session = auth.resolveSession(token);
const user = auth.userById(session.userId);
const role = auth.roleOf(user.id, session.organisationId);
if (!role) throw new TenantResolutionError(); // fail closed
RBAC schema

Four roles, ten permissions

Routes assert a permission, not a role. Roles are bundles of permissions in one file.

Permissionowneradminmemberviewer
org:readyesyesyesyes
org:manageyesyesnono
members:readyesyesyesyes
members:inviteyesyesnono
members:removeyesyesnono
members:set_roleyesyesnono
billing:readyesyesyesyes
billing:manageyesnonono
audit:readyesyesnono
usage:writeyesyesyesno
// src/lib/rbac.ts
export function requirePermission(role: Role, permission: Permission): void {
  if (!roleHasPermission(role, permission)) {
    throw new ForbiddenError(permission);  // mapped to 403
  }
}

// In a route, one line:
withGuard({ permission: "members:invite" }, handler, req)

// In a service:
guard(ctx, "members:set_role")
Audit log shape

Who did what, where, when

FieldTypeMeaning
idtext, pkRandom identifier for the entry
organisationIdtext, scopedTenant the action happened in. Never overridable
actorUserIdtext or nullWho performed the action. null for system actions (webhooks)
actiontextDotted name, for example members.invite, billing.webhook
metadatatext (JSON)Context for the action. Encoded on the way in, decoded on the way out
createdAtintegerMillisecond timestamp
// src/lib/audit.ts  - the single writer
recordAudit(repo, {
  organisationId: ctx.organisationId,
  actorUserId:    ctx.user.id,
  action:         "members.invite",
  metadata:       { email, role },
});

// Actions recorded out of the box:
//   auth.signup          new user and organisation created
//   members.invite       a user invited to a tenant
//   members.set_role     a member's role changed
//   billing.subscribe    a plan subscribed
//   billing.webhook      a provider event applied (actor null)
//   billing.cancel       a subscription canceled

// Reads go through selectScoped so a request for tenant B
// can only ever return tenant B's entries.
Request flow

From cookie to scoped row

Edge middleware is a cheap gate. The authoritative authorisation decision is server-side, every time.

rendering
Edge gate, resolved context, rate limit, RBAC, service, scoped repository, database.
Subsystems

Each piece, deep-dived

The repository chokepoint

Why it exists

Application-level isolation has to fail loudly in a unit test on any database. Row-level security is good, but it fails silently when misconfigured until production. A single narrow path to tenant data is the only thing you can test on commit one.

How it actually works

src/db/repository.ts exposes two families. insertScoped, selectScoped, selectOneScoped, updateScoped require an organisationId and refuse global tables. insertGlobal, selectGlobal, updateGlobal, deleteGlobal refuse tenant-scoped tables. assertScoped throws TenantScopeError if you pick the wrong family. About 230 lines, readable top to bottom.

The tenant predicate cannot be removed

Why it exists

A caller who can drop the organisationId from a WHERE clause has the whole system. The guarantee has to survive a caller who knows another tenant's ids and tries to slip them through the filter.

How it actually works

On read and update, the repository writes the predicate first and skips any organisationId key in the caller's where. The SQL it builds is: WHERE organisationId = @organisationId AND "userId" = @where_userId. The @organisationId binding comes from the first method argument, which is resolved from the session, not from the request body.

The tenant id is injected, never trusted

Why it exists

On insert, a payload from the client may carry an organisationId field. If the repository accepted it, a tenant could plant rows in another tenant's table.

How it actually works

const scoped = { ...(row as Row), organisationId }. The spread overwrites any smuggled id, so the row lands under the scope the caller passed as the first argument, not whatever was in the body. Asserted by the smuggled-id case in tests/tenant-isolation.test.ts.

Permission-based RBAC

Why it exists

Asserting roles at the call site (if role === "admin") rots. Every new capability forces you to revisit every role comparison and the meaning drifts. Permissions named at the call site keep the route readable and let the role table grow without touching call sites.

How it actually works

src/lib/rbac.ts declares the PERMISSIONS list (org:read, org:manage, members:read, members:invite, members:remove, members:set_role, billing:read, billing:manage, audit:read, usage:write) and ROLE_PERMISSIONS maps each role to its bundle. requirePermission(role, permission) throws ForbiddenError (mapped to 403) when a role lacks the permission. A forgotten check fails closed.

Append-only audit log

Why it exists

An incident review always asks who did what, in which tenant, and when. The answer has to be trustworthy, which means immutable, attributable, and tenant-scoped like everything else.

How it actually works

audit_log holds (id, organisationId, actorUserId, action, metadata JSON, createdAt). recordAudit is the single writer, called inside privileged operations. Reads go through selectScoped so a request for tenant B can only ever return tenant B's entries. No update or delete path is exposed. Convention for action names is domain.verb (members.invite, billing.subscribe, billing.webhook, billing.cancel).

Token-bucket rate limiter

Why it exists

A fixed window double-rates at the boundary. A sliding-window log needs a timestamp list per key. A bucket is two numbers and ports unchanged to a Redis Lua script. Smooth throttling with controlled bursts is what an API wants.

How it actually works

src/lib/rate-limit.ts. State per key is (tokens, lastRefill). refill computes elapsed time, adds elapsedSeconds * refillPerSecond, caps at capacity. consume refills, then takes one token if available. Injectable now() so tests use a hand-advanced clock and skip real sleeps. BucketStore interface so Redis swaps in. auth is tighter (capacity 5, refill 0.2/s) than api (capacity 60, refill 10/s). Blocked requests surface retryAfterMs as Retry-After.

Subscription state machine

Why it exists

A webhook that arrives out of order, replayed, or for a different subscription must not be allowed to reactivate a cancelled plan or mutate a tenant's record from another tenant's event.

How it actually works

src/lib/billing/service.ts validates every transition against an allow list. The terminal state is canceled. trialing → active, trialing → past_due, trialing → canceled, active → past_due, active → canceled, past_due → active, past_due → canceled. Illegal moves throw BillingError. Webhook events are also checked against the stored providerSubscriptionId so an event for a different subscription cannot mutate this tenant.

Stripe-shaped webhook verification

Why it exists

Bundling a payment SDK into a starter is the wrong choice. Doing webhook signature verification with hand-rolled string compare is also the wrong choice. The signature scheme is the one piece you genuinely cannot fake.

How it actually works

provider-stripe.ts implements the real Stripe signing scheme: HMAC-SHA256 over `${timestamp}.${payload}` with the webhook secret, compared via timingSafeEqual. A tampered or unsigned payload is rejected before any state changes. The customer and subscription methods throw with a pointer to the wiki until you pnpm add stripe and fill them in. The provider is selected by env in the billing route, so no app code changes when you flip BILLING_PROVIDER=stripe.

Rate limiter

Two numbers, per key

// src/lib/rate-limit.ts
private refill(bucket: Bucket): Bucket {
  const now = this.now();
  const elapsedSeconds = (now - bucket.lastRefill) / 1000;
  const refilled = Math.min(
    this.config.capacity,
    bucket.tokens + elapsedSeconds * this.config.refillPerSecond,
  );
  return { tokens: refilled, lastRefill: now };
}

export const RATE_LIMITS = {
  auth: { capacity: 5,  refillPerSecond: 0.2 },   // tightest
  api:  { capacity: 60, refillPerSecond: 10  },
};

// Applied per (organisationId, routeGroup).
// Unauthenticated routes (signup, login) key by IP instead.
// MemoryBucketStore by default; the BucketStore interface ports
// the same algorithm to a Redis Lua script unchanged.
Operating it

Failure modes you should expect

selectScoped returns nothing for data you can see in the database
Cause: Tenant mismatch. The organisationId you passed is not the one the rows belong to
Fix: The id comes from the session in resolveContext. Check the session's active organisation. This is the guarantee working, not a bug
TenantScopeError: table "x" is not tenant-scoped
Cause: A *Scoped method was called on a global table (or vice versa)
Fix: The split is in TENANT_SCOPED_TABLES in src/db/schema.ts. Pick the right family or move the table
A new table holds tenant data but isolation does not apply
Cause: You added the table without adding its name to TENANT_SCOPED_TABLES
Fix: Add the name and route access through the scoped methods. Without that entry the repository treats it as global
A route returns 401 where you expected 403
Cause: No valid session (AuthError). The cookie is missing or expired
Fix: 401 means not authenticated, 403 means authenticated but not allowed. Sign in or refresh the session
BillingError: illegal transition canceled to active
Cause: A webhook tried to move a terminal subscription
Fix: This is the state machine refusing a replayed or out-of-order event. A genuine resubscribe is a new subscribe call, not a webhook back into active
webhook signature verification failed
Cause: HMAC over timestamp.payload did not match
Fix: Usually the wrong STRIPE_WEBHOOK_SECRET, or a proxy that re-serialised the body so the signed bytes changed. Verify against the raw request body
Measured on my machine

What you can measure

Apple M3 Pro, Node v25.9.0. Real numbers, not estimates.

29
tests passing
Six suites, fresh in-memory DB each
~470ms
whole test run
~1.04s including process start
0
external services
node:sqlite, scrypt, HMAC, all built in
Roadmap

What I will add

Email-token invitations

Invite by email with a signed token, accept on first sign-in. Today only direct membership creation is exposed.

Postgres repository

A second implementation of Repository against Postgres, alongside the SQLite one. Same interface, same tests, RLS as defence in depth.

Redis BucketStore

Atomic refill-and-take in a small Lua script so the limiter is correct multi-instance. The algorithm does not change, only where (tokens, lastRefill) lives.

Organisation switching

Move the active organisationId on the session without a new login. Server-side state, cookie unchanged.

Per-tenant feature flags

Flags keyed by (organisationId, flag). Same scoped repository, same audit log entry on toggle.

What I will not add

An ORM, a bundled payment SDK, or a UI kit. Those are decisions your product should make and bolting them on would undo the reason the spine is small enough to trust.

Ready to read the source?

Clone, pnpm install, pnpm test, pnpm dev. No services to install, no API keys to set, no checkout to do before the guarantees prove themselves.

All open-source projects