RAG-over-PDF
A minimal, production-shaped RAG starter. Upload a PDF, ask questions, get cited streaming answers. The cleanest end-to-end RAG you can clone, run, and ship in 10 minutes. No vector DB to provision, no Pinecone account, no LangChain weight. Just the moving parts.
Why this exists
Every product team eventually needs to "chat with our docs". The default response is to reach for LangChain, spin up Pinecone, write 400 lines of glue code, and ship something nobody understands six months later.
Most of that complexity is not load-bearing. The actual moving parts of a working RAG system are: chunk the document, embed the chunks, embed the question, take the most similar chunks, stuff them in the prompt. That is roughly 600 lines of TypeScript.
RAG-over-PDF is that 600 lines, written cleanly, with no framework hiding the moving parts. Clone it. Read it. Ship it. Swap the in-memory store for pgvector when you outgrow it. Add re-ranking when you measure that you need it. Do not pay framework tax up front.
Built-in features
Everything below works out of the box. Clone, add an OpenAI key, deploy.
PDF parsing in pure JS
pdf-parse handles 95% of real-world PDFs with no native bindings, no Docker, no OCR setup. Drop a PDF, extract text in milliseconds.
Fixed-size chunking with overlap
1,000-char chunks with 200-char overlap. Sentences that span boundaries are still findable in either chunk. Tunable via env vars.
OpenAI text-embedding-3-small
1536-dimensional vectors at £0.000016 per 1k tokens. Indexing a 500-page PDF costs about 2p. Override the model with one env var.
In-memory cosine retrieval
Zero infrastructure. Top-5 chunks in 2-12ms for sub-1k chunk corpora. The whole vector store is a 30-line module you can swap in an afternoon.
Streaming answers via SSE
gpt-4o-mini generates token by token through the App Router stream API. Time-to-first-token: 600-900ms. Users perceive responsiveness, not latency.
pgvector when you outgrow memory
The vector store interface has three methods: add, search, clear. Replace the body with Postgres calls — the retrieval pipeline does not care.
Grounded answers, no hallucination prompts
System prompt pins the model to the retrieved chunks only. If the answer is not there, the model says so plainly. Pin and test, do not trust.
TypeScript end to end
Strict mode. Every chunk, embedding, and message is typed. Schema-first means provider API changes break the build, not your users.
Wiki with the full theory
How RAG works, architecture diagrams, cost and performance, swap-to-pgvector walkthrough. Read the whole thing in 25 minutes.
One-click Vercel deploy
Vercel ships sharp natively, App Router supports streaming on the free tier, and the only secret is your OpenAI key. From clone to live in 60 seconds.
Tech stack
Architecture sketch
Two API routes. One in-memory store. One UI page. That is the whole thing.
┌─────────────────────────────────────────────────────────────┐
│ Indexing (POST /api/upload) │
│ Browser ──FormData(file)──▶ Route handler │
│ │ │
│ ▼ pdf-parse(buffer) // pure JS, no deps │
│ ▼ chunk(text, 1000, 200) // overlap window │
│ ▼ openai.embeddings(chunks) // 1536 dims │
│ ▼ vectorStore.add(vectors) // in-memory cosine │
│ ▼ 200 OK { chunks: 47 } │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Question (POST /api/chat) │
│ Browser ──{ question }──▶ Route handler │
│ │ │
│ ▼ openai.embeddings([question]) │
│ ▼ vectorStore.search(qVec, k=5) │
│ ▼ buildPrompt(system, chunks, question) │
│ ▼ openai.chat.stream(prompt) // gpt-4o-mini │
│ ▼ SSE tokens ──▶ Browser │
└─────────────────────────────────────────────────────────────┘Quick start
From clone to running locally in four commands. From running to deployed in another two.
git clone https://github.com/sarmakska/rag-over-pdf.git cd rag-over-pdf pnpm install cp .env.example .env.local # Add OPENAI_API_KEY to .env.local pnpm dev
Open http://localhost:3000, upload a PDF, ask a question. That is the loop.
Use cases
What people actually build with this.
Internal docs chat
"Make our 200 internal PDFs searchable." Index policies, runbooks, contracts. Cite the exact passage.
Customer support copilot
Ground answers in your real product docs, not the model's training data. Update docs, re-index, done.
Research assistant
Skim 50-page papers in seconds. Top-5 chunk retrieval is precise enough for academic prose without re-ranking.
Learning RAG end-to-end
Read 600 lines of TypeScript and understand every moving part. No framework hiding the indexing or retrieval steps.
Use it. Fork it. Ship it.
MIT licensed. No strings attached. Attribution appreciated, not required. Pull requests welcome — chunking strategies, re-rankers, citation rendering, local-embedding adapters all wanted.