Open Source · MIT · Helm + Bash + ArgoCD

Next.js on Kubernetes, production-grade in five commands.

A Helm chart for the app and a version-pinned bootstrap for the platform. Ingress, TLS, autoscaling, metrics, logs, alerts, spend tracking and an optional GitOps path. None of the yaml on your side.

View on GitHub Read the wiki Whitepaper How it works Hire me

cmds to live

~8 min

platform install

pinned upstream charts

grafana dashboards

MIT

licence

Why this exists

Most teams reach for Kubernetes when they outgrow Vercel or want to cut a hosting bill that has stopped making sense. Then they spend two weeks configuring the same things every other team configures: ingress, cert-manager, monitoring, logging, autoscaling, secrets. The end result is fine; the path to it is a waste.

The big ecosystems solve much larger problems. Argo and Crossplane bring serious machinery for serious orgs. Backstage brings a developer portal. The lighter starters often skip observability entirely and leave the next person to wire metrics by hand.

The toolkit is the middle. A small Helm chart you can read in twenty minutes, a single version-pinned bash installer for the platform, an ArgoCD app-of-apps if you prefer GitOps, and the dashboards and alert rules already opinionated for Next.js workloads. The week you would have spent, given back.

Why this matters

The first week on any new cluster is identical across teams: ingress, TLS, autoscaling, metrics, logs, alerts. Burning it every project is a tax. The toolkit pays that tax once, in public, and pins the answers so every cluster after this one inherits them. The time you keep is the entire point.

What is in the box

Everything below is pinned, tested, and wired together by the installer. Nothing here is aspirational.

Helm chart for Next.js

Deployment with tuned rolling update strategy, hardened pod and container security context, ClusterIP service, ingress with cert-manager TLS, HorizontalPodAutoscaler, PodDisruptionBudget, liveness and readiness probes on /api/health, and a Prometheus ServiceMonitor scraping /api/metrics.

cert-manager with Let's Encrypt

The installer creates the Let's Encrypt production ClusterIssuer and wires every ingress to request a real certificate. Automatic renewal. A bundled alert fires when a certificate is within fourteen days of expiry.

kube-prometheus-stack

Prometheus, Grafana, Alertmanager and node-exporter installed from the upstream community chart at a pinned version. ServiceMonitor on the chart picks up app metrics without extra config.

Loki 3.x with Promtail

Logs go from stdout to Promtail to Loki, queryable from Grafana with the same Explore UI as metrics. Labels keep the index small; the bulk lives on object storage.

Alertmanager rules

Bundled PrometheusRule covers crash-looping pods, ingress-nginx 5xx spikes above five percent, p99 latency above two seconds, certificate expiry inside fourteen days, and PV space predicted to exhaust within six hours. Slack webhook wired by the installer.

HPA on CPU by default

Default min 2, max 10, target 70 percent CPU. A documented pattern in the wiki swaps in custom-metrics HPA via the ServiceMonitor for requests-per-second autoscaling when you need it.

ingress-nginx, documented

The ingress everyone runs. The chart documents annotations for body-size limits, websocket support, redirects, and per-host TLS. ingress-nginx is the LoadBalancer-typed entry point.

PodDisruptionBudget on by default

minAvailable: 1 keeps a floor of replicas during voluntary disruptions such as node drains and cluster upgrades. Disable per-release if a workload prefers full availability over safe drains.

No service mesh by design

For a small fleet of Next.js workloads, mesh complexity is rarely worth the operational cost. The toolkit deliberately does not ship one. You add Linkerd or Istio when you have a reason.

Plain Helm, no operator

You can read every template, copy it, fork it. No CRDs to learn beyond what cert-manager and Prometheus already require. No hidden state.

OpenCost spend dashboard

OpenCost is pinned and installed with the rest. A bundled Grafana dashboard breaks down cluster spend by namespace and workload so you can see where the money goes.

ArgoCD app-of-apps for GitOps

Prefer reconciliation from git over a bash installer? The same components are described as an Argo Application set under gitops/argocd. Apply once, git is the source of truth.

End-to-end pytest suite

A real Helm render is fed into pytest fixtures that assert pod-selector match, service target-port wiring, TLS wiring, gating of optional objects, and version parity between the installer and the GitOps Applications.

Version-pinned everything

Every upstream chart version is declared in scripts/install.sh and mirrored in the Argo Applications. The same command produces the same platform every time.

Cluster topology

Internet traffic enters through ingress-nginx, lands on the Next.js pods, and produces metrics and logs that Prometheus, Loki and OpenCost feed back into Grafana and Alertmanager.

rendering

Cluster topology: ingress-nginx fronts the Next.js pods; Prometheus, Loki and OpenCost feed Grafana; Alertmanager routes to Slack; ArgoCD optionally reconciles the platform from git.

Request path

What a user request actually touches, from the LoadBalancer to the streamed response.

rendering

Request path: ingress-nginx terminates TLS via cert-manager, routes by host header, and the pod emits metrics and logs out of band.

Quick start

Five commands. About eight minutes from an empty cluster to ingress, TLS, metrics, logs, alerts, spend tracking, and your first app running.

01Clone the repo

git clone https://github.com/sarmakska/k8s-ops-toolkit.git
cd k8s-ops-toolkit
export KUBECONFIG=~/.kube/your-cluster.yaml

02Bootstrap the platform

./scripts/install.sh \
  --domain example.com \
  --email you@example.com \
  --slack-webhook https://hooks.slack.com/services/...

03Load the bundled dashboards

./scripts/load-dashboards.sh
# Cluster Overview, Next.js app, OpenCost spend installed as sidecar ConfigMaps

04Deploy your Next.js app

helm install my-app ./charts/nextjs-app \
  --set image.repository=ghcr.io/you/my-app \
  --set image.tag=v1.0.0 \
  --set ingress.host=app.example.com \
  --set replicas=3

05Open Grafana

kubectl -n monitoring port-forward svc/kube-prometheus-stack-grafana 3000:80
# user: admin · pwd: kubectl -n monitoring get secret kube-prometheus-stack-grafana -o jsonpath='{.data.admin-password}' | base64 -d

A real values.yaml

The actual default values that ship with charts/nextjs-app. Sensible defaults, then override only what your service needs.

replicas: 2

image:
  repository: ghcr.io/your-org/your-app
  tag: latest
  pullPolicy: IfNotPresent
  pullSecrets: []

rollingUpdate:
  maxSurge: 1
  maxUnavailable: 0

service:
  port: 3000

ingress:
  enabled: true
  className: nginx
  host: app.example.com
  annotations: {}
  tls:
    enabled: true
    issuer: letsencrypt-prod

resources:
  requests: { cpu: 100m, memory: 256Mi }
  limits:   { cpu: 1000m, memory: 1Gi }

autoscaling:
  enabled: true
  min: 2
  max: 10
  targetCPU: 70

pdb:
  enabled: true
  minAvailable: 1

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  seccompProfile: { type: RuntimeDefault }

containerSecurityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop: [ALL]

probes:
  liveness:
    path: /api/health
    initialDelaySeconds: 30
    periodSeconds: 10
  readiness:
    path: /api/health
    initialDelaySeconds: 5
    periodSeconds: 5

monitoring:
  enabled: true
  prometheusServiceMonitor: true
  metricsPath: /api/metrics
  metricsPort: 3000
  interval: 30s
  serviceMonitorLabels:
    release: monitoring

Full reference: Helm-Chart wiki page

Platform components

Every upstream pinned in scripts/install.sh and mirrored in gitops/argocd. Same versions either path.

Component	Purpose
ingress-nginx	Layer-7 ingress controller exposed as a LoadBalancer service. The cluster's only public endpoint.
cert-manager	Issues and renews TLS certificates via Let's Encrypt. ClusterIssuer is created by the installer.
kube-prometheus-stack	Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics. The metrics backbone.
Loki 3.x	Log aggregation. Cheap to run because it indexes labels, not log content.
Promtail	Sidecar-less log shipper. Reads container stdout from the node, ships to Loki.
OpenCost	Spend attribution. Queries Prometheus for utilisation, emits cost-per-namespace and cost-per-workload.
Alertmanager	Routes alerts to Slack via the installer-supplied webhook. PagerDuty or Opsgenie one variable away.

Bundled alert rules

PrometheusRule under manifests/prometheus-rules/app-rules.yaml. Loaded automatically when the kube-prometheus-stack release label matches.

Alert	Severity	Fires when
KubePodCrashLooping	critical	A container restarts more than five times in ten minutes
KubePersistentVolumeFillingUp	warning	A PV is predicted to run out of space within six hours
IngressNginxHigh5xxRate	critical	Ingress 5xx ratio above five percent for five minutes
IngressNginxHighLatency	warning	Ingress p99 latency above two seconds for five minutes
CertManagerCertificateExpirySoon	warning	A certificate has not been renewed within fourteen days of expiry

Slack webhook wired by the installer through manifests/values-alertmanager.yaml.

The install script

The platform install is a single bash script. Every chart version is pinned. Idempotent.

#!/usr/bin/env bash
set -euo pipefail

INGRESS_VERSION=4.11.3
CERT_MANAGER_VERSION=v1.15.3
KPS_VERSION=65.1.0
LOKI_VERSION=6.16.0
PROMTAIL_VERSION=6.16.5
OPENCOST_VERSION=2.4.6

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add jetstack https://charts.jetstack.io
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
  --version "$INGRESS_VERSION" -n ingress-nginx --create-namespace

helm upgrade --install cert-manager jetstack/cert-manager \
  --version "$CERT_MANAGER_VERSION" -n cert-manager --create-namespace \
  --set installCRDs=true
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata: { name: letsencrypt-prod }
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ${EMAIL}
    privateKeySecretRef: { name: letsencrypt-prod }
    solvers: [ { http01: { ingress: { class: nginx } } } ]
EOF

helm upgrade --install monitoring prometheus-community/kube-prometheus-stack \
  --version "$KPS_VERSION" -n monitoring --create-namespace \
  -f manifests/values-alertmanager.yaml

helm upgrade --install loki grafana/loki \
  --version "$LOKI_VERSION" -n monitoring -f manifests/values-loki.yaml
helm upgrade --install promtail grafana/promtail \
  --version "$PROMTAIL_VERSION" -n monitoring

helm upgrade --install opencost opencost/opencost \
  --version "$OPENCOST_VERSION" -n monitoring

Truncated for the page. The real script also wires the Slack webhook, waits for cert-manager webhooks, and prints a readiness summary.

GitOps with ArgoCD

Prefer the platform reconciled from git instead of installed by hand? Apply the app-of-apps root once.

rendering

ArgoCD app-of-apps: a single root Application reconciles every platform component from gitops/argocd.

Full GitOps walkthrough: GitOps wiki page

Use cases

What teams actually run this for.

First production cluster

Greenfield team going from "we deploy to Vercel" to "we run our own k8s." Skip the week of yak-shaving on ingress, TLS, autoscaling and metrics.

Adding observability later

Apps already running but no metrics or logs. The installer drops in Prometheus, Loki and Grafana in an afternoon without touching workloads.

Standardising deploys

Pin every Next.js service in your org to the same chart. Consistent probes, consistent autoscaling, consistent alerts, consistent labels.

Cost-controlled SaaS infrastructure

One DigitalOcean cluster hosting an arbitrary number of services. OpenCost surfaces where the spend lives. Predictable bill.

Platform team with multiple Next.js services

Each service installs the chart with its own values file. Helm release name is the unit of isolation. ArgoCD reconciles the platform from git.

Staging environments that look like prod

Same install script, smaller node pool. Real TLS, real metrics, real alerts, a fraction of the spend.

k8s-ops-toolkit vs alternatives

How the toolkit compares to other ways to put a Next.js app on Kubernetes. Honest scope-by-scope.

Feature	k8s-ops-toolkit	Stock Helm + bash	Backstage	Pure ArgoCD	Vercel
Helm chart for Next.js	Yes, opinionated	Build yourself	Via plugin	Bring your own	N/A
TLS via cert-manager	Pinned + wired	Manual install	Out of scope	Manual install	Managed
Prometheus + Grafana	Pinned + dashboards	Manual install	Out of scope	Manual install	Managed
Loki for logs	Pinned	Manual install	Out of scope	Manual install	Managed
OpenCost spend	Pinned + dashboard	Manual install	Out of scope	Manual install	Limited
GitOps reconcile	ArgoCD app-of-apps	Bring your own	Out of scope	Yes, native	N/A
E2E test suite	pytest renders chart	No	No	No	N/A
Licence	MIT	MIT components	Apache 2.0	Apache 2.0	Commercial
Total time to live	About 8 minutes	Days	Days	Hours	Minutes (managed)

Tech stack

Every piece pinned. No surprise minor-version drift.

Kubernetes 1.31+Helm 3.17ingress-nginxcert-managerPrometheusGrafanaLoki 3.xPromtailAlertmanagerkube-prometheus-stackOpenCostArgoCDpytest + pyyamlShellCheckbash bootstrap

Documentation & guides

The wiki is the deep reference. Architecture, dashboards, alert rules, and the GitOps path are written down.

Architecture

How ingress, app, Prometheus, Loki, OpenCost and Alertmanager fit together. Mermaid diagrams included.

Read on GitHub

Quick-Start

Install on a fresh cluster, deploy the example app, see metrics in Grafana inside ten minutes.

Read on GitHub

Helm-Chart

Every value in charts/nextjs-app/values.yaml explained, with secure defaults and override patterns.

Read on GitHub

Observability

Dashboards, alert rules, log queries. How to extend without forking the chart.

Read on GitHub

GitOps

The ArgoCD app-of-apps pattern in detail. When to prefer it over the imperative installer.

Read on GitHub

Roadmap

Velero disaster-recovery, HPA on custom metrics, ingress-nginx canary traffic split.

Read on GitHub

Frequently asked

The questions that come up most often before adoption.

Why not just use Vercel?+

For some teams Vercel is the right answer forever. For others, three or four services on Vercel cost more than a single $70 a month DigitalOcean cluster that hosts an arbitrary number of apps. This toolkit is for the day you cross that line.

Does it lock me into ingress-nginx?+

No. ingress-nginx is the default because it is the controller most teams already run and the one the bundled rules and dashboards target. Swap to Traefik or Contour and the chart still works; you would re-author the ingress-specific alerts and dashboards.

How is this different from Argo, Crossplane, Backstage?+

Those solve much larger problems and bring much heavier machinery. This toolkit is the small platform-layer most teams need. The ArgoCD app-of-apps is an opt-in path, not a replacement for the imperative installer.

Can I run the installer twice?+

Yes. Every step uses helm upgrade --install. The script is idempotent: re-running it converges on the same pinned versions and the same values.

How do I add a custom Grafana dashboard?+

Drop a JSON file into manifests/grafana-dashboards/ and re-run scripts/load-dashboards.sh. The sidecar discovers ConfigMaps with the grafana_dashboard label and loads them on the next reconcile.

What about secrets management?+

The chart supports inline env, individual secret-backed env, and whole-Secret envFrom mounting. Sealed Secrets or External Secrets Operator are documented patterns; neither is pinned by default because the right choice is team-specific.

Does autoscaling on CPU cover real-world Next.js?+

For most workloads, yes. For request-bound services with long-tail latency, the wiki includes a pattern for HPA on requests-per-second sourced from the ServiceMonitor via Prometheus Adapter.

How are upgrades managed?+

Upstream chart versions are pinned in scripts/install.sh and gitops/argocd. Bumping a version is a single edit, a re-run of the installer (or an Argo sync), and the e2e pytest suite to verify the chart still renders cleanly.

Stop yak-shaving the platform

Clone the repo, run the installer, deploy the chart. The same five commands every time, on every cluster.

View on GitHub Read the whitepaper How it works Hire me

Related projects

Part of a portfolio of production-shaped open-source repos.

Next.js on Kubernetes, production-grade in five commands.

Why this exists

Why this matters

What is in the box

Helm chart for Next.js

cert-manager with Let's Encrypt

kube-prometheus-stack

Loki 3.x with Promtail

Alertmanager rules

HPA on CPU by default

ingress-nginx, documented

PodDisruptionBudget on by default

No service mesh by design

Plain Helm, no operator

OpenCost spend dashboard

ArgoCD app-of-apps for GitOps

End-to-end pytest suite

Version-pinned everything

Cluster topology

Request path

Quick start

A real values.yaml

Platform components

Bundled alert rules

The install script

GitOps with ArgoCD

Use cases

First production cluster

Adding observability later

Standardising deploys

Cost-controlled SaaS infrastructure

Platform team with multiple Next.js services

Staging environments that look like prod

k8s-ops-toolkit vs alternatives

Tech stack

Documentation & guides

Architecture

Quick-Start

Helm-Chart

Observability

GitOps

Roadmap

Frequently asked

Stop yak-shaving the platform

Related projects

SarmaLink-AI

MCP Server Toolkit

Voice Agent Starter

Agent Orchestrator

AI Eval Runner

Local LLM Router