Systems status: [ OK ]
  • [ OK ]proxy
  • [ OK ]mailserver
  • [ OK ]sunrise
  • [ OK ]

O11Y on Kubernetes

I’ll show you how to set up your own monitoring stack on Kubernetes.

O11Y on Kubernetes

Most Kubernetes teams start with third-party observability platforms like Datadog or New Relic. They’re fast to set up and cover 80% of needs. But over time, you hit limits: opaque billing, vendor outages, or not being able to store raw logs and traces as long as you’d like. That’s when teams start looking at self-hosted monitoring.

The core open-source stack I’ve used is:

  • Prometheus for metrics
  • Loki for logs
  • Tempo for traces
  • Grafana to visualize everything

Together, they give you a vendor-neutral, fully customizable monitoring solution.

Deploying the stack

Helm makes installation straightforward:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki-stack
helm install prometheus grafana/prometheus
helm install tempo grafana/tempo
helm install grafana grafana/grafana

But real-world setups go beyond defaults. For example, you’ll almost always need persistence for Prometheus and Loki:

# values.yaml
loki:
  persistence:
    enabled: true
    size: 50Gi
prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 100Gi

Wiring it together

  • Prometheus scrapes metrics from pods via service annotations.
  • Loki collects logs through Promtail sidecars or a DaemonSet.
  • Tempo ingests spans directly from instrumented apps (e.g. OpenTelemetry).
  • Grafana connects to all three, letting you pivot between metrics, logs, and traces in a single dashboard.

This triad makes debugging powerful: you see a spike in latency (Prometheus), jump into the logs (Loki), and correlate with a distributed trace (Tempo).

Lessons learned in practice

  • raw logs can fill disks quickly. Use retention policies and S3-compatible backends for Loki/Tempo.
  • exposing Prometheus endpoints without auth is a common security mistake. Lock it down.
  • the stack gives you raw data, but meaningful alerts require tuning (e.g. 99th percentile latency, error budgets).
  • Helm makes it easy to install, but version mismatches between components can break integrations. Test upgrades in staging.

Takeaway

Running your own observability stack is not free — you trade vendor convenience for control. But if you care about cost transparency, long-term data retention, or avoiding lock-in, Loki + Prometheus + Tempo + Grafana is a proven combination. Once configured, it gives your team deep visibility into Kubernetes workloads, and you’re not at the mercy of a third-party SaaS when things go wrong.