21 Feb 2018 • 2 min read

O11Y on Kubernetes

I’ll show you how to set up your own monitoring stack on Kubernetes.

Most Kubernetes teams start with third-party observability platforms like Datadog or New Relic. They’re fast to set up and cover 80% of needs. But over time, you hit limits: opaque billing, vendor outages, or not being able to store raw logs and traces as long as you’d like. That’s when teams start looking at self-hosted monitoring.

The core open-source stack I’ve used is:

Prometheus for metrics
Loki for logs
Tempo for traces
Grafana to visualize everything

Together, they give you a vendor-neutral, fully customizable monitoring solution.

Deploying the stack

Helm makes installation straightforward:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki-stack
helm install prometheus grafana/prometheus
helm install tempo grafana/tempo
helm install grafana grafana/grafana

But real-world setups go beyond defaults. For example, you’ll almost always need persistence for Prometheus and Loki:

# values.yaml
loki:
  persistence:
    enabled: true
    size: 50Gi
prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 100Gi

Wiring it together

Prometheus scrapes metrics from pods via service annotations.
Loki collects logs through Promtail sidecars or a DaemonSet.
Tempo ingests spans directly from instrumented apps (e.g. OpenTelemetry).
Grafana connects to all three, letting you pivot between metrics, logs, and traces in a single dashboard.

This triad makes debugging powerful: you see a spike in latency (Prometheus), jump into the logs (Loki), and correlate with a distributed trace (Tempo).

Lessons learned in practice

raw logs can fill disks quickly. Use retention policies and S3-compatible backends for Loki/Tempo.
exposing Prometheus endpoints without auth is a common security mistake. Lock it down.
the stack gives you raw data, but meaningful alerts require tuning (e.g. 99th percentile latency, error budgets).
Helm makes it easy to install, but version mismatches between components can break integrations. Test upgrades in staging.

Takeaway

Running your own observability stack is not free — you trade vendor convenience for control. But if you care about cost transparency, long-term data retention, or avoiding lock-in, Loki + Prometheus + Tempo + Grafana is a proven combination. Once configured, it gives your team deep visibility into Kubernetes workloads, and you’re not at the mercy of a third-party SaaS when things go wrong.