Back to guidesGuide

Build incident timelines with logs, metrics, and traces without making up the story

A practical flow to reconstruct incidents when the dashboard says one thing, the logs say another, and the on-call engineer trusts none of them.

Recent ecosystem signals point in the same direction: more telemetry does not automatically produce better diagnosis. This guide turns that into a repeatable method for building trustworthy timelines with Prometheus, Loki, and OpenTelemetry, while avoiding very current traps such as misleading memory metrics, noisy labels, and traces that lack useful request context.

Created: April 19, 2026

Published: April 19, 2026

Estimated time42 min
LevelIntermediate
Before you startRead-only access to Prometheus, Loki, and your tracing backend
PlatformsLinux / Docker
WhatsAppXLinkedIn

Docker

Use this in a lab or self-contained Docker Compose stack to repeat the analysis without depending on the host shell.

docker composeRunning prometheus, loki, and tempo containers or equivalentsPermission to run internal queries
Run the error query from the Prometheus container
docker compose exec prometheus wget -qO- 'http://localhost:9090/api/v1/query?query=sum(rate(http_server_requests_seconds_count%7Bservice%3D%22checkout%22%2Cstatus%3D~%225..%22%7D%5B5m%5D))'
Query Loki from the local stack
docker compose exec loki wget -qO- 'http://localhost:3100/loki/api/v1/query?query=%7Bservice_name%3D%22checkout%22%7D%20%7C%3D%20%22timeout%22'
Search slow traces from Tempo in Docker
docker compose exec tempo wget -qO- 'http://localhost:3200/api/search?q=%7B%20resource.service.name%20%3D%20%22checkout%22%20%26%26%20span.http.status_code%20%3E%3D%20500%20%7D'

Content locked

This guide requires both steps before full content is available.

  • Click “Like” on this guide.
  • Share on WhatsApp, X, LinkedIn, or copy the link.

Access is automatically unlocked as soon as both steps are completed.