All guides

Filter by category, difficulty, or free text to find the right material for your team.

Logs~40 min

Build incident timelines from logs, metrics, and traces without making up the missing parts

Created: May 2, 2026 · Published: May 2, 2026

Learn how to reconstruct a real incident from Prometheus, Loki, and distributed traces without getting lost in noise, clock skew, or confident storytelling.

LinuxDocker
Intermediate
Read guide
Logs~40 min

Debug Vector pipelines before retries and buffers hide the real bottleneck

Created: May 1, 2026 · Published: May 1, 2026

A practical guide to using Vector internal metrics, config validation, and isolation tests when logs arrive late, get retried too often, or vanish.

LinuxDocker
Intermediate
Read guide
Logs~40 min

Fix Loki label explosion without breaking the searches that actually matter

Created: April 29, 2026 · Published: April 29, 2026

A practical guide to spotting high-cardinality labels in Loki, taking them out of the hot path, and proving your searches still work.

LinuxDocker
Advanced
Read guide
Logs~40 min

Diagnose hot shards in OpenSearch before latency and indexing queues spiral

Created: April 27, 2026 · Published: April 27, 2026

Learn how to confirm an OpenSearch hot shard, find the affected index and node, fix the actual cause, and validate that recovery is real.

LinuxDocker
Advanced
Read guide
Logs~42 min

Build incident timelines with logs, metrics, and traces without making up the story

Created: April 19, 2026 · Published: April 19, 2026

Recent ecosystem signals point in the same direction: more telemetry does not automatically produce better diagnosis. This guide turns that into a repeatable method for building trustworthy timelines with Prometheus, Loki, and OpenTelemetry, while avoiding very current traps such as misleading memory metrics, noisy labels, and traces that lack useful request context.

LinuxDocker
Intermediate
Read guide
Logs~55 min

Resolve hot shards in OpenSearch before the cluster starts melting

Created: April 19, 2026 · Published: April 19, 2026

An advanced guide to isolating hot shards in OpenSearch with node, shard, and ingest signals, then applying reversible mitigations before queues, timeouts, and backlogs take over.

LinuxDocker
Advanced
Read guide
Logs~24 min

Debug Vector pipelines when logs arrive late, broken, or not at all

Created: April 17, 2026 · Published: April 17, 2026

When a Vector pipeline starts delaying, duplicating, or dropping events, random tuning is usually the expensive path. This guide shows how to use internal metrics, config validation, and sink-side signals to find the real bottleneck and fix it with reversible changes.

DockerLinux
Intermediate
Read guide
Logs~60 min

What to do when Loki sinks from label cardinality explosion

Created: April 13, 2026 · Published: April 13, 2026

Actionable guide to detect and fix high-cardinality labels that degrade or crash Loki: symptoms, metrics and logs to inspect, safe Promtail/ingest changes and validation steps.

DockerLinux
Advanced
Read guide
Logs~28 min

Size OpenSearch shards from real ingestion

Created: April 9, 2026 · Published: April 9, 2026

Advanced guide for choosing `number_of_shards` and `max_size` from the real ingestion rate of an index.

Advanced
Read guide
Logs~18 min

OpenSearch for centralized logs (cross-platform)

Created: April 6, 2026 · Published: April 6, 2026

Configure OpenSearch and Dashboards, load initial documents, and validate operational log-search workflows on any platform.

Docker
Beginner
Read guide