An operational knowledge base for self-hosted Temporal clusters. Contains Grafana dashboards, Grafana alert provisioning YAMLs, operational playbooks, and dynamic config reference, covering both the Temporal server and all Temporal SDKs.
Everything here is designed to be used directly: drop dashboards into Grafana, drop alert YAMLs into provisioning/alerting/, and follow playbooks against a real cluster.
Community feedback and contributions are always welcome — if something doesn't work in your environment, a threshold feels off, or you have operational knowledge worth sharing, open an issue or PR.
- Server Dashboards — Grafana dashboards for monitoring a self-hosted Temporal Server cluster, including:
- Server Overview Dashboard — cluster health, throughput, persistence, and service metrics
- Standby Cluster Dashboard — replication health, lag, and failover readiness for standby clusters
- History Host Health Dashboard — per-pod
host_healthgauge, NOT_SERVING detection, and fleet-level aggregation - Shard IO Concurrency Dashboard — shard IO semaphore health, DB prerequisite check, and decision guide for tuning
history.shardIOConcurrency(SQL backends only)
- SDK Dashboards — Grafana dashboards for monitoring Temporal SDK clients and workers (Java, Go, TypeScript, Python, .NET, Ruby).
- Troubleshooting Dashboards — Grafana dashboards focused on troubleshooting specific Temporal operational issues.
- Server Alerts — Grafana alerting provisioning rules for a self-hosted Temporal Server cluster. Covers the essential alert set plus dual visibility store alerts. Each alert links to a runbook with diagnosis and recovery steps.
- SDK Alerts — Grafana alerting provisioning rules for Temporal SDK clients and workers. One YAML per SDK reporter (Java Micrometer, Java OTel, Go, Core). Each alert links to a runbook with diagnosis and recovery steps.
- Metrics References — per-metric reference docs for the Temporal server and all SDKs (Go, Java, Core).
Production-ready operational playbooks for self-hosted Temporal clusters. Each playbook has been tested against a real cluster and cross-references the specific dashboard panels and alert rules that surface its signals.
OSS Temporal server dynamic config reference, dynamic config YAML samples, and troubleshooting info.
- temporal-etcd-dynconfig — etcd-backed dynamic config client for Temporal
- temporal-configmap-dynconfig — Kubernetes ConfigMap-backed dynamic config client for Temporal
- temporal-helm-superchart — Helm super-chart wrapping the upstream Temporal chart with a full observability stack