Service Health & Deployment Diagnostics

github

datadog

kubernetes

slack
Automate health checks, resource diagnostics, and deployment visibility for your services. Get real-time insights and on-call notifications via Slack.
TL;DR
This runbook automates a full diagnostic sweep for your services — including health checks, deployment visibility, resource monitoring, latency, and error rate analysis. After execution, it posts a detailed summary to Slack for on-call awareness.
Who is this for?
Site Reliability Engineers (SREs), platform engineers, and DevOps professionals responsible for infrastructure uptime, performance monitoring, and incident response.
What problem does this solve?
Without automation, monitoring health, deployment activity, and performance metrics across multiple platforms (Datadog, Kubernetes, GitHub) can be slow and error-prone.
This runbook solves:
- Manual health verification across tools
- Lag in detecting performance issues
- Delays in team notification during failures or anomalies
What this workflow accomplishes
- Verifies API health of your services
- Checks for new deployments within the past 20 minutes
- Audits CPU and memory usage using Datadog and Kubernetes
- Identifies latency and error rate anomalies
- Scans recent logs for application errors
- Sends structured summary reports to a Slack channel with on-call alerts
Integrations
This runbook uses the following integrations:
GitHub Agent: Checks for recent deployments
Datadog Agent: Collects resource usage, latency, and error metrics
Kubernetes Agent: Retrieves pod-level resource usage
Slack Agent: Sends detailed runbook results and alerts
Setup
- GitHub:
- Repo access token with deployment read permissions
- Datadog:
- Valid API key and App key
- Required scopes: Metrics read, Logs read
- Kubernetes:
- Bearer Token - A long-lived ServiceAccount Token
- Cluster CA Certificate - The cluster’s root certificate for TLS verification
- API Server URL - The Kubernetes cluster endpoint URL
- Access to query pod metrics in the
workspaces
namespace
- Slack:
- Bot token with
chat:write
permission - Channel ID:
#random
- Bot token with
Runbook Template
Alexis Warner
Marketing
May 30, 2025
•
5 min read
Categories
devops
observability
diagnostics
alerts
slack
github
datadog
About this post
Alexis Warner
Marketing
Last updated: May 30, 2025
5 min read
Agents Used




Categories
devops
observability
diagnostics
alerts
slack
github
datadog
Product
2025 © Bearify All Rights Reserved