How to Automate Uptime Display in via Synthetic Monitors in Datadog in 4 steps

I was given the task of converting a 1/0 metric on our page from Datadog Metrics using the Route53 healthcheck to an actual percentage uptime display in, or at least something similarly meaningful to the end user.

First stop: Service Level Objectives

When browsing around our current monitors and dashboards, one thing that stood out was “service level objectives.” In combination with synthetics, they provide an uptime percentage over a period of time that can be embedded on the dashboard. [We’ll come back to synthetics on a different approach]

SLO Synthetics Uptime Display in Datadog

Next stop: Trying to embed those SLOs

The System Metrics integration on the side seems to really only be built for flat queries for a point-in-time, and not aggregated over a period of time of days or weeks. A aws.route53.health_check_status query that produced either a 1 or a 0 at any given point in time was fine, but coming up with a way to “query” for a 24 hour or 90 day up time was a different story (impossible to do via direct integration between the two apps?)

Third stop: UptimeRobot and Similar

Jyll over @ suggested some experimentation with Uptime Robot and similar services with my own free instance of StatusPage, and it was in stripping away the extra configuration and being able to feed a simple up/down email or webhook to that I came back to the idea of looking to see if I could email or webhook synthetic alerts from Datadog to Statuspage. (Spoiler: You can!)

Final stop (and the actual steps needed!) Automating Datadog to Send Status to get Uptime Display in

  1. Add a component in your account
  2. Click on the “Automation” button to get the automation email. Copy that email:
uptime display in
Click the Automation button to reveal your automation email

3. (Create a synthetic monitor that checks a heartbeat route if you don’t already have one)

4. Go to your synthetic monitor in Datadog… under Step 6 is “Notify your team”. Your monitor name needs to use the template variables {{#is_alert}}DOWN{{/is_alert}}{{#is_recovery}}UP{{/is_recovery}} for statuspage automation to understand the message. The rest of the monitor name is irrelevant (as long as DOWN or UP isn’t a fixed part of that name!)

The automation email needs to be mentioned in the message body with an @ in front of it.

Monitor alert settings
No, that’s not a valid automation email.

Leave a Reply

%d bloggers like this: