I was given the task of converting a 1/0 metric on our Statuspage.io page from Datadog Metrics using the Route53 healthcheck to an actual percentage uptime display in StatusPage.io, or at least something similarly meaningful to the end user.
First stop: Service Level Objectives
When browsing around our current monitors and dashboards, one thing that stood out was “service level objectives.” In combination with synthetics, they provide an uptime percentage over a period of time that can be embedded on the dashboard. [We’ll come back to synthetics on a different approach]
Next stop: Trying to embed those SLOs
The System Metrics integration on the statuspage.io side seems to really only be built for flat queries for a point-in-time, and not aggregated over a period of time of days or weeks. A aws.route53.health_check_status
query that produced either a 1 or a 0 at any given point in time was fine, but coming up with a way to “query” for a 24 hour or 90 day up time was a different story (impossible to do via direct integration between the two apps?)
Third stop: UptimeRobot and Similar
Jyll over @ Veracity.net suggested some experimentation with Uptime Robot and similar services with my own free instance of StatusPage, and it was in stripping away the extra configuration and being able to feed a simple up/down email or webhook to statuspage.io that I came back to the idea of looking to see if I could email or webhook synthetic alerts from Datadog to Statuspage. (Spoiler: You can!)
Final stop (and the actual steps needed!) Automating Datadog to Send Status to get Uptime Display in StatusPage.io
- Add a component in your statuspage.io account
- Click on the “Automation” button to get the automation email. Copy that email:
3. (Create a synthetic monitor that checks a heartbeat route if you don’t already have one)
4. Go to your synthetic monitor in Datadog… under Step 6 is “Notify your team”. Your monitor name needs to use the template variables {{#is_alert}}DOWN{{/is_alert}}{{#is_recovery}}UP{{/is_recovery}}
for statuspage automation to understand the message. The rest of the monitor name is irrelevant (as long as DOWN or UP isn’t a fixed part of that name!)
The automation email needs to be mentioned in the message body with an @
in front of it.