Skip to content

Docs site outage response playbook

Hillary Fraley edited this page Nov 22, 2022 · 2 revisions

Sensu docs are monitored using the Sensu CR Monitoring Caviar project. See also the live demo.

When you receive a PagerDuty notification that the Sensu docs site is down, first visit docs.sensu.io to confirm the site is not working.

If the site is in fact down, post in the #sensu-alerts Slack channel to confirm that you are responding to the PagerDuty notification. Then, redeploy the site with Heroku:

  1. Log in to the Sensu docs site Heroku app.
  2. Click the Deploy tab.
  3. Scroll down to "Manual Deploy" and click Deploy Branch.
  4. Wait for the docs site to build.
  5. Visit docs.sensu.io to confirm that the manual deployment fixed the problem.
  6. Post to the #sensu-alerts Slack channel to confirm that you've resolved the outage.

If the manual deployment did not fix the problem:

  1. In the Sensu docs site Heroku app, click the Resources tab.
  2. Scroll down to the "Add-ons" list and click Papertrail.
  3. In the Papertrail log window, search for the word "error."
  4. Read through the search results and find the error that matches the date and time of the docs site failure.
  5. Find the error code in the list of Heroku error codes.
  6. Copy and paste the Papertrail log entry for the error and a link to the relevant error code information in the #alerts and #reliability Slack channels.

If you do not find an error in the Papertrail log or the error code indicates that Heroku is having issues:

  1. Check the Heroku status page to confirm the problem.
  2. Post a link to the Heroku status page information in the #alerts and #reliability Slack channels.

If the problem is a Heroku issue, you may need to wait until the issue is resolved for the site to come back up. You can try restarting all dynos:

  1. Open the Sensu docs site Heroku app.
  2. Click the More button in the upper-right corner of the page.
  3. Click Restart all dynos.