Three basic questions need to be addressed:
- How will we detect an issue?
- How will we redirect traffic to the failover solution?
- What is the failover destination?
How to monitor and detect an issue?
Monitoring issues is pretty simple to solve. There are any number of stand-alone services that will check the health of your website. Sites like Pingdom and Uptime just hit a URL at frequent intervals looking for a consistent response you can define. They typically have servers worldwide, and if several of them start reporting unexpected results, you can assume something is wrong with your website. These services can send emails and SMSs, or you can integrate them with your support systems like PagerDuty. Additionally, services like NewRelic actually integrate with your production backend systems and can send alerts when certain usage thresholds are exceeded. In this way a well-monitored system can ensure smart engineers are debugging a problem before it brings the live website down.
How to redirect traffic?
So the site is broken, and it's not magically coming up on its own. While fixing the site is a great idea and a top priority, you would like to have something better than just an error page running while the engineers figure out what is wrong. Manual is the default answer. Manual simply means some engineer is going to do their best to respond to the situation as it emerges. Depending on where the issue is with your technology stack, or hosting you can redirect traffic to some type of failover solution.
Automatic is a better answer. Some (not all) DNS providers have an automatic failover solution. Your DNS provider can provide some basic monitoring and when it looks like the server is down, the DNS provider can route the public to a failover destination. DNSMadeEasy DNS Failover does this. You can use Amazon's Route 53 health checking to configure active-active and active-passive failover configurations. Fundamentally, your IT department likely wants to manage DNS themselves (instead of having your web agency in charge of it) so it's essential to make sure their chosen DNS provider has a failover option if you want this to happen automatically.
Your site's infrastructure may have a load balancer setup as well. In this scenario, it'd be possible for the load balancer to redirect traffic elsewhere in an emergency, assuming the load balancer is working. The challenge here is there's always going to be some point of failure in these systems, so if you're looking for the outermost point, that's your DNS server.
What is the failover destination?
Websites are generally much more than the simple HTML pages of the 1990s, with many websites functioning much more like fully interactive applications that require a working database and programming language. Due to the complexity of digital marketing tools and content management systems, saying "just take a copy of the website" isn't as straightforward as it may seem. There are a few options for how complex your failover destination is: