By and large this is recovering as quickly as possible from a failure when it occurs. In order to reduce the Down Time to a minimum, this recovery should be automated. This automation is often done by a High Availability Harness.
The cardinal thing to consider is the time it takes to restore the application to full functionality, which is given by:
(1) |
The detection time, , is entirely driven by the HA Harness (and should be easily tunable). The application recovery time, , is usually less susceptible to tuning (although it can be minimised by making sure necessary data is on a journaling file-system for example).