Check out the new USENIX Web site. next up previous
Next: Important Lessons Up: Clusters and Service Levels Previous: Converting Fault Resilience to

Is it Availability you want?

The standard service level agreement is usually phrased in terms of availability. However, as we've seen, availability can be a tricky thing to determine and can also be very hard to manage since it depends on uptime which is outside the capability of any clustering product to control.

However, consider the nature of most modern Internet delivered services (the best exemplar being the simple web-server). Most users, on clicking a URL would try again, at least once if they receive an error reply. The Internet has made most web users tolerant of any type of failure they could put down to latency or routing errors. Thus, to maintain the appearance of an operational website, uptime and thus availability are completely irrelevant. The only parameter which plays any sort of role in the user's experience is downtime. As long as you can have the web-server recovered within the time the user will tolerate a retry (by ascribing the failure to the Internet) then there will be no discernible outage, and thus the service level will have met the user's expectation of being fully available.

In the example given above, which most user requirements tend to fall into, it is important to note that since uptime turns out to be largely irrelevant, then any money spent on uptime features is wasted cash. As long as the investment is in a HA harness which switches over fast enough, the cheapest possible hardware may be deployed. [*]


next up previous
Next: Important Lessons Up: Clusters and Service Levels Previous: Converting Fault Resilience to
James Bottomley 2004-05-12