Check out the new USENIX Web site. next up previous
Next: Replication Up: Reducing Down Time Previous: Reducing Down Time

Linux Specific Problems

One of the major problems with Linux distributions can be the sheer number of kernel's available (usually with distribution proprietary patches), so any HA package that depends on kernel modifications is obviously going to have a hard time playing ``catch up''. Thus, although kernel support may be standardised by the CGL specification [5], currently it is a good idea to find a HA package that doesn't require any kernel modifications at all (except possibly to fix kernel bugs detected by the HA vendor). Unfortunately, protection of certain services (like NFS) may be extremely difficult to do unaided; however, if your vendor does supply kernels or modules, make sure they have a good update record for your chosen distribution.

The greatest (and currently unaddressed) problem within the Linux kernel is the so called ``Oops'' issue where a fault inside the kernel may end up only killing the process whose user space happens to be above it rather than taking down the entire machine. This is bad because the fault may have ramifications beyond the current process; the usual consequence of which is that the machine hangs. Such hangs are inimical to HA software if they cause the machine to respond normally to heartbeats but fail (in a locally undetectable manner) to be exporting the service.


next up previous
Next: Replication Up: Reducing Down Time Previous: Reducing Down Time
James Bottomley 2004-05-12