Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
272 Chapter11·HighAvailability:ResettoGood Introduction One of the most frequent causes of downtime is a failure of the underlying server hardware. One of the early promises of virtualization is the ability to keep virtualized systems online and operational regardless of problems with the underlying hardware by allowing the virtual machine to run on any host in the virtual environment. If an individual host fails, it's no problem because the virtual machine can be run on another host with little to no downtime. High availability is a design methodology used to ensure the uptime and availability of virtual machines. Generally speaking, there are two types of downtime that are mitigated by high availability provided by virtualization technologies: Planned downtime Unplanned downtime The chapter will discuss different methods of providing high availability in a virtual environment as well as how to maintain and operate the virtualization hosts. It will also cover some of the pitfalls of building a high availability infrastructure. Understanding High Availability Before any discussion of how to provide for high availability can occur, you must first understand what high availability is and distinguish between planned and unplanned downtime. Planned downtime is downtime that has been scheduled and is expected in the environment. It is typically caused by system maintenance that, while disrup- tive to the overall system, usually can't be avoided. Reasons for planned downtime range from applying patches or configuration changes that may require a reboot to upgrading or replacing hardware. One benefit of planned downtime is that it can be more easily managed in order to minimize disruption. In many cases, as we will examine, virtualization can actually provide for zero downtime. On the other hand, unplanned downtime is just the opposite of planned downtime. Unplanned downtime typically results from things like power outages, hardware failures, software crashes, network connectivity failures, security breaches, and operating system failures. While unplanned downtime cannot be easily predicted, it can be more easily recovered from in a virtual environment than in a physical environment.