When Time Began: The Mainframe Era
The Mainframe Era marked the beginning of commercial computing. Looking from the vantage point of how computing evolved post the Mainframe Era, we can observe what might be the problems we would encounter going back to mainframes again. The age after the mainframe was marked by the networked minicomputer (or server). The generally-perceived advantage of this was to move computing resources out of the glass house and closer to their users. Another significant advantage was enhanced fault resilience. Now, if the mainframe went down for some reason, not everyone in the organization was affected: the local network of minicomputers would still provide any local services which did not depend on the mainframe. Conversely, the crash of any server in this network only affected its local users, not the entirety of the organization. The ultimate end point of this expansion into networks was the total decentralization of services, resulting in the decommissioning of many of the central mainframes and the nearly complete reliance instead on a distributed network of servers. However, the essential problem of a distributed network of servers, which virtualization promises to solve, is that they are hard to find (not centrally located) and hard and costly to manage (most are running operating systems which not amenable to easy remote management). Worse still, if something goes wrong with the hardware or the operating system, there is limited remote diagnostic ability, meaning that the machine has to be manually found at the remote site and the problem analyzed and corrected. While new remote management system technologies help alleviate some of the administration burden, issues of server proliferation and remote accessibility continue to exist.
Computing Grows Up: The Server Age
The server age, therefore, heralded unparalleled management headaches. So great were these that after the initial heady decentralization which saw servers running in any available, network connected space, most business critical servers were tracked down and forcibly repatriated to the old glass houses (or what now became the modern server room) where at least they could be corralled so that the remote management nightmare was considerably lessened. However, the management problem still isn't eradicated: just because you have twenty odd servers physically located in the same place doesn't mean that you have the expertise to cope with all failures that can still occur. This aspect of the management problem occurs because the servers that replaced the mainframe were likely purchased over a considerable span of time, often from different manufacturers. Differences in internal components, Basic I/O Subsystem (BIOS) configuration and software configurations make diagnosing and fixing problems that may occur in the aging server very difficult and mandates the building of large amounts of in-house expertise. In many large organizations, the server management problem has become the single largest concern of IT departments. Even in small and medium-sized businesses, concern is growing about the multiplicity of server types in the environment and how they can be effectively managed and repaired without affecting business critical operations.
The Future: The Promise of The Virtualization Age
The promise of the virtualization age is that of server consolidation: all those individual servers in the machine room can become "virtual" servers running on a single (very powerful) physical machine. This solves the management problem because now there is only a single physical machine to understand. Well, that's the theory: in practice one also needs to understand the virtualization environment; however, that's still only two pieces of knowledge as opposed to the much broader knowledge set required to understand the original multi-server environment being replaced by the virtualization setup. To understand exactly what this replacement entails, we must examine the nature of a virtualized environment.
Understanding the Virtualization Environment
The first thing you need to understand when choosing a virtualization environment (VE) is that they come in two flavors:
Standard Virtualization: This presents a set of known device drivers to the operating system running inside the VE. Note that the devices presented by the VE are often not the actual devices present on the platform, but are emulated by the virtualization system to look like real devices. The advantage of doing this is that the operating system uses its standard device drivers to drive these pseudo devices, and thus no modifications to the operating system are required. Any standard operating system may run in this type of environment. The disadvantage is obviously that two levels of device drivers are involved: one that the operating system uses to drive the pseudo-device and the one that the virtualization environment uses to emulate the pseudo-device. This increases the complexity of the I/O path, and very often slows it down.
Para-Virtualization: This presents a set of para-virtual devices to the operating system which require special drivers to operate. This "special" set of devices aren't found ordinarily in the operating system, and thus the operating system itself requires modifications to talk to them. Since the operating system is being modified anyway, often additional changes are made to it to make it run more efficiently within the VE. Although the necessity of modifying the operating system appears at first sight to be a significant drawback, the resulting efficiency of the virtualized operating system is often an overriding justification for taking this route. Additionally, because the para-virtual device drivers are crafted exactly for the VE, they are often as efficient as the operating system driving the hardware natively.
There is a third class of virtualization type coming, and that is virtualized hardware. In this scenario, a hardware card itself is expecting to be driven simultaneously by multiple virtualized operating systems. The virtualization software merely presents the virtualized device instances to the operating system to be driven by the device driver provided by the operating system (although the native device driver usually has to be enhanced to understand the hardware virtualization). This type of hardware virtualization promises to blur the distinction between Standard and Para virtualization in the field. Additionally, even for hardware that might not be thought of as natively virtualized, the major processor manufacturers are adding virtualization technologies to their chipsets (Intel with its VT architecture and AMD with Pacifica) which promises to erase the Standard vs Para distinction altogether.
A Comparison of a Virtualization Environment and a Mainframe
A Virtualization Environment and a Mainframe are very similar from the point of view of being "just a large machine", and that's not all they have in common: In an effort to make mainframes relevant to the modern world, mainframe manufacturers became the first true pioneers of virtualization (and the first business groups to tout the benefits of server consolidation). The current generation of virtualization technology is really "second wave": moving virtualization from the province of highly specialized (and expensive) mainframes to running on commodity systems. However, one of the chief disadvantages comes from the very fact that the virtualization environment is now running on commodity hardware. Although this might be cheaper by factors of ten to one hundred over the old mainframes, the flip side is less individual tailoring and burn-in testing. Thus, the failure potential of commodity hardware is far higher than that of the mainframe.
There is also an additional disadvantage inherent in the commodity environment: diversity. Although diversity is often a good thing; in hardware terms, the extreme diversity of so called commodity hardware results in a plethora of device drivers for that hardware (and, indeed in Open Source operating systems, the risk that some of the hardware will not even have device drivers available). Whether you regard this diversity of hardware as a good thing or a bad thing, it is certain that device drivers ? in both open and closed source operating systems ? are the single most significant source of operating system faults. Since both standard and para virtualization, the virtualization software itself actually contains the "real" device driver, this type of fault can still potentially bring down the virtualization layer, and thus potentially every virtual machine running on the box.
So what are the lessons we can learn?
The lessons of virtualization are several: First, the very act of virtualizing servers increases the vulnerability of your application environment both to hardware failure and to driver faults. Second, the consequences of these faults when they occur will be much more catastrophic than when the environment was distributed among a larger pool of servers, since all of the virtualized servers will be taken down with a single machine or driver failure. Therefore, while virtualization may solve the management problem for servers, the cost of doing so is to increase the potential and scope of failures in the enterprise, thus causing an availability crisis.
Solving the Availability Crisis
The beauty of this problem is that the solution is the same as it was in the many-server environment: High Availability Clustering. High availability clustering software is designed to take a group of servers and ensure that a set of services (application, database, file shares) is always available across them. This same paradigm applies in a virtualized environment with the single caveat that you must still have at least two physical machines to guard against failures of the hardware or virtualization environment. In general, since high availability software is designed to run on servers, mostly it will run unmodified in the virtualized server environment, so if you used high availability software in your original environment, it will be perfectly possible to use the same software in your virtualized environment. The only caveat is that the high availability software should be configured so that every service has its backup on a separate physical machine. Thus, the virtualization setup desired to achieve the benefits of server consolidation without sacrificing protection against unplanned outages is two physical machines, each initially running about half of the virtual machines, and each acting as a failover target for the services which it doesn't run.
Choosing a high availability clustering software solution which monitors the entire application stack (application services, database, client and network connections as well as the OS, virtualization layer and underlying hardware) provides the highest levels of protection against crippling downtime.
In studying the impact of migrations to virtualized environments, we can find lessons from previous cycles in the computer industry. However, the primary points to bear in mind are:
1. Virtualization is not High Availability. It is a solution for the server management problem, not a solution for the service availability problem.
2. If carried too far, virtualization can in fact lead to a decrease in the availability of your services, not an increase.
Therefore, the deployment of a High Availability solution becomes much more critical in a virtualized environment. Since deploying a high availability solution will likely require a modification of the virtualized configuration (i.e. you need two virtualization servers, not one), plans for implementing virtualization should include High Availability planning from the outset of the design stage.
By combining server virtualization with high availability clustering, IT organizations can realize the benefits of increased manageability and savings from server consolidation without risking increased downtime for business-critical applications.