|
LISA 2001 Paper   
[LISA '01 Tech Program Index]
Solaris Bare-Metal Recovery from a Specialized CD-ROM and Your Enterprise Backup SolutionAbstractThe bane of all system administrators is the crashing of a mission critical system. Further depression sets in when the crash has caused the boot disk to become corrupted and no longer possess the ability to boot the system. A crashed mission critical system requires immediate attention. Literally, every moment of downtime equates to lost revenue. The desire is to get the mission critical system back to its normal functioning state in the quickest amount of time. The rebuilding of a computer system that has lost its capability to boot is known as a bare-metal recovery. The system will need to be built starting from a bare disk drive up to a bootable and functioning operating system. Not until the release of Solaris 8 did the Solaris operating system contain a utility for providing bare-metal recovery functionality. Such functionality can be found in the operating system of other UNIX variants, such as IBM AIX, HP HP-UX, and Linux. However, by making use of the following Solaris utilities: ufsdump, ufsrestore, dd, cpio, tar, format, prtvtoc, fmthard, installboot, and JumpStart, bare-metal recovery can be achieved. Furthermore, it can be achieved by using a custom-built bootable CD-ROM (or DVD-ROM) and your environment's networked enterprise backup solution, such as Legato NetWorker and Veritas NetBackup. IntroductionThere is an ever-increasing demand to have mission critical computer systems run 24 hours by 365 days a year. When a mission critical system suffers from corrupted disk data, and when the disk drive in question happens to be the boot drive, the situation becomes more involved. One cannot simply run the backup application to restore the system because a functional operating system no longer exists on the system. This type of disaster recovery is known as bare-metal recovery. (The bare-metal comes from the early days of computing when storage media was a metal core. Nowadays, storage media is commonly magnetic or optical.) Under such dire circumstances, rebuilding a boot disk can take what seems like an inordinate amount of time. The operating system needs to be installed, standard packages (clusters) installed, customized packages installed, patches applied, kernel tuning, and finally any special or custom configurations need to be set. To perform all of these tasks could take up to a couple hours. Even using Sun Microsystems' automated installation utility, JumpStart, can still take what might be considered too much time, because following the installation of the operating system and any customized packages, the JumpStart process still needs to apply desired patches. The application of a large set of patches can take up to a couple of hours. The process is not totally complete because following the installation of the patches, any variable data will still need to be restored to the system using the environment's backup solution. Until Web Flash in Solaris 8, Sun Microsystems did not provide a complete disaster recovery tool as an integral part of the Solaris product. As opposed to the IBM AIX UNIX environment, which has mksysb, and the Hewlett-Packard HP-UX environment, which has Ignite-UX and make_recovery. While not providing a formal disaster recovery (cloning) tool, Sun Microsystems does provide some very useful utilities to aid in the process of disaster recovery. Specifically, these utilities are ufsdump, ufsrestore, dd, cpio, tar, format, prtvtoc, fmthard, installboot, and JumpStart. When combined, these utilities can produce a very powerful disaster recovery tool. This paper will discuss the steps needed to perform a bare-metal recovery on a Solaris system. In so doing, this paper describes a tool that enables the system administrator to perform a timely rebuild of a crashed system to the state of its last successful backup by using a custom-built CD-ROM and the environment's networked backup solution. Currently, the tool discussed in this paper is built to handle Solaris systems within an environment that uses Legato NetWorker or Veritas NetBackup. This tool can be extended to handle other shrink-wrapped or homegrown backup products. Background InformationThis paper is a by-product from the paper, Unleashing the Power of JumpStart: A New Technique for Disaster Recovery, Cloning, or Snapshotting a Solaris System, [1] presented at the 14th Annual LISA Conference, (LISA-2000). The LISA-2000 paper and associated presentation discussed in detail how to create the ``Capture and Restore Tool,'' (CART), a tool for capturing the image of a Solaris system onto a set of CD-ROMs, with the first CD-ROM volume in the set being bootable. Using this set of CD-ROMs, a server could be readily rebuilt onto the same hardware or cloned onto like hardware consisting of disk drives of the same or larger size as the original source system. The CART was achieved by using the same technique Sun Microsystems uses for installing the Solaris operating system from its installation CD-ROM. The technique incorporates a specialized JumpStart mechanism built in the CD-ROM. By using the CART technique, it is possible to build a bootable CD-ROM (or DVD-ROM) that will automatically invoke a customized script whose function is to perform a bare-metal recovery using the latest filesystem backups captured for that system by the environment's backup solution. The CART restored the system's data from a set of CD- ROMs created during the capture phase. This new tool performs the bare-metal recovery by restoring data over the network from the environment's backup solution. This new ancillary tool for performing the bare-metal recovery of a Solaris system is referred to as the Bare-metal Ancillary Recovery Tool, or BART. The concept of the BART was first announced at LISA-2000. At that time, the tool was under development and showed promising results. Since then, several requests from System Administrators and Backup & Recovery professionals have rolled in regarding the availability of this tool. At this time, we are now prepared to fully disclose the details of the BART, how it was built, and how it can be customized to address the idiosyncrasies of a particular environment. Basic Bare-metal Recovery on a Solaris SystemThe use of any disaster recovery or bare-metal recovery tool does not replace the need for a complete Disaster Recovery Plan (DRP). Devising a good disaster recovery plan is hard work. It needs to be built from the ground up and it can take years to perfect. Since computer environments change constantly, the DRP must continually be tested to ensure it still works in the changed environment. W. Curtis Preston's O'Reilly book [4] is a great resource on backup, recovery, disaster recovery, and bare-metal recovery. He walks the reader through the what, when, why, how, how many, and how often data on a system should be captured prior to the need for restoring it. Since the topic of bare-metal recovery has been covered by other sources, specifically [3] and [4] listed in the references section, this paper will not discuss the topic in great detail. Instead, a list of pertinent steps for accomplishing bare-metal recovery on a Solaris system is provided. Prior to the Disaster
After the Disaster
The Bare-metal Ancillary Recovery Tool - The ``BART''The Bare-metal Ancillary Recovery Tool, BART, consists of a single bootable CD-ROM. Like the CART, the BART contains a trimmed down version of the Solaris Software Installation CD-ROM. To create the BART, selected files were copied from the Solaris Installation CD-ROM to a read-writable hard disk drive. Scripts were borrowed from the CART project to accomplish the task of copying files to the ``image disk.'' This disk drive is known as the ``image disk'' because once the disk is put into its desired configuration, it becomes the image that gets written to a recordable CD (CD-R or CD-WR). Since, the maximum capacity of a CD-ROM is about 650 MB, a one GB disk drive is adequate for the ``image disk.'' The full layout, sizes, and description of the slices for the BART ``image disk'' are displayed in Diagram 1. Description of the BART ``Image Disk'' slicesSlice s0 contains a trimmed down version of the Solaris Installation CD-ROM slice s0 and is present solely to enable the JumpStart mechanism. Slice s1 contains the mini-root along with an adequate set of UNIX utilities. Upon booting the BART CD-ROM, slice s1 gets placed into memory and contains the mini operating system so that the system can function even though there is not anything on the disk drive(s) yet. The customized portion of the boot process accomplished through the custom JumpStart BEGIN script will eventually load the backup software package(s) into slice s1. At that point, the system will be able to access functioning backup software needed to accomplish the restore of the entire boot disk using the environment's backup solution. Slices s2-s5 contain the boot information (bootblock) for the various hardware architectures of Sun Microsystems' products that run Solaris. The following file is also contained in these slices: .SUNW-boot-redirectwhich simply contains a single byte, the character `1', to direct the firmware boot PROM program to look for the kernel on slice 1 of the boot device.
Slice s6 is for future enhancements and could contain environment specific configuration and profile files to minimize or eliminate user interaction with the BART during a bare-metal recovery. Lastly, slice s7 contains compressed tar files of the backup software packages, such as Legato NetWorker and Veritas NetBackup. These files get uncompressed, untarred, and placed into the appropriate directories resident in memory by the customized JumpStart BEGIN script. Location of the Pertinent JumpStart files on the ``Image Disk''The pertinent JumpStart files that allow for the Solaris Install-like boot process to take place are located on the ``Solaris Installation CD-ROM,'' and now the BART ``image disk,'' in the following location (Solaris 2.6 is used in this example): /s0/Solaris_2.6/Tools/Boot/usr/ sbin/install.d/install_config The pertinent standard issue JumpStart files found in this location are the following:
The BART will replace the rules.ok and the install_begin files with its own versions. It is important to name the BEGIN script the same as the BEGIN script being called out in the customized version of the rules.ok file. Similar to the CART, the BART does not need to call out a FINISH script, and thus, the devsyn_finish is removed from the ``image disk.'' Correspondingly, the rules.ok file does not make reference to a FINISH script. Since, the customized BEGIN script ends with a reboot of the system, even if a JumpStart profile or JumpStart FINISH script were placed in the rules.ok file, they would never get invoked. Customized JumpStart BEGIN script actionsUpon booting from the BART CD-ROM, a mini-root operating system gets placed on the target system. The BART then proceeds through the customized JumpStart BEGIN script, ``bart_begin'' to perform the bare-metal recovery. Some of the more salient actions performed by this BEGIN script are outlined below:
Requirements of the BART
How the BART Functions in the Networked EnvironmentDiagram 2 depicts how the BART works in a networked environment. The steps to rebuild or clone a target system involve the following:
Testing the BARTThe authors have used and customized the BART at various Fortune 500 clients, especially W. Curtis Preston, who has been involved in designing and implementing their Enterprise Backup & Recovery and Enterprise Disaster Recovery solutions. The BART has been a proven success at the clients where it has been employed. Limitations of the BART
ConclusionFor large enterprise sites with elaborate disaster recovery plans that include data mirrored to remote locations, the BART may not prove to be of value or of need. However, for the small to medium sized enterprise where budget constraints have not allowed for the desirable disaster recovery plan, or for the large site that does not have elaborate disaster recovery plans implemented, this tool may very well prove to be a life saver. Similar to the CART, the BART can also be implemented on a networked JumpStart Server. However, the BART was specifically developed for the consultant who specializes in Backup and Recovery administration. By building the BART on a bootable CD- ROM, this professional can use it at any client site without having to setup or get familiar with an existent JumpStart server in the client's environment. ResourcesThe following freeware products from Joerg Schilling [9] were used in the development and implementation of the BART:
A ``Smart and Friendly'' CD-RW 426 Deluxe CD-Recorder was used in the development and final implementation. There were not any issues encountered with the installation or use of the cdrecord products or with the use of the ``Smart and Friendly'' CD-RW 426 Deluxe CD-Recorder. Both of these products receive a high endorsement from the authors. Other CD-R recording hardware and software products (i.e., Young Minds, Inc., HyCD, Gear to name a few) could have been integrated into the CART as well. However, the price of cdrecord and its associated products could not be beat. Also, of great value to the development of the BART is the following freeware product from Matthew R. Green:
AcknowledgementsWe thank The Storage Group, Inc. for proposing the initial concept of the BART and for providing the equipment necessary to design, build, and test it. We thank the ``Publications Group'' of Collective Technologies for providing editing expertise for this paper. We thank Adelaida Esquivel for also providing editing expertise for this paper. We thank Joerg Schilling whose freeware products were indispensable in the creation and final product of the BART due to their ease of installation, use, and an unbeatable cost. Author InformationLee ``Leonardo'' Amatangelo was graduated from the University of California, Irvine in 1983 with a B.S. in Molecular Biology and in 1985 with a B.A. in Anthropology. He has been working in the computer industry since 1981. Currently, he is a systems management consultant specializing in Solaris and disaster recovery for Collective Technologies. He can be reached via email at leonardo@colltech.com or lamat@earthlink.net and by physical mail at Collective Technologies, 9433 Bee Caves Road, Building III, Austin, TX 78733. W. Curtis Preston is the President of The Storage Group, Inc. (https://www.thestoragegroup.com), and has been specializing in storage for over seven years and has designed, implemented, and audited enterprise-wide backup and recovery systems for many Fortune 500 and e-commerce companies. His O'Reilly & Associates book, UNIX Backup & Recovery, has sold over 20,000 copies, and he writes a regular column for UnixReview online and SysAdmin magazine. Curtis is also the webmaster for backupcentral.com, and can be reached at curtis@thestoragegroup.com. References[1] Amatangelo, Lee ``Leonardo,'' ``Unleashing the Power of JumpStart: A New Technique for Disaster Recovery, Cloning, or Snapshotting a Solaris System,'' LISA XIV Conference Proceedings, 2000.[2] Kasper, P. A. and A. I. McClellan, Automating Solaris Installations - Custom JumpStart Guide, SunSoft, Prentice Hall, 1995. [3] Nemeth, E., G. Snyder, S. Seebass, and T. Hein, UNIX System Administration Handbook, Second Edition, Chapter 9, Prentice Hall, 1995. [4] Preston, W. Curtis, Unix Backup & Recovery, O'Reilly and Associates, Inc., 1999. [5] Sun Microsystems, Solaris 2.6 - Solaris Advanced Installation Guide, Revision A, Mountain View, CA, Part No. 802-5740-10, August 1997. [6] Sun Microsystems, Solaris 8 - Advanced Installation Guide, Mountain View, CA, Part No. 806-0957-10, February, 2000. [7] Zuberi, A., ``JumpStart in a Nutshell,'' Inside Solaris, Chapter 1, February, 1999, [8] https://www.fadden.com/cdrfaq/faq00.html#[0-1]. [9] https://www.fokus.gmd.de/research/cc/glone/employees/joerg_schilling/private/. [10] https://www.smartandfriendly.com/. [11] https://www.ymi.com/. |
This paper was originally published in the
Proceedings of the LISA 2001 15th System Administration Conference, December 2-7, 2001, San Diego, California, USA.
Last changed: 2 Jan. 2002 ml |
|