USENIX Technical Program - Paper - Proceedings of the 12th Systems Administration Conference (LISA '98) [Technical Program]

The Evolution of the CMD Computing Environment:
A Case Study in Rapid Growth

Lloyd Cha, Chris Motta, Syed Babar, and Mukul Agarwal - Advanced Micro Devices, Inc. Jack Ma and Waseem Shaikh - Taos Mountain, Inc. Istvan Marko - Volt Services Group

Abstract

Rapid growth of a computing environment presents a recurring theme of running out of resources. Meeting the challenges of building and maintaining such a system requires adapting to the ever changing needs brought on by rampant expansion. This paper discusses the evolution of our computer network from its origins in the startup company NexGen, Inc. to the current AMD California Microprocessor Division (CMD) network that we support today. We provide highlights of some of the problems we have encountered along the way, some of which were solved efficiently and others that provided lessons to be learned.

The reengineering of computer networks and system environments have been the subject of numerous papers including [Harrison92, Evard94b, Limoncelli97]. Like the others, we discuss topics related to modernization of our systems and the implementation of new technologies. However, our focus here is on the problems caused by rapid growth. With increasing requirements for more compute power and the availability of less expensive and more powerful computers, we believe that other environments are poised for rapid growth such as ours. We hope that lessons learned from our experience will better prepare other system administrators in similar situations.

Introduction

The California Microprocessor Division of AMD was formed from the merger of the Battery Powered Processors group of AMD and the newly acquired NexGen, Inc. at the end of 1995. The NexGen computing environment circa early 1995 was based primarily on Solbourne workstations and Thicknet (10base5) cabling. Over the past three years we have grown and modernized our network into our current network of nearly 1000 nodes. We have added over three terabytes of disk space and added over 500 UNIX compute servers. Our network has been through two major revolutions, incorporating technologies such as 10base5, 10baseT, ATM, CDDI, FDDI, 100baseT, and Fast EtherChannel at various points along the way. Our passwd file has grown from 433 entries at the beginning of 1996 to over 700 entries today. That number may be small by standards set by today's internet service providers (ISPs); however, the resource requirements of a typical user in our environment are far greater.

The account that follows is roughly in chronological order. We provide the reader with extended discussions on topics related to the growing pains of our compute environment.

Out of Money

Startup companies frequently have severe spending limitations, and NexGen prior to the merger with AMD was no exception. Spending authorization was handled by upper management, which had little interest or experience in dealing with large scale UNIX system environments. The early systems administration team had little contact with upper management, and was hard pressed to get approval to spend money for changes to the environment or downtime to make much needed adjustments to the network or systems.

Rectifying this situation required gathering copious amounts of information on systems issues, coming up with solutions for the problems discovered, and quantifying the loss and potential loss of revenue or schedule delays to the business as a whole. To do this analysis properly, we recruited finance and engineering management into the project, and accompanied middle managers to budget meetings to answer questions and to reinforce the business case for system improvements. The increased contact between the systems administration team and upper management led to a perception of the team as business aware, and developed a trust relationship that eased future project approval.

However, the available capital was still in short supply. Verification of x86-based microprocessors involves many many cycles of simulation to ensure complete functionality and compatibility. The NexGen engineers were hungry for as many CPU cycles as we could provide. Our solution was to build up a farm of machines based on NexGen's own Nx586 CPU running Linux. The CPUs were obtainable with very low overhead and the peripheral hardware required was inexpensive. Demonstrating a commercial large-scale Linux installation also provided public relations benefits.

The Linux solution turned out to be short-lived. With the introduction of the UltraSparc processor, Sun was able to tilt the price-performance ratio back in their favor. In addition, the Sparc based processors were able to run almost all of our preferred CAD applications, while the Linux based machines were capable of running only a limited number of simulation programs. Fortunately, the merger of AMD and NexGen brought an immediate influx of cash which was used to purchase new machines rapidly in large quantities.

Recent developments in the microprocessor market, including the contributions of our own division, are swinging the price/performance pendulum back in the other direction. Our sister division in Austin, TX currently runs a compute farm based on AMD-K6 CPUs running Solaris x86 [AMD97]. Whether our compute farm of x86 computers was an idea ahead of its time is a subject still open to debate and is beyond the scope of this paper.

Out of Control

The early NexGen computing environment suffered from inexperienced system administration and a lack of centralized control. There were a dozen different NIS domains, partially because of an intention to separate groups of users, but also as a result of a misunderstanding of how NIS works. Since most users ended up having to have access to almost all of the domains, a conglomeration of scripts was used to manually synchronize the various NIS domains from one master domain. One of our first tasks was to merge all the domains into one.

During this period, NexGen also lacked a department responsible for deployment of CAD applications software. Such tasks were left to the whims of the individual design groups. Design engineers were forced to sacrifice valuable time installing vendor software to varying degrees of success. Applications were often installed in user home directories. Poorly written .cshrc files circulated among different design groups, the use of incorrect versions of tools was epidemic, and the presence of multiple copies of identical software was fairly common.

The implementation of standard "setup scripts" for CAD applications was key to improving this situation. We devised a scheme in which a single setup script would be used for each vendor's tool suite. The setup script would handle any path and environment variable configuration needed. While crude, these scripts achieved our immediate goal of having centralized control over application use.

The users' .cshrc files source the setup scripts for the desired tools. An example of such a script is given in Figure1a. A snippet of a user's .cshrc using this script is given in Figure1b. /usr/local/bin/get_arch is a short script which returns the operating system being used - sunos, solaris, hp9, hp10, or linux in our example. The optional amdpostpath and amdprepath variables are used to eliminate excessive path rehashes when multiple setup files are used. If these variables are set, the user's .cshrc is expected to use them to build the appropriate path variable after all the desired setup scripts have been sourced. If amdpostpath and amdprepath are not set, the setup scripts will append or prepend directly to the path variable.

#  $Id: k6system.figures,v 1.4 1998/09/28 21:02:00 lccha Exp $
#  setup_cadvendor_1.0  -
#     A setup file for cadvendor
# Check for conflicts:
if ($?SETUP_CADVENDOR) then
 if ($?prompt) then
   echo "WARNING: setup_cadvendor_1.0 conflicts with"
   echo "setup_cadvendor_$SETUP_CADVENDOR already sourced in this shell"
 endif
endif
# Find architecture of platform
if (-x /usr/local/bin/get_arch) then
   set ARCH_FOR_SETUP = '/usr/local/bin/get_arch'
else
   set ARCH_FOR_SETUP = "unknown"
endif
switch ($ARCH_FOR_SETUP)
   case 'sunos':
      if ( $?amdpostpath ) then
         set amdpostpath = ($amdpostpath /tools/cadvendor/1.0/bin)
      else
         set path = ($path /tools/cadvendor/1.0/bin)
      endif
      breaksw
   case 'solaris':
      if ( $?amdpostpath ) then
         set amdpostpath = ($amdpostpath /tools/cadvendor/1.0/bin)
      else
         set path = ($path /tools/cadvendor/1.0/bin)
      endif
      breaksw
   case 'hpux9':
   case 'hpux10':
      if ( $?amdpostpath ) then
         set amdpostpath = ($amdpostpath /tools/cadvendor/1.0/bin)
      else
         set path = ($path /tools/cadvendor/1.0/bin)
      endif
      breaksw
   case 'linux':
      exit
      breaksw
   default:
      exit
      breaksw
endsw
# Setup version variable
setenv SETUP_CADVENDOR 1.0
# Setup license files
if ( $?LM_LICENSE_FILE ) then
   if ( "$LM_LICENSE_FILE" !~ *"1700@key"* ) then
      setenv LM_LICENSE_FILE \
         ${LM_LICENSE_FILE}:1700@keya,1700@keyb,1700@keyc
   endif
else
   setenv LM_LICENSE_FILE 1700@keya,1700@keyb,1700@keyc
endif

Figure 1a: Sample setup file for fictitious cadvendor (software version 1.0).

set amdpostpath
set amdprepath
foreach vendor (cadvendor mentor cadence avanti)
   if (-e /usr/local/setup/setup_$vendor) then
      source /usr/local/setup/setup_$vendor
   else
      if ($?prompt) then
         echo "WARNING: setup_$vendor does not exist ... skipping"
      endif
   endif
end
set path=($amdprepath $path $amdpostpath)

Figure 1b: Snippet of code from user's .cshrc.

Some general guidelines governed the use and creation of our setup scripts:

path and environment variables that are not vendor specific will be appended to and never overwritten.
no assumptions should be made with regard to the order in which the setup scripts are used by users.
the sourcing of multiple setup scripts belonging to one vendor (e.g., setup_cadvendor_1.0 and setup_cadvendor_1.1) is not supported. Some checks are put in to prevent such misuse.
the setup scripts are named by vendor and version number (e.g., setup_cadvendor_1.0). A symbolic link will be provided to the default version (setup_cadvendor --> setup_cadvendor_1.0).

We could now ensure that obsolete and duplicate installations could be eliminated without leaving users in limbo. More importantly, it facilitated moves and changes to the application trees without requiring users to alter their own login files. The setup scripts allowed centralized setups for vendor tools while still allowing users freedom to customize their individual login environments. Once users had been converted to the setup script paradigm, we eliminated redundant installation of tools and performed proper reinstallation of haphazardly installed software.

There are several known limitations of our method. We currently only support csh based shells. There was very little demand in our user community for anything other than csh or tcsh, so we haven't been motivated to spend much effort in supporting other shells. Changes to setup scripts or tool versions are only reflected when the user's .cshrc file is executed. The login process is also relatively slow even with amdpostpath and amdprepath variables set. We have received a few complaints about this, but none demanding enough to make improvements to it a priority. In most cases, our users were so relieved to have a simple and relatively foolproof tool setup environment that they were willing to overlook these shortcomings.

We had also considered using wrapper scripts written in C-shell or Perl. In practice, we found that they take more effort to maintain when new versions of software are installed. Many of our CAD packages consist of numerous binaries which would all require individual wrappers or a link to a common wrapper. The composition of executables in the package can change frequently from version to version, forcing the system administrator installing the software to carefully inspect each release to ensure that all necessary wrappers are in place.

Some alternative methods have been presented in [Rath94], [Evard94a], and [Furlani91]. We may look at implementing some of the ideas presented in those papers in the future. However, feedback from our users has indicated that our method is currently adequate and therefore improvements have not been a high priority.

Out of Disk Space

Prior to 1995, NexGen's data was distributed on a variety of Sun, Solbourne, and HP700 servers which were all cross mounted via NFS hard mounts to each other. This type of design results in a network with multiple points of failure, each of which can affect performance and availability of the entire network. In addition, backup of many small servers tends to be cumbersome, requiring backups running over the network or the installation of numerous local tape drives.

Early 1995 marked the arrival of NexGen's first large centralized fileserver, an Auspex NS5000. Bristling with a dozen network interfaces, it was able to provide reliable file service to all machines on the network without requiring any network router hops. Data from the various "servers" was migrated to the central fileserver and network reliability improved accordingly.

We made some decisions in the implementation of this first Auspex that we would later regret. We opted to use automount "direct" maps rather than indirect maps based on field reports from other Auspex customers that a large quantity of indirect map entries could cause overloads on the Auspex host processor. This denied us the flexibility and scalability that indirect maps would have given us. We limited the partition size to 5GB per partition due to limitations of the backup technology we were using at the time, Exabyte 8500 8mm tape drives driven by shell scripts using dump. Each partition contained a mixture of project data, user home directories, applications, and temporary data.

The mixture of different types of directories on shared partitions combined with the relatively small size of the partitions was a nightmare to administer. Large amounts of scratch data often filled up partitions causing critical design data to be lost or corrupted. Large vendor application installations had to be awkwardly distributed among several disks.

The merger with AMD in 1996 brought a second Auspex server to our network. This gave us an immediate opportunity to revise our disk usage strategy based on our previous experience. We opted to dedicate individual partitions base on their use. We designated four general categories of disk use:

Applications
User home directories
Project directories
Critical project directories.

Partitions were individually sized based on needs. Critical project directories were allocated a sufficient amount of free buffer space to minimize disk full situations. At the opposite extreme, application directories were permitted to operate with very little free space in order to minimize waste. Access to these directories was provided by indirect maps. As of this writing we have scaled this plan to eight fileservers, each serving roughly 500 gigabytes of disk.

To monitor disk usage and to give advance warnings of partitions getting full, we wrote a disk monitoring script in Perl to be run hourly by cron. We started by using a simple script that would parse the output of the df command and generate e-mail when any disk fell below a given threshold. This script did not keep any historical data and therefore was not able to detect the difference between a partition that was rapidly nearing capacity from a partition that had inched across the forbidden threshold. As a result, this script generated too many nuisance e-mails which rendered the important warnings useless.

To solve these problems, we wrote highly configurable Perl script with the following features:

Rules based notification - rules are expressed in Perl syntax and are based on a comprehensive set of conditions tracked by the script.
Comprehensive set of conditions - rules for notification can be based on any combination of the following conditions:
- amount of free disk available
- percentage of free disk available
- time of day
- time of last notification
- various calendar data (month, day, year, day of week)
- change in free disk available since last notification
- change in free disk available since last run of script
History database - disk usage information and a record of previous notification sent is kept. This allows the frequency of warning messages to be tuned to how quickly the disk is filling up.

Out of CPU

As product schedules became tighter and tighter, our need for faster turnaround times on our simulation and verification runs became even more critical. The brute force solution of purchasing more computers was a key part of our solution to this problem. But this alone would not allow us to reach our goal. We needed a way of using the available CPUs more efficiently. We elected to use Platform Computing's LSF (Load Sharing Facility) product to help us reach our goal of having every available CPU in use at all times.

Our model was based on having the most powerful servers located in the computer room. We deployed less powerful (i.e., less expensive) workstations or x-terminals on user desktops and encouraged users to submit all jobs, including interactive ones, to the server farm. Keeping the powerful machines off people's desktops helped prevent large jobs from causing performance problems for interactive users and reduced problems caused by "possessive" users that would complain about any background jobs running on their machines.

Our simulation jobs were typically CPU bound with minimal disk and memory requirements. Achieving maximum CPU utilization was therefore the key objective. We deployed LSF on nearly every available machine in the division as well as many machines "borrowed" from outside our division in order to maximize the number of CPUs available. Every additional CPU we were able to utilize contributed to an improved time to market for the products being developed by our division.

LSF allows job scheduling based on several factors, including available memory, load average, and the number of LSF jobs already running on the machine. Server room machines were configured to run one job per CPU at all times. Desktop machines are configured to run batch jobs when the load average and the idle time fall within specific parameters. The actual thresholds used were tuned based on user feedback. This allowed us to get maximal use out of idle desktop CPUs while keeping the console users happy.

Further details of our configuration beyond the scope of the paper can be found in [Platform97].

Out of Space

Computers require space. Lots of computers require lots of space. Setting up workstations lined up on tables and ordinary utility shelves will work for smaller installations, but for large installations there is no substitute for well-constructed racks. For flexibility, we opted for an "open-shelf" type of rack. We used these racks to stack servers up to limits allowed by local fire codes.

Our primary goals were high density, easy accessibility, and reasonable cost. Since appearance was only a minor consideration, and access to our server room is well-controlled, we were uninterested in enclosed cabinet style racks with doors and locks. Instead, we opted for basic 23" wide racks with 11" deep ventilated shelves bolted to them front and back for an effective depth of 22". This arrangement proved to be highly versatile, accommodating PCs, Sun Sparc machines from the Sparc20s through the UltraServer2 machines, and HP 735s and J-class servers. With most of the "pizza-box" style chassis, we were able to stack up to 14 machines on 7 foot high racks. Ladder racks across the top and bolts in the floor provided stability in the event of earthquakes.

Out of Power

In August 1996, during a critical part of CMD's product development schedule, we began to notice a high number of memory failures from our compute server ranch. This led us to suspect some sort of environmental problem. We first suspected that the machines were not being adequately cooled and ventilated. After determining that this was not the cause of our maladies, we then focused on possible fluctuations in the power supply. Bingo! Our facilities department discovered that the transformer outside of our building was operating at about double the rated capacity.

The news from our utility company was not good. We could either take eight hours of downtime to get the transformer replaced immediately at the utility's expense, or risk blowing the transformer and taking several days downtime to have it replaced at our expense. We opted to plan an orderly Saturday downtime to get the transformer replaced. In the meantime, to lessen the chance of a catastrophic transformer failure, we shutdown any equipment that was not absolutely necessary, including unused monitors, obsolete yet functional computer systems, and hallway lights. After about a week of working in a very dark building, we spent an entire weekend powering machines down and back up.

Moral of this story: check your power requirements carefully with your facilities department before it's too late. We had to pay the price with a hastily planned shutdown at a critical point in the project, but at least we were lucky enough to have not blown the transformer unexpectedly. A related requirement is to ensure that the cooling capacity of your air-conditioning system is sufficient to maintain server room temperature even on the hottest summer days.

Out of Network Bandwidth

Reaching capacity limits of our network was a persistent problem throughout our growth experience. Fortunately, as our network grew, better and faster network technology also became available. Our early network of bridges, hubs, and routers with shared segments and multiport collision domains gave way to a completely switched network composed primarily of Xylan Omniswitches by early 1996. In subsequent years, we were able to migrate our backbone from FDDI to ATM OC-3 technology, and our end stations from 10baseT to CDDI and 100baseT interfaces.

Figure 2a: Early network topology.

The early network topology (circa 1995) is shown in Figure 2a. Our first major upgrade was to eliminate the hubs of shared ethernet segments and replace them with ethernet switches, creating dual-port collision domains (i.e., one machine per shared ethernet). The resulting switch based network of Figure 2b served us well for approximately three years. It suffered somewhat from the irregular growth that characterized that period of our expansion. We were several times required to add hundreds of machines to our network with no downtime permitted, which left little leeway for major topological changes. As a result, the loads across the various subnets were poorly balanced.

Figure 2b: Switch based network (1996).

As the network evolved the main bottlenecks that limited our scalability were the numerous routers and the fileserver interfaces. Our new design attempted to eliminate as many of the router bottlenecks as possible. In order to do this, we attempted to flatten the network as much as possible. We had considered completely flattening the network into a single subnet, but we were not able to come up with a suitable topology that had sufficient bandwidth at the core of the network without using unproven technology.

Providing enough throughput to the fileservers would also prove to be a limiting factor in a completely flat network. We opted to use Cisco's Fast EtherChannel technology [Cisco97, Cisco98, Auspex98] to combine several full-duplex 100bT links into a single logical pipe or "channel." Currently a maximum of four links can be combined into one channel, which limits each logical interface to the fileserver to 800Mbps. In order to provide the desired throughput to each fileserver, we calculated a minimum requirement of three 800Mbps channels per fileserver. This implied a minimum of three subnets to avoid having the added complexity of managing a server having multiple interfaces on any one subnet.

The final design as implemented is shown in Figure 2c. The access (bottom) layer of switches provides connections to all of the client compute and desktop machines. The distribution (middle) layer consists of four Cisco Catalyst 5500 switches, each with a route-switch module to provide fast routing between subnets. These switches are responsible for traffic between the various access layer switches and all the routing between the subnets of our local network. Two Catalyst 5500 switches with router cards make up the core (top) layer, which tie together the various distribution layer switches. In addition, the core layer provides access to the routers that handle our external network traffic. All interswitch links use Fast EtherChannel, and every switch is connected to at least two devices in the layer above it for redundancy.

Our analysis indicated that roughly 85 percent of our network traffic was NFS related. NFS traffic is also particularly sensitive to latency, so special accommodations had to be made in the topology for the NFS fileservers. In our design, the fileservers were connected via 800Mbps Fast EtherChannel to the various distribution layer switches to minimize the number of switch hops required by the end stations. Each fileserver had an interface on each of the three subnets, ensuring that every NFS client workstation had access to a fileserver interface without needing a router hop.

Figure 2c: Current network.

Out of Time, Part I

Size does matter [Godzilla98]. Perhaps the most valuable resource in any computing environment is the system administrator's time. Large scale computing environments highlight the need for automation. As we added more and more machines to our environment, it became obvious that many of the methods currently in place were no longer acceptable. Manual procedures had to be scripted or automated in some way. Semi-automated procedures needed to be fully-automated.

The sporadic growth of our network and the constant flurry of moves, adds and changes within our network resulted in a chaotic state of network wiring. The task of tracing a machine to its network port often required two people to trace through the rat's nest of cables. A week's worth of cleaning up and untangling cables was often undone in less than an hour by an emergency machine relocation or cubicle swap. To solve this problem we developed a package of scripts that collected MAC address information from the arp tables of our network switches and correlated those addresses with the information from our ethers table to accurately report where machines were connected in the network.

We realized early into our bulk purchases of workstations that we would benefit greatly by keeping machine configurations as consistent as possible. The idea was to reduce the "entropy" as described in [Evard97] thereby minimizing debug time. We had loaded our first twenty workstations by cloning hard drives using a simple script that performed a dd from one disk to another. Once the disk copy completed, a post-install script was run to take care of the machine specific configuration.

While this technique is one of the fastest methods of loading the operating systems software, it was also expensive in administrator time. Pulling the hardware off the rack, swapping in the master disk drive, starting the cloning process, and then reinstalling the system would take a minimum of 15 minutes even in best case scenarios. Updating all the machines with this method would theoretically require several man-days and would probably require several weeks in practice.

We now employ a variety of network loading methods, Jumpstart [Sun98] for Solaris products, netdist [HP94] and Ignite-UX [HP98] for HP-UX 9.X and 10.X respectively, and Kickstart for Linux. The basic theory in each of these is similar:

Perform a diskless boot using bootpd or similar protocol. A miniature version of the operating system is loaded via tftp into the swap partition
Load the operating system onto the local disk
Run a customization script to load patches and handle local configuration information.

With these methods, administrator time is now reduced to assuming control of the machine and rebooting with the designated command to force a boot from the install server rather than the local disk. The machine then takes care of the rest. Typical time to load the operating system has increased to a few hours, mostly dependent on the amount of operating system patches involved. The penalty in load time is far outweighed by the benefit of savings in administrator time.

Out of Time, Part II

Our original server room configuration had keyboards attached to every machine and a terminal that was wheeled around on a cart to attach to the console port of any machine that required attention. This was both messy and inefficient.

We were able to solve this problem by using terminal servers with "reverse telnet" capability. This feature allows one to telnet from a remote host to any of the terminal server's serial ports on the terminal server. Connecting the terminal server's serial ports to the serial console ports of the compute servers enables one to telnet directly to the console of the machine.

The default configuration of terminal servers provides access to the network from the server's serial ports. In our case, we used the terminal servers in the opposite direction, to provide access to the serial ports from the network. Consequently, many changes to the default settings of the terminal servers were required to achieve the desired functionality. We list below the most significant changes we made to our Xyplex terminal servers. Other manufacturers probably have similar options:

port access mode: We set the port access mode to "remote" to allow connections to be initiated from the network side. This setting also instructs the terminal server not to output any data to the serial port other than the data being passed from the network port.
default session mode/telnet binary session mode: These should both be set as passall, which directs the terminal server to pass the data from the network port to the serial port without attempting to process the data.
telnet echo mode: This mode should be set to "character" to prevent the terminal server from buffering the data flow.

Our current setup uses twenty 40-port terminal servers. In order to manage the terminal servers and ensure consistent configuration we opted to boot the terminal servers from a single network boot server. This allows us to perform updates to the terminal server configurations by simply loading a new image file on the boot server and rebooting each of the terminal servers. In addition, each terminal has a local flash card that can be used for booting in the event that the boot server is unavailable.

We developed a package of shell and Expect scripts to map connections between the compute server console ports and the terminal server serial ports. Without these scripts, we would have been forced to rely on manually generated documentation. Given the frequency of server moves and changes in our environment, this documentation would have soon become outdated and useless without automatic updates.

Our Sun servers automatically use the serial port if a keyboard is not detected upon bootup. If a Sun machine receives a "break" signal on its serial console port, it halts execution and drops into ROM monitor mode. Unfortunately, power cycling of a terminal server attached to a Sun console port is often perceived by the host as a "break" signal. [Cisco98b] suggests working around this problem by preventing the Sun computers from halting when receiving a "break."

However, this would also prevent us from halting the unresponsive machine remotely by intentionally sending a "break" signal. Since our terminal servers are protected by the same UPS system that protects our compute servers, we decided that the benefit of being able to remotely debug machines was worth the risk of leaving the servers susceptible to a global "halt" due to a power glitch.

We also ran into problems with garbage characters appearing when older telnet clients, such as those provided with SunOS 4.1.X and HP-UX 9.X, were used. We suspect that those binaries were not able to cope with some option that the Xyplex was attempting to negotiate. Compiling new telnet binaries on these machines helped eliminate some, but not all of the problems. This only affected a minority of our machines that were still running older operating systems, so we opted to ignore the problem and require that telnet connections to the consoles via the terminal server be launched from our Solaris based machines.

The Future

We are still in a state where improvements to our computing environments are bounded by a lack of time or manpower rather than a lack of ideas. We recognize that some of our solutions presented here, while adequate for our immediate needs, still leave room for improvement. Some of the projects we are currently working on are listed below:

Canonical hostlist project: We have several databases that require hostname information, including the corporate DNS file, our local NIS hosts file, our local netgroup file, and various location databases. Each of these lists is currently independently maintained. Adding a host to the networks requires manually updating each of these.
We are in the process of implementing a new methodology. A single file will contain a minimum amount of data which is manually entered by the system administrator. All the rest of the information will be generated by scripts which will collect MAC addresses, network port connections, console port collections, and other data to build a master database of all host information. This database will be used to build DNS and NIS host tables, netgroup files, lists for update scripts, LSF configuration files, and documentation. The principle is to keep data entry to an absolute minimum and to have only a single source for the manually inputed data. By facilitating automated updates of the various data files, we avoid inconsistency problems.
Cluster monitor: We currently monitor our systems by using scripts which process raw data produced by syslog, LSF, and HP OpenView. LSF gives us a good overall analysis of the total throughput of our cluster. The syslog reports the hardware problems, and HP OpenView supports a variety of monitoring options.
Our objective is to develop a system that will give us more detailed reporting and debugging information when there are problems. In addition, we are developing methods to allow the cluster to automatically isolate problems and perform automated fixes without human intervention.
Documentation: We sometimes forget that computing environments are setup for the benefit of our users rather than for the entertainment of the system administrators. Documentation is an essential ingredient for ensuring that a computing environment can be efficiently used and is essential to prevent system administrators from being bogged down by an endless barrage of frequently asked questions. We will be making a major effort to improve our on-line documentation currently maintained on our internal web site.

Final Thoughts

We hope our experiences will be valuable in helping others plan ahead for growth in their computing installations. We have learned that it is important to solve small problems on a small scale before they expand to large problems on a large scale. Careful planning and a vision of the future is necessary to design systems that will scale easily without major renovations.

Availability

Please contact the authors at <lisa98@cmdmail.amd.com> regarding availability of scripts referenced in this paper.

Author Information

Lloyd Cha is a MTS CAD Design Engineer at Advanced Micro Devices in Sunnyvale, California. Prior to joining AMD, he was employed by Rockwell International in Newport Beach, California. He holds a BSEE from the California Institute of Technology and a MSEE from UCLA. He can be contacted by USPS mail at AMD, M/S 361, PO Box 3453, Sunnyvale, CA 94088 or by electronic mail at <lloyd.cha@amd.com> or <lloyd.cha@pobox.com>.

Chris Motta is the manager of the CMD Systems and Network Administration department. He holds a BSME from the University of California at Berkeley. He has held a variety of systems administration positions including UNIX and networking consulting. Electronic mail address is <Chris.Motta@amd.com>, and USPS mail address is M/S 366, PO Box 3453, Sunnyvale, CA 94088.

Syed Babar received his master's degree in computer engineering from Wayne State University in Detroit, Michigan. He works at Advanced Micro Devices in Sunnyvale, California as a Senior CAD Systems Engineer. He can be contacted via e-mail at <Syed.Babar@amd.com> or <Syed_Babar@hotmail. com>.

Mukul Agarwal received his MSCS from Santa Clara University. He joined NexGen, Inc. in Milpitas, California as a CAD Engineer in 1993. He switched to systems and network administration in 1995 and has been a System/Network Administrator ever since. Reach him via e-mail at <mukul.agarwal@amd.com>.

Jack Ma holds a BSCS from Tsinghua University and a MSCS from Computer Systems Engineering Institute. He was a UNIX software developer at Sun Microsystems before joining Taos Mountain at 1995, where he now works as a networking/UNIX system consultant. He can be reached electronically at <ylma@netcom.com>.

Waseem Shaikh holds a master's degree in computer engineering from the University of Southern California and received his bachelor's degree in electrical engineering from University of Engineering and Technology in Lahore, Pakistan. He was a System/Network Engineer at Steven Spielberg's Holocaust Shoah Foundation, a System Consultant at Stanford Research Institute, and is now working as a System/Network Consultant with Taos Mountain. He can be reached at <shaikh@netcom.com>.

Istvan Marko is a self-educated Computer Specialist currently working as a System Administrator employed through Volt Services Group. He can be contacted via e-mail at <imarko@pacificnet.net>.

References

[AMD97] Unpublished internal e-mail correspondence, AMD Austin, TX, 1997.

[Auspex98] "Auspex Support For Cisco Fast EtherChannel," Auspex Technical Report #21, Document 300-TC049, March 1998.

[Cisco97] Cisco Systems, Inc. "Fast EtherChannel," Cisco Systems Whitepaper, 1997.

[Cisco98] Cisco Systems, Inc. "Understanding and Designing Networks Using Fast EtherChannel," Cisco Systems Application Note, 1998.

[Evard94a] Remy Evard, "Soft: A Software Environment Abstraction Mechanism," LISA VIII Proceedings, San Diego, CA, September 1994.

[Evard94b] Remy Evard, "Tenwen: The Re-engineering Of A Computing Environment," LISA VIII Proceedings, San Diego, CA, September 1994.

[Evard97] Remy Evard, "An Analysis of UNIX System Configuration," LISA XI Proceedings, San Diego, CA, October 1997.

[Furlani91] John L Furlani, "Modules: Providing a Flexible User Environment," LISA V Proceedings, San Diego, CA, September 1991.

[Godzilla98] "Godzilla," Columbia TriStar Pictures, 1998.

[Harrison92] Harrison, Helen E, "So Many Workstations, So Little Time," LISA VI Proceedings, Long Beach, October 1992.

[HP94] Hewlett-Packard Support Services, "Cold Network Installs - Configuring/Troubleshooting Guide," Engineering Notes - Document CWA941020000, December 12, 1994.

[HP98] Hewlett-Packard Company, "Ignite-UX Startup Guide for System Administrators," 1998.

[Limoncelli97] Tom Limoncelli, Tom Reingold, Ravi Narayan, and Ralph Loura, "Creating a Network for Lucent Bell Labs Research South," LISA XI Proceedings, San Diego, CA, October 1997.

[Platform97] Platform Computing, "AMD's K6 Microprocessor Design Experience with LSF," LSF News, Platform Computing, August 1997. https://www.platform.com/content/industry_solutions/success_stories/eda_solutions/eda_stories/AMD.htm.

[Rath94] Christopher Rath, "The BNR Standard Login (A Login Configuration Manager)," LISA VIII Proceedings, San Diego, CA, September 1994.

[Sun98] Sun Microsystems, Inc. "SPARC: Installing Solaris Software," 1998.

This paper was originally published in the Proceedings of the 12th Systems Administration Conference (LISA '98), December 6-11, 1998, Boston, Massachusetts, USA
Last changed: 3 April 2002 ml