on reliability
Security and Reliability
by John Sellens
John Sellens is part of the GNAC Canada team in Toronto, and
proud to be husband to one and father to two.
Security is a wide-ranging and sometimes poorly defined topic. "Computer people" often (incorrectly) think of security as being related only to things that you can do with a keyboard, a computer, random bits and bytes, and someone else's password. Accordingly, I will attempt to summarize what security is, or at least what it is in the context of this article. The relevant security-related elements I will cover are: Access control passwords and other mechanisms that attempt to require some level of authentication and authorization for access to your networks and systems (i.e., how to know when to open the barn door). Physical security protection against physical attacks and "acts of God." Intrusion detection how to detect when someone unexpected has entered through an open or insufficiently closed barn door. Correction fixing things when they break, which includes "remotivating" individuals when they act inconsistently with what is expected. Change management to ensure that changes are appropriate and have been subjected to the appropriate review and approval. Security is not just "prevention" it's prevention, detection, control, and correction. And when you have all those, you have (of course) a more reliable system. Let's review each of the five elements in turn. Access Control Authentication and Authorization Please allow me to be ridiculous for a moment: If you have no access control, anyone can do anything to your systems, and so they are almost by definition unreliable. And you're similarly exposed even if you have good access control but have no authorization mechanism that limits what different users can do. I'm going to discuss access control in two parts, authentication and authorization. I'll further subdivide the discussion of authorization into logical access restrictions, physical access restrictions, and activity restrictions. Authentication Authentication is what we are all (I hope) familiar with some form of userid and password pair that "proves" that the user is who he or she claims to be. In theory, both the userid and password could change with time, but the most common implementations involve a publicly known userid and a static password. (I'll define a static password as one that stays the same until it is explicitly changed, typically by the user.) Static passwords are most commonly stored on the destination machine (or network of machines), typically in an encrypted or obscured form to prevent the casual browsing of passwords. The most commonly used "more secure" static-password mechanism is kerberos, by which the passwords are stored on a "secure" server, and the protocol protects the passwords as they are passed back and forth between various clients and servers to authenticate users. One problem with this is the weakest-link problem you need to have kerberos locally available on every device, or you risk sending your password over some connection in cleartext, which limits the effectiveness of kerberos in those environments. (You can sometimes secure otherwise cleartext links with ssh encryption, but then you need to have ssh on the local box, which is the same problem but with a different piece of software.)
More advanced (i.e., obfuscated or annoying) systems use some form of
one-time- The reason for an authentication mechanism is to identify the user, and the more reliable the authentication mechanism, the more reliable your overall system is going to be, because you have a better "front-door" defense to protect you from the unreliable among us. I'm a big fan of one-time passwords only the most rudimentary of systems would not benefit from the use of OTPs, even if used only for authenticating the more privileged users or for granting root (or equivalent) access. Authorization The next element in an access-control system is what I will refer to as "logical access restrictions." Logical access restrictions are those that are based on such things as the originating network address of a connection, time of day, or current usage rules. The most common way to implement originating network restrictions (in the UNIX world at least) is through the use of the "TCP wrapper"[2] package, which makes it easy to "wrap" certain services (such as telnet) with an access-control program that can restrict on the basis of network address, etc. The other logical restrictions are more commonly implemented with certain operating-system configurations, custom shells, or commercial software. Restrictions that you might want or need to implement include:
It's probably worth mentioning that mechanisms and policies like this have a long history in the mainframe world. Complementary to logical access restrictions are (of course) physical access restrictions. Sometimes you wish to allow access only to a particular system, application, or function if the user is (thought to be) in a "secure" location. For UNIX systems the most common example of this kind of restriction is to allow direct root logins only from the system console. Other examples include allowing connections only from within your building, enforced either through the use of hardwired connections (almost unheard of in these days of networked workstations), subnets and firewalls, or simply not allowing any connections (network or dialup) to and from the outside. The final component of access control that I am going to cover are what I'll call "activity restrictions" restrictions or limits on the commands and functions that a user can invoke. These come into effect once your authentication system has identified a particular user, and the user (or the user's connection) has passed any logical or physical access restrictions that have been implemented. One of the most common (UNIX) examples of activity restrictions is the common requirement that a user be a member of a certain group (often "wheel" or group 0) in order to "su" to root. Lots of other examples of group- or ACL-based restrictions exist. Other restrictions can be implemented by applications, using compiled in information (bad), or files or database entries with restriction or permission information. I suggest dividing activity restrictions into three types:
I don't claim to have covered all situations here. One obvious situation that's not covered is multiple authentication, where two or more people must agree and authenticate before a task is executed. (Recall those action movies featuring nuclear missile silos where two people have to turn two different keys on opposite sides of the room at the same time, and they're both carrying guns.) And I haven't mentioned the use of biometrics for authentication. Some of these access-control mechanisms can be quite inconvenient and/or obtrusive. As in most other discussions of reliability, there's a tradeoff between reliability and control on the one hand and cost and inconvenience on the other, and each organization must strike the most appropriate balance for its needs. And I haven't mentioned the need for proper logging, which is a necessity for tracking, troubleshooting, and change control. And to tie this discussion back to reliability, good access control means that you limit, control, or track who did (or could do) what, when, and under what circumstances. This means that when you determine that certain controls or limits are required to help your systems, networks, and business processes function reliably, you've got (at least part of) the mechanism to help you implement them. Physical Security The preceding discussion has focused primarily on electronic access to systems and networks, which is the traditional area of concern for computer-oriented people. But it's just as important to consider the physical security aspects, and again balance the costs (monetary and otherwise) against the expected risks and/or advantages. Note that I'm not talking here about disaster-recovery planning, or high-availability hardware I'm talking about preventing people (or things) from getting physical access to your premises or equipment. Why is physical security important? In most cases, physical access to a machine is tantamount to administrator access. In the most extreme cases, a machine (or parts of it) is stolen and attacked at the thief's leisure, whim, or screwdriver. Physical security can also help to guard against so-called acts of God a more secure building is likely to be stronger and more appropriately located. What kinds of things should physical security guard against, and how do they contribute to reliability?
The best security system in the world is reduced in its effectiveness if it's not properly monitored. You must have some mechanisms and processes that are designed to detect any intrusions that do take place and, optimally, any attempted intrusions that were blocked by the systems. Proper intrusion-detection systems will alert you when you're under attack and will give you time to increase your awareness or monitoring to fend off any further attacks. For example, if you can detect when a copy of your encrypted passwords have been stolen, you have a better chance of changing all the passwords before they get cracked and exploited and of blocking the access used to intrude. Quite simply, if you can't detect when something has gone awry, you've got much less chance of protecting yourself and your systems. And if you can't protect the systems, it's going to be harder to keep them working reliably. Techniques and mechanisms for intrusion detection include:
Once you've detected an intrusion or attack (or attempted attack), you need a mechanism and process by which you can put things right again, and, optimally, a way to prevent it from happening again. Keep good backups, know where your distribution media is, have documented procedures and mechanisms to get in touch with the necessary people. Keep up to date on vendor updates, notices, and security alerts in the community at large. Be ready to disconnect machines or networks that are under attack or need repair while you investigate and undertake repairs. The impact on reliability should be clear a modified machine or system is at risk, and the sooner you can get things back together, the sooner normal, reliable operation can resume. The other side of correction is "remotivating" individuals who are acting contrary to policy and reasonable standards of behavior. A user or system administrator who behaves incorrectly (let's say by choosing trivial passwords and writing them on notepaper stuck to their monitor) can be putting other users, systems, and information at risk. If you're expecting people to act appropriately, you had better define and publish what the standards of behavior are and be prepared to enforce or explain them. Change Management The best-laid plans can be all for naught if there are no controls around them, and one of the most important controls is change management. The primary components of a proper change-management system are:
Security is a wide-ranging topic and has an impact on many areas of an organization's activities. Proper security systems, mechanisms, policies, and practices sometimes augment reliability, but in many ways their primary reliability benefit is in preventing the intentional or unintentional reduction of current reliability levels. This is the ninth article in the On Reliability series published in ;login: over the past two years and concludes the list of topics that I had planned to cover. Thanks very much for reading, and I hope you've found this series useful. References [1] The S/KEY One-Time Password System from Bellcore. <ftp://ftp.bellcore.com/pub/nmh/> [2] TCP Wrapper by Wietse Venema (<wietse@porcupine.org>). <ftp://ftp.porcupine.org/pub/security/index.html> [3] Tripwire, originally by Gene Kim and Gene Spafford, is available from <ftp://info.cert.org/pub/tools/tripwire/> and provides tools to track when system files change unexpectedly.
|
Need help? Use our Contacts page.
Last changed: 16 Nov. 1999 mc |
|