|
LISA '04 Paper   
[LISA '04 Technical Program]
A Machine-Oriented Integrated Vulnerability Database for Automated Vulnerability Detection and Processing(Atlanta, GA: USENIX Association, November, 2004). AbstractThe number of security vulnerabilities discovered in computer systems has increased explosively. Currently, in order to keep track of security alerts, system administrators rely on vulnerability databases such as: CERT Coordination Centre, Securityfocus BugTraq and Sans Vulnerabilities Notes Database. Such databases are designed primarily to be read and understood by humans. Given the speed at which an exploit becomes available once a vulnerability is known, and the frequency of occurrence of such vulnerabilities, manual human intervention is too slow, time-consuming and may not be effective. We propose the design of a new vulnerability database which is oriented to be machine readable and processable rather than human oriented. This allows automated response to a vulnerability alert rather than relying on manual intervention of system administrators. With this approach, many kinds of automatic processing of alerts become feasible. We show the value of such a database by constructing a prototype sample scanner for Unix systems tailored for Linux RedHat and FreeBSD. We envisage that our work can help spur a development of far more effective vulnerability databases to benefit a wide-ranging user community.IntroductionA worrying trend in the age of the Internet is the increasing incidence of cyber attacks. CERT statistics [1] quotes 114,855 reported incidents (an incident may involve an arbitrary number of sites, even thousands) in the first nine months of 2003 alone. This is a large jump from 21,756 incidents in 2000. One of the objectives of computer security emergency centers like CERT is to help disseminate vulnerability alerts and relevant advisory notes to the user community in a timely fashion. However, the speed of cyber attacks together with the complexity of administrating computer and network infrastructures today, makes it difficult for many system administrators to cope with such attacks. While automatic tools may be available, there is still a need to routinely inspect any security/vulnerability alerts in order to take the necessary corrective measures. Current sources of such alerts are designed primarily for human consumption and contain large amounts of information in natural language format. In this paper, we will call such sources, vulnerability databases, because they deal with collections of data and not whether they are actually kept in a database form or not. While a human oriented format is useful for disseminating the full details of an alert, it also requires a human in the chain to make use of it. This problem is acknowledged in a CERT document [2]. Given the 5500 vulnerabilities reported in 2002, it is estimated that a system administrator would need 229 days just to digest the information. Furthermore, usually multiple vulnerability databases need to be consulted to fully deal with a vulnerability, i.e., just the CERT entry is not sufficient. Thus, the deck is stacked on the side of the hackers rather than the system administrators. Clearly, the solution would be to move away from direct human processing towards automatic security alert response processing. This paper proposes an initiative to redesign vulnerability databases to be machine oriented and amenable to automatic processing. In practice, such a database would also need to integrate vulnerabilities disclosed from multiple sources. The dissemination of machine processable alerts allows for automated tools to operate on an alert immediately without requiring humans in the loop. This would cut down the long time interval between release of a vulnerability/advisory note and corrective action being taken. Other automated tools do exist, e.g., Microsoft Windows systems have Software Update Services, however there is little which is general purpose, publicly accessible, and open to public or third party scrutiny and verification. We have developed a proof-of-concept machine oriented database schema using a vulnerability expression language for describing the targets and effects of vulnerabilities. To illustrate the use of this database, we have developed a prototype vulnerability scanning robot which can determine existing and potential vulnerabilities based on the database. The creation of an effective machine oriented vulnerability database would require the cooperation of many parties such as CERT, BugTraq, vendors, software developers, etc. As such, this paper is not meant to be a standalone definitive solution. Rather the prototype database and scanner is intended to spur the development of machine oriented databases by the parties concerned. We believe that our proof of concept presents the key elements for further development of machine oriented vulnerability databases. The use of a simple vulnerability expression abstraction also simplifies the integration of data from multiple sources. Motivation and Design GoalsFigure 1 reproduced from the following CERT report [3] describes the vulnerability exploit cycle. The Y-axis represents the number of incidents for a given vulnerability. The graph illustrates the time lag between the release of a vulnerability/advisory report and the decrease of incidents following corrective measures by users. We argue that current vulnerability databases, such as CERT, Bugtraq, CVE, in their present format are not designed to facilitate a speedy user response because they suffer from the following limitations:
Our philosophy is that vulnerabilities should be expressible in an explicit form in terms of data (or a description) rather than an implicit form like code to process a vulnerability. Hence the data can be stored in a database (or any data description language, i.e., XML). Our database is designed with the following goals:
We also want to have an automatic scanner which can use the database to do the following:
Related WorkThere have been a number of popular tools that scan for any presence of vulnerability or configuration weaknesses in a system. Some notable examples are: COPS [4], SATAN [5] and Nessus [6]. These tools are code-based scanning applications where the logic of vulnerability checking is embedded tightly in the scanner's code. This means that including a new check for a vulnerability entry requires one to update the scanner's code, its sub-component(s), or its configuration file. In contrast, our system uses a generic scanner which makes use of vulnerability descriptions stored separately in a vulnerability database. While a code-based solution is generally more powerful, it requires that code/plug-ins be written. There are trust and verification issues which we discuss later in this section. There is some existing work which reorganizes and integrates information in existing vulnerability databases into one that is more of a ``real database.'' NIST has developed ICAT [7], a searchable index of vulnerability entries leading the users to various vulnerability resources and patch information. Similarly, Purdue University maintains a web-based search system called ``Public Cooperative Vulnerability Database'' [8]. These databases are, however, designed mainly for vulnerability search based on categorized attribute values, and not for automated applications. Krsul [9] proposes a comprehensive taxonomy of vulnerabilities for possible further processing or automated manipulation. A database is also proposed. It is hard to compare the database since no specific applications were co-designed with it. Windows Update [10] is a Microsoft online tool for automatically updating Windows operating systems and Microsoft applications with recent patches. It illustrates some important issues with automatic tools. Windows Update (and its more automatic cousin, Windows Software Update Services [11]) are closed systems. We propose an open system which can cater for heterogeneous environments. Windows Update has a ``black-box update model'' which allows easy and seemingly automatic patch update, yet the non-transparency of the system leads to the following issues:
Some related concerns of the Windows Update mechanism is discussed in an article by Berlind [13]. We argue that any automatic update or alert processing mechanism should be based on an open model which can be independently verified. In addition, it should be possible for the user to bypass the automatic system in cases where the security policy may not allow the execution of foreign code or connection to external hosts. Morover, the administrator/user should be able to determine the consequences of a patch or alert on his system. Movtraq: A New Vulnerability DatabaseThe integrated vulnerability database which we have called Movtraq (Machine Oriented Vulnerability and Tracking) database is designed to be compiled from multiple source vulnerability databases and is usable directly by an automatic scanner (see Figure 2). Design ConsiderationsThe main challenge in designing the new database is to determine what the actual contents of each vulnerability entry should be. For our proof-of-concept, we have focused on what the database should contain rather than on a general database schema. The data fields corresponding to a vulnerability fall into three general categories: general information and references; vulnerability factors and its environmental requirements; and impact of vulnerability. General Information FieldsThe general information portion mostly contains references to several public vulnerability databases such as CERT, Bugtraq, etc. The purpose of these fields is to give the user a reference to the original source of information to obtain additional information. This is mainly for human consumption. Vulnerability Factors and Environmental RequirementsThe second category, vulnerability factors and environment data, provides the main content of machine processable vulnerability information. A vulnerability has to exist within a context, hence it is described in terms of its original source factor and associated environmental factors. By ``original source factor,'' we mean the system component(s) (application or operating system) where the vulnerability originates. ``Environmental factors'' refers to settings/configuration or services in the local system which make the system subject to the vulnerability. We distinguish between two kinds of vulnerabilities:
There are a number of different combinations of original source and environment factors: Case 1: Vulnerability factors: match & Environment factors: matchWe will get this result when a particular vulnerability's original source exists on the local system and the settings of local system match all the environment factors. In this case, we will conclude that the vulnerability exists on the system. Case 2: Vulnerability factors: match & Environment factors: no matchThis occurs when we can detect the origin of the vulnerability on the local system, however the settings of the local system does not match the environment factors. So the vulnerability is not applicable but it has the potential to affect the system if the environment changes. For example, consider the case of ``Apache Web Server Chunk Handling Vulnerability'' [14]. Even if apache is installed, we will not be affected by the vulnerability as long as we do not provide http services. Although this second case appears to be an exception, it is actually not uncommon as a full installation of the operating system and application programs may have been done. Hence, many installed components in the system may not usually be in use. Case 3: Vulnerability factors: no match & Environment factors: matchIn this case, the vulnerability would appear to be not applicable. However, there is a subtle issue. Consider the case of OpenSSL (an open source implementation of the SSL protocol) which had several stack overflow vulnerabilities which are exploitable [15]. OpenSSL may not be installed as an individual component, so even if there is a database entry for the OpenSSL vulnerability, this would return a negative result in terms of vulnerability data factors. However, OpenSSL is commonly included in applications such as Apache, Sendmail, Bind, Linux and Unix based systems. Thus, it is necessary to check for the existence of such applications which may indicate that such an OpenSSL vulnerability exists even if OpenSSL is itself not detected. This highlights that one may need several database entries corresponding to a vulnerability given some of these indirect potential factors. Case 4: Vulnerability factors: no match & Environment factors: no matchThe vulnerability does not exist on the local system. Vulnerability Impact (Consequences)The third category of data concerns the impact of vulnerability, which describes the possible consequences of a vulnerability if it is successfully exploited. In our database, this is stored as a vulnerability description expression which is machine processable and describes the vulnerability impact in a precise and concise form. There is no need to use any taxonomy or qualitative impact factor (e.g., critical, high, medium, low) which is not precise and may not make sense in the context of a particular system. It also enables checking of the relationship between different vulnerabilities and whether they can affect one another. Database StructureAs we have argued, the exact structure of the database is not so important. Rather, it is the content and having it in a more precise machine processable format. In our proof-of-concept design, the database has seven main entities namely:
An entity relationship diagram which gives an overview of the relationship between these data items is given in Figure 3. We will briefly mention some of the key fields from an integration and machine processable perspective. We have mainly omitted fields in the general information category which are present in the database for human consumption.
The fields which have been labeled by (*) make use of the vulnerability description expressions or vulnerability target objects from the next section. Note that some fields which have a similar function occur a few times in a different context, e.g., hardware requirements may be different for the application and environment, there are two different consequences - one from the vulnerability and one from using a specific exploit. Integrating the DataOne of the difficulties with dealing with security/vulnerability alerts is the need to integrate the information from multiple sources. Our prototype database is no exception and was built by integrating data from multiple vulnerability sources such as CERT, BugTraq, CVE, vendors and software developer sites. Ideally, one would prefer a single source for the vulnerability information (even if it is only in text form). However, the reality is that due to the distributed handling and speed of dealing with vulnerabilities, one has to accept that integration may be required. The following example, which is the ``OpenSSL SSLv2 Malformed Client Key Remote Buffer Overflow Vulnerability,'' illustrates the need for integration. It has a CVE ID of CAN-2002-0656 [15]. BugTraq from SecurityFocus provides: BugTraq ID: 5363 Application environment: Apache v1.0 - 1.3.26 OS environment: Linux, Microsoft Windows Proof of concept exploit: available Minimum user rights for exploit: u#R[Note 1] CERT vulnerability advisory provides: CERT ID: CA-2002-23 Vulnerable application version: OpenSSL prior to 0.9.6 Vulnerability impact: @G u#S Vendor/software information: From OpenSSL (www.openssl.org) we get the vulnerable application range as: 0.9.1c - 0.9.5a. From apache documentation we know that usually the user is root. In general, determining the complete environmental requirements and the consequences of the vulnerability from the textual descriptions can be a tedious and time consuming process. This is one rationale for a better system such as the one described here. Vulnerability Description ExpressionsThe main machine oriented data fields in the database belong to three categories: system components of the vulnerability; environment factors of the vulnerability; and consequences of the vulnerability. The first category for various system components is usually specified as versions of the operating systems and applications. This can be straightforwardly encoded in the database. The other two categories require a machine friendly specification. After studying 943 vulnerability notes from CERT advisory database, we found that most of the information for these two categories can be described effectively using the vulnerability description expression described below. These expressions are inspired by the rule language in KuangPlus system [16]. An expression is written with the syntax:
Rather than giving a formal definition of target objects, we have listed examples of target objects in Table 6. In Table 6 the following prefixes are used: `%' is used to denote an actual value; `#' is used to denote a symbolic value; and `&' is used for expressing users/ groups associated with an application/service. As our proof-of-concept implementation is for Unix systems, the examples and objects are also Unix based. Vulnerabilities for other operating systems may require extension to the types of target objects and actions. Examples using Vulnerability ExpressionsThe following examples use the expressions to describe various vulnerability consequences.[Note 2]
The following are examples of portions of the machine oriented
fields in the database for several vulnerabilities:
Translation IssuesFrom our experiments in translating text-based vulnerabilities into vulnerability expressions, we encountered the following issues:
Movtraq Scanning RobotTo demonstrate the use of the Movtraq database, we have implemented a prototype automatic vulnerability scanner (called the Movtraq scanning robot). The robot runs on two different versions of Unix: Redhat Linux and FreeBSD. This is to demonstrate a degree of platform independence. The overall structure of the robot together with the database is depicted in Figure 2. The integrated Movtraq database is stored in MySql. The scanner consists of a local system configuration collector which collects information about applications, operating system (which processes are running, which ports are open, hardware details, etc.) and services on the system. Software versions are obtained by using the rpm utility on Redhat and pkg_info utility on FreeBSD. The scanner is written in Perl and queries the MySql Movtraq database using SQL. The robot has three basic scanning options:
An abbreviated sample log from running the scanner illustrates how application, version and environmental checking is performed; see Listing 1. Only some of the pertinent checks from the log are shown to illustrate the following points:
Vulnerability Chaining AnalysisAn interesting use of the scanner is that it can be used to test if existing vulnerabilities can be combined together (chaining) to create more vulnerabilities. This mimics what a hacker might do to take advantage of indirect weaknesses on the system. Consider the following example which is typical of a privilege escalation
attack. Suppose the system has the following two vulnerabilities:
Name: Buffer Management Vulnerability In this example, the scanner discovers that both vulnerability 48 and 57 are present. From Vul_ID: 57 a remote user (u#R) can gain local rights (@G u#L), and this chains onto Vul_ID: 48 which has a local environment requirement (local user: u#L and setuid executable file:f#*(4111)). Thus it discovers that a remote user may be able to exploit the two vulnerabilities to gain local root access. Chaining analysis illustrates the benefit of a machine oriented approach and the use of vulnerability expressions to analyse relationships between vulnerabilities. Operating System and Local Configuration MappingBecause environmental and application vulnerability data are expressed as vulnerability expressions, these abstractions may need to be further refined. In the context of a particular local system configuration, operating system distribution, etc., additional localization may be needed to map the abstractions to concrete objects. One may choose to have additional databases to do this mapping from vulnerability target objects to the actual objects on the system. Our robot prototype does not do this since it has been tested only on RedHat and FreeBSD. Deployment Strategies for vtraqThe prototype Movtraq system is sufficiently useful to be deployed in a number of ways. Some of the potential scenarios depicted in Figure 4 are:
These strategies are suitable for our Movtraq proof-of-concept system but one could have more general systems. For example, one could have a scanner which is partially local and partially remote. This may be useful in an organizational context where any system configuration changes are registered with a separate non-local configuration database. Any security alerts are then checked externally against this configuration database. DiscussionWe believe that there is a real need for vulnerability databases which integrate the necessary pieces of information for evaluating the impact of any new vulnerability and allows the appropriate action to be taken automatically. Furthermore, in order to be timely, we argue that the vulnerability evaluation process should not be dependent on having humans process alerts. This does not mean that we advocate not having humans at all in the loop but rather that the loop should not be dependent on the speed of a human response. Thus, it is important that there be a not only human readable vulnerability database but also one which is geared for automatic processing by machines. As far as we are aware, the existing systems for disseminating alerts are still primarily human oriented as are the key source databases. We have demonstrated a proof-of-concept database which allows effective integration of data from multiple sources and can be used directly by an automatic vulnerability scanner. In the workshop report on security vulnerability databases [17], it was remarked that some of the difficult issues are to do with terminology and the schema of the database. Our database design uses both abstraction and separation of exploits from vulnerabilities - both of which are highlighted in the report. In particular, the use of abstraction, which for us is how the database caters for automated analysis and machine processing, simplifies the issue of terminology and taxonomy. This is a plus point since these are often controversial from a textual description viewpoint. The database described here is meant to be a proof-of-concept system and is not necessarily comprehensive. However, the prototype scanner demonstrates that we capture the essential elements of a machine-oriented database. As this prototype was designed for Unix systems, for other operating systems, such as Microsoft Windows, both the database and vulnerability expressions may need to be enhanced. However, the fundamental concepts in the design should still be applicable. Finally, our proposal also addresses a number of important practical issues:
Further work would involve convenient GUIs, fully featured implementation, Windows compatibility, and a more sophisticated vulnerability model. AcknowledgmentsWe acknowledge the support of the ``Defence Science and Technology Agency'' and ``Temasek Laboratories". Author InformationSufatrio holds a B.Sc. from University of Indonesia and a MSc from National University of Singapore. He is currently a Ph.D. student in the School of Computing and an associate scientist in Temasek Laboratories, National University of Singapore. His interests include intrusion detection systems and infrastructure for secure program execution. He can be reached electronically at tslsufat@nus.edu.sg. Roland H. C. Yap obtained his Ph.D. from the Monash University. He is currently an associate professor in the School of Computing, National University of Singapore. His interests include systems security, operating systems, programming languages and distributed systems. He can be reached electronically at ryap@comp.nus.edu.sg. Liming Zhong graduated from National University of Singapore in 2004. Currently he is working as an IT security specialist in Quantiq International Singapore. His interests cover intrusion detection systems, network and system forensic analysis. Reach him electronically at rick@Quantiqint.com. Bibliography[1] CERT Coordination Center, CERT/CC Statistics 1988-2003, https://www.cert.org/stats/cert_stats.html, 2003.[2] CERT Coordination Center, CERT/CC Overview Incident and Vulnerability Trends, https://www.cert.org/present/cert-overview-trends/module-2.pdf, 2003. [3] Lipson, H. F., Tracking and Tracing Cyber- Attacks: Technical Challenges and Global Policy Issues, CERT Coordination Center, available at https://www.cert.org/archive/pdf/02sr009.pdf, 2002. [4] Farmer, D. and E. H. Spafford, ``The COPS Security Checker System,'' Summer USENIX Conference, 1990. [5] https://www.fish.com/satan. [6] https://www.nessus.org. [7] https://icat.nist.gov/icat.cfm. [8] https://cirdb.cerias.purdue.edu/coopvdb/public. [9] Krsul, I., Software Vulnerability Analysis, Ph.D. Thesis, Purdue University, COAST technical report 98-09, 1998. [10] https://windowsupdate.microsoft.com. [11] https://www.microsoft.com/windowsserversystem/sus/default.mspx. [12] Keizer, G. ``Trojan Horse Poses as Windows XP Update,'' TechWeb News, ./fixurl: Command not found. [13] Berlind, D., ``Why Windows Update Desperately Needs an Update,'' ZDNet Technical Update, https://techupdate.zdnet.com/techupdate/stories/main/0,14179,2914519,00.html, 2003. [14] https://www.cert.org/advisories/CA-2002-17.html. [15] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2002-0656i. [16] Howard, J., Kuangplus: A General Computer Vulnerability Checker, M.IS. Thesis, Australian Defence Force Academy, 1999. [17] Meunier P. C. and E. H. Spafford, Final Report of the Second Workshop on Research with Security Vulnerability Databases, CERIAS TR 99/06, 1999. Footnotes: Note 1: This is a vulnerability description expression to describe the impact, see the next section. Note 2: For simplicity, multiple expressions are separated by semicolon. |
This paper was originally published in the
Proceedings of the 18th Large Installation System Administration Conference,
November 1419, 2004 Atlanta, GA Last changed: 7 Sept. 2004 aw |
|