LISA '06 Paper
SUEZ: A Distributed Safe Execution Environment for System Administration Trials
Doo San Sim and V. N. Venkatakrishnan - University of
Illinois, Chicago
Pp. 161-173 of the Proceedings of LISA '06:
20th Large Installation System Administration Conference (Washington, DC:
USENIX Association, December 3-8, 2006).
Abstract
In this paper, we address the problem of safely and conveniently
performing ``trial'' experiments in system administration tasks.
System administrators often perform such trial executions that involve
installing new software or experimenting with features of existing
software. Often such trials require testing of software that run on
multiple hosts. For instance, experimenting with a typical client-server
application requires understanding the effect of the actions of
the client program on the server. We propose a distributed safe
execution environment (DSEE) where such tasks can be performed safely
and conveniently. A DSEE performs one-way isolation of the tasks run
inside it: the effects of the client and the server are prevented from
escaping outside the DSEE, and therefore are prevented from
interfering with the processes running outside the DSEE. At the end of
the trial execution, a DSEE allows clear inspection of the effects of
running the task on all the hosts that are involved in the task
execution. Also, a DSEE allows the changes to the ``committed,'' in
which case the actions become visible outside the DSEE. Otherwise,
they can be ``aborted'' without affecting the system in any way. A
DSEE is an ideal platform through which a system administrator can
perform such trials without the fear of damaging the system in any
manner. In this paper, we present the design and implementation of a
tool called SUEZ that allows a system administrator to create and use
distributed safe execution environments. We have experimented with
several client-server applications using our tool. By performing these
trials in a DSEE, we have found configuration vulnerabilities in our
trials that involve some commonly used client-server applications.
Introduction
System administrators and desktop users encounter various
situations in their day-to-day activity that require them to download,
install and run applications on their machines. One of the most common
tasks is that of a ``trial'' run of a piece of software that the
administrator. Such a trial is typically done if a system
administrator has no prior experience in using that piece of software,
but there are several other reasons for such a trial execution.
-
Understanding actions of a program. Often, system
administrators would like to study the impact of executing a
particular command on their system. More importantly, they often would
like to exercise a particular option in a program, and see the
observable effects of exercising that option. For instance, when
exercising an option in a particular program, the administrator would
like to know the direct and indirect effects of using that option. The
abundance of binary programs and programs equipped with graphical user
interfaces (as opposed to script based installations) often compound
this difficulty, as a lot of critical system changes happen ``behind
the scenes.''
-
Testing compatibility with existing configurations Often a
system administrator wonders whether installation of an application
will work (co-operatively) with existing packages and configurations.
Another issue she is concerned about is about the security of user
data that is handled by the application, and whether the data handled
by the application is adequately protected through file permissions.
-
Experimenting with new software. Often users download
freeware/shareware from various Internet sources. These software may
be untrusted or faulty and hence it is important to understand the
effects of these software. Hence, system administrators may wish to
perform several walk-throughs of these tools to ensure that they do
not create any new problems related to security and/or
interoperability.
-
Patch testing. Application of software patches and updates too
early may leave the system with potential interoperability issues
created due to the updates. Another issue is with possible bugs in the
patches/updates. (This is usually the reason updates are delayed
much).
Often the user[Note 1]
performs the above tasks while facing the need for an environment that
allows convenient study of the impact of such tasks. By impact on the
system, we refer to issues related to general operation,
interoperability with existing applications and security.
While the goal is to understand the actions of a program in a
networked system, we note that users are not interested in every
action of a program, but only those actions whose effects they
perceive as relevant to system interoperability and security. This
requires that we abstract away from internal actions of a program,
(such as function calls and assignments to local variables), and focus
on observable actions of a program. Some examples of such
actions are a) addition of new users b) modification to user files c)
changes to local configuration files d) changes to boot time scripts.
To understand the impact of such changes on a host, Safe
Execution Environments (SEE) were proposed in [15, 23] and for the
Windows platform in [25]. A SEE uses one-way isolation to
effect containment of the tasks run inside the SEE. Processes running
outside the SEE do not see the changes made by the tasks run inside
the SEE. At the end of execution, one can examine the changes made to
the SEE environment, and decide whether to keep them or discard them
and return to the original state.
A SEE is a highly effective environment to perform system
administration trials that involve a single host. However, for tasks
that are distributed over a set of networked hosts it is not directly
suitable. A typical example is a client-server application where any
action triggered by a client may change the system state on the server
host. In this case, to understand the changes on the server that were
triggered by the actions of the client, we need a distributed
environment. This paper presents the design and implementation of a
SUEZ, a tool that allows for creating distributed safe execution
environments(DSEE) to assist in system administration trials.
Let us consider a simple system administration example that
involves remote administration of printer software. The Common UNIX
Printing System (CUPS) [1] allows remote administration of printers
using a specialized port on the printer server (TCP port 631). Now, on
loading the printer web interface page from the print server, the user
is presented with several options related to adding and managing
printers and jobs. Each of these options triggers a specific change in
the printer server. For instance, adding a printer requires changes in
the server on the printer driver file /etc/cups/ppd,
and changes to the /etc/printcap that lists the
printers. All these changes take effect when the user executes the
command to add a printer. In order to know the specific changes made
by a command, the system administrator is left with two options. The
first one is to read manuals and other forms of documentation. The
second one is the use of low level system tools. While some may argue
that these are be viable options in the case of a well-known
application such as CUPS, they are unsuitable in the case of
new/experimental software, software updates and patches.
Thus, understanding the key impacts of installing a software
package/patch requires the following abilities:
-
To make the observable effects of an action on a host transparent to
the user: In the above example, the action is the choice selection
(through the menu displayed by the browser) to add a printer.
-
To make transparent the observable effects of these actions on other
hosts in the network: This corresponds to changes to the files
/etc/cups/ppd and /etc/printcap in the
printer server.
-
To see the ``difference'' between the state of the system in all
affected hosts before the action and after it: For the above example,
this requires us to identify the above-mentioned files before and
after the add-printer action, and any changes to system objects in the
filesystem client (in this case there are none).
-
To correlate the above three to arrive at a complete understanding of
the actions of the software under scrutiny: This requires us to log
the temporal sequence of actions performed on both the client and
server in a unified view.
-
In the event of the system administrator is not satisfied with the
results, restore the state of the system to that before that of the
start of installation (i.e., undo the effect of observable actions).
The un-doing capability is needed to perform any experimentation on
real systems.
Related Work
In this section, we discuss related work that are available as
options to the system administrator. We first state the requirements
of any system that would satisfy our objectives 1) to 5) given above.
-
Allow the task to execute to completion. In order to study the
effects of a trial execution, we must allow the application to execute
to completion. This will ensure that the results of a trial execution
match the results of a real execution when the application is actually
installed and deployed.
-
Track the effect of the task on multiple hosts. During
execution, a task may further trigger changes to system objects in
other hosts, as given in the CUPS example above. This suggests that
any approach that addresses this problem must have support for
distributed monitoring, thereby tracking and correlations the
actions of a program on other hosts on the network.
-
Support customizable unified logging. The temporal sequence of
operations that result in changes to the objects in various hosts need
to logged in a central location, where they can be analyzed. In
addition, to focus on events of interest to the system administrator,
the logging system must be simple enough to support customizable
filters to reduce the size and complexity of evaluating them.
-
Ability to undo the effects of actions of a program. This is
required to ensure that the system can be restored to its original
state before the program was executed.
Below, we discuss the related work by grouping them into various
categories. At the end of this section, we discuss the suitability of
each approach category in matching the the above requirements.
Logging based approaches A typical approach way to
understand the effects of executing a particular software is through
the use of logging [2]. The system administrator can enable the
logging options present in the software, and then inspect the logs
after the operation to have an understanding of the actions of the
software. The problem with this approach is that it is completely
dependent on the developer of the software system/patch to log its
actions. Thus this is is not very dependable option as many software
systems are written without logging features. Of course, with
experimental software this approach clearly will not work. Also, an
approach purely based on logging will make the job of reverting the
system back to original state quite tedious, error-prone and in some
cases, impossible.
Use of program tracing tools A second approach is to use
tools such as ltrace [10] and strace [4] to study the
actions of a piece of software. While this approach may reveal the
effects of running or upgrading an application, one sees the effects
of the software after it has finished execution, when the
applications actions have already affected the system. It may be too
late, as recovery actions may involve clean-up actions such as
restoring files from backups, or removing user-ids created by the
application. Approaches such as sandboxing [14, 13, 18, 22, 7, 19] do
not work too, as they simply restrict the execution of the software,
rather than allowing it to run completely in order to study its
actions. Use of package managers such as RPM and
dpkg may simplify the problem of uninstallation; but
they do not offer any help in understanding the effects of software
that are already installed. Furthermore, package managers are
inapplicable if the software is distributed in binary or source forms.
VM based approaches A third approach is to use special
machines [16, 12] or even virtual machines [5, 11, 24] for studying
the effects of a particular piece of software. In order to correctly
track the effects of the system, machines and special hardware have
the problem of accurate environment reproduction, where the system
configuration on the virtual machine environment needs to accurately
reflect the one on the production environment. Such accurate
environment reproduction is crucial to ensure that the system behavior
on the VM is same as that on the production system. Another
possibility is make use of snapshot features in modern virtual
machines such as VMware. However, these snapshots tend to give the
difference of the actions of the entire set of processes running on
the system and not the programs the user wishes to focus on.
Recovery-oriented approaches Although recovery from
failures is not the primary goal of our approach, we do provide
facilities for recovery in case of a task failure. The Recovery-Oriented
Computing (ROC) project [20] is developing techniques for
fast recovery from failures, focusing on failures due to operator
errors. [8] presents an approach that assists recovery from operator
errors in administering a network server, with the specific example of
an email server. The recovery capabilities provided by their approach
are more general than those provided by ours. The price to be paid for
achieving more general recovery capabilities is that their approach is
application specific. In contrast, through a DSEE we provide a general
task-independent framework for troubleshooting and recovery.
Discussion Note that sandboxing based approaches do not
fully support the objective of allowing a task to run to completion
(point a) above), as they block actions of a program based on the
policy. So using sandboxing, we have no way of learning the complete
effects of a piece of software. Logging based systems allow the
applications to run with complete freedom, but do not support undoing
of actions (point d) above). File versioning systems [17, 21] and
virtual machine based snapshot approaches may satisfy undoing at a
more general level, but not based on a program or specific actions of
a program and therefore do not satisfy point d) above. Furthermore,
they do not directly support point b) and c) above. On the other hand,
executing a task in a DSEE will address all the objectives a) to d)
above.
Paper Organization This paper is organized as follows. In
the next section, we discuss the concept of one-way isolation that
serves as the basis for our approach for building DSEEs. We then
discuss the design details of our framework for building DSEEs
followed by the routing enhancements to automatically provide the
redirection facility for network operations. We explicate a message
handling subsystem that we implemented for communication between
various DSEEs. We present a system evaluation by performing various
trials using our system and discuss the performance costs followed by
a conclusion.
One-way Isolation
Our approach builds on the one-way isolation approach presented in
[15, 23]. We briefly review the one-way isolation approach that we
employ to create distributed safe execution environments (DSEE).
Isolation of a set of tasks refers to the property that disallows
the effects of such tasks from being made available until its
completion. In database systems, isolation is one of the ACID
properties. The main objective in using isolation in our approach is
to effect containment of the trial execution task performed inside the
isolated environment. Any operation that is only ``reads'' the system
(i.e., one that reads the system state but does not write/modify it)
may be performed by SEE processes. It also means that ``write''
operations should not be permitted to change the state of the system.
There are two options to implement the environment such that isolation
is achieved: one is to restrict the operation, i.e., disallow
its execution. The second option is to redirect the operation
to a different resource that is invisible outside the safe execution
environment. To maintain the correctness of the resource access
operations, it is important to maintain the redirection for subsequent
operations (such as writes) from the program. Below, we discuss both
restriction and redirection for performing system administration
trials.
Through restriction, an operation initiated by a process is
prevented from completion. When this happens, an exception may be
returned to the process. To implement restriction, we need to know the
set of operations that may affect the state of the system. However, in
the context of performing trial executions, an approach purely based
on restriction is not likely to be very successful as it will prevent
applications from running successfully to completion. For instance, a
program may intend to perform a network operation by opening and
socket and listening to messages on that socket. If this operation is
restricted, this program will not be able to successfully receive
messages. Most non-trivial client-server applications will fail for
similar reasons. Hence, in our approach we resort to restriction only
if the other redirection option is not likely to provide successful
results.
The other choice for implementing isolation is through
redirection. In redirection, any operation that accesses a
resource is redirected to another resource that is unavailable to the
rest of the system. For instance, when a file modification operation
is performed by a SEE process, a copy of the original file may be
created in a ``private'' area of the filesystem, and the modification
operation is performed on this copy. Redirection does not suffer from
the same problem as restriction and the SEE process is likely to
successfully run to completion under redirection.
Two forms of redirection are possible: static or
dynamic. Static redirection requires the source and target
objects to be specified in advance of the operation, in fact before
the SEE process is executed. For instance, one may statically specify
that operations to bind a socket to a port p should be
redirected to an alternate port p'. Similarly, one may specify
that operations to connect to a port p on host h should
be redirected to host h' (which may be the same as h)
and port p'. However, such static redirection becomes hard to
implement when the number of possible targets is too large to be
specified in advance or if a SEE process performs a large number of
such operations that are distinct. For instance, it may be hard to
predict the number and location of files on a server that may be
accessed or modified by a client operation. Moreover, such
modification operations have indirect side effects that involve
dependencies between such object, e.g., the file operations on the
server involve changes to the directories these files reside in. A
redirection operation that ignores the effect on these directories
simply will not work. In such case, dynamic redirection where
the target for redirection is determined dynamically during execution.
In this paper, by using such redirection, we show how to build
distributed SEEs (DSEE), where processes executing within SEEs
on multiple hosts can communicate with each other. Such distributed
SEEs are particularly useful for safe execution of a network server
application, whose testing would typically require accesses by
nonlocal client applications. (Note, however, that this approach for
distributed SEEs works only when all cross-SEE communications take
place directly between the SEE processes, and not through other means,
e.g., indirect communication through a shared NFS directory.)
In our current implementation, system call interposition is used
to implement restriction and static redirection. We restrict all
modification operations other than those that involve the file system
and the network. In the case of file operations, all accesses to
normal files are permitted, but accesses to raw devices and special
purpose operations such as mounting file systems are disallowed.
In terms of network operations, we permit any network access that
can be dynamically redirected. This entails any local network
operation such as a service request from a host in the network.
Dynamic redirection is currently supported in our implementation for a
number of commonly used network services.
After the trial execution is over, the system administrator can
examine the results of the trial execution. If the results are
satisfactory, she can commit the results back to the file systems on
the respective hosts that run the DSEE. Commit criteria for such
executions have been developed in [23]. In this paper, we do not
discuss criteria for committing. Instead, our focus is solely on
construction of DSEEs and performing system administration experiments
with them.
Our Approach
Figure 1 shows the a network-level overview of SUEZ. There are two
main components in SUEZ that are responsible for creating a DSEE. They
are a) a host level monitor that runs on each SUEZ host and b) a
network redirector that runs on the main router. Each host under SUEZ
has a host monitor component. This host monitor is responsible
for isolating any local operation or remote operation. Such host-level
isolation component resides on all the other hosts that are similar in
the network, and the isolation environments in all these hosts
collectively form a DSEE isolation context. The host monitor also runs
a messaging service that it uses to communicate with other DSEEs.
Figure 1: A network view of SUEZ.
The router has a component of SUEZ that performs transparent
network level host and service redirection. The use of
transparent host and service redirection allows the user of the system
to run experiments without having to know the network and service
requirements of the task to be performed in advance. Each host monitor
logs its actions, and these logs are integrated in a log server. The
log server presents the temporal sequence of operations performed
during the trial execution.
Host Monitor
Figure 2 presents a detailed view of the host monitor. Each host-level
monitor is built on top of the isolation module present in [15].
These monitors are used for tracking observable behaviors of programs
running on their hosts and tracking changes to file-system state. As
shown in Figure 1, similar monitors run of every host used in our
system, and communicate with each other for the purposes of logging
software actions.
Figure 2: A host view of a DSEE.
In a typical client-server interaction, an action from a client
triggers an action in the server. Hence these monitors communicate
with each other to precisely track the commands executed in the server
in response to the actions of the client. We therefore have two broad
components in a monitor. The first one that addresses isolation of
processes running locally under the monitor, corresponding to host-level
isolation. The other component is for communicating with
similar monitors running on other hosts such that network level
isolation is achieved. This is shown in Figure 2 by the division of
host and network level components.
In the reminder of this section, we describe the host monitor.
The objective of our monitoring system is to identify observable
events that are triggered by the execution of a program across the
entire system administrative boundary. At the level of a host system,
this requires us to monitor the observable actions of a set of
processes. These actions are ultimately effected through system calls,
and hence, system call interposition is our primary monitoring
approach. Each host level monitor intercepts the system calls of the
applications that are running under its purview.
The file system module tracks changes made by the software that is
run under the DSEE. The file system module is based on our past work
on one-way isolation [15, 23]. Isolation is achieved by intercepting
and redirecting file modification operations made by the process
running on the host so that they access a ``modification cache.'' This
modification cache is invisible to other processes in the system.
(This ensures that in the event the system administrator does not like
the changes made by the software, it can be safely removed from the
system without any side effects.) To ensure a consistent view of
system state, the results of file read operations made by the process
are modified to incorporate the contents of the modification cache. On
termination of the process, the system log contains entries from the
modification cache for user to inspect these files to determine if the
modifications are acceptable. Otherwise, they can completely undo the
changes through the trial execution.
Managing network connections When a process is being
monitored, it may make connections to other hosts on the network. Once
such a connection is initiated, the Control Channel Module
(CCM) initiates the monitoring required at the other end of the
connection. Based on the nature of network connectivity
(client/server), this module will communicate with its counterpart on
the other end of the connection. If the program tries to connect to
the network, CCM informs the router of this event which will result in
creation of new routing path to the other hosts. There is no global
network state stored at a single point for network actions since all
other distributed monitors handle them co-operatively. CCM just passes
appropriate control messages to the relevant components. We describe
the routing module in more detail in the next section.
Dynamic service start/monitoring Recall the CUPS example,
where the actions from a browser affect the configuration settings on
the print server. In this case, the monitor on the remote host needs
to be alerted to monitor the service that receives this request. If
the service is not already running and if the
SERVICE_UP message is received, then the system allows
the service to be started on demand. This is accomplished by using the
database of services available in the system. In case the service is
already running (i.e., started through the previous step), then the
monitor detects this and dynamically attaches itself to the service
process. If the service is not already running, it starts the service
process.
Log Module The log module generates logs depending on
various configuration options and filters. These logs reside on the
individual hosts. Using the system call output information itself as
the log is not very useful as it may contain excessive information.
The log module transforms the system call log information to a more
user friendly form. Since the logs can be output can be quite long,
customizable filters can be written for the logs to inspect specific
actions. For instance, the log can be customized retain information
only about filesystem operations and network operations. For
filesystem operations, it contains the file object name. For network
operations, the service type and address related to the connection is
retained. A log generator can be used to merge logs from various hosts
to produce unified view logs.
Routing Module
A process that is run may connect to a network service on the
local network. Isolation of this operation can be done statically or
dynamically. Performing network-level isolation using static
redirection requires that the system administrator knows the
requirements of the software system that she is experimenting with.
Guessing the requirements can soon become tedious or can impact the
usability of the approach. Instead, our approach involve dynamic
redirection of network service requests. Such dynamic redirection is
configurable for specified network services. One question that arises
in the same context is that of an application contacting an Internet
host. In this case, providing complete isolation while allowing the
application to run is not possible, as it is hard to emulate the
functioning of an arbitrary network service. In this case, there are
two options. One is to disable such requests, for the sake of
security. Since our approach is built using system-call interposition,
this is feasible. The other option is to only isolate the actions of
the client at the host level. Of course, the disadvantage is this
option is that reproduction of the entire behavior of the application
is not possible, as the server side behavior is not reproduced
accurately. This is acceptable as there is generally no easy solution
to the problem of studying an experimental/untrusted software that
tries to connect to an outside host.
Dynamically setting up routes and services requires redirection of
network service requests, that are established using dynamic route
generation and dynamic service redirection. We will
describe the route generation in this section, and service redirection
in the next subsection.
Dynamic route generation is established using a specialized
route handler module, that dynamically establishes a routing
path between the host running the program and the target host. Such
dynamic redirection has several possible options - the use of
forwarders that do IP masquerading such as IP tables and IP chains.
However, if the application specific functionality (such as any
internal tables) is dependent on the target IP address, then such
forwarding mechanisms may break programs. Open source redirectors are
available, however, they do not support every kind of TCP/UDP
connection. Also, using a redirector requires the same to be installed
on the all the target hosts. The approach we have taken is to modify
the routing table dynamically on the router to forward the connections
to the target network/host.
To enable redirection of connections, the host needs to configure
the IP address of the target host (that runs the network service)
dynamically. In our implementation, this is accomplished by
establishing a virtual network interface on the target host. This
virtual network interface is enabled using IP aliasing.
For a minimal set up for testing client-server implementations,
our system needs one router and at least two machines, one that
initiates a service request and the other that accepts such requests.
(These can be set up in an inexpensive fashion using virtual machines,
a topic we will discuss below.) If each of the machines needs to be on
a different subnet, then the router should have a network interface on
each subnet. Furthermore, IP forwarding needs to be enabled in the
kernel state of the router. Our router module is required to be
running on the router and on the host accepting service requests. This
is needed to change routing tables dynamically.
Let us look at a typical client-server interaction between a
client and a web server on our system.
-
A client invokes connection request to the service that either runs on
the network or is not yet available.
-
The Control channel module on the client intercepts this event and
notifies the routing module (running on the router) of the address for
this connection request.
-
Upon receiving this request, the router checks whether there exists an
already running web server on the network. If so, it returns and the
CCM informs this service-related information to a the service handler
on a machine running server. If the service is already running on the
server, the client can start exchanging messages. If not, the service
handler starts the service. If the network path is not established it
proceeds to the next step.
-
The routing module on the machine receiving the message from the
router boots up a new virtual network interface with the address.
-
The router chooses appropriate address for a new routing path and
boots this new interface.
-
From this point onwards, all communication is transparently redirected
through this newly established path between client and server.
The routing module is explained in Figure 3. The state maintained
in the router consists of available Ethernet devices and addresses of
hosts running. During the initialization of the router module,
devices' name need to be given to the module as parameters. When a new
host comes up on the network, it registers its address with the router
module. The router module maintains a vector of such addresses.
Whenever a task is complete, the network interfaces allocated for the
routing path are brought down, and the returned to the pool of
resources for future use.
network-op-isolation-module() {
switch(new-route){
case ROUTE-UP:
client-addr = get-address-of-client();
target_addr= get-requested-address();
if (target-addr) already on network break
else
new-host = find-available-host();
map new-host to client-addr;
send new-routing-up message to new-host;
get network-portion of the requested address.
new-device = get-available-devcie();
boot new--device.
break;
case DEL-ROUTE:
client = get-address-of-host();
find list of hosts assigned for client.
send del-routing-path message to the host.
new-device= get-device-name(routing-path);
release host resources;
release network device();
shutdown device ();
break;
}
}
Figure 3: Algorithmic sketch of the routing
module.
Message Handler
Often, the focus of attention on a particular trial execution is
in executing one or more features of an application. In this case, a
user may want to only focus on this operation and ignore other
operations of the system. A message handler is made available on each
client to start and stop tracing the operations made by the trial
execution. A typical use scenario is as follows: When the user would
like to focus on exercising a feature in the application, before
exercising this feature, she can instruct the client DSEE to send a
START_TRACING message. All the DSEEs will record the
subsequent operations made by the task. After the user is done, she
can send a STOP_TRACING message that will stop
recording the operations of the task. When tracing is stopped, the set
of actions that were recorded between the
START_TRACING and STOP_ TRACING
messages capture observable effects of the operations in this window.
Additionally, the message handler also deals with messages from
other DSEE components. These messages are about routing information
and services registration. On receiving these messages, the message
handler invokes the appropriate handlers. The responses to messages
received are shown in the commands exercised by the message handler in
Figure 4. For example, when it gets NEW-ROUTE-UP or
DEL-ROUTE messages, it invokes routing module to boot
up or shutdown routing paths respectively.
message-loop() {
while(true) {
waitfor-command();
dispatch-command();
}
}
dispatch-command {
case NEW-ROUTE-UP:
/* set up new route */
break;
case DEL-ROUTE:
/* delete route and release resources */
break;
case SERVICE-UP:
/* bring up the network service */
break;
case SERVICE-DOWN:
/* shutdown network service */
break;
case NEW-HOST-UP:
/* add host info to host list */
break;
case QUERY-HOST:
Query host list ;
break;
case START-TRACING:
/* start recording operations */
break;
case STOP-TRACING:
/* stop recording operations */
break;
}
Figure 4: Various messages received by the message
handler.
If the application running in the DSEE is untrusted, it may send
false messages to the message handlers on the other hosts. For this
purpose, the default policy enforced by the system call interceptor is
to disallow any such messages on the control channel that is
maintained by the host monitors.
Support for virtual machine hosts Virtual machines can
result in creation of inexpensive hosts on demand, and our approach is
designed to take advantage of the use of virtual machines. Our
prototype implementation uses VMware virtual machines [5], where
creation/loading of virtual network interface and virtual network
groups can be easily done on demand.
Experimental Evaluation
Before describing the experiments performed with SUEZ, we describe
our experimental set up. We also describe the configuration options
available to the user.
Setup
-
Virtual network setup The network set up has one router and two
subnets. Since we used VMware to create hosts on the network, this
required creation of three virtual machines.
-
Router setup To act as a router, the kernel value for
IP_FORWARD should be 1. This router has three network interfaces, one
on the physical network, and the other two for subnets A (192.168.1.X)
and B (192.168.2.X).
-
Message handler setup The message handler on the router is set
up with available device names and addresses. Above case, available
device on subnet B is bound to 192.168.2.1.
-
Server Host monitor setup A SUEZ host monitor (with its
associated message handler) is launched on a machine on subnet B to
act as host available for service. To this, HOSTMODE
value need to be set in the config file. At the starting of this host
information will be sent to the message handler.
-
Client Host setup A SUEZ monitor with ROUTEMODE
value set in the appropriate config file for the a machine on subnet
A. ROUTEMODE config variable is explained below.
From this point onwards, if the client program tries to connect to
a service, with SUEZ with ROUTEMODE set, the
connection will be transparently forwarded to host in subnet B.
Configuration Parameters
The following configuration flags need to be set on the hosts in
the network.
-
ROUTEMODE - If this value is set, SUEZ will intercept
all network connections before the client program get connected to its
original destination. Eventually, the connection will be transparently
forwarded to a machine that hosts the corresponding service.
-
HOSTMODE - To automatically configure an ip address
and start the required service dynamically as on host, one would set
this value in SUEZ. If this flag is set, the host monitor in SUEZ will
send host address to the appropriate message handler.
-
REMOTELOG(ULOG) - In order to make a unified log, each
host monitor traces and collects the events of interest. If this value
is set in config file, each event of interest will be logged. When
these events are merged into to one log, only events of interest will
be made viewable in the unified log.
In addition, a list of available devices available to setup new
routing paths on the router is provided as input to the router module
through a config file.
In the following section, we present an experimental evaluation of
using our approach. Our evaluation consists of two parts: the first is
a system evaluation, which was about applying the system to
study the execution of several system and application software tools.
The second part is a performance evaluation of our system.
We analyzed the installation and execution of several applications
in DSEEs created using our system. Below we describe four candidates
from our experiments.
Address Book leak in SquirrelMail Squirrelmail is a Mail
User Agent (MUA) package written in PHP4. Being a web based user
agent, it interacts with a web-server in addition to a mail server.
The functionality of the program is triggered through many links and
buttons on the web page interface. For SquirrelMail, since the
interface is web-based, we tried to understand the functionality that
interacts solely with the web server, as opposed to that which also
interacts with the mail server. Understanding the nature of
information stored in a web server is critical as the protection of
data stored in a web server is an important issue. So we installed
Squirelmail with its default configuration in a DSEE, and observed the
actions during installation.
After the installation, we tested the various options in
Squirrelmail by trying out the various options in the web interface.
One such interface is the address book interface that allows a user to
add or remove entries from his address book. Once that interface was
tried, the results of the system pointed to file modifications on the
web server. We observed that the default configuration resulted in
placing the data subdirectory that holds the address
book information under the top-level Squirrelmail directory. If this
URL is known, an arbitrary user can access the (private) address book
information of any other user. The URL is normally known to any user
of the system, and is easily guessable if one knows the presence of a
Squirrelmail installation on a server. This directory needs to be
protected from being directly accessible in order to protect the
privacy address book information.
Our tool enabled us to correlate the action of creating an address
book entry on the client to the location that it was stored in the
server and therefore uncover the vulnerability of address book
information leak. Changing the access permissions for the directory
subsequently solved this problem.
Remote web server upgrading Several systems exist that
perform upgrades/installation from a remote machine. For instance,
Webmin [9] is one such tool. The primary purpose of such tools is to
simplify desktop administration. Although this purpose is achieved by
such tools, they do not provide a way to recover from any problems
during installation. For instance, if the installation of a package
using a remote administration tool is not successful it is difficult
to recover from such errors. Configuration files that are overwritten
may be lost during the installation process. (Using a backup
procedure, the system administrator can save these files, but this
requires knowing in advance which files are being overwritten). Using
our approach, we can perform the installation without the fear of
damaging the system in any way, and then finally inspect the system to
see if the changes made by the installation are desirable, and then
commit these changes. If the installation does not proceed as
expected, they can go back to the original state of the system.
To study the use of the Webmin administration tool, we upgraded
the apache web server program from a remote client machine (in a
DSEE). We upgraded the http package (that contains the apache web
server) version 2.0.55 to http-2.0.58 through the Webmin tool
remotely. After the installation, we tested whether the installation
process worked fine by testing the new version of the web server.
Also, we observed for any modifications to existing files. In both
cases, there were no problems resulting from the installation. Hence
these changes were successfully committed into our system.
Debugging Mgboard Configuration Mgboard [6] is web-based
message board on apache with php. Mgboard uses an internal flat-file
database rather than an external SQL database such as mySQL. Using
Mgboard, a user can post articles and upload files to the website. In
interactive programs such as Mgboard, it would be a tedious task to
identify misconfigurations. For example, the files that store system
configuration data for Mgboard needs to be group-writable and other
when create database file (for public writing). As server-side scripts
are hard to debug, any misconfigurations in Mgboard (which is
indirectly powered by the Phpadmin program) are hard to detect. Also
it is difficult to know which actions (such as addition/update of
posts) are affected by this misconfiguration. However with the use of
a DSEE, it is easy to know which actions were triggered (specifically,
which files were executed) and thereby reason through the CGI script
operations.
To check this, we performed two experiments. In the first
experiment, we intentionally planted certain misconfigurations in the
remote client, by introducing file permission errors. In the second
experiment, using a web client, we created a new page on the web
server. While posting an article from the client on to the server,
using our tool, we observed creation of temporary files in the
/tmp directory of the server. This helped us to
investigate the possibility of a local race condition vulnerability
that could result through the creation of this temporary file. Such a
race condition could happen if arbitrary users can overwrite this
file. Such reasoning is possible with our system as the unified log
presents the temporal sequence of such actions and writing custom
filters can identify and `zoom in' on the error.
Configuration errors in Proftpd ProFTPD [3] is a ftp server
written for use on a UNIX-like operating system. A ftp server allows
the remote user to perform operations on remote file systems, and even
send site-specific commands. It is therefore important to test the
software installation and check for the potential actions of the
server when it interacts with a client. Occasionally the settings may
dictate account users or even anonymous users can inquire or change
file systems explicitly by using remotely issued commands.
We tried this as a candidate example for testing the ProFTPD
server. We observed that on installation, the system executes an init
script, which results in two main actions: creation of user and group
ids for the server. Another configuration file that was created was
the /usr/local/etc/proftpd.conf. Also during our
installation, the server was configured to restrict users' access with
DefaultRoot in proftpd.conf but an
accidental system configuration error resulted in the option
#Default Root . When the service was started, we
exercised the gftp client to change the working
directory to the root directory and store a file. The usual
expectation on part of the server administrator was to have the file
stored in user home root, but due to this misconfiguration it
triggered a permission error. In this case, the unified view log shows
actual action sequence with operations on the file system starting
from operations on the client to the server. It pinpointed the source
of the error to the DefaltRoot option. Such
configuration errors can be debugged effectively with our approach,
which allows one to focus specifically on the results of a particular
action in a program.
Performance Evaluation
The second part of our evaluation is the of the performance of our
system with several client-server applications.
We describe the experimental setup first. We used several virtual
machines hosted in machine for all our experiments. The machine is an
AMD Sempron 2400+ CPU with 2 GBytes memory, running the Red Hat
Enterprise Linux distribution. The virtual machines also run the
RedHat Enterprise 4. Each virtual machine instance runs SUEZ with
router and service handler, and any associated client or server
programs.
We classify the performance measurement experiments into three
categories:
-
Client-side overheads. These overheads in the client side may
result from the monitoring overhead through SUEZ.
-
Server side overheads. These overheads result from the
monitoring overhead at the server.
-
Network delays. These are overheads introduced due to the
routing and service isolation in SUEZ.
-
Service and program launching overheads. Since the service
redirection and service program startup are done on demand, this may
introduce additional overheads during the start of a session.
We discuss all the overheads in detail below.
Client-side overheads We recall that our system uses system call
interposition at the client to track the actions of the client, and
any possible communication messages to the server. Such interception
facilities are implemented using ptrace mechanisms available in
the Linux operating system. We have measured the performance as a
ratio of the combined system and user time and compared them for the
following situations: a) without any monitoring b) with the use of our
monitoring mechanism. Figure 5 shows the performance overheads at the
client. The performance numbers show overall execution times with and
without the monitor.
Figure 5: Client-side overheads.
In the figure, we have measured the performance of four desktop
clients while performing the experiments mentioned in the previous
section. The results show that the overhead due to system call
interposition vary for various clients ranging from 68 to 325%. The
difference in overhead is due to the frequency of system calls invoked
by these different clients. In addition, an entirely a user-level
mechanism such as ptrace suffers from moderate to high
overheads [15]. These high overheads due to the context switching
associated with the process that performs the monitoring. A kernel
level mechanisms typically has overheads in the range of 10-15% as
evidenced by earlier work on kernel level mechanisms for one-way
isolation [23].
Server-side overheads For servers, we measure the overheads
differently. Since most servers continue to run even after servicing
client requests, it is not possible to measure the overheads in a
manner similar to that of the clients. To measure the overheads on the
server, we have measured the mean response time of the server at each
client. Recall that our system monitors the system calls made from the
client, and on a connect system call, sets up routing
paths and starts the corresponding services. Therefore, the response
time is measured (at the client) as the difference from first connect
system call to first or last recv call on each
client's log.
Figure 6 shows the mean response times at the client. This
response time indicates the steady state overhead, i.e., the overhead
without the following two causes 1) any (one-time) routing overhead in
the computation of the virtual routing path and 2) overheads from
automatic start of the corresponding network service. As shown in the
figure, the response times for various server programs are within a
factor of two. For trial installation purposes we consider such
overheads acceptable. Moreover, a kernel level patch to the isolation
mechanism will reduce the response time overhead.
Figure 6: Server response times.
Route Computation Overhead
In order to create a dynamic routing path, new network interfaces
are needed to be initialized on the router and the server host
respectively. Delays are introduced at the router (in the control
channel module implementation) to find any available hosts for
assignment of new IP address.
We compute the overheads for CUPS, Webmin, Proftpd, Sendmail and
CVS. overhead also as the difference between mean server response time
at the client, with and without route computation. In order to measure
this overhead introduced by the dynamic route computation and the
service handler, we obtained the following measurements of the
programs used in our experiment. These are a) the relative mean
response time without the system, b) with the use of isolation but not
using routing and service function on servers and c) with host-level
isolation and the use of dynamic routing and service handling. The
performance for all the seven applications that were used in the
server side overhead experiments (described above) were measured. The
time stamps on the client were recorded in the log for each network
and file related operation. For web based programs, the mean time
difference from first connect to first recv system call was measured.
For sendmail, the difference between the first connect
to last socket write was measured. For proftpd the
difference between the first and second connect system
calls made (the first call is made for getting the data channel for
the file transfer). For CVS, the mean time to create
the .cvspass was measured.
We measured the delays introduced due to the routing process. This
delay does not depend on the specific application that was used. We
measured this delay as a average delay as perceived by the client. The
mean delay introduced due to the router (as perceived by the client)
was measured to be 0.125 seconds.
Routing and Service Launching Overhead
Once a host receives a request for a service, it needs to start
the service and subsequently the server responds to the request. We
measured the service launching time for each of the server
applications tested. We measured this as an average delay perceived by
each client. Also, to avoid routing delays from entering the picture,
we set static routes from the client to the server. These are the
delays for the server applications: sendmail (3.8 s), apache (3.8 s),
Proftp (2.2 s), and Webmin (1.6 s).
We note that services can be started using inetd,
and may not result in overheads when the client contacts the server
host for the first time. Such service programs can be traced
dynamically (i.e., attached to the monitoring process). This will
result in much lesser overheads. We also note that at the time of
interception of the original network operation, the route is created
and the service is launched transparently before the actual connection
request from the client is sent.
Conclusion
A set of questions that the system administrator typically has
when performing any trial execution are:
-
During the trial experiment, does this piece of software cause
conflicts with other packages such as overwriting configuration files?
-
After installation of a package, does it work co-operatively well with
existing software within the entire network?
-
Does any of the features of a piece of software malfunction, even
though it may seem to work well apparently?
-
Is it safely deployable in the network? Does it violate the network
privacy and security policies? Does it install files in hidden
locations?
The system we describe in this paper, called SUEZ, is designed to
support assist a system administrator in answering these questions. To
achieve this our system employs one-way isolation of local and remote
operations inside a distributed safe execution environment. In
addition to satisfying main goal of providing support for study and
experimentation with software, our approach has numerous other
benefits. It requires no access to source code of the applications
that need to be studied; it is cost-effective in being able to utilize
virtual machine technology for dynamically configuring hosts and
routes, and customizable to various situations that one may encounter
in system administration practice. We believe that our approach has
the potential to be applicable in several day to operations involving
system trials, reverse engineering and troubleshooting.
Acknowledgments
We thank Zhenkai Liang, Weiqin Sun and R. Sekar for many
discussions about the implementation of distributed isolation
operations that formed the basis for writing this paper. We also thank
our shepherd Narayan Desai and the anonymous referees for reading our
text and providing many useful suggestions that have improved the
contents this paper. Finally, we acknowledge Rob Kolstad's help with
typesetting of the manuscript.
Author Biographies
Doo San Sim is a graduate student in Computer Science at
University of Illinois at Chicago. His research interests are in
computer security, mainly in addressing security in software
installations. He can be reached by email at .
Dr. V. N. Venkatakrishnan is an Assistant Professor of Computer
Science at the University of Illinois at Chicago. He is currently co-director
of the Center for Research and Instruction in Technologies
for Electronic Security (rites.uic.edu). His main research area is
computer and network security. Specific research areas include malware
detection, software security and personal information privacy. He is
available by email at .
Bibliography
[1] Common UNIX printing system, https://www.cups.org.
[2] Controls the system log, Man pages.
[3] Professional FTP, https://www.proftpd.org.
[4] Strace, https://www.liacs.nl/~wichert/strace.
[5] Vmware, https://www.vmware.com.
[6] A web board not using Sqldb, https://www.phpschool.com
.
[7] Acharya, A., and M. Raje, ``Mapbox: Using parameterized
behavior classes to confine applications,'' USENIX Security
Symposium, 2000.
[8] Brown, A. and D. Patterson, ``Undo for operators: Building an
undoable e-mail store,'' USENIX Annual Technical Conference,
2003.
[9] Cameron, J., A web-based interface for system
administration for UNIX, https://www.webmin.com.
[10] Cespedes, J., A library call tracer, Man pages.
[11] Chen, P. M. and B. D. Nobl, ``When virtual is better than
real,'' Proceedings of Workshop on Hot Topics in Operating
Systems, 2001.
[12] Chiueh, T., H. Sankaran, and A. Neogi, ``Spout: A
transparent distributed execution engine for java applets,''
International Conference on Distributed Computing Systems
(ICDCS), 2000.
[13] Dan, A., A. Mohindra, R. Ramaswami, and D. Sitaram,'
Chakravyuha: A sandbox operating system for the controlled
execution of alien code, Technical report, IBM T.J. Watson
research center, 1997.
[14] Goldberg, I., D. Wagner, R. Thomas, and E. A. Brewer, ``A
secure environment for untrusted helper applications: confining the
wily hacker,'' USENIX Security Symposium, 1996.
[15] Liang, Z., V. Venkatakrishnan, and R. Sekar, ``Isolated
program execution: An application transparent approach for execution
of untrusted programs,'' ACSA Computer Applications Security
Conference (ACSAC), Las Vegas, December, 2003.
[16] Malkhi, D. and M. K. Reiter, ``Secure execution of java
applets using a remote playground,'' Software Engineering, Vol.
26, Num. 12, 2000.
[17] Muniswamy-Reddy, K.-K., C. P. Wright, A. P. Himmer, and E.
Zadok, ``A versatile and user-oriented versioning file system,''
Proceedings of USENIX Conference on File and Storage
Technologies, 2004.
[18] Prevelakis, V. and D. Spinellis, ``Sandboxing
applications,'' Proceedings of Usenix Annual Technical Conference:
FREENIX Track, 2001.
[19] Provos, N., ``Improving host security with system call
policies,'' 2002.
[20] Recovery-oriented computing,
https://roc.cs.berkeley.edu.
[21] Santry, D. J., M. J. Feeley, N. C. Hutchinson, and A. C.
Veitch, ``Elephant: The file system that never forgets,''
Proceedings of Workshop on Hot Topics in Operating Systems,
1999.
[22] Scott, K. and J. Davidson, ``Safe virtual execution using
software dynamic translation,'' Proceedings of Annual Computer
Security Applications Conference, 2002.
[23] Sun, W., Z. Liang, V. N. Venkatakrishnan, and R. Sekar,
``One-way isolation: An effective approach for realizing safe
execution environments,'' NDSS, 2005.
[24] Whitaker, A., M. Shaw, and S. Gribble, ``Denali: Lightweight
virtual machines for distributed and networked applications,''
Proceedings of USENIX Annual Technical Conference, 2002.
[25] Yu, Y., F. Guo, S. Nanda, L. Lam, and T. Chiueh, ``A
feather-weight virtual machine for windows applications,''
Proceedings of the 2nd ACM/ USENIX Conference on Virtual Execution
Environments (VEE'06), 2006.
Footnotes:
Note 1: In this paper, we use the
terms user and system administrator
interchangeably, unless otherwise mentioned.
|