pH performs two important functions: It monitors individual processes at the system-call level, and it automatically responds to anomalous behavior by either slowing down or aborting system calls. Normal behavior is determined by the currently running binary program; response, however, is determined on a per-process basis.
To minimize I/O requirements and maximize efficiency, stability, and security, we have implemented most of pH in kernel space. We considered several alternative approaches, including audit packages, system-call tracing utilities (such as strace), and instrumented libraries. However, each of these other approaches has serious drawbacks. Audit packages generate voluminous logfiles, which are expensive to create and even more expensive to analyze. Additionally, they do not routinely record every system call. User-space tracing utilities are too slow for our application, and in some cases, they interfere with privileged daemons to the extent that they behave incorrectly. Instrumented libraries cannot detect every system call, because not every system call comes through a library function (e.g., buffer overflow attacks). In addition, a kernel implementation allows us to put our monitoring and response mechanisms exactly where they are needed, in the system call dispatcher, and allows the implementation to be as secure as the kernel.
For each running executable, pH maintains two arrays of pair data: A training array and a testing array. The training array is continuously updated with new pairs as they appear; the testing array is used to detect anomalies, and is never modified except by replacing it with a copy of the training array. Put another way, the testing array is the current normal profile for a program, while the training array is a candidate future normal profile.
A new ``normal'' is installed by replacing the testing array with the
current state of the training array. The replacement occurs under
three conditions: (1) the user explicitly signals via a special system
call (sys_pH) that a profile's training data is valid; (2) the
profile anomaly count exceeds the parameter
; (3) the
training formula is satisfied. When an anomaly is detected, the
current system call is delayed according to a simple formula. Details
of these conditions and actions are given in the next several
paragraphs.
The training to testing copy can occur automatically based on the
state of the following training statistics:
When the training array meets all of the following conditions, it is copied onto the testing array (note: this is the normal mechanism for initiating anomaly detection in the system):
As we mentioned earlier, pH responds to anomalies by delaying system
call execution. The amount of delay is an exponential function of the
current LFC, regardless of whether the current call is anomalous or
not. The unscaled delay for a system call is
. The effective delay for a system call is
, where
is another user-defined
parameter. Note that delays may be disabled by setting
to 0. If the LFC ever exceeds the
parameter (which is 12 for the experiments
described below), the training array is reset, preventing truly
anomalous behavior from being incorporated into the testing array.
Because pH monitors process behavior based on the executable that is
currently running, the execve system call causes a new profile to be
loaded. Thus, if an attacker were able to subvert a process and cause
it to make an execve call, pH might be tricked into treating the
current process as normal, based on the data for the newly-loaded
executable. To avoid this possibility the maximum LFC count (maxLFC)
for a process is recorded. If maxLFC exceeds the
threshold, then all execve's are aborted for the anomalous process.
pH also keeps a count of the raw number of anomalies each profile has
seen. This count can be seen as a measure of ongoing, non-clustered
abnormal behavior. If this number exceeds the parameter
, pH automatically copies the training array to the
testing array, causing pH to treat similar future behavior as normal.
Borrowing from immunology, we refer to this process as tolerization. Low values of
allow pH to
automatically tolerize most novel behavior, while higher values
inhibit tolerization. When a system is initially set up,
automatically-created normal profiles may contain too little normal
behavior. To reduce the number of reported anomalies,
should be set to a small value (less than 10). Then,
once the system has stabilized,
should be set to at
least 20 to prevent pH from automatically learning the behavior of
attacks.