In this paper we have proposed a new algorithm based on the k-Nearest Neighbor classifier method for modeling program behavior in intrusion detection. Our preliminary experiments with the 1998 DARPA BSM audit data have shown that this approach is able to effectively detect intrusive program behavior. Compared to other methods using short system call sequences, the kNN classifier does not have to learn individual program profiles separately, thus the calculation involved with classifying new program behavior is largely reduced. Our results also show that a low false positive rate can be achieved. While this result may not hold against a more sophisticated data set, the k-Nearest Neighbor classifier appears to be well applicable to the domain of intrusion detection.
The tf idf text categorization weighting technique was adopted to transform each process into a vector. With the frequency-weighting method, where each entry is equal to the number of occurrences of a system call during the process execution, each process vector does not carry any information on other processes. A new training process could be easily added to the training data set without changing the weights of the existing training samples. This could make the kNN classifier method more suitable for dynamic environments that require frequent updates of the training data.
In our current implementation, we used all the system calls to represent program behavior. The dimension of process vectors, and hence the classification cost, can be further reduced by using only the most relevant system calls.