Performance on Known Executables

Next: System Performance Up: Methodology for Building Data Previous: Performance on New Executables

Performance on Known Executables

We also evaluated the performance of the models in detecting known executables. For this task, the algorithms generated detection models for the entire set of data. Their performance was then evaluated by testing the models on the same training set.

Table 2: Results of the algorithms after testing on known programs. In this task both algorithms detected over 99% of known malicious programs. We explain later the data mining algorithm can be boosted to have 100% accuracy by using some signatures.

Profile	Detection	False Positive	Overall
Type	Rate	Rate	Accuracy
Signature Method	100%	0%	100%
Data Mining Method	99.87%	2%	99.44%

As shown in Table 2, both methods detected over 99% of known executables. The data mining algorithm detected 99.87% of the malicious examples and misclassified 2% of the benign binaries as malicious. However, we have the signatures for the binaries that the data mining algorithm misclassified, and the algorithm can include those signatures in the detection model without lowering accuracy of the algorithm in detecting unknown binaries. After the signatures for the executables that were misclassified during training had been generated and included in the detection model, the data mining model had a 100% accuracy rate when tested on known executables.

**Figure 2:** ROC: This figure shows how the data mining method can be configured to have different detection and false positive rates by adjusting the threshold parameter. More secure settings should choose a point on the data mining line towards the right, and applications needing fewer false alarms should choose a point towards the left.
$\begin{figure} \begin{center} \psfig{figure=nb_roc.ps,angle=270,width=3in}\end{center}\end{figure}$

Next: System Performance Up: Methodology for Building Data Previous: Performance on New Executables

Matthew G. Schultz
2001-05-01