Next: System Performance
Up: Methodology for Building Data
Previous: Performance on New Executables
We also evaluated the performance of the models in detecting known
executables. For this task, the algorithms generated detection models for the
entire set of data. Their performance was then evaluated by testing the
models on the same training set.
Table 2:
Results of the algorithms after testing on known
programs. In this task both algorithms detected over 99% of known malicious
programs. We explain later the data mining algorithm can be boosted to have
100% accuracy by using some signatures.
Profile |
Detection |
False Positive |
Overall |
|
|
|
|
Type |
Rate |
Rate |
Accuracy |
|
|
|
|
Signature Method |
100% |
0% |
100% |
|
|
|
|
Data Mining Method |
99.87% |
2% |
99.44% |
|
|
|
|
|
As shown in Table 2, both methods detected over 99% of
known executables. The data mining algorithm detected 99.87% of the malicious
examples and misclassified 2% of the benign binaries as malicious. However,
we have the signatures for the binaries that the data mining algorithm
misclassified, and the algorithm can include those signatures in the detection
model without lowering accuracy of the algorithm in detecting unknown binaries.
After the signatures for the executables that were misclassified during
training had been generated and included in the detection model, the data
mining model had a 100% accuracy rate when tested on known executables.
Figure 2:
ROC: This figure shows how the data mining method can be configured to
have different detection and false positive rates by adjusting the threshold
parameter. More secure settings should choose a point on the data mining line
towards the right, and applications needing fewer false alarms should choose a
point towards the left.
|
Next: System Performance
Up: Methodology for Building Data
Previous: Performance on New Executables
Matthew G. Schultz
2001-05-01