Over the last decade, there has been a significant increase in the number and sophistication of malware-related attacks and infections. Many detection techniques have been proposed to mitigate the malware threat. A running theme among existing detection techniques is the similar promises of high detection rates, in spite of the wildly different models (or specification classes) of malicious activity used. In addition, the lack of a common testing methodology and the limited datasets used in the experiments make difficult to compare these models in order to determine which ones yield the best detection accuracy.
In this paper, we present a systematic approach to measure how the choice of behavioral models in uences the quality of a malware detector. We tackle this problem by executing a large number of testing experiments, in which we explored the parameter space of over 200 different models, corresponding to more than 220 million of signatures. Our results suggest that commonly held beliefs about simple models are incorrect in how they relate changes in complexity to changes in detection accuracy. This implies that accuracy is non-linear across the model space, and that analytical reasoning is insufficient for finding an optimal model, and has to be supplemented by testing and empirical measurements.