Large scale malware collection : lessons learned

Canto, Julio; Dacier, Marc; Kirda, Engin; Leita, Corrado

SRDS 2008, 27th International Symposium on Reliable Distributed Systems, October 6-8, 2008, Napoli, Italy

In order to assure accuracy and realism of resilience assessment methods and tools, it is essential to have access to field data that are unbiased and representative. Several initiatives are taking place that offer access to malware samples for research purposes. Papers are published where techniques have been assessed thanks to these samples. Definition of benchmarking datasets is the next step ahead. In this paper, we report on the lessons learned while collecting and analysing malware samples in a large scale collaborative effort. Three different environments are described and their integration used to highlight the open issues that remain with such data collection. Three main lessons are offered to the reader. First, creation of representative malware samples datasets is probably an impossible task. Second, false negative alerts are not what we think they are. Third, false positive alerts exist where we were not used to see them. These three lessons have to be taken into account by those who want to assess the resilience of techniques with respect to malicious faults.

Detail

Document

BIBTEX

Type:

Conférence

City:

Napoli

Date:

2008-10-08

Department:

Sécurité numérique

Eurecom Ref:

2648

© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.