Humans vs. machines in malware classification

Aonzo, Simone; Han, Yufei; Mantovani, Alessandro; Balzarotti, Davide

USENIX 2023, 32nd Usenix Security Symposium, 9-11 August 2023, Anaheim, CA, USA

Today, the classification of a file as either benign or malicious is performed by a combination of deterministic indicators (such as antivirus rules), Machine Learning classifiers, and, more importantly, the judgment of human experts. However, to compare the difference between human and machine intelligence in malware analysis, it is first necessary to understand how human subjects approach malware classification.

In this direction, our work presents the first experimental study designed to capture which ‘features’ of a suspicious program (e.g., static properties or runtime behaviors) are prioritized for malware classification according to humans and machines intelligence. For this purpose, we created a malware classification game where 110 human players worldwide and with different seniority levels (72 novices and 38 experts) have competed to classify the highest number of unknown samples based on detailed sandbox reports. Surprisingly, we discovered that both experts and novices base their decisions on approximately the same features, even if there are clear

differences between the two expertise classes. Furthermore, we implemented two state-of-the-art Machine Learning models for malware classification and evaluated their

performances on the same set of samples. The comparative analysis of the results unveiled a common set of features preferred by both Machine Learning models and helped better understand the difference in the feature extraction. This work reflects the difference in the decision-making process of humans and computer algorithms and the different ways they extract information from the same data. Its findings serve multiple purposes, from training better malware analysts to improving feature encoding.

Detail

Document

HAL

BIBTEX

Type:

Conférence

City:

Anaheim

Date:

2023-08-09

Department:

Sécurité numérique

Eurecom Ref:

7048

Copyright Usenix. Personal use of this material is permitted. The definitive version of this paper was published in USENIX 2023, 32nd Usenix Security Symposium, 9-11 August 2023, Anaheim, CA, USA and is available at :