Binary analysis for Linux and IoT malware

Cozzi, Emanuele
Thesis

For the past two decades, the security community has been fighting malicious
programs for Windows-based operating systems. However, the increasing
number of interconnected embedded devices and the IoT revolution
are rapidly changing the malware landscape. Malicious actors did not stand
by and watch, but quickly reacted to create “Linux malware”, showing an
increasing interest in Linux-based operating systems and platforms running
architectures different from the typical Intel CPU. As a result, researchers
must react accordingly, in order to adapt the techniques and toolchains that
they initially designed to analyze Windows malware. While Linux malware
can reuse well-known patterns and behaviors, the binary analysis of Linux
and IoT binaries requires to tackle specific new challenges.
Through this thesis, we navigate the world of Linux-based malicious
software and highlight the problems we need to overcome for their correct
analysis. After a systematic exploration of the challenges involved in the
analysis of Linux malware, we present the design and implementation of the
first malware analysis pipeline, specifically tailored to study this emerging
phenomenon. We use our platform to analyze over 322K samples and collect
detailed statistics and insights that can help to direct future works.
We then apply binary code similarity techniques to systematically reconstruct
the lineage of IoT malware families, and track their relationships,
evolution, and variants. We apply our approach on a dataset collected over
a period of 5.7 years, and we show how the free availability of source code
resulted in a very large number of variants, often impacting the classification
of antivirus systems.
Last but not least, we address a major problem we encountered in the
analysis of statically linked executables. In particular, we present a new
approach to identify the boundary between user code and third-party libraries,
such that the burden of libraries can be safely removed from binary
analysis tasks.

HAL
Type:
Thesis
Date:
2020-12-14
Department:
Digital Security
Eurecom Ref:
6364
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/6364