An analysis of human-in-the-loop approaches for binary analysis automation

Mantovani, Alessandro

Thesis

In system and software security, one of the first criteria before applying an analysis methodology is to distinguish according to the availability or not of the source code.
When the software we want to investigate is present in binary form, the only possibility that we have is to extract some information from it by observing its machine code, performing what is commonly referred to as emph{Binary Analysis} (BA).
The artisans in this sector are in charge of mixing their personal experience with an arsenal of tools and methodologies to comprehend some intrinsic and hidden aspects of the target binary, for instance, to discover new vulnerabilities or to detect malicious behaviors.

Although this human-in-the-loop configuration is well consolidated over the years, the current explosion of threats and attack vectors such as malware, weaponized exploits, etc. implicitly stresses this binary analysis model, demanding at the same time for high accuracy of the analysis as well as proper scalability over the binaries to counteract the adversarial actors.
Therefore, despite the many advances in the BA field over the past years, we are still obliged to seek novel solutions.

In this thesis, we take a step more on this problem, and we try to show what current paradigms lack to increase the automation level.
To accomplish this, we isolated three classical binary analysis use cases, and we demonstrated how the pipeline analysis benefits from the human intervention.
In other words, we considered three human-in-the-loop systems, and we described the human role inside the pipeline with a focus on the types of feedback that the analyst ``exchanges'' with her toolchain.
These three examples provided a full view of the gap between current binary analysis solutions and ideally more automated ones, suggesting that the main feature at the base of the human feedback corresponds to the human skill at comprehending portions of binary code.

This attempt to systematize the human role in modern binary analysis approaches tries to raise the bar towards more automated systems by leveraging the human component that, so far, is still unavoidable in the majority of the scenarios.
Although our analysis shows that machines cannot replace humans at the current stage, we cannot exclude that future approaches will be able to fill this gap as well as evolve tools and methodologies to the next level.
Therefore, we hope with this work to inspire future research in the field to reach always more sophisticated and automated binary analysis techniques.

Detail

Document

HAL

BIBTEX

Type:

Thesis

Date:

2022-03-25

Department:

Digital Security

Eurecom Ref:

6839