RAID 2019, 22nd International Symposium on Research in Attacks, Intrusions and Defenses, September 23-25, 2019, Beijing, China
Within the realm of program analysis, dynamic analysis approaches are at the foundation of many frameworks. In the context of Android security, the vast majority of existing frameworks perform API-level tracing (i.e., they aim at obtaining the trace of the APIs invoked by a given app), and use this information to determine whether the app under analysis contains unwanted or malicious functionality. However, previous works have shown that these API-level tracing and instrumentation mechanisms can be easily evaded, regardless of their specific implementation details. An alternative to API-level tracing is syscall-level tracing. This approach works at a lower level and it extracts the sequence of syscalls invoked by a given app: the advantage is that this approach can be implemented in kernel space and, thus, it cannot be evaded and it can be very challenging, if not outright impossible, to be detected by code running in user space. However, while this approach offers more security guarantees, it is affected by a significant limitation: most of the semantics of the app’s behavior is lost. These syscalls are in fact low-level and do not carry as much information as the highly semantics-rich Android APIs. In other words, there is a significant semantic gap. This paper presents the first exploration of how much it would take to bridge this gap and how challenging this endeavor would be. We propose an approach, an analysis framework, and a pipeline to gain insights into the peculiarities of this problem and we show that it is much more challenging than what previously thought.
Copyright Usenix. Personal use of this material is permitted. The definitive version of this paper was published in RAID 2019, 22nd International Symposium on Research in Attacks, Intrusions and Defenses, September 23-25, 2019, Beijing, China and is available at :