Paper Summary: Obfuscation Revealed — Leveraging Electromagnetic Signals for Obfuscated Malware Classification

This is a paper that appeared in ACSAC 2021.

TL;DR: The paper proposes a methodology to use side channel information (electromagnetic signals (EM)) to be able to identify malware activity, malware type and identity, even in the presence of obfuscation.

Internet of Things (IoT) have become ubiquitous in the last decade. From home security cameras, to our ovens and fridges, everything you can imagine is becoming smarter, with the help of IoT. One of the hinderances to wider adoption of IoT, as you might expect, is security. The core problem is that security usually is seen as an afterthought process rather than a core component of device's design and software implementation.

There are two major ways of detecting malware in software today: static or dynamic analysis. Static analysis is great, possibly more reliable but failed to detect in the existence of obfuscation. At the end, as our security tools become more sophisticated, hackers become more sophisticated as well, a vicious cycle. Therefore dynamic analysis to detect malware is a more resilient approach than a static analysis one might think. However, malicious actors also become more dynamic and try to detect the "detection". Once successful, the malware ceases the operation and becomes more difficult for security tools to reveal it.

This paper proposes a different way of detecting malware. Instead of installing a software to your device and monitor the activities, the paper proposes to use side channel information, i.e. EM. The idea of using side channel information to detect malware, or anomalous activities in general is not a new one. Beyond EM, people in the field also tried "heat" as another source to detect such activities. The paper claims the following differences from the state-of-the-art:

On-the-wild malware detection instead of proof-of-concept samples.
Examination of scenarios that involve benign dataset and obfuscated malware to test for robustness.
Most existing works use anomaly detection with small sample size which have advantage of detecting any new different activity as anomalous but at the same time also prone to generate lots of false positives.
Experiments with wide-ranging classification models in real-world malware analysis: type, obfuscation or variant.

Malware dataset: 4,790 32-bit ELF ARM malware samples collected from Virusign. In total 19 different families detected, Mirai (43.5%) and Bashlite (35.8%) are the majorities. Combination of static and dynamic code obfuscation is used to enrich the dataset. Finally, benign dataset is added by monitoring and logging under the normal running conditions.

Methodology: Raw EM data is processed and spectrogram of the signal is computed to reduce the noise. Feature extraction using NICV is applied to reduce the amount of time and resources needed for classification. Two neural networks architectures are used for the experiments: MLP and CNN along with traditional machine learning algorithms, Naive Bayes and SVM.

Results: The paper presents promising results as it can identify the type of malware with >98% accuracy where CNN outperforms the others. It is also shown that it is effective in detecting novel malware by 99.38% accuracy. Obfuscation techniques are also classified with 82.70% accuracy, again CNN is outperforming the others.

The main use case of the paper is for analysts to have an environment where they can experiment with malware. When it comes to IoT security, would such approach work? One can imagine that scalability and cost can become a big challenge as we cannot put additional device next to each and every IoT device for sensing EM. Even though the approach is not intrusive and definitely helpful for analysts, we still need an approach that can work in scale to protect IoT devices from attacks.