Invoice #31415 attached

Automated analysis of malicious Microsoft Office documents

Journal article (2022)

Authors

Vasilios Koutsokostas University of Piraeus

Nikolaos Lykousas University of Piraeus

Theodoros Apostolopoulos University of Piraeus

Gabriele Orazi Università degli Studi di Padova

Amrita Ghosal Mary Immaculate College

Fran Casino University of Piraeus, Communication and Knowledge Technologies

M. Conti Università degli Studi di Padova

Constantinos Patsakis Communication and Knowledge Technologies, University of Piraeus

Affiliation

External organisation

Malware LOLBAS Macro malware Office documents Powershell

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:ba31b89e-46fc-44f8-a0b3-21e5f70feb87

Published Date

2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Affiliation

External organisation

Abstract

Microsoft Office may be by far the most widely used suite for processing documents, spreadsheets, and presentations. Due to its popularity, it is continuously utilised to carry out malicious campaigns. Threat actors, exploiting the platform's dynamic features, use it to launch their attacks and penetrate millions of hosts in their campaigns. This work explores the modern landscape of malicious Microsoft Office documents, exposing the means that malware authors use. We leverage a taxonomy of the tools used to weaponise Microsoft Office documents and explore the modus operandi of malicious actors. Moreover, we generated and publicly shared a specially crafted dataset, which relies on incorporating benign and malicious documents containing many dynamic features such as VBA macros and DDE. The latter is crucial for a fair and realistic analysis, an open issue in the current state of the art. This allows us to draw safe conclusions on the malicious features and behaviour. More precisely, we extract the necessary features with an automated analysis pipeline to efficiently and accurately classify a document as benign or malicious using machine learning with an F1 score above 0.98, outperforming the current state of the art detection algorithms.