Circular Image

T. Crull

info

Please Note

1 records found

Dynamic malware analysis produces large amounts of behavioural evidence, which can be difficult to interpret manually and too large to process directly with small Large Language Models (LLMs). This paper evaluates to what extent Qwen3-4B can distinguish between benign and malicious Windows executables using reduced CAPEv2 dynamic-analysis reports. To test this, we built a sandbox pipeline which executes samples in a Windows 10 Pro detonation VM, collects CAPEv2 reports, filters them down to the most relevant dynamic-analysis information, and feeds them to Qwen for classification. The reduced reports retain the process tree, domains and DNS activity, behavioural signatures, and their ATT&CK TTP and Malware Behavior Catalog mappings. The dataset consisted of 1082 malware samples from MalwareBazaar and 762 benign samples collected from PortableApps, PortableApps installers, the Sysinternals Suite, and Benign-NET. Two prompts were tested: one with only benign and malware as possible verdicts, and one which also allowed an inconclusive verdict. The first prompt achieved 73.01% recall and 60.91% precision, showing that Qwen could detect many malware samples but also misclassified many benign samples as malicious. The second prompt did not solve this issue, since more correct classifications became inconclusive than incorrect ones. Overall, Qwen3-4B shows some potential for dynamic malware analysis, but its high false positive rate makes it unsuitable as a standalone classifier without further improvements or fine-tuning. ...