AA

A. Amalan

info

Please Note

2 records found

Unraveling Malware Genomics: Synergistic Approach using Deep Learning and Phylogenetic Analysis for Evolutionary Insights

The rapid advancement of artificial intelligence technologies has significantly increased the complexity of polymorphic and metamorphic malware, presenting new challenges to cybersecurity defenses. Our study introduces a novel bioinformatics-inspired approach, leveraging deep learning and phylogenetic analysis to understand the evolutionary dynamics of such malware. By analyzing a dataset of 103,883 malware samples, we transformed extracted features using pseudo-static, dynamic, and image analyses into embeddings with deep learning techniques, combining them into what we refer to as the "genome" of malware. These combined embeddings were used to construct phylogenetic trees employing the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) and the Neighbor-Joining (NJ) method.We were the first to utilize OpenAI's state-of-the-art embeddings for converting pseudo-static and dynamic features into embeddings. In addition, we discovered that transfer learning with ResNet-50 is highly effective compared to traditional CNNs, producing better image embeddings that outperform others in terms of classification accuracy.

We also introduced new validation techniques for phylogenetic trees, making use of VirusTotal timestamps and embedding drift analysis. These methods confirmed that the NJ method was more accurate. Furthermore, we developed techniques to simplify the analysis of these extensive phylogenetic trees, enabling efficient derivation of relationships within and between malware families. The insights from our NJ-built phylogenetic trees closely align with public data and lay a foundation for generating evolutionary-informed signatures that enhance tailored detection strategies. Our method has significantly expedited the process of identifying connections among 538 malware families by dramatically reducing the timeframe from months or years to just weeks much faster than traditional reverse engineering approaches for tracing malware evolution. ...
Bachelor thesis (2022) - A. Amalan, K. Liang, R. Wang, J. Urbano Merino
Federated learning is an emerging concept in the domain of distributed machine learning. This concept has enabled GANs to benefit from the rich distributed training data while preserving privacy However,in a non-iid setting, current federated GAN architectures are unstable, struggling to learn the distinct features and vulnerable to mode collapse. In this paper, we propose a novel architecture MULTIFLGAN to solve the problem of low-quality images, mode collapse and instability for non-iid datasets. Our results show that MULTI-FLGAN is four times as stable and performant (i.e. high inception score) on average over 20 clients compared to baseline FLGAN. ...