Enhanced phishing detection using multimodal data

None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None

Enhanced phishing detection using multimodal data

Journal Article (2025)

Author(s)

Lázaro Bustio-Martínez (Universidad Iberoamericana)

Vitali Herrera-Semenets (Advanced Technologies Application Center)

Jorge Ángel González-Ordiano (Universidad Iberoamericana Ciudad de México)

Yamel Pérez-Guadarramas (Universidad Iberoamericana Ciudad de México)

Luis Zúñiga-Morales (Universidad Iberoamericana Ciudad de México)

Daniela Montoya-Godínez (Universidad Iberoamericana Ciudad de México)

Miguel Ángel Álvarez-Carmona (Centro de Investigacion en Matematicas, CIMAT)

Jan van den Berg (TU Delft - Cyber Security)

Research Group

Cyber Security

DOI related publication

https://doi.org/10.1016/j.knosys.2025.115105

To reference this document use:

https://resolver.tudelft.nl/uuid:464ccc05-2ab8-49cc-af8e-b097e2c6c7ec

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Cyber Security

Volume number

334

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Phishing remains one of the most persistent cybersecurity threats, increasingly exploiting not only technical vulnerabilities but also human cognitive biases. Existing detection systems often rely on single-modality features and black-box models, which restrict both generalization and interpretability. This study presents an explainable multimodal framework that combines textual and technical cues, including message content, URL structure, and Principles of Persuasion, to capture both objective and subjective aspects of phishing. Several classifiers were evaluated using 10-fold stratified cross-validation, with Random Forest achieving the best balance between performance and transparency (ROC-AUC = 0.9840), supported by SHAP explanations that identify the most influential linguistic and structural features. Comparative analysis shows that the proposed framework outperforms unimodal baselines while preserving interpretability, enabling a clear rationale for classification outcomes. These results indicate that integrating multimodal representation with explainable learning strengthens phishing detection accuracy, improves user trust, and supports reliable deployment in real-world environments.

Files

1-s2.0-S0167844225005580-main.... (pdf)

(pdf | 9.89 Mb)