Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations

None, None; None, None; None, None

Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations

Conference Paper (2021)

Author(s)

L.H. Applis (TU Delft - Software Engineering)

Annibale Panichella (TU Delft - Software Engineering)

Arie Van van Deursen (TU Delft - Software Technology)

Research Group

Software Engineering

Copyright

Machine Learning Deep learning Metamorphic Testing Documentation Generation Code-To-Text

To reference this document use:

https://resolver.tudelft.nl/uuid:250720e7-34f8-4718-adf4-7191d3c4b48d

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Research Group

Software Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Metamorphic testing is a well-established testing technique that has been successfully applied in various domains, including testing deep learning models to assess their robustness against data noise or malicious input. Currently, metamorphic testing approaches for machine learning (ML) models focused on image processing and object recognition tasks. Hence, these approaches cannot be ap- plied to ML targeting program analysis tasks. In this paper, we extend metamorphic testing approaches for ML models targeting software programs. We present Lampion, a novel testing frame- work that applies (semantics preserving) metamorphic transforma- tions on the test datasets. Lampion produces new code snippets equivalent to the original test set but different in their identifiers or syntactic structure. We evaluate Lampion against CodeBERT, a state-of-the-art ML model for Code-To-Text tasks that creates Javadoc summaries for given Java methods. Our results show that simple transformations significantly impact the target model be- havior, providing additional information on the models reasoning apart from the classic performance metric.

Files

Paper.pdf

(pdf | 0.353 Mb)