M. Izadi | TU Delft Repository

HyperSeq

A Hyper-Adaptive Representation for Predictive Sequencing of States

Conference paper (2025) - Roham Koohestani (author) , M. Izadi (author)

In the rapidly evolving world of software development, the surge in developers’ reliance on AI-driven tools has transformed Integrated Development Environments into powerhouses of advanced features. This transformation, while boosting developers’ productivity to unprecedented lev ...

The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models

Conference paper (2025) - J.B. Katzy (author) , R.M. Popescu (author) , A. van Deursen (author) , M. Izadi (author)

The recent rise in the popularity of large language models has spurred the development of extensive code datasets needed to train them. This has left limited code available for collection and use in the downstream investigation of specific behaviors, or evaluation of large langua ...

A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics

Conference paper (2025) - J.B. Katzy (author) , Yongcheng Huang (author) , Gopal Raj Panchu (author) , Maksym Ziemlewski (author) , Paris Loizides (author) , Sander Vermeulen (author) , A. van Deursen (author) , M. Izadi (author)

Large Language Models are essential coding assistants, yet their training is predominantly English-centric. In this study, we evaluate the performance of code language models in non-English contexts, identifying challenges in their adoption and integration into multilingual workf ...

The Impact of Generative AI on Creativity in Software Development

A Research Agenda

Journal article (2025) - Victoria Jackson (author) , B.V. Mr. Vasilescu (author) , Daniel Russo (author) , Paul Ralph (author) , Rafael Prikladnicki (author) , M. Izadi (author) , Sarah D'Angelo (author) , Sarah Inman (author) , Anielle Andrade (author) , André van der Hoek (author)

As GenAI becomes embedded in developer toolchains and practices, and routine code is increasingly generated, human creativity will be increasingly important for generating competitive advantage. This article uses the McLuhan tetrad alongside scenarios of how GenAI may disrupt sof ...

When People Come First

A Human-Centered Approach to Computer Science Education

Conference paper (2025) - Ilya Zakharov (author) , Liudmila Piatnitckaia (author) , A.B. Birillo (author) , Agnia Sergeyuk (author) , M. Izadi (author)

The rise of AI tools is reshaping computer science education, shifting the focus from coding skills to teaching students how to effectively use these technologies. Understanding students' mental models and fostering computational and metacognitive skills are now essential, as ove ...

In-IDE Human-AI Experience in the Era of Large Language Models

A Literature Review

Conference paper (2024) - Agnia Sergeyuk (author) , Sergey Titov (author) , M. Izadi (author)

Integrated Development Environments (IDEs) have become central to modern software development, especially with the integration of Artificial Intelligence (AI) to enhance programming efficiency and decision-making. The study of in-IDE Human-AI Experience is critical in understandi ...

Investigating the Performance of Language Models for Completing Code in Functional Programming Languages

A Haskell Case Study

Conference paper (2024) - Tim van Dam (author) , Frank van der Heijden (author) , Philippe de Bekker (author) , Berend Nieuwschepen (author) , Marc Otten (author) , M. Izadi (author)

Language model-based code completion models have quickly grown in use, helping thousands of developers write code in many different programming languages. However, research on code completion models typically focuses on imperative languages such as Python and JavaScript, which re ...

Correction to

The potential of an adaptive computerized dynamic assessment tutor in diagnosing and assessing learners’ listening comprehension (Education and Information Technologies, (2024), 29, 3, (3637-3661), 10.1007/s10639-023-11871-w)

Journal article (2024) - Mehri Izadi (author) , M. Izadi (author) , Farrokhlagha Heidari (author)

In the PDF of this article, the pages were incorrectly numbered as ‘2303–2327’ when it should have been ‘3637–3661’. The page range was found to be just correct in the HTML version of the article. The original article has been corrected.

Generative AI in Software Engineering Must Be Human-Centered

The Copenhagen Manifesto

Journal article (2024) - Daniel Russo (author) , Sebastian Baltes (author) , Niels Van Berkel (author) , Paris Avgeriou (author) , Fabio Calefato (author) , Beatriz Cabrero-Daniel (author) , G. Catolino (author) , M. Izadi (author) , Bogdan Vasilescu (author) , More authors (author)

A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

Conference paper (2024) - A.D. de Moor (author) , A. van Deursen (author) , M. Izadi (author)

Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be intrusive, especially when they suggest to ...

The potential of an adaptive computerized dynamic assessment tutor in diagnosing and assessing learners’ listening comprehension

Journal article (2023) - Mehri Izadi (author) , M. Izadi (author) , Farrokhlagha Heidari (author)

In today’s environment of growing class sizes due to the prevalence of online and e-learning systems, providing one-to-one instruction and feedback has become a challenging task for teachers. Anyhow, the dialectical integration of instruction and assessment into a seamless and dy ...

In today’s environment of growing class sizes due to the prevalence of online and e-learning systems, providing one-to-one instruction and feedback has become a challenging task for teachers. Anyhow, the dialectical integration of instruction and assessment into a seamless and dynamic activity can provide a continuous flow of assessment information for teachers to boost and individualize learning. In this regard, adaptive learning technology is one way to facilitate teacher-supported learning and personalize curriculum and learning experiences. This study aimed to investigate the potential of an adaptive Computerized Dynamic Assessment (C-DA) tool applicable as a language diagnostician and assistant. The study tried to get insight into 75 Iranian EFL learners’ listening development by focusing on the learning potential exhibited through learners’ assessment and the degree of internalization of mediation. To achieve these, a C-DA tutor including two dynamic listening comprehension tests, each comprising 20 items, arranged in the order of difficulty was developed. The test takers unable to answer an item correctly were provided with graduated hints for different comprehension- and production-type items and the overall difficulty level of the test was adapted to the test takers’ proficiency level. In order to have a full diagnosis of each individual’s listening development, the adaptive C-DA automatically generated five test scores on each learner’s performance: actual (unmediated) score, mediated score, gain score, Learning Potential Score (LPS), and transfer score. The results of paired-sample t-tests revealed a significant development from the actual to the mediated scores. Furthermore, the LPSs indicated that the tutor was capable of revealing learners’ potential for learning. Moreover, learners with high LPS gained a higher mean for transfer scores followed by transfer scores of medium and low levels. The results of Mann-Whitney tests revealed a significant difference in the degree of internalization of mediation of learners with mid and low range of LPSs on the easy test and high and low range of LPSs on the difficult test. The findings of this research can have important theoretical and practical implications for researchers and educationalists. The instructional value of this adaptive C-DA tool lies in its unique opportunities for individualizing learning and developing individual learning plans in accordance with learners’ needs.

Semantically-enhanced topic recommendation systems for software projects

Journal article (2023) - M. Izadi (author) , Mahtab Nejati (author) , Abbas Heydarnoori (author)

Software-related platforms such as GitHub and Stack Overflow, have enabled their users to collaboratively label software entities with a form of metadata called topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. ...

Software-related platforms such as GitHub and Stack Overflow, have enabled their users to collaboratively label software entities with a form of metadata called topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. For instance, a correct and complete set of topics assigned to a repository can increase its visibility. Consequently, this improves the outcome of tasks such as browsing, searching, navigation, and organization of repositories. Unfortunately, assigned topics are usually highly noisy, and some repositories do not have well-assigned topics. Thus, there have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. In this work, we propose two recommender models for tagging software projects that incorporate the semantic relationship among topics. Our approach has two main phases; (1) we first take a collaborative approach to curate a dataset of quality topics specifically for the domain of software engineering and development. We also enrich this data with the semantic relationships among these topics and encapsulate them in a knowledge graph we call SED-KGraph. Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph. The second predictive model, however, assumes there are no topics available for a repository, hence it proceeds to predict the relevant topics based on both textual information of a software project (such as its README file), and SED-KGraph. We built SED-KGraph in a crowd-sourced project with 170 contributors from both academia and industry. Through their contributions, we constructed SED-KGraph with 2,234 carefully evaluated relationships among 863 community-curated topics. Regarding the recommenders’ performance, the experiment results indicate that our solutions outperform baselines that neglect the semantic relationships among topics by at least 25% and 23% in terms of Average Success Rate and Mean Average Precision metrics, respectively. We share SED-KGraph, as a rich form of knowledge for the community to re-use and build upon. We also release the source code of our two recommender models, KGRec and KGRec+ (https://github.com/mahtab-nejati/KGRec).

Correction to

The potential of an adaptive computerized dynamic assessment tutor in diagnosing and assessing learners’ listening comprehension (Education and Information Technologies, (2024), 29, 3, (3637-3661), 10.1007/s10639-023-11871-w)

Journal article (2023) - Mehri Izadi (author) , M. Izadi (author) , Farrokhlagha Heidari (author)

The copyright holder in the original publication of this article was incorrect. The original article has been corrected.

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge [PRESENTATION]

Other (2023) - A. Al-Kaswan (author) , M. Izadi (author) , A. van Deursen (author)

Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is cha ...

The (ab)use of Open Source Code to Train Large Language Models

Conference paper (2023) - A. Al-Kaswan (author) , M. Izadi (author)

In recent years, Large Language Models (LLMs) have gained significant popularity due to their ability to generate human-like text and their potential applications in various fields, such as Software Engineering. LLMs for Code are commonly trained on large unsanitized corpora of s ...

STACC: Code Comment Classification using SentenceTransformers

Conference paper (2023) - A. Al-Kaswan (author) , M. Izadi (author) , A. van Deursen (author)

Code comments are a key resource for information about software artefacts. Depending on the use case, only some types of comments are useful. Thus, automatic approaches to clas-sify these comments have been proposed. In this work, we address this need by proposing, STACC, a set o ...

Enriching Source Code with Contextual Data for Code Completion Models

An Empirical Study

Conference paper (2023) - Tim van Dam (author) , M. Izadi (author) , A. van Deursen (author)

Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer’s toolkit. While many have striven to improve the code-understanding abilities of such models, ...

The NLBSE'23 Tool Competition

Conference paper (2023) - Rafael Kallis (author) , M. Izadi (author) , Luca Pascarella (author) , Oscar Chaparro (author) , Pooja Rani (author)

We report on the organization and results of the second edition of the tool competition from the International Workshop on Natural Language-based Software Engineering (NLBSE'23). As in the prior edition, we organized the competition on automated issue report classification, with ...

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

Conference paper (2023) - A. Al-Kaswan (author) , Toufique Ahmed (author) , M. Izadi (author) , Anand Ashok Sawant (author) , Premkumar Devanbu (author) , A. van Deursen (author)

Binary reverse engineering is used to understand and analyse programs for which the source code is unavailable. Decompilers can help, transforming opaque binaries into a more readable source code-like representation. Still, reverse engineering is difficult and costly, involving c ...

Predicting the objective and priority of issue reports in software repositories

Journal article (2022) - M. Izadi (author) , Kiana Akbari (author) , Abbas Heydarnoori (author)

Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Trackin ...

Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team’s effort. An issue report is a rich source of collaboratively-curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub’s top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82 % (fine-tuned RoBERTa) and 75 % (Random Forest) accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90 % accuracy on the sample set. We measure inter-rater reliability and obtain an average Percent Agreement of 85.3 % and Randolph’s free-marginal Kappa of 0.71 that translate to a substantial agreement among labelers.