J. Yang | TU Delft Repository

Reducing Carbon Emissions of Code Generation in Large Language Models with Line-level Completions

Master thesis (2025) - T.J. Nulle (author) , Arie Van Deursen (mentor) , Luís Cruz (mentor) , Jie Yang (graduation committee member)

This thesis investigates reducing carbon emissions in code generation using large language models (LLMs) by comparing function-level and line-level code completions across models of different sizes (1.5B and 9B parameters). The study utilises the BigCodeBench dataset, comprising ...

Dataset Development for LLMs4Code: Licensing, Contamination, and Reproducibility Challenges

Master thesis (2025) - R. Popescu (author) , Arie Deursen (mentor) , M. Izadi (mentor) , Jie Yang (graduation committee member)

The rapid rise in the popularity of large language models has highlighted the need for extensive datasets, especially for training on code. However, this growth has also raised important questions about the legal implications of using code in large language model training, partic ...

The rapid rise in the popularity of large language models has highlighted the need for extensive datasets, especially for training on code. However, this growth has also raised important questions about the legal implications of using code in large language model training, particularly regarding the potential infringement of code licenses. At the same time, the availability of clean datasets for evaluating these models is becoming increasingly limited, due to a high risk of contamination which restricts the capacity for reliable research. On top of that, this requires researchers to repeatedly perform data curation steps in order to evaluate their models on downstream tasks, based on previously unseen data. This process is not only time- and resource-intensive but also introduces potential inconsistencies across studies, which can impact their reproducibility.
We address these challenges through a comprehensive licensing analysis and by developing robust datasets to support accurate and reproducible large language model evaluations. We compiled a list of 53 large language models trained on file-level code and analyzed their datasets, discovering pervasive license inconsistencies despite careful selection based on repository licenses. Our analysis, covering 514M code files, reveals 38M exact duplicates of strong copyleft code, and 171M file-leading comments, 16M of which are under copyleft licenses and another 11M discouraging unauthorized copying. To further understand the depth of non-permissive code in public training datasets, we developed StackLessV2, a strong copyleft Java dataset decontaminated against The Stack V2 to facilitate accurate model evaluations. Our results revealed that non-permissive code is also present at the near-duplication level, although, this represents a gray area in terms of legal interpretation, where the boundary between acceptable reuse and license violation is still unclear, emphasizing the need for further legal clarification. Finally, we extend on this and introduce The Heap, a large multilingual copyleft dataset covering 57 programming languages, specifically deduplicated to avoid contamination from existing open training datasets. The Heap offers a solution for conducting fair, reproducible evaluations of large language models without the significant overhead of the data curation process.

Uncovering Sequential Social Dilemmas in Multi-Agent Reinforcement Learning

Challenges and Strategies for Local Energy Communities

Master thesis (2025) - M.T. Okoń (author) , L. Siebert (mentor) , Jochen Cremer (mentor) , J. Yang (graduation committee member)

This thesis investigates the occurrence and mitigation of Sequential Social Dilemmas (SSDs) in Local Energy Communities (LECs) managed through Multi-agent Reinforcement Learning (MARL). LECs have great potential as pivotal elements in the green energy transition, yet the inherent ...

Collaborative and Confidential Junction Trees for Hybrid Bayesian Networks

Master thesis (2025) - R. Gheda (author) , Y. Chen (mentor) , Thiago Guzella (mentor) , Carlo Lancia (mentor) , J. Yang (graduation committee member)

Bayesian Networks (BNs) are widely utilized across various industrial sectors to optimize processes, with an emerging focus on the collaboration across multiple parties. While most realistic scenarios require handling a mixture of categorical and continuous data simultaneously, t ...

Collaborative reflection on personal data: An approach for investigating context-related user experiences in recommender systems

Master thesis (2024) - Z. Wang (author) , Q. Wang (graduation committee member) , Jie Yang (graduation committee member) , Di Yan (graduation committee member)

Recommender systems are widely used in modern lives and contribute to many industries. Therefore, methods to evaluate and improve them are important. Nowadays, much research has been done to improve the system aspects such as algorithms. However, user experiences are not only aff ...

Deep Reinforcement Learning for Ride-hailing Systems

An experimental study on optimizing matching radius for ride-hailing systems using Deep Reinforcement Learning

Master thesis (2024) - H. Zhao (author) , J. Gao (mentor) , M. Mai (mentor) , O. Cats (graduation committee member) , J. Yang (graduation committee member)

In the field of public transportation, environmentally friendly and convenient transportation modes are the future trends. The ride-hailing services is an important component of them. However, current ride-hailing systems, particularly the matching systems, still have issues rela ...

In the field of public transportation, environmentally friendly and convenient transportation modes are the future trends. The ride-hailing services is an important component of them. However, current ride-hailing systems, particularly the matching systems, still have issues related to low system efficiency and bad user experience. Although existing ride-hailing rider-driver matching system can allocate travel demands and drivers to a certain extent, they still have deficiencies in certain scenarios. For example, they cannot ensure effective rider-driver matching during peak hours, or they cannot find a good balance between pick-up distance and matching rate. As Reinforcement learning (RL) has been proven in many studies to be applicable and effective in solving complex and dynamic optimization problems. This study aims to explore how Reinforcement Learning (RL) can be adapted to the ride-hailing matching system to optimize system efficiency and user experience through a dynamic matching radius policy. The research objective of this study is to simulate an actual ride-hailing system and use RL to train a policy. This policy can output an optimized dynamic matching radius in real-time based on real-time rider-driver demand-supply relationship, hence achieving a higher matching rate, a shorter average pick-up distance, and a higher driver utilization rate of the ride-hailing system.
Adapting Reinforcement Learning (RL) to optimize the ride-hailing system's matching radius has several difficulties and challenges due to the uncertainties in the real-world rider-hailing market. Traditional approaches are normally static, solving the matching problem at specific times through mathematical models. However, these methods often perform inconsistently when dealing with fluctuating ride-hailing supply-demand relationships, particularly during peak hours. On the other hand, the dynamics and complexity of the ride-hailing market and the ride-hailing environment also make it difficult to model the ride-hailing system. The ride-hailing market is easily affected by many variables, such as weather conditions and local traffic conditions. When quantitatively optimizing the matching radius of the ride-hailing matching system, it is critical to reasonably control irrelevant variables. To address these challenges, this study models the ride-hailing matching problem as a Markov Decision Process (MDP). Based on the defined MDP, a ride-hailing matching simulator is developed. Some assumptions and simplifications are also made to ensure high realism while reasonably controlling irrelevant variables and uncertainties. Multi-replay-buffer Deep Deterministic Policy Gradient (MDDPG) algorithm is then applied to handle the optimization problem of the ride-hailing matching radius. Through the interactions between the MDDPG agent and the developed simulator, feedback rewards are received for the agent to improve the policy. The proposed method is then validated in a case study showcasing the application of the developed simulator and the RL algorithm in a real-world scenario in Austin, Texas. The case study includes an analysis of the current ride-hailing market in Austin, how to apply the simulator based on it, the implementation details of the RL algorithm, and the resulting performance improvements. The results of the case study show that the actions obtained from the proposed method outperform all the baselines in multiple scenarios, highlighting the benefits of using Reinforcement Learning to improve ride-hailing efficiency and user experience.
To conclude, the optimization method proposed in this study applies an advanced Reinforcement Learning approach to the ride-hailing system, successfully improving overall efficiency and user experience. The results of this research demonstrate the potential of Reinforcement Learning in optimizing ride-hailing matching systems, offering a promising direction for further exploration. This study lays a solid foundation for future research to build upon, encouraging the development of more optimization methods with RL technologies that can enhance the effectiveness and adaptability of ride-hailing system in increasingly complex and dynamic environments.

Capturing Power: Feminist Considerations about Machine Learning Fairness

Bachelor thesis (2024) - A.N. Postu (author) , S.E. Carter (mentor) , Jie Yang (graduation committee member) , S.N.R. Buijsman (graduation committee member)

Machine Learning (ML) algorithms have the potential to reproduce biases that already exist in society, a fact that leads to scholarly work trying to quantify algorithmic discrimination through fairness metrics. Although there are now a plethora of metrics, some of them are even c ...

Machine Learning (ML) algorithms have the potential to reproduce biases that already exist in society, a fact that leads to scholarly work trying to quantify algorithmic discrimination through fairness metrics. Although there are now a plethora of metrics, some of them are even contradictory, making fairness become a problem of knowing which measurement to choose over another. Consequently, scholars began considering that fairness should be discussed by placing algorithms in their social contexts. Since (1) these social aspects are related to structures of discrimination and (2) feminism aims to criticise discrimination against the marginalised, I introduce the possibility of analysing the social context of ML algorithms through a feminist lens. By doing this, I highlight social and political aspects that are equally important to consider for a faithful discussion on fairness: corporate lobbying, the lack of diverse hiring which leads to fairness discussions that do not consider the experiences of marginalised groups and, lastly, the broader context that an algorithm is used in. Moreover, I emphasise how feminist ethics of care constitute an essential framework for a conversation about actually implementable fairness solutions, since it shows the need to listen to both the marginalised community and to the developers who might want to build fairer ML but currently cannot. Having built a bridge between the hegemony and the feminist camp, I highlight how Northpointe’s (now Equivant’s) Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm can be considered biased against black people. Through this, I illustrate how feminist considerations bring clarity to fairness debates by helping choose a fairness metric or by claiming that an algorithm is unfair by nature and should be abolished. To follow, I use the same feminist critiques to draw attention to the possible weak points of current sociotechnical solutions. For instance, the EU AI Act risks being too susceptible to company lobbying, leading to not strict enough regulations. Furthermore, the AI committee should ensure that they hire a diverse group of people in order to develop regulations that positively consider all marginalised groups. Lastly, I highlight how ethics education is essential for creating a new generation of responsible engineers. Considering this, I emphasise the urgency of making ethics courses at TU Delft (and not only) more interdisciplinary by interacting more with critique points coming from the social sciences. This will open up possibilities for more research tackling fairness from a multitude of perspectives.

Natural Language Counterfactual Explanations in Financial Text Classification

Master thesis (2024) - K.T. Dobiczek (author) , Cynthia C. S. Liem (mentor) , P. Altmeyer (mentor) , Jie Yang (graduation committee member)

Central banks communicate their monetary policy plans to the public through meeting minutes or transcripts. These communications can have immense effects on markets and are often the subjects of studies in the financial literature. The recent advancements in Natural Language Proc ...

Robot-Assisted Music Making to Foster Creativity in Older Adults

Master thesis (2024) - A. Magadi Rajeev (author) , Mark Neerincx (mentor) , M.L. Tielman (mentor) , F. Broz (mentor) , J Yang (graduation committee member)

This project aims to harness the potential of music-making and robotic interaction to enhance creative expression and cognitive function among individuals with cognitive impairment and dementia. With the aging population, there is a growing demand for innovative interventions tha ...

This project aims to harness the potential of music-making and robotic interaction to enhance creative expression and cognitive function among individuals with cognitive impairment and dementia. With the aging population, there is a growing demand for innovative interventions that support cognitive health and active engagement. Music therapy has demonstrated effectiveness in stimulating cognitive function and emotional expression in individuals with dementia. Creative expression through music serves as a unique outlet, fostering cognitive functions and emotional well-being. This study explores the synergy between music therapy and Socially Assistive Robots (SARs) to develop a more immersive therapeutic intervention. It represents an exploratory investigation into an end-to-end robotic intervention, proposing various interaction elements and examining their functionalities. Each element is designed to foster engagement, enhance perceptions of collaboration, and promote feelings of creativity.
In this thesis, we propose an end-to-end interactive music-making experience designed for use with the Pepper robot, an SAR. The system features a user-friendly interface with eight color-coded boxes, each corresponding to a musical note. Users simply tap the boxes to create melodies. The Pepper robot acts as a guide, assisting users in interacting with the interface. It additionally implements an engagement tracking system by monitoring user interaction through the screen taps on the interface and provides real-time feedback and encouragement. If a period of inactivity is detected, Pepper gently nudges the user to re-engage. Furthermore, the robot functions as a collaborative musical partner, providing rhythmic accompaniment if the user desires. The system also records user-created music and provides playback functionality, allowing users to revisit their compositions.
Methodologically, the study involves an end-to-end system comprising an intelligent music-making interface and an interactive robot providing real-time feedback and rhythmic accompaniment. Insights from the exploratory study highlight the benefits of real-time feedback in enhancing engagement, particularly among participants with musical backgrounds. However, rhythmic accompaniment shows mixed results in fostering collaboration, indicating a preference among participants for emotional connection in collaborative settings. Since this is an exploratory study, the empirical study focuses on healthy older adults, a population with an increased risk of cognitive decline. This is because individuals with dementia are a vulnerable group. Music interventions have shown promise in improving cognitive function and engagement in individuals with dementia. Therefore, this study informs the design of future interventions for people with early-stage dementia.\\
Key findings underscore the potential of real-time feedback and interaction in promoting engagement in the activity. The intelligent music interface also shows potential to support creative exploration, albeit with improvements needed for advanced musical participants. Participants appreciate the playback feature, enhancing their sense of creative ownership and motivation. Despite promising outcomes, the study acknowledges limitations in sample size and participant demographics, primarily recruiting from music-engaged older adults rather than the target demographic of individuals with cognitive impairments.
Future research directions include expanding participant diversity, refining robot interaction capabilities, and addressing technical challenges to improve system usability and accessibility. Integrating findings from ongoing research on music and memory could further enhance personalized interventions. Ultimately, this study lays the groundwork for future developments in robotic interventions that promote well-being through music therapy for individuals with cognitive impairments.

Black-box context-aware code completion

Enhancing consumer-facing code completion with low-cost general enhancements

Master thesis (2024) - T.O. van Dam (author) , Maliheh Izadi (mentor) , Arie Deursen (mentor) , Egor Bogomolov (mentor) , Jie Yang (graduation committee member)

Algorithmic Solutions for Improved Carrier-Shipper Matching in a Competitive Transport Marketplace

Master thesis (2024) - T. Huisman (author) , N. Yorke-Smith (mentor) , A. Giudici (mentor) , J. Yang (graduation committee member)

UTURN aims to maximize the matching rate on its freight transport platform by efficiently connecting shippers with suitable carriers. To support this matching process, UTURN required a solution that was additive rather than restrictive on the platform. To achieve this, our r ...

Instruction Tuning for Domain Adaptation of Large Language Models

A Case Study in the Field of Education

Master thesis (2024) - J. Zhang (author) , Maria S. Pera (mentor) , Jie Yang (graduation committee member) , Maliheh Izadi (graduation committee member) , Gaole He (graduation committee member)

While most large language models (LLMs) are powerful, they are primarily designed for general purposes. Consequently, many enterprises and institutions have now focused on developing domain-specific models. In the realm of education, an expert LLM can significantly enhance studen ...

While most large language models (LLMs) are powerful, they are primarily designed for general purposes. Consequently, many enterprises and institutions have now focused on developing domain-specific models. In the realm of education, an expert LLM can significantly enhance students' ability to find information more effectively and reach their learning goals. Nevertheless, the training of such expert models in education remains largely unexplored. This study explores this research gap by developing a framework to transform semi-structured educational web data into structured datasets and perform instruction tuning on foundation models. Additionally, we conduct a comprehensive performance analysis to determine how various training factors affect model performance.

We first employed a systematic and cost-effective approach involving web data extraction, data cleaning, validation, task design based on student surveys, and automated instruction instance generation using LLMs. Human evaluations confirmed the quality, especially the relevance and accuracy of these datasets.

This study then investigates the impact of various training techniques on domain-specific educational large language models (LLMs) performance. Our experiments reveal that further pre-training enhances model performance, especially with domain-specific terminology, although the performance gains decrease as the dataset size increases. Furthermore, multi-task training also improves model relevance, accuracy, and clarity, but less correlated tasks and datasets can present challenges. These challenges include increased complexity and potential degradation in performance due to the model having to switch between diverse tasks. Lastly, this study conducts a comparative analysis of different models and it highlights trade-offs between computational resources and performance.

The findings demonstrate that a structured approach to dataset generation and strategic training can effectively develop domain-specific LLMs in education. This research benefits the development of educational LLMs and provides a foundation for future researchers to build more specialized models in various domains.

Multimodal Context Informed Machine Translation of Manga Using LLMs

Master thesis (2024) - K.P. Skublicki (author) , Christoph Lofi (mentor) , Jie Yang (graduation committee member) , Cynthia Liem (coach)

Large language models have achieved breakthroughs in many natural language processing tasks. One of their main appeals is the ability to tackle problems that lack sufficient training data to create a dedicated solution. Manga translation is one such task, a still budding and un ...

Algorithmic Fairness: Encouraging Exclusionary Diversity

(instead of Inclusionary Pluriversality)

Bachelor thesis (2024) - K.S. Caldera (author) , Sarah E. Carter (mentor) , S.N.R. Buijsman (graduation committee member) , J. Yang (graduation committee member) , Marcus M. Specht (graduation committee member)

AI is becoming significantly more impactful in society, especially with regard to decision-making. Algorithmic fairness is the field wherein the fairness of an AI algorithm is defined, subsequently evaluated, and ideally improved. This paper uses a fairness decision tree to crit ...

Leveraging Large Language Models for Classifying Subjective Arguments in Public Discourse

Bachelor thesis (2024) - A. Dobrinoiu (author) , Luciano C. Siebert (mentor) , Amir Homayounirad (mentor) , E. Liscio (mentor) , J. Yang (graduation committee member)

This study investigates the effectiveness of Large Language Models (LLMs) in identifying and classifying subjective arguments within deliberative discourse. Using data from a Participatory Value Evaluation (PVE) conducted in the Netherlands, this research introduces an annotation ...

Using Large Language Models to Detect Deliberative Elements in Public Discourse

Detecting Subjective Emotions in Public Discourse

Bachelor thesis (2024) - B.C.P. Zuurbier (author) , Luciano C. Siebert (mentor) , Amir Homayounirad (mentor) , E. Liscio (mentor) , J. Yang (graduation committee member)

In order to tackle topics such as climate change together with the population, public discourse should be scaled up. This discourse should be mediated as it makes it more likely that people understand each other and change their point of view. To help the mediator with this task, ...

Leveraging LLMs for Classifying Subjective Topics Behind Public Discourse

Bachelor thesis (2024) - A. Marcu (author) , Luciano C. Siebert (mentor) , Amir Homayounirad (mentor) , E. Liscio (mentor) , J. Yang (graduation committee member)

Public deliberations play a crucial role in democratic systems. However, the unstructured nature of deliberations leads to challenges for moderators to analyze the large volume of data produced. This paper aims to solve this challenge by automatically identifying subjective topic ...

Decoding Sentiment with Large Language Models

Comparing Prompting Strategies Across Hard, Soft, and Subjective Label Scenarios

Bachelor thesis (2024) - T. Oberhuber (author) , Luciano C. Siebert (mentor) , A. Homayounirad (mentor) , E. Liscio (mentor) , J. Yang (graduation committee member)

This study evaluates the performance of different sentiment analysis methods in the context of public deliberation, focusing on hard-, soft-, and subjective-label scenarios to answer the research question: ``can a Large Language Model detect subjective sentiment of statements wit ...

Offensive AI for Directory Enumeration

Master thesis (2024) - A. Castagnaro (author) , M. Conti (mentor) , Luca Pajola (mentor) , G. Smaragdakis (graduation committee member) , Jie Yang (graduation committee member)

Web Vulnerability Assessment and Penetration Testing (Web VAPT) is an important cybersecurity practice that thoroughly examines web applications to uncover possible vulnerabilities. These vulnerabilities represent potential security gaps that could severely compromise the web app ...

New Advancements in Head-Worn Sensors

Master thesis (2023) - C.A. Peppelman (author) , Przemysław Pawełczak (mentor) , Jie Yang (graduation committee member)

Non-invasive head-worn sensors are an upcoming field of interest. Most commercial sensors and sensors presented in research papers are limited in their capabilities and require frequent user interaction. In this Thesis, we research the possibilities of overcoming some of these li ...