Circular Image

J. Yang

55 records found

With the increasing demand for artificial intelligence (AI), intelligent systems have become deeply integrated into various aspects of modern life, including autonomous driving, smart assistants on mobile devices, and powerful online language models such as ChatGPT. In additio ...
There has been a steady increase in technologies that leverage Deep Learning (DL) techniques on resource-constrained devices for real-time processing. While DL techniques are adept at recognition tasks, their performance depends on the training process. Training data is seldom fu ...

Annotation Practices in Societally Impactful Machine Learning Applications

What are these automated systems actually trained on?

This study examines dataset annotation practices in influential NeurIPS research. Datasets employed in highly cited NeurIPS papers were assessed based on criteria concerning their item population, labelling schema, and annotation process. While high-level information, such as the ...
Online databases contain extensive collections of (bio)chemical reactions serving as valuable resources for a variety of applications. However, these large datasets often suffer from incomplete reaction data missing, for example, co-reactants and by-products. Machine learning can ...

Predicting CMV Serostatus using Donor Data

Leveraging a Global Database of 8 Million Donors to Predict CMV Serostatus and Investigate HLA Associations

The Cytomegalovirus (CMV) serostatus, of both donor and patient, plays an important role in allogeneic hematopoietic stem cell transplantation, yet it is only known for 19% of donors in the global database. In this research, the other available data in the global database of the ...

Annotation Practices in Societally Impactful Machine Learning Applications

What are these automated systems actually trained on?

The output of machine learning (ML) models can be only as good as the data that is fed into them. Because of this, when making datasets for creating ML models, it is important to ensure the quality of the data. This is especially true of human labeled data, which can be hard to s ...

Dataset quality within a societally impactful machine learning domain

An overview of data collection and annotation practices of the datasets used by papers published by the ACL

This study gives an overview of the data collection and annotation practices of the datasets used by the most impactful papers published by the Association of Computational Linguistics (ACL). This was achieved by selecting the most highly cited papers published within the ACL ant ...

Behind the Labels: Transparency Pitfalls in Annotation Practices for Societally Impactful ML

A deep dive into annotation transparency and consistency in CVPR corpus

This study investigates annotation and reporting practices in machine learning (ML) research, focusing on societally impactful applications presented at the IEEE/CVF Computer Vision and Pattern Recognition (CVPR) conferences. By structurally analyzing the 75 most-cited CVPR paper ...

High-impact vision research still rests on datasets whose labels arrive via opaque, rarely documented pipelines. To understand how serious the problem is inside a large venue, we audited 75 TPAMI papers (2009-2024) that rely or introduce datasets. Each datase ...

This thesis investigates reducing carbon emissions in code generation using large language models (LLMs) by comparing function-level and line-level code completions across models of different sizes (1.5B and 9B parameters). The study utilises the BigCodeBench dataset, comprising ...
The rapid rise in the popularity of large language models has highlighted the need for extensive datasets, especially for training on code. However, this growth has also raised important questions about the legal implications of using code in large language model training, partic ...

Uncovering Sequential Social Dilemmas in Multi-Agent Reinforcement Learning

Challenges and Strategies for Local Energy Communities

This thesis investigates the occurrence and mitigation of Sequential Social Dilemmas (SSDs) in Local Energy Communities (LECs) managed through Multi-agent Reinforcement Learning (MARL). LECs have great potential as pivotal elements in the green energy transition, yet the inherent ...
Bayesian Networks (BNs) are widely utilized across various industrial sectors to optimize processes, with an emerging focus on the collaboration across multiple parties. While most realistic scenarios require handling a mixture of categorical and continuous data simultaneously, t ...
Recommender systems are widely used in modern lives and contribute to many industries. Therefore, methods to evaluate and improve them are important. Nowadays, much research has been done to improve the system aspects such as algorithms. However, user experiences are not only aff ...

Deep Reinforcement Learning for Ride-hailing Systems

An experimental study on optimizing matching radius for ride-hailing systems using Deep Reinforcement Learning

In the field of public transportation, environmentally friendly and convenient transportation modes are the future trends. The ride-hailing services is an important component of them. However, current ride-hailing systems, particularly the matching systems, still have issues rela ...
Machine Learning (ML) algorithms have the potential to reproduce biases that already exist in society, a fact that leads to scholarly work trying to quantify algorithmic discrimination through fairness metrics. Although there are now a plethora of metrics, some of them are even c ...
Central banks communicate their monetary policy plans to the public through meeting minutes or transcripts. These communications can have immense effects on markets and are often the subjects of studies in the financial literature. The recent advancements in Natural Language Proc ...
This project aims to harness the potential of music-making and robotic interaction to enhance creative expression and cognitive function among individuals with cognitive impairment and dementia. With the aging population, there is a growing demand for innovative interventions tha ...

UTURN aims to maximize the matching rate on its freight transport platform by efficiently connecting shippers with suitable carriers. To support this matching process, UTURN required a solution that was additive rather than restrictive on the platform. To achieve this, our r ...