W. Toussaint | TU Delft Repository

A Data Perspective on Ethical Challenges in Voice Biometrics Research

Journal article (2025) - Anna Leschanowsky, Casandra Rusti, Carolyn Quinlan, Michaela Pnacek, Lauriane Gorce, Wiebke Hutiri

Speaker recognition technology, deployed in sectors like banking, education, recruitment, immigration, law enforcement, and healthcare, relies heavily on biometric data. However, the ethical implications and biases inherent in the datasets driving this technology have not been fully explored. Through a longitudinal study of close to 700 papers published at the ISCA Interspeech Conference in the years 2012 to 2021, we investigate how dataset use has evolved alongside the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field and examines their usage patterns. The analysis reveals significant shifts in data practices since the advent of deep learning: a small number of datasets dominate speaker recognition training and evaluation, and the majority of studies evaluate their systems on a single dataset. For four key datasets–Switchboard, Mixer, VoxCeleb, and ASVspoof–we conduct a detailed analysis of metadata and collection methods to assess ethical concerns and privacy risks. Our study highlights numerous challenges related to sampling bias, re-identification, consent, disclosure of sensitive information and security risks in speaker recognition datasets, and emphasizes the need for more representative, fair, and privacy-aware data collection in this domain. ...

Tiny, Always-on, and Fragile

Bias Propagation through Design Choices in On-device Machine Learning Workflows

Journal article (2023) - Wiebke (Toussaint) Hutiri, Aaron Yi Ding, Fahim Kawsar, Akhil Mathur

Billions of distributed, heterogeneous, and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast, and offline inference on personal data. On-device ML is highly context dependent and sensitive to user, usage, hardware, and environment attributes. This sensitivity and the propensity toward bias in ML makes it important to study bias in on-device settings. Our study is one of the first investigations of bias in this emerging domain and lays important foundations for building fairer on-device ML. We apply a software engineering lens, investigating the propagation of bias through design choices in on-device ML workflows. We first identify reliability bias as a source of unfairness and propose a measure to quantify it. We then conduct empirical experiments for a keyword spotting task to show how complex and interacting technical design choices amplify and propagate reliability bias. Our results validate that design choices made during model training, like the sample rate and input feature type, and choices made to optimize models, like light-weight architectures, the pruning learning rate, and pruning sparsity, can result in disparate predictive performance across male and female groups. Based on our findings, we suggest low effort strategies for engineers to mitigate bias in on-device ML. ...

Design Patterns for Detecting and Mitigating Bias in Edge AI

Doctoral thesis (2023) - Wiebke Hutiri, Marijn Janssen, Aaron Ding

From smart phones to speakers and watches, Edge Al is deployed on billions of devices to process large volumes of personal data efficiently, privately and in real-time. While Edge Al applications are promising, many recent incidents of bias in Al systems caution that Edge Al too, may systematically discriminate against groups of people based on their gender, race, age, accent, nationality and other personal attributes. More so, as the physical restrictions of Edge Al, together with the complexity of its heterogeneous and decentralised operating environment pose trade-offs when deploying Al to the edge.

This thesis is motivated by the societal demand for trustworthy Al, by the propensity of Al systems to be biased, and consequently by the need to detect and mitigate bias in diverse Edge Al applications. To address this need, this thesis develops design patterns for detecting and mitigating bias in the development of Edge Al systems. The design patterns present a generalisable approach for capturing established practices to detect and mitigate bias in machine learning. They make this knowledge readily accessible to researchers and practitioners that develop Edge Al, but who have limited prior experience with detecting and mitigating bias. ...

Beyond data transactions

A framework for meaningfully informed data donation

Journal article (2023) - Alejandra Gomez Ortega, Jacky Bourgeois, Wiebke Toussaint Hutiri, Gerd Kortuem

As we navigate physical (e.g., supermarket) and digital (e.g., social media) systems, we generate personal data about our behavior. Researchers and designers increasingly rely on this data and appeal to several approaches to collect it. One of these is data donation, which encourages people to voluntarily transfer their (personal) data collected by external parties to a specific cause. One of the central pillars of data donation is informed consent, meaning people should be adequately informed about what and how their data will be used. However, can we be adequately informed when it comes to donating our data when many times we don’t even know it is being collected and, even more so, what exactly is being collected? In this paper, we investigate how to foster (personal) data literacy and increase donors’ understanding of their data. We introduce a Research through Design approach where we define a data donation journey in the context of speech records, data collected by Google Assistant. Based on the data donation experiences of 22 donors, we propose a data donation framework that understands and approaches data donation as an encompassing process with mutual benefit for donors and researchers. Our framework supports a donation process that dynamically and iteratively engages donors in exploring and understanding their data and invites them to (re)evaluate and (re)assess their participation. Through this process, donors increase their data literacy and are empowered to give meaningfully informed consent. ...

Bias in Automated Speaker Recognition

Conference paper (2022) - Wiebke Toussaint Hutiri, Aaron Yi Ding

Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition is deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in related domains like face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including data generation, model building, and implementation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions. ...

Towards Trustworthy Edge Intelligence: Insights from Voice-Activated Services

Conference paper (2022) - W. Toussaint, Aaron Yi Ding

In an age of surveillance capitalism, anchoring the design of emerging smart services in trustworthiness is urgent and important. Edge Intelligence, which brings together the fields of AI and Edge computing, is a key enabling technology for smart services. Trustworthy Edge Intelligence should thus be a priority research concern. However, determining what makes Edge Intelligence trustworthy is not straight forward. This paper examines requirements for trustworthy Edge Intelligence in a concrete application scenario of voice-activated services. We contribute to deepening the understanding of trustworthiness in the emerging Edge Intelligence domain in three ways: firstly, we propose a unified framing for trustworthy Edge Intelligence that jointly considers trustworthiness attributes of AI and the IoT. Secondly, we present research outputs of a tangible case study in voice-activated services that demonstrates interdependencies between three important trustworthiness attributes: privacy, security and fairness. Thirdly, based on the empirical and analytical findings, we highlight challenges and open questions that present important future research areas for trustworthy Edge Intelligence. ...

Design Guidelines for Inclusive Speaker Verification Evaluation Datasets

Journal article (2022) - Wiebke Hutiri, Lauriane Gorce, Aaron Yi Ding

Speaker verification (SV) provides billions of voice-enabled devices with access control, and ensures the security of voice-driven technologies. As a type of biometrics, it is necessary that SV is unbiased, with consistent and reliable performance across speakers irrespective of their demographic, social and economic attributes. Current SV evaluation practices are insufficient for evaluating bias: they are over-simplified and aggregate users, not representative of usage scenarios encountered in deployment, and consequences of errors are not accounted for. This paper proposes design guidelines for constructing SV evaluation datasets that address these short-comings. We propose a schema for grading the difficulty of utterance pairs, and present an algorithm for generating inclusive SV datasets. We empirically validate our proposed method in a set of experiments on the VoxCeleb1 dataset. Our results confirm that the count of utterance pairs/speaker, and the difficulty grading of utterance pairs have a significant effect on evaluation performance and variability. Our work contributes to the development of SV evaluation practices that are inclusive and fair. ...

Characterising the Role of Pre-Processing Parameters in Audio-based Embedded Machine Learning

Conference paper (2021) - Wiebke Toussaint, Akhil Mathur, Aaron Yi Ding, Fahim Kawsar

When deploying machine learning (ML) models on embedded and IoT devices, performance encompasses more than an accuracy metric: inference latency, energy consumption, and model fairness are necessary to ensure reliable performance under heterogeneous and resource-constrained operating conditions. To this end, prior research has studied model-centric approaches, such as tuning the hyperparameters of the model during training and later applying model compression techniques to tailor the model to the resource needs of an embedded device. In this paper, we take a data-centric view of embedded ML and study the role that pre-processing parameters in the data pipeline can play in balancing the various performance metrics of an embedded ML system. Through an in-depth case study with audio-based keyword spotting (KWS) models, we show that pre-processing parameter tuning is a remarkable tool that model developers can adopt to trade-off between a model's accuracy, fairness, and system efficiency, as well as to make an embedded ML model resilient to unseen deployment conditions. ...

Machine learning systems in the IoT

Trustworthiness trade-offs for edge intelligence

Conference paper (2020) - Wiebke Toussaint, Aaron Yi Ding

Machine learning systems (MLSys) are emerging in the Internet of Things (IoT) to provision edge intelligence, which is paving our way towards the vision of ubiquitous intelligence. However, despite the maturity of machine learning systems and the IoT, we are facing severe challenges when integrating MLSys and IoT in practical context. For instance, many machine learning systems have been developed for large-scale production (e.g., cloud environments), but IoT introduces additional demands due to heterogeneous and resource-constrained devices and decentralized operation environment. To shed light on this convergence of MLSys and IoT, this paper analyzes the tradeoffs by covering the latest developments (up to 2020) on scaling and distributing ML across cloud, edge, and IoT devices. We position machine learning systems as a component of the IoT, and edge intelligence as a socio-technical system. On the challenges of designing trustworthy edge intelligence, we advocate a holistic design approach that takes multi-stakeholder concerns, design requirements and trade-offs into consideration, and highlight the future research opportunities in edge intelligence. ...

Identifying optimal clustering structures for residential energy consumption patterns using competency questions

Conference paper (2020) - Wiebke Toussaint, Deshendran Moodley

Traditional cluster analysis metrics rank clustering structures in terms of compactness and distinctness of clusters. However, in real world applications this is usually insufficient for selecting the optimal clustering structure. Domain experts and visual analysis are often relied on during evaluation, which results in a selection process that tends to be adhoc, subjective and difficult to reproduce. This work proposes the use of competency questions and a cluster scoring matrix to formalise expert knowledge and application requirements for qualitative evaluation of clustering structures. We show how a qualitative ranking of clustering structures can be integrated with traditional metrics to guide cluster evaluation and selection for generating representative energy consumption profiles that characterise residential electricity demand in South Africa. The approach is shown to be highly effective for identifying usable and expressive consumption profiles within this specific application context, and certainly has wider potential for efficient, transparent and repeatable cluster selection in real-world applications. ...