M.A. Larson | TU Delft Repository

Towards Purpose-aware Privacy-Preserving Techniques for Predictive Applications

Doctoral thesis (2024) - M. Slokom, M.A. Larson, A. Hanjalic

In the field of machine learning (ML), the goal is to leverage algorithmic models to generate predictions, transforming raw input data into valuable insights. However, the ML pipeline, consisting of input data, models, and output data, is susceptible to various vulnerabilities and attacks. These attacks include re-identification, attribute inference, membership inference, and model inversion attacks, all posing threats to individual privacy. This thesis specifically targets attribute inference attacks, wherein adversaries seek to infer sensitive information about target individuals.

The literature on privacy-preserving techniques explores various perturbative approaches, including obfuscation, randomization, and differential privacy, to mitigate privacy attacks. While these methods have shown effectiveness, conventional perturbation based techniques often offer generic protection, lacking the nuance needed to preserve specific utility and accuracy. These conventional techniques are typically purpose unaware, meaning they modify data to protect privacy while maintaining general data usefulness. Recently, there has been a growing interest in purpose-aware techniques.
The thesis introduces purpose-aware privacy preservation in the form of a conceptual framework. This approach involves tailoring data modifications to serve specific purposes and implementing changes orthogonal to relevant features. We aim to protect user privacy without compromising utility. We focus on two key applications within the ML spectrum: recommender systems and machine learning classifiers. The objective is to protect these applications against potential privacy attacks, addressing vulnerabilities in both input data and output data (i.e., predictions).

We structure the thesis into two parts, each addressing distinct challenges in the ML pipeline.
Part 1 tackles attacks on input data, exploring methods to protect sensitive information while maintaining the accuracy of ML models, specifically in recommender systems. Firstly, we explore an attack scenario in which an adversary can acquire the user-item matrix and aims to infer privacy-sensitive information. We assume that the adversary has a gender classifier that is pre-trained on unprotected data. The objective of the adversary is to infer the gender of target individuals. We propose personalized blurring (PerBlur), a personalization-based approach to gender obfuscation that aims to protect user privacy while maintaining the recommendation quality. We demonstrate that recommender system algorithms trained on obfuscated data perform comparably to those trained on the original user-item matrix.
Furthermore, our approach not only prevents classifiers from predicting users' gender based on the obfuscated data but also achieves diversity through the recommendation of (non-stereotypical) diverse items. Secondly, we investigate an attack scenario in which an adversary has access to a user-item matrix and aims to exploit the user preference values that it contains. The objective of the adversary is to infer the preferences of individual users. We propose Shuffle-NNN, a data masking-based approach that aims to hide the preferences of users for individual items while maintaining the relative performance of recommendation algorithms. We demonstrate that Shuffle-NNN provides evidence of what information should be retained and what can be removed from the user-item matrix. Shuffle-NNN has great potential for data release, such as in data science challenges.

Part 2 investigates attacks on output data, focusing on model inversion attacks aimed at predictions from machine learning classifiers and examining potential privacy risks associated with recommender system outputs. Firstly, we explore a scenario where an adversary attempts to infer individuals' sensitive information by querying a machine learning model and receiving output predictions. We investigate various attack models and identify a potential risk of sensitive information leakage when the target model is trained on original data. To mitigate this risk, we propose to replace the original training data with protected data using synthetic training data + privacy-preserving techniques. We show that the target model trained on protected data achieves performance comparable to the target model trained on original data. We demonstrate that by using privacy-preserving techniques on synthetic training data, we observe a small reduction in the success of certain model inversion attacks measured over a group of target individuals. Secondly, we explore an attack scenario in which the adversary seeks to infer users' sensitive information by intercepting recommendations provided by a recommender system to a set of users. Our goal is to gain insight into possible unintended consequences of using user attributes as side information in context-aware recommender systems. We study the extent to which personal attributes of a user can be inferred from a list of recommendations to that user. We find that both standard recommenders and context-aware recommenders leak personal user information into the recommendation lists.
We demonstrate that using user attributes in context-aware recommendations yields a small gain in accuracy. However, the benefit of this gain is distributed unevenly among users and it sacrifices coverage and diversity. This leads us to question the actual value of side information and the need to ensure that there are no hidden `side effects'.

The final chapter of the thesis summarizes our findings. It provides recommendations for future research directions which we think are promising for further exploring and promoting the use of purpose-aware privacy-preserving data for ML predictions. ...

In the field of machine learning (ML), the goal is to leverage algorithmic models to generate predictions, transforming raw input data into valuable insights. However, the ML pipeline, consisting of input data, models, and output data, is susceptible to various vulnerabilities and attacks. These attacks include re-identification, attribute inference, membership inference, and model inversion attacks, all posing threats to individual privacy. This thesis specifically targets attribute inference attacks, wherein adversaries seek to infer sensitive information about target individuals.

The literature on privacy-preserving techniques explores various perturbative approaches, including obfuscation, randomization, and differential privacy, to mitigate privacy attacks. While these methods have shown effectiveness, conventional perturbation based techniques often offer generic protection, lacking the nuance needed to preserve specific utility and accuracy. These conventional techniques are typically purpose unaware, meaning they modify data to protect privacy while maintaining general data usefulness. Recently, there has been a growing interest in purpose-aware techniques.
The thesis introduces purpose-aware privacy preservation in the form of a conceptual framework. This approach involves tailoring data modifications to serve specific purposes and implementing changes orthogonal to relevant features. We aim to protect user privacy without compromising utility. We focus on two key applications within the ML spectrum: recommender systems and machine learning classifiers. The objective is to protect these applications against potential privacy attacks, addressing vulnerabilities in both input data and output data (i.e., predictions).

We structure the thesis into two parts, each addressing distinct challenges in the ML pipeline.
Part 1 tackles attacks on input data, exploring methods to protect sensitive information while maintaining the accuracy of ML models, specifically in recommender systems. Firstly, we explore an attack scenario in which an adversary can acquire the user-item matrix and aims to infer privacy-sensitive information. We assume that the adversary has a gender classifier that is pre-trained on unprotected data. The objective of the adversary is to infer the gender of target individuals. We propose personalized blurring (PerBlur), a personalization-based approach to gender obfuscation that aims to protect user privacy while maintaining the recommendation quality. We demonstrate that recommender system algorithms trained on obfuscated data perform comparably to those trained on the original user-item matrix.
Furthermore, our approach not only prevents classifiers from predicting users' gender based on the obfuscated data but also achieves diversity through the recommendation of (non-stereotypical) diverse items. Secondly, we investigate an attack scenario in which an adversary has access to a user-item matrix and aims to exploit the user preference values that it contains. The objective of the adversary is to infer the preferences of individual users. We propose Shuffle-NNN, a data masking-based approach that aims to hide the preferences of users for individual items while maintaining the relative performance of recommendation algorithms. We demonstrate that Shuffle-NNN provides evidence of what information should be retained and what can be removed from the user-item matrix. Shuffle-NNN has great potential for data release, such as in data science challenges.

Part 2 investigates attacks on output data, focusing on model inversion attacks aimed at predictions from machine learning classifiers and examining potential privacy risks associated with recommender system outputs. Firstly, we explore a scenario where an adversary attempts to infer individuals' sensitive information by querying a machine learning model and receiving output predictions. We investigate various attack models and identify a potential risk of sensitive information leakage when the target model is trained on original data. To mitigate this risk, we propose to replace the original training data with protected data using synthetic training data + privacy-preserving techniques. We show that the target model trained on protected data achieves performance comparable to the target model trained on original data. We demonstrate that by using privacy-preserving techniques on synthetic training data, we observe a small reduction in the success of certain model inversion attacks measured over a group of target individuals. Secondly, we explore an attack scenario in which the adversary seeks to infer users' sensitive information by intercepting recommendations provided by a recommender system to a set of users. Our goal is to gain insight into possible unintended consequences of using user attributes as side information in context-aware recommender systems. We study the extent to which personal attributes of a user can be inferred from a list of recommendations to that user. We find that both standard recommenders and context-aware recommenders leak personal user information into the recommendation lists.
We demonstrate that using user attributes in context-aware recommendations yields a small gain in accuracy. However, the benefit of this gain is distributed unevenly among users and it sacrifices coverage and diversity. This leads us to question the actual value of side information and the need to ensure that there are no hidden `side effects'.

The final chapter of the thesis summarizes our findings. It provides recommendations for future research directions which we think are promising for further exploring and promoting the use of purpose-aware privacy-preserving data for ML predictions.

BRISTLE: Decentralized Federated Learning in Byzantine, Non-i.i.d. Environments

Master thesis (2021) - J. Verbraeken, J.A. Pouwelse, M.A. Larson

Federated learning (FL) is a type of machine learning where devices locally train a model on their private data.
The devices iteratively communicate this model to a central server which combines the models and sends the updated model back to all devices.
Because the data stays on the devices and only the model is transmitted, federated learning is considered as a privacy-friendly alternative to regular machine learning where all data is transmitted over the internet.

However, the central server used in typical FL systems not only poses a single point of failure susceptible to crashes or hacks, but may also become a performance bottleneck. These issues are alleviated by decentralized FL (DFL), where the peers communicate model updates with each other instead of with a single server.

Unfortunately, DFL is challenging since (1) the training data possessed by different peers is often non-i.i.d. (i.e., distributed differently between the peers) and (2) malicious, or Byzantine, attackers can share arbitrary model updates with other peers to subvert the training process.

We address these two challenges and present Bristle, middleware between the learning application and the decentralized network layer.
Bristle leverages transfer learning to predetermine and freeze the non-output layers of a neural network, significantly speeding up model training and lowering communication costs.
To securely update the output layer with model updates from other peers, we design a fast distance-based prioritizer and a novel performance-based integrator.
The prioritizer prioritizes the model updates based on their distance to the peer's own model and an explore-exploit trade-off, and the integrator integrates each class of each model update separately based on their performance on a small set of i.i.d. test samples.
Their combined effect results in high resilience to Byzantine attackers and the ability to handle non-i.i.d. classes.

We empirically show that Bristle converges to a consistent 95% accuracy in Byzantine environments, outperforming all evaluated baselines. In non-Byzantine environments, Bristle requires 83% fewer iterations to achieve 90% accuracy compared to state-of-the-art methods. We show that when the training classes are non-i.i.d., Bristle significantly outperforms the accuracy of the most Byzantine-resilient baselines by 2.3x while reducing communication costs by 90%. ...

Federated learning (FL) is a type of machine learning where devices locally train a model on their private data.
The devices iteratively communicate this model to a central server which combines the models and sends the updated model back to all devices.
Because the data stays on the devices and only the model is transmitted, federated learning is considered as a privacy-friendly alternative to regular machine learning where all data is transmitted over the internet.

However, the central server used in typical FL systems not only poses a single point of failure susceptible to crashes or hacks, but may also become a performance bottleneck. These issues are alleviated by decentralized FL (DFL), where the peers communicate model updates with each other instead of with a single server.

Unfortunately, DFL is challenging since (1) the training data possessed by different peers is often non-i.i.d. (i.e., distributed differently between the peers) and (2) malicious, or Byzantine, attackers can share arbitrary model updates with other peers to subvert the training process.

We address these two challenges and present Bristle, middleware between the learning application and the decentralized network layer.
Bristle leverages transfer learning to predetermine and freeze the non-output layers of a neural network, significantly speeding up model training and lowering communication costs.
To securely update the output layer with model updates from other peers, we design a fast distance-based prioritizer and a novel performance-based integrator.
The prioritizer prioritizes the model updates based on their distance to the peer's own model and an explore-exploit trade-off, and the integrator integrates each class of each model update separately based on their performance on a small set of i.i.d. test samples.
Their combined effect results in high resilience to Byzantine attackers and the ability to handle non-i.i.d. classes.

We empirically show that Bristle converges to a consistent 95% accuracy in Byzantine environments, outperforming all evaluated baselines. In non-Byzantine environments, Bristle requires 83% fewer iterations to achieve 90% accuracy compared to state-of-the-art methods. We show that when the training classes are non-i.i.d., Bristle significantly outperforms the accuracy of the most Byzantine-resilient baselines by 2.3x while reducing communication costs by 90%.

The impact of image filters on machine classification and human perception of weather conditions

Master thesis (2019) - Charalampos Michail Valsamos, Martha Larson, Hayley Hung, Michael Riegler

Nowadays with the growth of social media, users upload millions of photos in different platforms online. Researchers in the field of computer vision devote their time and effort to analyze images in order to gain valuable insight. Data
analysis and classification can be impeded by different factors. One of which is the image filters that are studied in this work. People greatly change the appearance of their photos by adding filters in order to make them more appealing. Instagram is arguably one of the most popular social media platforms online. With the platform’s growth, filtering images has also become more popular. In this thesis a subset of Instagram filters has been selected in order to study their impact with a series of experiments. To our knowledge, no mention has been made of image filters’ impact in prior work, in the domain of machine classification and human perception. Image filters can create many challenges depending on the application they are used in. In this thesis, focus has been given on classification of weather conditions. Systems have been designed to receive images and accurately identify the weather conditions that exist in them solely using visual features and no prior knowledge. In weather forecasting a lot of resources are spent in order to study past and current weather conditions so as to predict the state of the weather in the future. Gathering and documenting weather related information can be aided by these aforementioned systems. However, if researchers would like to use social images to extract insight, they need to change their approach accordingly. As it is documented in the following chapters, dealing with these photos can be problematic and can cause huge decline in performance. For this reason, the algorithmic design has been changed by using different techniques inspired
from the domain of Adversarial Machine Learning to measure their effect. In addition to machine classification, filters can influence human perception as well. A study is conducted that measures the impact filters have on the ability of humans identifying the weather conditions in images. From the quantitative and qualitative analysis of the results several key findings are extracted regarding the effect of filters and the visual cues that are used by people. People have identified certain visual cues that have not been encoded in the classifier such as the type of clothing people are wearing. Instead much simpler features have been engineered and the performance of the classifier is still
quite high. ...

Nowadays with the growth of social media, users upload millions of photos in different platforms online. Researchers in the field of computer vision devote their time and effort to analyze images in order to gain valuable insight. Data
analysis and classification can be impeded by different factors. One of which is the image filters that are studied in this work. People greatly change the appearance of their photos by adding filters in order to make them more appealing. Instagram is arguably one of the most popular social media platforms online. With the platform’s growth, filtering images has also become more popular. In this thesis a subset of Instagram filters has been selected in order to study their impact with a series of experiments. To our knowledge, no mention has been made of image filters’ impact in prior work, in the domain of machine classification and human perception. Image filters can create many challenges depending on the application they are used in. In this thesis, focus has been given on classification of weather conditions. Systems have been designed to receive images and accurately identify the weather conditions that exist in them solely using visual features and no prior knowledge. In weather forecasting a lot of resources are spent in order to study past and current weather conditions so as to predict the state of the weather in the future. Gathering and documenting weather related information can be aided by these aforementioned systems. However, if researchers would like to use social images to extract insight, they need to change their approach accordingly. As it is documented in the following chapters, dealing with these photos can be problematic and can cause huge decline in performance. For this reason, the algorithmic design has been changed by using different techniques inspired
from the domain of Adversarial Machine Learning to measure their effect. In addition to machine classification, filters can influence human perception as well. A study is conducted that measures the impact filters have on the ability of humans identifying the weather conditions in images. From the quantitative and qualitative analysis of the results several key findings are extracted regarding the effect of filters and the visual cues that are used by people. People have identified certain visual cues that have not been encoded in the classifier such as the type of clothing people are wearing. Instead much simpler features have been engineered and the performance of the classifier is still
quite high.

Comparative Analysis of Techniques for Data Minimization for Recommender System algorithms

Master thesis (2019) - Manoj Krishnaraj, Martha Larson

Recommender systems (RS) often use a large amount of data for a marginal gain in performance. This thesis investigates the data minimization in Recommender Systems, which is not well studied in the literature. This thesis extends the data minimization principles advocated in GDPR and studies its effects on recommender systems. Minimizing data not only reduces storage and transmission requirements but also has the potential to improve privacy and increase training and prediction speeds. This thesis investigates the effects of reducing the amount of data used to model a recommender system. It evaluates the accuracy of the Biased Matrix Factorization (BMF) algorithm by varying the training data on the MovieLens 10 million ratings (ML-10M) dataset. In this thesis, four data minimization techniques were used. We reproduced one pervious work and proposed three new data minimization techniques. In the first technique, we confirmed previous work concerning training data analysis, where the data outside the selected temporal window were dropped. The second data minimization technique, user profile truncation, retained the recent N ratings for each of the users while truncating the historical ratings. The third technique improved the user profile truncation by selectively truncating a percentage of user's historical ratings. In the fourth technique, a long user profile was split into smaller pseudo-user profiles. Analysis of the results is conducted. The most interesting results come from the third data minimization technique. Here, we show that truncating a percentage of the least recently active long user-profiles does not damage the performance and may slightly help. 60% of the long users can truncate their profiles to 20 ratings with minimal impact on the performance. Based on the results, we conclude that a substantial amount of data can be dropped without a large impact on performance. The results hold for the ML-10M dataset. It should hold for other datasets. The privacy implications of data minimization warrant future work. The proposed techniques serve as a guide for future research in data minimization of recommender systems. ...

Recommender systems (RS) often use a large amount of data for a marginal gain in performance. This thesis investigates the data minimization in Recommender Systems, which is not well studied in the literature. This thesis extends the data minimization principles advocated in GDPR and studies its effects on recommender systems. Minimizing data not only reduces storage and transmission requirements but also has the potential to improve privacy and increase training and prediction speeds. This thesis investigates the effects of reducing the amount of data used to model a recommender system. It evaluates the accuracy of the Biased Matrix Factorization (BMF) algorithm by varying the training data on the MovieLens 10 million ratings (ML-10M) dataset. In this thesis, four data minimization techniques were used. We reproduced one pervious work and proposed three new data minimization techniques. In the first technique, we confirmed previous work concerning training data analysis, where the data outside the selected temporal window were dropped. The second data minimization technique, user profile truncation, retained the recent N ratings for each of the users while truncating the historical ratings. The third technique improved the user profile truncation by selectively truncating a percentage of user's historical ratings. In the fourth technique, a long user profile was split into smaller pseudo-user profiles. Analysis of the results is conducted. The most interesting results come from the third data minimization technique. Here, we show that truncating a percentage of the least recently active long user-profiles does not damage the performance and may slightly help. 60% of the long users can truncate their profiles to 20 ratings with minimal impact on the performance. Based on the results, we conclude that a substantial amount of data can be dropped without a large impact on performance. The results hold for the ML-10M dataset. It should hold for other datasets. The privacy implications of data minimization warrant future work. The proposed techniques serve as a guide for future research in data minimization of recommender systems.

Deep visual genre-aware descriptors for movie recommendation

Master thesis (2019) - Athanasios Dritsas, Martha Larson, Alessandro Bozzon, Mateo Gutierrez Granada

In the last years, the popularity of video-on-demand services has been constantly increasing, especially for the young audiences who are more adept at using new technologies. Through those platforms, the viewers have access to a huge volume of movies at any moment that makes the viewing decision for most of them a very challenging task. Recommender systems are employed by video-on-demand providers to address the former challenge. We propose a novel movie recommender system that filters movies based on the genre-related visual elements of their trailers. The proposed system utilizes a 3D pre-trained deep ConvNet to extract spatio-temporal deep features from the trailers which then are combined, through a Deep Bag of Segments (DBoS) pooling network, with the genre information of the movie to provide a single movie representation. The 3D deep visual genre-aware representation is exploited by a pure content-based filtering system to provide personalized recommendations to users. We conduct offline experiments with two datasets to evaluate the performance of our approach with respect to accuracy and beyond accuracy metrics. We also conduct an online experiment in a real-world streaming platform to evaluate the user perceived utility of the recommendations produced by a pure content-based recommender system using our proposed genre-aware movie descriptor against the same system using genre and visual 3D deep features. We conclude that a continuous genre representation, which reflects genre specific visual elements of the movie, provides interesting results in the content-based movie recommendation task. Exploring further its potential could bring important benefits to various tasks in the movie domain. ...

Improving Recommender Systems Algorithms for Personalized Music Video Television by Incorporating User Consumption Behaviour and Multiple Types of User Feedback

Master thesis (2018) - Reza Reza Aditya Permadi, Martha Larson, Bouke Huurnink

This thesis explores the effects of incorporating user consumption behavior and multiple types of user feedback to improve recommender systems for personalized music video television. An industrial use case is made possible by the availability of anonymized user interaction data on curation-based personalized music television system provided by XITE, a music video television broadcasting company in Amsterdam. The characteristics of the curation-based system motivates us to explore the effects of user behavior and feedback on two tasks: session reranking and like prediction task. For the session reranking task, an improvement, in terms of Mean Average Precision (MAP), is achieved by leveraging behavior toward playback of repeated item consumption, together with the implicit user preference which is inferred from personalized average playback ratio for each video. Three types of feedback are used for the `like' prediction task: explicit feedback when user presses like on a video, and implicit feedback in the form of skipping and watching a video completely. A multi-level sampler within Bayesian Personalized Ranking algorithm is used to exploit those types of feedback, and an improvement is obtained compared to using only one type of explicit feedback. Finally, considering common behavior that people often turns on the television while not actively paying attention to it, we show that performing heuristic cut-off, by only considering few music videos watched completely after an active action is taken by the user on the system as positive implicit feedback, could improve the MAP compared to assuming positive implicit feedback for all videos watched completely by the user. ...

Intent-Aware Diverse Social Image Retrieval

Master thesis (2018) - Bo Wang, Martha Larson

Behind each photographic act is a rationale that impacts the visual appearance
of the resulting photo. Better understanding of this rationale has great
potential to support image retrieval systems in serving user needs. However,
at present, surprisingly little is known about the connection between what
a picture shows (the literally depicted conceptual content) and why that picture
was taken (the photographer intent). In the thesis, we investigate photographer
intent in a large Flickr data set. First, an expert annotator carries
out a large number of iterative intent judgments to create a taxonomy of intent
classes. Next, analysis of the distribution of concepts and intent classes
reveals patterns of independence both at a global and user level. Finally,
we report the results of experiments showing that a deep neural network
classifier is capable of learning to differentiate between these intent classes,
and that these classes support the diversification of image search results. ...

An Ensemble Approach for News Recommendation Based on Contextual Bandit Algorithms

Master thesis (2017) - Yu Liang, Martha Larson

News recommendation is a field different from traditional recommendation fields. News articles are created and deleted continuously with a very short life cycle. Users' preference is also hard to model since they can easily be attracted by things happening around them. With all those challenges, traditional recommendation approaches, such as content-based filtering and collaborative filtering, do not work well in the field. Simple recency-based or popularity-based recommenders do work well. However, even the recommender with the highest performance has its restriction. In this work, we build an ensemble model to combine the power of different recommenders. We build up a delegation model on top of several news recommenders based on various contextual bandit algorithms (a combination of multi-armed bandit algorithms and context information). The delegation model is responsible for delegating recommendation requests to the appropriate recommender with the purpose to maximize the Click Through Rate (CTR) from the recommendations and can update continuously with users' feedback. We evaluate the performance of our delegation-model-based recommender in both online and offline scenarios with the evaluation methods provided by CLEF-NEWSREEL Challenge 2017. Furthermore, we also evaluate the response time of our delegation model to see whether it is feasible to run online. The results show that our proposed delegation model can choose the appropriate recommender to serve the incoming requests each time, improve its performance regarding CTR and is feasible to run in real-world settings. Additionally, we also evaluate our delegation-model-based recommender with another evaluation metric, catalog coverage. In our future work, we would like to combine more recommenders and explore more context features to further improve CTR. ...

Question classification according to Bloom's Revised Taxonomy

Bachelor thesis (2017) - Joe Harrison, Olivier Dikken, Dennis van Peer, Claudia Hauff, Huijuan Wang, Martha Larson

FeedbackFruits is a company that provides tools for educators to organize their courses. The company is currently working on aiding teachers in aligning course material and assessment. Aligning the two provides students with clear expectations and can lead to an increase in learning [1]. Aligning course material and assessment is usually done by comparing what the students are taught to how students are assessed. When a student is assessed by an exam consisting of questions, the alignment process involves classifying these questions according to the cognitive process categories needed to answer them[2].
This process can be time consuming if an exam contains many questions it, and it can be easy to lose oversight of whether the questions in the assessment are representative of what is taught in the course material. The task of classifying questions into categories that represent the cognitive processes needed to answer them can be facilitated by providing a classification tool. This tool also gives educators insight by displaying a summary of the different question categories present in a set of questions. As part of the solution to the problem of course alignment, FeedbackFruits requested the development of a question
classifier which classifies questions according to the cognitive process required to answer them. Bloom’s revised taxonomy (subsection 2.1.1) is a taxonomy that categorizes questions and learning objectives into six distinct classes in the cognitive process domain. We propose a software solution that uses machine learning techniques to classify a courses’ questions and provides a clear overview of the classes in Bloom’s revised taxonomy present in these courses. To achieve this, we built a training set and test set by combining a preexisting labeled dataset from Anwar Ali Yahya, Addin Osama, et al. [5] and a self labeled
dataset of over 1500 samples. We engineered a set of features specific to short text samples and questions. We adopted an experimental approach in selecting the classifier model: we tested several different models throughout the project and picked the best performing models as final step. When looking up Bloom’s taxonomy it is often presented with lists of class specific keywords. We replicated a study [3] that makes use of keywords that are indicative of the class in Bloom’s taxonomy to set a baseline to compare our model to. We ran our model and the model of the baseline study on the same test set. Our model scored an accuracy of 75% compared to the baseline model which scored an accuracy of 40%. ...

FeedbackFruits is a company that provides tools for educators to organize their courses. The company is currently working on aiding teachers in aligning course material and assessment. Aligning the two provides students with clear expectations and can lead to an increase in learning [1]. Aligning course material and assessment is usually done by comparing what the students are taught to how students are assessed. When a student is assessed by an exam consisting of questions, the alignment process involves classifying these questions according to the cognitive process categories needed to answer them[2].
This process can be time consuming if an exam contains many questions it, and it can be easy to lose oversight of whether the questions in the assessment are representative of what is taught in the course material. The task of classifying questions into categories that represent the cognitive processes needed to answer them can be facilitated by providing a classification tool. This tool also gives educators insight by displaying a summary of the different question categories present in a set of questions. As part of the solution to the problem of course alignment, FeedbackFruits requested the development of a question
classifier which classifies questions according to the cognitive process required to answer them. Bloom’s revised taxonomy (subsection 2.1.1) is a taxonomy that categorizes questions and learning objectives into six distinct classes in the cognitive process domain. We propose a software solution that uses machine learning techniques to classify a courses’ questions and provides a clear overview of the classes in Bloom’s revised taxonomy present in these courses. To achieve this, we built a training set and test set by combining a preexisting labeled dataset from Anwar Ali Yahya, Addin Osama, et al. [5] and a self labeled
dataset of over 1500 samples. We engineered a set of features specific to short text samples and questions. We adopted an experimental approach in selecting the classifier model: we tested several different models throughout the project and picked the best performing models as final step. When looking up Bloom’s taxonomy it is often presented with lists of class specific keywords. We replicated a study [3] that makes use of keywords that are indicative of the class in Bloom’s taxonomy to set a baseline to compare our model to. We ran our model and the model of the baseline study on the same test set. Our model scored an accuracy of 75% compared to the baseline model which scored an accuracy of 40%.

Limits on Modeling Compensation in Multimodal DNNs for Audio Visual Speech Recognition

Master thesis (2017) - Sreejith Chandrasekharan Nair, Alessio Bazzica, Martha Larson, Cynthia Liem, Jan van Gemert

Speech is a natural way of communicating that does not require us to develop any new skills in order to be able to interact with electronic devices. With the evolution of technology, speech has become one of the primary means of communication. Speech recognition is a form of multimedia content analysis, where the information carried in a speech signal is transcribed into a character string. Any information in the real world is perceived via several input channels. Each modality conveys some additional information about a real world concept. Likewise, the perception of speech in a human brain is bimodal in nature. We combine information from both visual and audio modalities to disambiguate speech. The system studied in here is a multimodal speech recognition system, where the features are generated by correlating visual and speech modalities using a multimodal Deep Belief Network. This thesis reproduces this system, and explores several aspects of its performance related to real-life conditions under which speech must be recognized. Since the limitations of multimodal deep learning approaches are not well comprehended, we would like to gain insights into the resemblance of such systems to humans in their ability to level multimodality. The experiments carried out by our study demonstrate that the visual modality complements speech modality, providing information such as place of articulation. Further studies are performed on the system to shed light on the limits of such a multimodal Deep Neural Network for Audio-Visual speech recognition. In real-life, Audio-Visual speech recognition systems will come across several perturbations such as reverberation and visual occlusion. The behavior of this system is analysed in a simulated environment replicating such real-life surroundings. Further, a study is performed to see the effect of the visual modality on recognition of phonemes, which are basic building blocks of speech. The study conducted in this thesis supports the conclusion that the multimodal Deep Neural Network is far from achieving human-like performance in the presence of perturbations. This demonstrates the necessity to conduct more research on the robustness of the multimodal Deep Neural Networks in real-life scenarios. ...

Speech is a natural way of communicating that does not require us to develop any new skills in order to be able to interact with electronic devices. With the evolution of technology, speech has become one of the primary means of communication. Speech recognition is a form of multimedia content analysis, where the information carried in a speech signal is transcribed into a character string. Any information in the real world is perceived via several input channels. Each modality conveys some additional information about a real world concept. Likewise, the perception of speech in a human brain is bimodal in nature. We combine information from both visual and audio modalities to disambiguate speech. The system studied in here is a multimodal speech recognition system, where the features are generated by correlating visual and speech modalities using a multimodal Deep Belief Network. This thesis reproduces this system, and explores several aspects of its performance related to real-life conditions under which speech must be recognized. Since the limitations of multimodal deep learning approaches are not well comprehended, we would like to gain insights into the resemblance of such systems to humans in their ability to level multimodality. The experiments carried out by our study demonstrate that the visual modality complements speech modality, providing information such as place of articulation. Further studies are performed on the system to shed light on the limits of such a multimodal Deep Neural Network for Audio-Visual speech recognition. In real-life, Audio-Visual speech recognition systems will come across several perturbations such as reverberation and visual occlusion. The behavior of this system is analysed in a simulated environment replicating such real-life surroundings. Further, a study is performed to see the effect of the visual modality on recognition of phonemes, which are basic building blocks of speech. The study conducted in this thesis supports the conclusion that the multimodal Deep Neural Network is far from achieving human-like performance in the presence of perturbations. This demonstrates the necessity to conduct more research on the robustness of the multimodal Deep Neural Networks in real-life scenarios.

Optimizing Content-Based Image Retrieval for Geolocation Estimation

Master thesis (2017) - Yiran Liu, Martha Larson

Incorporating Crowd Perspectives into Multimedia Retrieval Systems

Doctoral thesis (2017) - Raynor Vliegendhart, Alan Hanjalic, Martha Larson

The twenty-first century has brought plentiful computational power and bandwidth to the masses and has opened up access to multimedia recording devices for everyone. With these developments, a shift in the landscape of multimedia took place: from traditional one-to-many programming (the paradigm of traditional television) to many-to-many creation of diverse content. Nowadays, everyone can become a content creator and connect with new audiences, which has resulted in an explosion of diverse and available multimedia content. In tandem with this change, user needs have evolved as well. Yet, existing multimedia retrieval systems have been struggling to keep up with what users are looking for.

In this thesis, we argue that a multi-perspective approach is desired in order to cater to a diverse range of user needs. In order to know which perspectives should be taken, we turn to the crowd as a source of information on which perspectives would be actually helpful for serving users of multimedia retrieval systems. The central question underlying the research presented in this thesis is: How can we incorporate these perspectives of the crowd into multimedia retrieval systems?

The first major part of the thesis consists of the development of methodologies for effectively addressing the crowd in crowdsourcing studies. It first introduces the concept of framing. Framing allows people to picture a particular scenario that helps them to understand the task at hand and thus would result in high quality answers. Following the framing methodology, the focus shifts to the refinement of elicitation techniques in order to effectively model the common understanding on a particular topic. The methodologies presented in this first part are shown to be useful in informing the design of new features for a multimedia retrieval system.

The second major part of the thesis builds upon the methodologies developed in the first part and uses them to push the research on non-linear video access, i.e., supporting users in consuming relevant parts of a video, further in two ways. First, in a carefully designed crowdsourcing experiment, user comments referring to specifically mentioned time-points in a video are analyzed to build a crowd-informed typology that captures new dimensions of relevance at the time-code level. The usefulness of this typology is tested through a crowdsourced user study on a simulated search scenario. Second, a methodology is developed for obtaining realistic viewing behaviors through crowdsourcing experiments, which can be used in designing and testing new non-linear video access methods. This methodology stresses the importance of not only properly framing the crowdsourcing task, but also that the crowd and multimedia domain are jointly chosen in order to observe behavior that resembles behavior that participants would normally exhibit outside of the experiment. The methodology is used to demonstrate its ability to capture implicit viewing behavior that can be used to support users in non-linearly accessing videos.

The final contributions of the thesis consist of practical pointers for future work and a set of open research questions pertaining crowdsourcing tasks with an interpretive nature. The practical pointers for future work are fueled by experience gained through the various crowdsourcing campaigns that have been carried out throughout the thesis. Addressing these pointers will help in making crowdsourcing research more effective and reduce the effort needed in carrying out experiments. The set of open research questions are formulated by positioning this thesis in relation to prior related work. These questions serve as a starting point for future research on interpretive crowdsourcing tasks and pursuing them could aid the development of retrieval systems with multiple perspectives on multimedia. ...

The twenty-first century has brought plentiful computational power and bandwidth to the masses and has opened up access to multimedia recording devices for everyone. With these developments, a shift in the landscape of multimedia took place: from traditional one-to-many programming (the paradigm of traditional television) to many-to-many creation of diverse content. Nowadays, everyone can become a content creator and connect with new audiences, which has resulted in an explosion of diverse and available multimedia content. In tandem with this change, user needs have evolved as well. Yet, existing multimedia retrieval systems have been struggling to keep up with what users are looking for.

In this thesis, we argue that a multi-perspective approach is desired in order to cater to a diverse range of user needs. In order to know which perspectives should be taken, we turn to the crowd as a source of information on which perspectives would be actually helpful for serving users of multimedia retrieval systems. The central question underlying the research presented in this thesis is: How can we incorporate these perspectives of the crowd into multimedia retrieval systems?

The first major part of the thesis consists of the development of methodologies for effectively addressing the crowd in crowdsourcing studies. It first introduces the concept of framing. Framing allows people to picture a particular scenario that helps them to understand the task at hand and thus would result in high quality answers. Following the framing methodology, the focus shifts to the refinement of elicitation techniques in order to effectively model the common understanding on a particular topic. The methodologies presented in this first part are shown to be useful in informing the design of new features for a multimedia retrieval system.

The second major part of the thesis builds upon the methodologies developed in the first part and uses them to push the research on non-linear video access, i.e., supporting users in consuming relevant parts of a video, further in two ways. First, in a carefully designed crowdsourcing experiment, user comments referring to specifically mentioned time-points in a video are analyzed to build a crowd-informed typology that captures new dimensions of relevance at the time-code level. The usefulness of this typology is tested through a crowdsourced user study on a simulated search scenario. Second, a methodology is developed for obtaining realistic viewing behaviors through crowdsourcing experiments, which can be used in designing and testing new non-linear video access methods. This methodology stresses the importance of not only properly framing the crowdsourcing task, but also that the crowd and multimedia domain are jointly chosen in order to observe behavior that resembles behavior that participants would normally exhibit outside of the experiment. The methodology is used to demonstrate its ability to capture implicit viewing behavior that can be used to support users in non-linearly accessing videos.

The final contributions of the thesis consist of practical pointers for future work and a set of open research questions pertaining crowdsourcing tasks with an interpretive nature. The practical pointers for future work are fueled by experience gained through the various crowdsourcing campaigns that have been carried out throughout the thesis. Addressing these pointers will help in making crowdsourcing research more effective and reduce the effort needed in carrying out experiments. The set of open research questions are formulated by positioning this thesis in relation to prior related work. These questions serve as a starting point for future research on interpretive crowdsourcing tasks and pursuing them could aid the development of retrieval systems with multiple perspectives on multimedia.