M.A. Larson
Please Note
53 records found
1
Machine Learning Meets Data Modification
The Potential of Pre-processing for Privacy Enchancement
We explore how data modification can enhance privacy by examining the connection between data modification and machine learning. Specifically, machine learning “meets” data modification in two ways. First, data modification can protect the data that is used to train machine learning models focusing it on the intended use and inhibiting unwanted inference. Second, machine learning can provide new ways of creating modified data. In this chapter, we discuss data modification approaches, applied during data pre-processing, that are suited for online data sharing scenarios. Specifically, we define two scenarios “User data sharing” and “Data set sharing” and describe the threat models associated with each scenario and related privacy threats. We then survey the landscape of privacy-enhancing data modification techniques that can be used to counter these threats. The picture that emerges is that data modification approaches hold promise to enhance privacy, and can be used alongside of conventional cryptographic approaches. We close with an outlook on future directions focusing on new types of data, the relationship among privacy, and the importance of taking an interdisciplinary approach to data modification for privacy enhancement.
When Machine Learning Models Leak
An Exploration of Synthetic Training Data
We investigate an attack on a machine learning classifier that predicts the propensity of a person or household to move (i.e., relocate) in the next two years. The attack assumes that the classifier has been made publically available and that the attacker has access to information about a certain number of target individuals. That attacker might also have information about another set of people to train an auxiliary classifier. We show that the attack is possible for target individuals independently of whether they were contained in the original training set of the classifier. However, the attack is somewhat less successful for individuals that were not contained in the original data. Based on this observation, we investigate whether training the classifier on a data set that is synthesized from the original training data, rather than using the original training data directly, would help to mitigate the effectiveness of the attack. Our experimental results show that it does not, leading us to conclude that new approaches to data synthesis must be developed if synthesized data is to resemble “unseen” individuals to an extent great enough to help to block machine learning model attacks.
Social Signals and Multimedia
Past, Present, Future
The rising popularity of Artificial Intelligence (AI) has brought considerable public interest as well faster and more direct transfer of research ideas into practice. One of the aspects of AI that still trails behind considerably is the role of machines in interpreting, enhancing, modeling, generating, and influencing social behavior. Such behavior is captured as social signals, usually by sensors recording multiple modalities, making it classic multimedia data. Such behavior can also be generated by an AI system when interacting with humans. Using AI techniques in combination with multimedia data can be used to pursue multiple goals, two of which are high-lighted here. First, supporting people during social interactions and helping them to fulfil their social needs either actively or passively.Second, improving our understanding of how people collaborate, build relationships, and process self identity. Despite the rise of fields such as Social Signal Processing, a similar panel organised at ACM Multimedia 2014, and an area on social and emotional signal sat the ACM MM since 2014, we argue that we have yet to truly fulfil the potential of the combining social signals and multimedia. This panel asks where we have come far enough and what remaining challenges there are in light of recent global events.
Towards user-oriented privacy for recommender system data
A personalization-based approach to gender obfuscation for user profiles
In this paper, we propose a new privacy solution for the data used to train a recommender system, i.e., the user–item matrix. The user–item matrix contains implicit information, which can be inferred using a classifier, leading to potential privacy violations. Our solution, called Personalized Blurring (PerBlur), is a simple, yet effective, approach to adding and removing items from users’ profiles in order to generate an obfuscated user–item matrix. The novelty of PerBlur is personalization of the choice of items used for obfuscation to the individual user profiles. PerBlur is formulated within a user-oriented paradigm of recommender system data privacy that aims at making privacy solutions understandable, unobtrusive, and useful for the user. When obfuscated data is used for training, a recommender system algorithm is able to reach performance comparable to what is attained when it is trained on the original, unobfuscated data. At the same time, a classifier can no longer reliably use the obfuscated data to predict the gender of users, indicating that implicit gender information has been removed. In addition to introducing PerBlur, we make several key contributions. First, we propose an evaluation protocol that creates a fair environment to compare between different obfuscation conditions. Second, we carry out experiments that show that gender obfuscation impacts the fairness and diversity of recommender system results. In sum, our work establishes that a simple, transparent approach to gender obfuscation can protect user privacy while at the same time improving recommendation results for users by maintaining fairness and enhancing diversity.
From intra-modal to inter-modal space
Multi-task learning of shared representations for cross-modal retrieval
Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space. We integrate multi-task learning into each step, making it possible to leverage multiple datasets, annotated with different concepts, as if they were one large dataset. Large-scale systematic experiments demonstrate improvements over previously reported state-of-the-art methods on cross-modal retrieval tasks.
Up close, but not too personal
Hypotargeting for recommender systems
Hypotargeting for recommender systems (hyporec) is the idea of controlling the number of unique lists of items that a recommender system can recommend to users during a given time period. The main advantage of hyporec is oversight. If a recommender system offers only a finite number of unique lists, then it becomes feasible for a person without technological knowledge to audit the recommender system. Oversight makes it possible to spot filter bubbles or cases in which users are being bombarded with divisive content. We argue that hyporec is actually not so far from many existing recommender system ideas, and that with further research hyporec systems could be capable of making good tradeoffs between the number of unique lists, rate of list renewal (which controls coverage), and conventional evaluation metrics for user satisfaction.
In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naïve, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.
Remembering winter was coming
Character-oriented video summaries of TV series
Today’s popular tv series tend to develop continuous, complex plots spanning several seasons, but are often viewed in controlled and discontinuous conditions. Consequently, most viewers need to be re-immersed in the story before watching a new season. Although discussions with friends and family can help, we observe that most viewers make extensive use of summaries to re-engage with the plot. Automatic generation of video summaries of tv series’ complex stories requires, first, modeling the dynamics of the plot and, second, extracting relevant sequences. In this paper, we tackle plot modeling by considering the social network of interactions between the characters involved in the narrative: substantial, durable changes in a major character’s social environment suggest a new development relevant for the summary. Once identified, these major stages in each character’s storyline can be used as a basis for completing the summary with related sequences. Our algorithm combines such social network analysis with filmmaking grammar to automatically generate character-oriented video summaries of tv series from partially annotated data. We carry out evaluation with a user study in a real-world scenario: a large sample of viewers were asked to rank video summaries centered on five characters of the popular tv series Game of Thrones, a few weeks before the new, sixth season was released. Our results reveal the ability of character-oriented summaries to re-engage viewers in television series and confirm the contributions of modeling the plot content and exploiting stylistic patterns to identify salient sequences.
BlUrM(or)e
Revisiting gender obfuscation in the user-item matrix
Past research has demonstrated that removing implicit gender information from the user-item matrix does not result in substantial performance losses. Such results point towards promising solutions for protecting users’ privacy without compromising prediction performance, which are of particular interest in multistakeholder environments. Here, we investigate BlurMe, a gender obfuscation technique that has been shown to block classifiers from inferring binary gender from users’ profiles. We first point out a serious shortcoming of BlurMe: Simple data visualizations can reveal that BlurMe has been applied to a data set, including which items have been impacted. We then propose an extension to BlurMe, called BlurM(or)e, that addresses this issue. We reproduce the original BlurMe experiments with the MovieLens data set, and point out the relative advantages of BlurM(or)e.
Progress in the autonomous analysis of human behavior from multimodal information has lead to very effective methods able to deal with problems like action/gesture/activity recognition, pose estimation, opinion mining, user tailored retrieval, etc. However, it is only recently that the community has been starting to look into related problems associated with more complex behavior, including personality analysis, deception detection, among others. We organized an academic contest co-located with ICPR2018 running two tasks in this direction. On the one hand, we organized an information fusion task in the context of multimodal image retrieval in social media. On the other hand, we ran another task in which we aim to infer personality traits from written essays, including textual and handwritten information. This paper describes both tasks, detailing for each of them the associated problem, data sets, evaluation metrics and protocol, as well as an analysis of the performance of simple baselines.
Privacy and Audiovisual Content
Protecting Users as Big Multimedia Data Grows Bigger
This chapter discusses the relationship between privacy and algorithms that make use of large amounts of multimedia data. As users continue to post their audiovisual content online, and as companies continue to collect user profiles and interaction data, concerns about privacy are becoming increasingly urgent. The chapter focuses on multimedia algorithms, but looks beyond a purely technical approach to privacy. It explains what must be done to protect users’ privacy. The chapter explores the particular privacy challenges raised by multimedia, and specifically by big multimedia data. It presents example techniques and algorithms. The chapter provides an outlook for the next steps for multimedia privacy research. It shows cybercasing as a motivating example in order to illustrate the importance of privacy. The chapter then focuses on personal information. Personal information that must be protected is referred to as sensitive information.
Data masking for recommender systems
Prediction performance and rating hiding
Data science challenges allow companies, and other data holders, to collaborate with the wider research community. In the area of recommender systems, the potential of such challenges to move forward the state of the art is limited due to concerns about releasing user interaction data. This paper investigates the potential of privacy-preserving data publishing for supporting recommender system challenges. We propose a data masking algorithm, Shuffle-NNN, with two steps: Neighborhood selection and value swapping. Neighborhood selection preserves valuable item similarity information. The data shuffling technique hides (i.e., changes) ratings of users for individual items. Our experimental results demonstrate that the relative performance of algorithms, which is the key property that a data science challenge must measure, is comparable between the original data and the data masked with Shuffle-NNN.
The Conversation Continues
The Effect of Lyrics and Music Complexity of Background Music on Spoken-Word Recognition
Background music in social interaction settings can hinder conversation. Yet, little is known of how specific properties of music impact speech processing. This paper addresses this knowledge gap by investigating the effect of the 1) complexity of the background music, and 2) the presence versus absence of sung lyrics on spoken-word recognition in background music. To answer these questions, a word identification experiment was run in which Dutch participants listened to Dutch CVC words embedded in stretches of background music in four conditions: low/high complexity and with lyrics/music-only, and at three SNRs. Music stretches with and without lyrics were sampled from the same song in order to control for factors beyond the complexity of the music and the presence of lyrics. The results showed a clear negative impact of more complex music and the presence of lyrics in background music on spoken-word recognition. The results open a path for future work, and suggest that social spaces (e.g., restaurants, cafés and bars) should make careful choices of music to promote conversation.
In this paper, we focus on event detection over the timeline of a music track. Such technology is motivated by the need for innovative applications such as searching, non-linearaccess and recommendation. Event detection over the timeline requires time-code level labels in order to train machine learning dels. We use timed comments from SoundCloud, a modern social music sharing platform, to obtain these labels. While in this way the need for tedious and time-consuming manual labeling can be reduced, the challenge is that timed comments are subject to additive temporal noise, as they are in the temporal neighborhood of the actual events. We investigate the utility of such noisy timed comments as training labels through a case study, in which we investigate three types of events in Electronic Dance Music (EDM): drop, build and break. These socially significant events play a key role in an EDM track's unfolding and are popular in social media circles. They are therefore not only interesting for detection, but also typically accompanied by timed comments resulting from the online social activity around them. We propose a two-stage learning method that relies on noisy timed comments and, given a music track, marks the events on the timeline. In the experiments, we focus in particular on investigating to which extent noisy timed comments can replace manually added expert labels. The conclusions we draw during this study provide useful insights that motivates further research in the field of event detection.
We propose an image representation and matching approach that substantially improves visual-based location estimation for images. The main novelty of the approach, called distinctive visual element matching (DVEM), is its use of representations that are specific to the query image whose location is being predicted. These representations are based on visual element clouds, which robustly capture the connection between the query and visual evidence from candidate locations. We then maximize the influence of visual elements that are geo-distinctive because they do not occur in images taken at many other locations. We carry out experiments and analysis for both geo-constrained and geo-unconstrained location estimation cases using two large-scale, publicly available datasets: the San Francisco Landmark dataset with 1.06 million street-view images and the MediaEval'15 Placing Task dataset with 5.6 million geo-tagged images from Flickr. We present examples that illustrate the highly transparent mechanics of the approach, which are based on commonsense observations about the visual patterns in image collections. Our results show that the proposed method delivers a considerable performance improvement compared to the state-of-the-art.
CitRec 2017
International Workshop on Recommender Systems for Citizens
The "International Workshop on Recommender Systems for Citizens" (CitRec) is focused on a novel type of recommender systems both in terms of ownership and purpose: recommender systems run by citizens and serving society as a whole.
Multimodal Video-to-Video Linking
Turning to the Crowd for Insight and Evaluation