Aesthetics in Visual Training Datasets

More Info
expand_more

Abstract

Correctly processing accumulated information is beneficial for our survival. Berghman and Hekkert (2017) argue that this is why we humans derive pleasure from having a sense of aesthetics. These aesthetic experiences can be seen as our brain’s reward system for correctly perceiving and interpret- ing the world around us. While our senses have evolved to perceive and organise the physical world, these very mechanisms also come into play when we interact with the digital realm. Aesthetics in visual training datasets are of importance as it allows us to derive a sense of aesthetic pleasure from digital media. Integrating aesthetics into artificial intelligence, especially in text-to-image generators, becomes important to cater to humans psychological reward systems and to engage them at a deeper level.

This thesis is focused on investigating the annotation method used in the development of the LAION- Aesthetics V2 datasets and comparing it to other annotation methods for measuring aesthetics. The purpose is to explore whether there are more suitable alternatives to the current annotation method (where people are asked to annotate images with the instruction ”how much do you like this image on a scale from 1 to 10?”, (Schuhmann, 2022) which is not backed by literature to actually measure aesthetics), and to evaluate the alignment between the LAION Aesthetics Predictor scores and human ratings.

This thesis explores different distinct levels of inquiry: one focuses on the design of instructions for image annotation tasks (alternative task design), while the other centers around measuring aesthetics during the annotation process (alternative metrics). Both lines of inquiry are supported by relevant literature, indicating their potential capacity to capture aesthetics. In addition to comparing alternative annotation methods, this thesis investigates three hypotheses related to the annotation of aesthetics within the project’s context.

Four experiments are conducted using crowdsourcing to compare alternative task design and alternative metrics. The experiments include semantic concept activation, different phrasing of the annotation instruction, and alternative modalities (such as ranking and two-alternative forced choice). Next to these four experiments, a separate fifth experiment is deployed which looks into the evaluation of image content versus overall image liking. Two post hoc analyses are performed, one which compares scores that the LAION Aesthetics predictor assigns to the stimulus set to human image liking ratings, and one examining the influence of region on image liking ratings.

The LAION aesthetics approach performed equal to the alternatives with scientific backing. The ranking treatment even performed worse. For this data, region did not impact image liking ratings. No significant difference was found between participants’ overall image liking and content liking. The LAION Aesthetics predictor scores partially aligned with human liking ratings but showed some disparities, par- ticularly in extreme ratings. Qualitative analysis suggests that more research is necessary to make a judgement on whether ”liking” is a relevant and appropriate approach for capturing aesthetics.

The limitations of the experiments include small sample sizes and the focus on a specific image class (buildings). Recommendations for future research include exploring different image classes, investigating other ranking modalities, and considering n-alternative forced choice experiments. It is also suggested to examine the influence of regions on aesthetic experiences in more detail, explore Gibbs Sampling with People for measuring image aesthetics, and explore different demographic groups and contexts.