Circular Image

Y. Zhao

info

Please Note

11 records found

Comparing physically based renderings and generative AI images through material perception

Journal article (2026) - Yuguang Zhao, Jeroen Stumpel, Huib de Ridder, Jan Jaap R. van Assen, Maarten W.A. Wijntjes
Generative artificial intelligence (AI) models unlock new ways to create images, emerging as a new medium alongside paintings, photographs, physically based renderings (PBR), etc. Generative AI images can be perceptually convincing without being physically plausible, allowing to investigate the boundaries of visual perception. This study examines whether generative AI images adhere to a medium-independent perceptual space converged from previous studies. We compared the perceptual similarity of images from three generative AI models against a bidirectional reflectance distribution functions (BRDFs) PBR image dataset, using human similarity judgments. In experiment 1, we used the text descriptions of 32 materials (e.g., blue acrylic) from the Mitsubishi Electric Research Laboratories (MERL) BRDF dataset, prompting two text-to-image models, DALL-E 2 and Midjourney v2, to generate 32 sphere-shaped stimuli per model. Perceptual spaces derived from similarity judgments revealed that both AI models resulted in two-dimensional spaces whereas the MERL space was confined to one dimension, probably owing to a lack of surface texture. These unrelated perceptual spaces suggest the AI models generated unique and different images from identical text prompts. In experiment 2 we used the text-to-image model Stable Diffusion v1.5 with ControlNet for additional depth-map constraints. Using the same 32 descriptions, we generated 3 sets using 3 different depth maps. The three resulting perceptual spaces are all two-dimensional, exhibiting high similarity, indicating a robust and non-random structure. They also show a similar structure to the MERL space and perceptual spaces from other material studies using photographs, PBR, and depictions, suggesting AI-generated imagery may indeed be used as a new medium to explore material perception. ...

Material perception and depiction across different styles and media

Doctoral thesis (2025) - Y. Zhao, M.W.A. Wijntjes, H. de Ridder
In contemporary society, we are surrounded by not only physical materials, but also images of them. We are capable of judging materials and their properties with only visual information. For instance, if an object looks solid and glossy or soft and fluffy. This ability is called material perception. As for images, there are various ways of image making, such as photography, painting, computer rendering, etc. And a new method has emerged recently: generative AI. All these image generation methods can produce different appearances of the same object or material. In this thesis, we studied human visual perception of two types of appearances: appearance as material property, appearance as pictorial style and the interaction between them.

In Chapter 2, we investigated depiction style by zooming in on a single motif, an apple. By using the fragments instead of the whole painting, we were able to keep the subject matter relatively constant, and isolate style from composition as well as other contextual information. We first constructed a perceptual space of style using similarity judgements from online participants. Then we fitted perceived attributes to this space to understand its dimensions. The data resulted in a three-dimensional space. Dimension 1 is associated with smoothness and brushstroke coarseness. Dimensions 2 and 3 are related to hue and chroma. Surprisingly, we also found a rotational relation between creation year and the first two dimensions, revealing a certain cyclic, repetitive pattern of style. The results suggest style can already be perceived in fragments of paintings.

In Chapter 3, we studied the influence of medium on appearance. For example, imagine an oil-painted apple and a pencil-sketched apple: they can have different appearances. The comparison between different media has rarely been studied. One possible reason is the difficulty to isolate medium from its confounding factor, subject matter. We found a solution by comparing oil paintings and their engraved reproductions. The identical content gave us a perfect opportunity to compare material perception from two distinct media. We collected 15 pairs, consisting of 88 fragments depicting different materials like fabric, skin, wood and metal. We also created three manipulations to understand the effect of color (a grayscale version) and contrast (equalized histograms towards both painting and engraving). We collected ratings on five attributes: three-dimensionality, glossiness, convincingness, smoothness and softness. Paintings showed a broader perceived range than engravings, with contrast equalization having a greater impact on perception than color removal. Possibly engravers used local contrast to compensate the absence of color.

In Chapter 4, we analyzed an emerging medium from a non-human creator, generative AI. In two experiments, we explored human material perception using generative AI stimuli and compared the perceptual spaces of three generative AI models, as well as a computer-generated BRDF stimulus set, the MERL dataset. In Experiment 1, we used text descriptions of 32 materials from MERL (e.g. ‘green fabric’) as prompts for DALL-E 2 and Midjourney v2. Both AI models resulted in a 2D space while MERL resulted in a 1D one. The three spaces showed low similarity, suggesting the AI models generated unique and different images of materials from identical text prompts. In Experiment 2, we explored another text-to-image model Stable Diffusion v1.5 with an add-on, ControlNet. ControlNet allowed us to add additional graphical constraints besides text input. In this way we could inspect more complex shapes. We kept the same 32 descriptions and generated material blobs in three shapes, from simple to more complex geometry. The three perceptual spaces from the three shapes showed high similarity, indicating both robust structure and minor influence of object shape on material perception. Interestingly, the perceptual spaces from Experiment 2 also shared similar structure as perceptual spaces from other material studies using real-world photos, computer renderings and depictions. In sum, we investigated visual perception through the lens of art by examining appearances rendered by painters, engravers and generative AIs. ...
Abstract (2025) - M.W.A. Wijntjes, Y. Zhao
Large Multimodal Modals can be subjected to similar psychophysical paradigms as human observers, affording comparison between human and machine vision. In this context, we explored material perception. We created 32 stimuli of a constant 3D shape but with various material properties. Then we presented them in 1193 triplets in an odd-one-out task for both humans (N=18) and machine. The machine judgements were performed with gpt-4o, which has vision capabilities. Triplet data was both analysed directly, and also used to create perceptual embeddings using Soft Ordinal Embedding (SOE). The raw triplet data revealed an interesting commonality between human and machine judgements when we compared the ‘popularity scores’ of odd-ones-out: a group of 6 stimuli was substantially more different from the remaining 26 stimuli. Furthermore, we found that 47% of the triplet judgements were similar for the human and gpt-4o data, which is well above chance level (33%). The SOE analysis revealed that the accuracy (agreement between raw triplet data and multidimensional embeddings) was substantially higher for machine than human vision, indicating a higher degree of internal consistency. Also, we found a full saturation at 6 dimensions for the machine data: all triplets could be accounted for by the embedding. Besides various commonalities, the embeddings themselves revealed some peculiar differences. Firstly, translucent stimuli were close for humans but distant for the machine. Secondly, the machine embedding showed a clear cluster of achromatic stimuli, while this was entirely absent in the human data. This suggests that computers use colour for material perception, while humans do not. With some imagination, one could argue that human material perception partly prepares for physical interaction where colour is irrelevant, while the algorithm does not (yet) have a body to interact with the outside world. ...
We investigated the influence of the medium on the perception of depicted objects and materials. Oil paintings and their reproductions in engravings were chosen because they are vastly distinctive media while having completely identical content. A total of 15 pairs were collected, consisting of 88 fragments depicting different materials, including fabric, skin, wood and metal. Besides the original condition, we created three manipulations to understand the effect of colour (a greyscale version) and contrast (equalised histograms towards both painting and engraving). We performed rating experiments on five attributes: three-dimensionality, glossiness, convincingness, smoothness and softness. An average of 25 participants finished each of the 20 online experimental sessions (five attributes X four conditions). Besides clear correlations between the two media, the differences mainly show in their means (different levels of perceived attributes) and standard deviations (perceived range). In most sessions, paintings depict a wider range than engravings. In addition, it was the histogram equalisation (global contrast) that made the most impact on perceived attributes, rather than colour removal. This suggests that engravers compensated for the lack of colour by exploiting the possibilities of local contrast. ...
Journal article (2023) - Y. Zhao, H. de Ridder, J.F.H.J. Stumpel, M.W.A. Wijntjes
If two painters paint the same scene, the appearance difference can be referred to as style difference. The distinguishing features result from artists’ use of composition, color, brushstroke etc. We are interested in how people perceive different depiction styles, when they are presented with different levels of information. Whole paintings contain mid-level information (depicted scenes, etc.) and low-level information (brushstroke, colors, etc.). Square cut-outs of single objects contain only low-level information. The same cut-outs in grayscale contain low-level information but without colors. We collected 42 digitized oil paintings as stimuli, the creation years varied from 15th to 21st century, and their location of production varied from southern Spain to the northern Netherlands. All paintings contain at least one apple. We gathered similarity judgement data using a triplet comparison method from three online experiments, where observers were presented the whole paintings (condition 1), square cut-outs of painted apples (condition 2) and the same cut-outs in grayscale (condition 3). 20 observers completed each experiment (60 observers in total). We applied soft ordinal embedding to achieve multidimensional embeddings. We reached a 3D space for condition 1 and 3, and a 4D space for condition 2. Condition 2 has less information than condition 1, but has one more dimension, suggesting that different criteria might be involved. Condition 3 has one less dimension than condition 2, suggesting that color is one of the attributes for style perception judgement. In addition, having the same dimensionality, around 64% of the raw data was in line with the 3D embedding in condition 1 and 58% in condition 3. This difference suggests that although the whole scene and a grayscale cut-out both need three dimensions to describe their style differences, the implicit style criteria for grayscale cut-outs are apparently more ambiguous than those used to judge the whole paintings. ...

Exploring style perception using details of paintings

Journal article (2023) - Yuguang Zhao, Jeroen Stumpel, Huib de Ridder, Maarten W.A. Wijntjes
Most studies on the perception of style have used whole scenes/entire paintings; in our study, we isolated a single motif (an apple) to reduce or even eliminate the influence of composition, iconography, and other contextual information. In this article, we empirically address two fundamental questions of the existence (Experiment 1) and description (Experiment 2) of style. We chose 48 cut-outs of mostly Western European paintings (15th to 21st century) that showed apples. In Experiment 1, 415 unique participants completed online triplet similarity tasks. Multidimensional scaling (MDS) reached a nonrandom three-dimensional (3D) embedding, showing that participants are able to judge stylistic differences in a systematic way. We also found a strong correlation between creation year and embedding, both a linear correlation with Dimension 2, and a rotational correlation in the first two dimensions. To interpret the embedding further, in Experiment 2, we fitted three color statistics and nine attribute ratings (glossiness, three-dimensionality, convincingness, brush coarseness, etc.) to the 3D perceptual style space. Results showed that Dimension 1 is associated with spatial attributes (Smoothness, Brushstroke coarseness) and Convincingness, Dimension 2 is related to Hue, and Dimension 3 is related to Chroma. The results suggest that texture and color are two important variables for style perception. By isolating the motifs, we could exclude higher levels of information such as composition and context. Interestingly, the results reinforce previous findings using whole scenes, suggesting that style can already be perceived in sometimes very small fragments of paintings. ...
Journal article (2022) - Y. Zhao, H. de Ridder, J.F.H.J. Stumpel, M.W.A. Wijntjes
Before the invention of photography, paintings were reproduced in a graphic and linear medium, engravings. To compare material perception across two modalities, paintings and engravings, we conducted two online experiments. We collected 15 pairs of color oil paintings and their engraving reproductions. Then we selected 40 elements from these 15 pairs, including fabric and skin, which resulted in 80 stimuli in total. In experiment 1, we used original (colored) versions for both paintings and engravings. In experiment 2, we used the same stimulus set, but achromatic (luminance only). Two attributes were rated in both experiments: glossiness and softness. 30 observers completed online rating tasks for each attribute in each experiment (120 observers in total). For glossiness, independent of color or black and white versions, engravings scored higher than paintings. In experiment 1, engravings were rated as glossier in 28 out of 40 pairs of variations, with 11 out of these 28 pairs showing significant differences. In experiment 2 (achromatic), engravings scored higher in glossiness than paintings in 33 pairs with 21 pairs showing significant differences. Both numbers increased when colors were removed. For softness, 21 elements in experiment 1 and 29 elements in experiment 2 were judged glossier in engravings than in paintings. Surprisingly, engravings performde well in presenting both gloss and softness. Moreover, when colors were removed, the performance of engravings in conveying glossiness and softness got even better. The increased number of significant cases underscores the robustness of this trend. ...

Stylistic features as perceived by non-experts

Poster (2021) - Y. Zhao, H. de Ridder, J.F.H.J. Stumpel, M.W.A. Wijntjes

The neglected role of visual material perception in packaging design

Food appearance sets intentions and expectations. When designing packaged food much attention is devoted to packaging elements like color and shape, but less to the characteristics of the images used. To our awareness, no study has yet investigated how the appearance of the food shown on the package affects consumers’ preferences. Often, orange juice packages depict an orange. Juiciness being one of the most important parameters to assess oranges’ quality, we hypothesized that an orange with a juicier appearance on the package would improve the overall evaluation of the juice. Using image cues found to trigger juiciness perception of oranges depicted in 17th century paintings, we designed four orange juice packages by manipulating the highlights on the pulp (present vs. absent) and the state of the orange (unpeeled vs. peeled). In an online experiment, 400 participants, each assigned to one condition, rated expected naturalness, healthiness, quality, sweetness and tastiness of the juice, package attractiveness and willingness to buy. Finally, they rated juiciness of the orange for all four images. A one-way ANOVA showed a significant effect of the highlights on juiciness. A MANOVA showed that the presence of highlights, both in isolation and in interaction with the peeled side, also significantly increased expected quality and tastiness of the juice. The present study shows the importance of material perception and food texture appearance in the imagery of food packaging. We suggest that knowledge from vision science on image features and material perception should be integrated into the process of packaging design. ...
Journal article (2020) - Y. Zhao, H. de Ridder, M.W.A. Wijntjes
Similar objects can appear different because of natural or man-made variations. Depictions of objects also exhibit appearance differences. If two painters paint the same object, the appearance difference can be called style. Artists use colors, shading, brushstroke etc., to give their work a unique signature. However, it is implicit and difficult to quantify. In this study, we investigated how humans perceive different depiction styles. In an online experiment, we used (fragments of) paintings as stimuli. The creation years of the paintings varied from the 17th to 20th century. There were four sets of stimuli: 10 flower paintings, 10 flower fragments, 16 apple fragments, and 16 peach fragments. In each trial, two stimuli were presented side by side. After five practice trials, participants were asked to rate depiction style differences on a 0-100 scale, from “not so different” to “very different”. 80 participants completed the rating task (20 for each set). To quantify inter-observer agreement, we computed correlations between individual and mean data. We found that on average, observers agreed most on peaches (r=0.75) and least on flower fragments (r=0.51). Multidimensional scaling analysis was then performed to position the stimuli in a perceptual space. After calculating stress values, 2D spaces were the best fit, except for peaches (1D). In the 2D perceptual space of apples, a clear gradient of creation years was present. This confirms that style changes with time. Furthermore, for the flower fragments, two clusters emerged from a single cluster in the whole-painting condition, suggesting that participants were using different criteria to judge style differences. We showed that people are capable of distinguishing different depiction styles. We found that one of the underlying criteria is creation year. Furthermore, the scale difference for the flower paintings suggest that brush strokes contribute to these perceptions. ...