<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
Generative artificial intelligence (AI) models unlock new ways to create images, emerging as a new medium alongside paintings, photographs, physically based renderings (PBR), etc. Generative AI images can be perceptually convincing without being physically plausible, allowing to investigate the boundaries of visual perception. This study examines whether generative AI images adhere to a medium-independent perceptual space converged from previous studies. We compared the perceptual similarity of images from three generative AI models against a bidirectional reflectance distribution functions (BRDFs) PBR image dataset, using human similarity judgments. In experiment 1, we used the text descriptions of 32 materials (e.g., blue acrylic) from the Mitsubishi Electric Research Laboratories (MERL) BRDF dataset, prompting two text-to-image models, DALL-E 2 and Midjourney v2, to generate 32 sphere-shaped stimuli per model. Perceptual spaces derived from similarity judgments revealed that both AI models resulted in two-dimensional spaces whereas the MERL space was confined to one dimension, probably owing to a lack of surface texture. These unrelated perceptual spaces suggest the AI models generated unique and different images from identical text prompts. In experiment 2 we used the text-to-image model Stable Diffusion v1.5 with ControlNet for additional depth-map constraints. Using the same 32 descriptions, we generated 3 sets using 3 different depth maps. The three resulting perceptual spaces are all two-dimensional, exhibiting high similarity, indicating a robust and non-random structure. They also show a similar structure to the MERL space and perceptual spaces from other material studies using photographs, PBR, and depictions, suggesting AI-generated imagery may indeed be used as a new medium to explore material perception.
...
Generative artificial intelligence (AI) models unlock new ways to create images, emerging as a new medium alongside paintings, photographs, physically based renderings (PBR), etc. Generative AI images can be perceptually convincing without being physically plausible, allowing to investigate the boundaries of visual perception. This study examines whether generative AI images adhere to a medium-independent perceptual space converged from previous studies. We compared the perceptual similarity of images from three generative AI models against a bidirectional reflectance distribution functions (BRDFs) PBR image dataset, using human similarity judgments. In experiment 1, we used the text descriptions of 32 materials (e.g., blue acrylic) from the Mitsubishi Electric Research Laboratories (MERL) BRDF dataset, prompting two text-to-image models, DALL-E 2 and Midjourney v2, to generate 32 sphere-shaped stimuli per model. Perceptual spaces derived from similarity judgments revealed that both AI models resulted in two-dimensional spaces whereas the MERL space was confined to one dimension, probably owing to a lack of surface texture. These unrelated perceptual spaces suggest the AI models generated unique and different images from identical text prompts. In experiment 2 we used the text-to-image model Stable Diffusion v1.5 with ControlNet for additional depth-map constraints. Using the same 32 descriptions, we generated 3 sets using 3 different depth maps. The three resulting perceptual spaces are all two-dimensional, exhibiting high similarity, indicating a robust and non-random structure. They also show a similar structure to the MERL space and perceptual spaces from other material studies using photographs, PBR, and depictions, suggesting AI-generated imagery may indeed be used as a new medium to explore material perception.
Journal article(2025)
-
Mitchell J.P. van Zuijlen, Yung Hao Yang, Jan Jaap R. van Assen, Shin'ya Nishida
Although rigid three-dimensional (3D) motion perception has been extensively studied, the visual detection of non-rigid 3D motion remains underexplored, particularly with regard to its interactions with material perception. In natural environments with various materials, image movements produced by geometry-dependent optical effects, such as diffuse shadings, specular highlights, and transparent glitters, impose computational challenges for accurately perceiving object deformation. This study examines how optical material properties influence human perception of non-rigid deformations. In a two-interval forced choice task, observers were shown a pair of rigid and non-rigid objects and asked to select the one that appeared more deformed. The object deformation varied across six intensity levels, and the stimuli included four materials (dotted matte, glossy, mirror, and transparent). We found that the material has only a small effect on deformation detection, with the threshold being slightly higher for transparent than other materials. The results remained the same regardless of the viewing angles, light field conditions (Experiment 1), and the deformation type (Experiment 2). These results show the robust capacity of the human visual system to perceive non-rigid object motion in complex natural visual environments.
...
Although rigid three-dimensional (3D) motion perception has been extensively studied, the visual detection of non-rigid 3D motion remains underexplored, particularly with regard to its interactions with material perception. In natural environments with various materials, image movements produced by geometry-dependent optical effects, such as diffuse shadings, specular highlights, and transparent glitters, impose computational challenges for accurately perceiving object deformation. This study examines how optical material properties influence human perception of non-rigid deformations. In a two-interval forced choice task, observers were shown a pair of rigid and non-rigid objects and asked to select the one that appeared more deformed. The object deformation varied across six intensity levels, and the stimuli included four materials (dotted matte, glossy, mirror, and transparent). We found that the material has only a small effect on deformation detection, with the threshold being slightly higher for transparent than other materials. The results remained the same regardless of the viewing angles, light field conditions (Experiment 1), and the deformation type (Experiment 2). These results show the robust capacity of the human visual system to perceive non-rigid object motion in complex natural visual environments.
Abstract(2022)
-
J.J.R. van Assen, M.J.P. van Zuijlen, Shin'ya Nishida
Visual motion computation is challenging under real-world conditions due to continuous contextual changes such as varying lighting conditions and a large range of optical material properties. Due to these changes the retinal optical flow can drastically vary while the physical motion of an object remains constant. Especially materials with high reflective and refractive interactions can cause complex motion patterns. Here we investigate object motion constancy across various optical contexts and if the human visual system compensates for other causal sources in motion.
We performed two experiments. In the first experiment observers had to estimate which of two stimuli was rotating faster around the vertical axis. The stimuli were displayed for 500 ms in a 2-IFC staircase design. For the Match stimulus the illumination, material properties and shape were constant. The stimulus was rendered at a high temporal resolution allowing for small rotational speed changes for the staircase design. The Test stimuli varied in ten optical properties (e.g., matte, glossy, anisotropic, translucent), three illumination maps (sunny, cloudy, indoor), and three shapes (knot, cubic, blobby), the rotational speed remained constant. There were three different conditions in the second experiment: 1. unmasked Match and Test stimulus (same as experiment one); 2. masked Test stimulus (circular gaussian mask, masking outer shape contours); 3. masked Test stimulus and masked Match stimulus where the Match stimulus was replaced by horizontally moving 2D pink noise. In this experiment a subset of the optical conditions was used.
Expanding on our previously presented work [1], we applied three image-based motion capturing models (Figure 1) to gain deeper insights on motion cues that are predictive of human judgements. The models are Lucas-Kanade (optical flow), RAFT (optical flow DNN), FFV1MT (motion energy). First, we found that there are clear illusory differences of perceived rotational speed with even bigger effects when the circular mask was applied. The transparent material with the refractive index of water is systematically perceived to be rotating faster than other materials across all conditions. We performed an RSA (representational similarity analysis) to compare a range of different metrics across conditions and flow models. We find that the gradient of the optical flow is a particularly good predictor of human performance. The gradient emphasizes local speed changes in the optical flow, for example with moving highlights. Another observation is that Lucas-Kanade is most predictive of human performance under most conditions while RAFT is most stable across materials and closest to the ground truth. Our results further suggest that the human visual system does partially compensate for motion flow effects across optical contexts in object motion.
[1] Van Assen, J. J. R., Kawabe, T., & Nishida, S. Y. (2020). Object motion and flow variance across optical contexts. Journal of Vision, 20(11), 458-458.
This work has been supported by a Marie-Skłodowska-Curie Actions Individual Fellowship (H2020-MSCA-IF-2019-FLOW) and by JSPS Kakenhi JP20H05957.
...
Visual motion computation is challenging under real-world conditions due to continuous contextual changes such as varying lighting conditions and a large range of optical material properties. Due to these changes the retinal optical flow can drastically vary while the physical motion of an object remains constant. Especially materials with high reflective and refractive interactions can cause complex motion patterns. Here we investigate object motion constancy across various optical contexts and if the human visual system compensates for other causal sources in motion.
We performed two experiments. In the first experiment observers had to estimate which of two stimuli was rotating faster around the vertical axis. The stimuli were displayed for 500 ms in a 2-IFC staircase design. For the Match stimulus the illumination, material properties and shape were constant. The stimulus was rendered at a high temporal resolution allowing for small rotational speed changes for the staircase design. The Test stimuli varied in ten optical properties (e.g., matte, glossy, anisotropic, translucent), three illumination maps (sunny, cloudy, indoor), and three shapes (knot, cubic, blobby), the rotational speed remained constant. There were three different conditions in the second experiment: 1. unmasked Match and Test stimulus (same as experiment one); 2. masked Test stimulus (circular gaussian mask, masking outer shape contours); 3. masked Test stimulus and masked Match stimulus where the Match stimulus was replaced by horizontally moving 2D pink noise. In this experiment a subset of the optical conditions was used.
Expanding on our previously presented work [1], we applied three image-based motion capturing models (Figure 1) to gain deeper insights on motion cues that are predictive of human judgements. The models are Lucas-Kanade (optical flow), RAFT (optical flow DNN), FFV1MT (motion energy). First, we found that there are clear illusory differences of perceived rotational speed with even bigger effects when the circular mask was applied. The transparent material with the refractive index of water is systematically perceived to be rotating faster than other materials across all conditions. We performed an RSA (representational similarity analysis) to compare a range of different metrics across conditions and flow models. We find that the gradient of the optical flow is a particularly good predictor of human performance. The gradient emphasizes local speed changes in the optical flow, for example with moving highlights. Another observation is that Lucas-Kanade is most predictive of human performance under most conditions while RAFT is most stable across materials and closest to the ground truth. Our results further suggest that the human visual system does partially compensate for motion flow effects across optical contexts in object motion.
[1] Van Assen, J. J. R., Kawabe, T., & Nishida, S. Y. (2020). Object motion and flow variance across optical contexts. Journal of Vision, 20(11), 458-458.
This work has been supported by a Marie-Skłodowska-Curie Actions Individual Fellowship (H2020-MSCA-IF-2019-FLOW) and by JSPS Kakenhi JP20H05957.
In everyday life our visual system is continuously exposedto a wide range of motionflow patterns. Our knowledgeof low-level motion processing is substantial, while higher-level processes such as future state prediction, motion con-stancy, and behavioural property estimation remain poorlyunderstood. Here, we concentrate on behavioural cues ofcollectiveflow. Collectiveflow exists of a body of individualagents that show both collective and individual behavioursfollowing a coordinated set of rules. In nature there aremany occurrences of collectiveflow on various scales, var-ious levels of complexity, across both animate and inani-mate systems (e.g., swarms of insects, cars on highways).Using a real-time browser-based simulator of a relativelysimple six-dimensional parametric model we displayed arange of collective behaviours. In a variety of experimentswith free naming, name selection, similarity judgements,and rating tasks we started exploring the parametricspace and its perceived behavioural dimensions. Wefindthat observers can name a wide range of behaviours despitethe abstraction of the simulations. Observers found thewords expressing the spacing between agents to be themost descriptive. However, in the rating experiment it wasfound to be a challenge to differentiate between more dis-tinct definitions of this spacing such as grouping or dispersal.Moreover, the six-dimensional parametric space containedmultiple instances of the same perceived behaviour, makingdirect mappings between the parametric space and percep-tual space even more complex. The challenge will be toclearly tease apart the perceived behaviours with their non-linear interactions across the explored parametric space.[This work was supported by a Marie-Skłodowska-CurieActions Individual Fellowship (H2020-MSCA-IF-2019-FLOW, Project ID: 896434) and a Marie-Skłodowska-CurieActions Innovative Training Network (MSCA-ITN-ETN,grant number 765121, 2017) DyViTo.]
...
In everyday life our visual system is continuously exposedto a wide range of motionflow patterns. Our knowledgeof low-level motion processing is substantial, while higher-level processes such as future state prediction, motion con-stancy, and behavioural property estimation remain poorlyunderstood. Here, we concentrate on behavioural cues ofcollectiveflow. Collectiveflow exists of a body of individualagents that show both collective and individual behavioursfollowing a coordinated set of rules. In nature there aremany occurrences of collectiveflow on various scales, var-ious levels of complexity, across both animate and inani-mate systems (e.g., swarms of insects, cars on highways).Using a real-time browser-based simulator of a relativelysimple six-dimensional parametric model we displayed arange of collective behaviours. In a variety of experimentswith free naming, name selection, similarity judgements,and rating tasks we started exploring the parametricspace and its perceived behavioural dimensions. Wefindthat observers can name a wide range of behaviours despitethe abstraction of the simulations. Observers found thewords expressing the spacing between agents to be themost descriptive. However, in the rating experiment it wasfound to be a challenge to differentiate between more dis-tinct definitions of this spacing such as grouping or dispersal.Moreover, the six-dimensional parametric space containedmultiple instances of the same perceived behaviour, makingdirect mappings between the parametric space and percep-tual space even more complex. The challenge will be toclearly tease apart the perceived behaviours with their non-linear interactions across the explored parametric space.[This work was supported by a Marie-Skłodowska-CurieActions Individual Fellowship (H2020-MSCA-IF-2019-FLOW, Project ID: 896434) and a Marie-Skłodowska-CurieActions Innovative Training Network (MSCA-ITN-ETN,grant number 765121, 2017) DyViTo.]
The effective illumination incident on an object in a three-dimensional scene is a geometrically-weighted sum of direct and indirect light. The luminous and chromatic properties of the light field vary spatially and directionally, inducing luminance and chromatic gradients - smooth color variations over objects. When a color combination of a step or gradient produces a pleasing effect, it is said to be harmonious. Previous studies have shown that perception of color harmony is dependent on a complex interplay between hue, chroma and lightness (Ou and Luo, 2006). Further, the visual cues from luminance and chromatic gradients might assist three-dimensional shape recovery (Ruppertsberg et al., 2008). The aim of this research is to investigate the influence of chromatic furnishing materials on the perception of object color harmony and shape, through inter-reflections. Box spaces were rendered with uni-chromatic surfaces and a colored sphere, acting as a probe, in its center, illuminated by a planar white illuminant. 24 room surface colors were sampled systematically in RGB space. The sphere’s color was sampled from the 15 CIE CRI color checker samples. Participants had to rate perceived three-dimensionality (flat disk vs. sphere) and color harmony (disharmonious vs. harmonious) of the rendered sphere under carefully calibrated conditions. Before each session and between trials, participants adapted to an animated noisy mask. A short training session introduced randomly selected stimuli after which the main experiment took place. Of the tested furnishing hues, the bluish rooms resulted in the highest mean color harmony and three-dimensionality scores. Decreasing the furnishing brightness resulted in a major three-dimensionality enhancement, as expected. Reducing the saturation and even more so the brightness of the chromatic furnishing colors enhanced the perceived color harmony of the probe. These effects show the importance of 3D versions of color checkers, here we used spheres, for color testing.
...
The effective illumination incident on an object in a three-dimensional scene is a geometrically-weighted sum of direct and indirect light. The luminous and chromatic properties of the light field vary spatially and directionally, inducing luminance and chromatic gradients - smooth color variations over objects. When a color combination of a step or gradient produces a pleasing effect, it is said to be harmonious. Previous studies have shown that perception of color harmony is dependent on a complex interplay between hue, chroma and lightness (Ou and Luo, 2006). Further, the visual cues from luminance and chromatic gradients might assist three-dimensional shape recovery (Ruppertsberg et al., 2008). The aim of this research is to investigate the influence of chromatic furnishing materials on the perception of object color harmony and shape, through inter-reflections. Box spaces were rendered with uni-chromatic surfaces and a colored sphere, acting as a probe, in its center, illuminated by a planar white illuminant. 24 room surface colors were sampled systematically in RGB space. The sphere’s color was sampled from the 15 CIE CRI color checker samples. Participants had to rate perceived three-dimensionality (flat disk vs. sphere) and color harmony (disharmonious vs. harmonious) of the rendered sphere under carefully calibrated conditions. Before each session and between trials, participants adapted to an animated noisy mask. A short training session introduced randomly selected stimuli after which the main experiment took place. Of the tested furnishing hues, the bluish rooms resulted in the highest mean color harmony and three-dimensionality scores. Decreasing the furnishing brightness resulted in a major three-dimensionality enhancement, as expected. Reducing the saturation and even more so the brightness of the chromatic furnishing colors enhanced the perceived color harmony of the probe. These effects show the importance of 3D versions of color checkers, here we used spheres, for color testing.