Y. Wang | TU Delft Repository

Compositional generative models

For generalizable scene generation and understanding

Doctoral thesis (2026) - Y. Wang, G.J.T. Leus, J.H.G. Dauwels

Human intelligence is fundamentally compositional: it constructs new ideas by flexibly recombining known concepts, enabling generalization to entirely new tasks. We aim to develop intelligent systems with similar robust generalization capabilities. To that end, we develop compositional generative modeling frameworks and present three research thrusts that advance scene generation, decomposition, and understanding.

First, we introduce a hierarchical object-centric generative model that integrates latent-variable modeling with object-centric representation learning, enabling coherent multi object scene generation and fine-grained object-level editing. This approach overcomes limitations of prior object-aware models by supporting flexible object morphology and significantly improving in-distribution generalization.

Second, we propose an unsupervised compositional image decomposition method that represents images as compositions of energy landscapes encoded by diffusion models. This enables the extraction of reusable global and local visual factors, such as shadows, expressions, and objects, and supports zero-shot compositional image generation by recombining these factors into novel configurations far outside the training distribution.

Third, we develop a compositional inverse generative modeling framework for scene understanding. By formulating inference as likelihood maximization over conditional generative model parameters, we show how composable diffusion models enable object discovery and multi-label classification in scenes substantially more complex than those seen during training, including generalization to images with more objects or new configurations. The framework also supports zero-shot category inference using pretrained generative models without additional training.

Overall, these contributions demonstrate that the incorporation of compositional structure into generative modeling yields interpretable, controllable, and significantly more generalizable intelligent systems. This thesis offers a step toward building intelligent agents with the flexible, systematic compositional imagination characteristic of human cognition.
...

Human intelligence is fundamentally compositional: it constructs new ideas by flexibly recombining known concepts, enabling generalization to entirely new tasks. We aim to develop intelligent systems with similar robust generalization capabilities. To that end, we develop compositional generative modeling frameworks and present three research thrusts that advance scene generation, decomposition, and understanding.

First, we introduce a hierarchical object-centric generative model that integrates latent-variable modeling with object-centric representation learning, enabling coherent multi object scene generation and fine-grained object-level editing. This approach overcomes limitations of prior object-aware models by supporting flexible object morphology and significantly improving in-distribution generalization.

Second, we propose an unsupervised compositional image decomposition method that represents images as compositions of energy landscapes encoded by diffusion models. This enables the extraction of reusable global and local visual factors, such as shadows, expressions, and objects, and supports zero-shot compositional image generation by recombining these factors into novel configurations far outside the training distribution.

Third, we develop a compositional inverse generative modeling framework for scene understanding. By formulating inference as likelihood maximization over conditional generative model parameters, we show how composable diffusion models enable object discovery and multi-label classification in scenes substantially more complex than those seen during training, including generalization to images with more objects or new configurations. The framework also supports zero-shot category inference using pretrained generative models without additional training.

Overall, these contributions demonstrate that the incorporation of compositional structure into generative modeling yields interpretable, controllable, and significantly more generalizable intelligent systems. This thesis offers a step toward building intelligent agents with the flexible, systematic compositional imagination characteristic of human cognition.

A Multidimensional Design for Dual Active Bridge Converters in Low-Voltage DC Systems

Journal article (2026) - Hang Ren, Hanwen Zhang, Yanbo Wang, Haoyuan Yu, Pingyang Sun, Zhe Chen

The dual-active-bridge (DAB) converter serves as a crucial galvanic isolating solution to provide dc grid-forming for dc elements in low-voltage direct-current (LVdc) systems. Key performance metrics such as efficiency, current stress, power density, and cost of DAB converter are chiefly subject to the optimal design of magnetic components and modulation strategies. However, existing DAB converter designs yield compromised solutions that optimize a limited subset of these metrics. This article develops a comprehensive analytical framework to characterize DAB converter operation across three key dimensions: 1) zero-voltage switching (ZVS) range; 2) power rating utilization; and 3) reactive power. To achieve a well-balanced design, a holistic optimization methodology is proposed, integrating multiobjective particle swarm optimization (MOPSO) with triple phase-shift control. By optimally selecting the transformer turns ratio and product of switching frequency and series inductance, the proposed MOPSO approach can collectively or selectively improve these performance aspects, enabling tailored DAB converter designs to meet diverse performance objectives. Experimental validation on a 1-kW DAB converter prototype demonstrates enhanced ZVS capability, improved utilization of converter rating, reduced reactive power, and achieves a peak efficiency over 95.9%. ...

Precipitation Nowcasting Using Physics Informed Discriminator Generative Models

Conference paper (2024) - Junzhe Yin, Cristian Meo, Ankush Roy, Zeineh Bou Cher, Mircea Lică, Yanbo Wang, Ruben Imhoff, Remko Uijlenhoet, Justin Dauwels

Nowcasting leverages real-time atmospheric conditions to forecast weather over short periods. State-of-the-art models, including PySTEPS, encounter difficulties in accurately forecasting extreme weather events because of their unpredictable distribution patterns. In this study, we design a physics-informed neural network to perform precipitation nowcasting using the precipitation and meteorological data from the Royal Netherlands Meteorological Institute (KNMI). This model draws inspiration from the novel Physics-Informed Discriminator GAN (PID-GAN) formulation, directly integrating physics-based supervision within the adversarial learning framework. The proposed model adopts a GAN structure, featuring a Vector Quantization Generative Adversarial Network (VQ-GAN) and a Transformer as the generator, with a temporal discriminator serving as the discriminator. Our findings demonstrate that the PID-GAN model outperforms numerical and SOTA deep generative models in terms of precipitation nowcasting downstream metrics. ...

Nowcasting of Extreme Precipitation Using Deep Generative Models

Conference paper (2023) - Haoran Bi, Maksym Kyryliuk, Zhiyi Wang, Cristian Meo, Yanbo Wang, Ruben Imhoff, Remko Uijlenhoet, Justin Dauwels

Nowcasting is an observation-based method that uses the current state of the atmosphere to forecast future weather conditions over several hours. Recent studies have shown the promising potential of using deep learning models for precipitation nowcasting. In this paper, novel deep generative models are proposed for precipitation nowcasting. These models are equipped with extreme-value losses to more reliably predict extreme precipitation events. The proposed deep generative model contains a Vector Quantization Generative Adversarial Network and a Transformer ("VQGAN + Transformer"). For enhanced modeling and forecasting of extreme events, Extreme Value Loss (EVL) is incorporated in the autore-gressive Transformer. The numerical results show that the proposed model achieves comparable performance with the state-of-the-art conventional nowcasting method PySTEPS for predicting nominal values. By incorporating an EVL, the proposed model yields more accurate nowcasting of extreme precipitation. ...

Slot-VAE

Object-Centric Scene Generation with Slot Attention

Journal article (2023) - Yanbo Wang, Letao Liu, Justin Dauwels

Slot attention has shown remarkable object-centric representation learning performance in computer vision tasks without requiring any supervision. Despite its object-centric binding ability brought by compositional modelling, as a deterministic module, slot attention lacks the ability to generate novel scenes. In this paper, we propose the Slot-VAE, a generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation. For each image, the model simultaneously infers a global scene representation to capture high-level scene structure and object-centric slot representations to embed individual object components. During generation, slot representations are generated from the global scene representation to ensure coherent scene structures. Our extensive evaluation of the scene generation ability indicates that Slot-VAE outperforms slot representation-based generative baselines in terms of sample quality and scene structure accuracy. ...

Image Search Engine by Deep Neural Networks

Conference paper (2022) - Y. Yao, Q. Zhang, Y. HU, C. Meo, Y. Wang, Andrea Nanetti, J.H.G. Dauwels

We typically search for images by keywords, e.g., when looking for images of apples, we would enter the word “apple” as query. However, there are limitations. For example, if users input keywords in a specific language, then they may miss results labeled in other languages. Moreover, users may have an image of the object they want to obtain more information about, e.g., a landmark, but they may not know the name of it. In such scenario, word-based search is not adequate, while imagebased search would be ideally suited. These needs drive us to develop a purely content-based image search engine, meaning that users can search images with an image as query. Motivated by this use case with numerous applications, in this paper we propose and validate an image query based search engine... ...

Object Detection and Person Tracking in CathLab with Automatically Calibrated Cameras

Conference paper (2022) - Y. Jiang, R. Dai, J. Zeng, R.M. Butler, T.S. Vijfvinkel, Y. Wang, J.J. van den Dobbelsteen, M. van der Elst, J.H.G. Dauwels

Workflow analysis is a young research field that has been gaining traction in recent years. Work in this field aims to improve the efficiency and safety in operating rooms by analysing surgical processes and providing feedback or support, where observations are made and evaluated by algorithms rather than human experts. For our study, we mount five cameras from different angles in a Catheterization Laboratory (CathLab) to observe and analyse Cardiac Angiogram procedures. To automate the classification of workflow and personnel activities, we propose a pipeline that first automates the camera calibration of the 5-camera network then detect locations of medical equipment and track personnel activities... ...

Extreme Precipitation Nowcasting using Deep Generative Models

Conference paper (2022) - H. Bi, M.S. Kyryliuk, Z. Wang, C. Meo, Y. Wang, Ruben Imhoff, R. Uijlenhoet, J.H.G. Dauwels

Extreme precipitation usually leads to substantial impacts. Floods in the Netherlands, Belgium and Germany in the summer of 2021 have caused loss of lives, destruction of infrastructures, and long-term effect on economics. To avoid such disasters, it is important to develop a reliable and accurate method to predict heavy rain. ...