High-dimensional data imputation is a critical challenge in semiconductor metrology, where secondary measurements are often purposely omitted to optimize throughput. This thesis examines the Missing By Design (MBD) framework—an industrially motivated scenario in which data are sy
...
High-dimensional data imputation is a critical challenge in semiconductor metrology, where secondary measurements are often purposely omitted to optimize throughput. This thesis examines the Missing By Design (MBD) framework—an industrially motivated scenario in which data are systematically uncollected to reduce measurement overhead—and investigates a range of imputation solutions tailored to the particular complexities of wafer reflectivity and overlay. After establishing the physical, rank-deficient nature of wafer metrology data through singular-value decompositions and principal component analyses, we explore several classes of methods: linear regressions and matrix-completion techniques for baseline comparisons; deep neural-network regression (MLP) to capture nonlinearities; a contrastive-learning adaptation of CLIP for pairwise matching of primary–secondary measurements; and novel Bridge Models that refine coarse CLIP estimates with localized residual translations. Additionally, we integrate overlay-based domain constraints into CLIP via domain-guided neural network regularization (DG), ensuring physically coherent tool-to-tool (T2T) predictions. Comprehensive experiments on proprietary wafer datasets confirm that linear approaches including regressions and matrix completion methods, despite capturing the low rank structure of the data, underperform in downstream overlay and T2T prediction due to subtle nonlinear relationships. Deep neural networks offer strong reconstruction accuracy, yet demand extensive hyperparameter tuning and deeper network structures than contrastive alternatives such as CLIP-like approaches, which yield architecturally efficient, instance-based retrievals, but can lack the precision needed for rigorous overlay alignment. DG regularization, as an extension of the CLIP framework, considerably enhances T2T consistency and reduces raw reconstruction error. Meanwhile, the Bridge Model combines a CLIP-derived coarse imputation with a smaller learnable residual map between encoder domains, bridging global pairwise alignment and localized corrections for improved reconstruction and downstream tasks. Overall, this thesis presents a flexible suite of tools that advance high-dimensional MBD imputation in wafer metrology, offering valuable insights and a robust methodological foundation for future industrial applications.