K. Batselier | TU Delft Repository

A Novel Multivariate Hidden Markov Model Learning method using Coupled Canonical Polyadic Decomposition

Applied on the Sleep Physionet dataset to uncover sleep stages

Master thesis (2025) - J.P.K. Brinkmann (author) , Kim Batselier (mentor) , Borbála Hunyadi (mentor) , Nitin Jonathan Myers (graduation committee member)

Hidden Markov Models (HMMs) are probabilistic models that are widely used in various fields, including machine learning, economics, information theory, neuroimaging, and more. In particular, they are frequently employed in functional Magnetic Resonance Imaging (fMRI) studies, whi ...

Hidden Markov Models (HMMs) are probabilistic models that are widely used in various fields, including machine learning, economics, information theory, neuroimaging, and more. In particular, they are frequently employed in functional Magnetic Resonance Imaging (fMRI) studies, which involve large datasets, to analyze the behavior of brain networks or states. For example, HMMs can help determine whether a patient is healthy.

HMMs represent the probability of transitions between different states of a system, which operates as a discrete Markov chain and is not directly observable. Additionally, they describe the probability of observing specific measurements based on the current state of the model. These probabilities are characterized by a state transition matrix (T), an emission matrix (O), and an initial state distribution (π).

The current methods for fitting HMMs to data primarily utilize the Baum-Welch algorithm, which is a specific type of Expectation Maximization (EM). One of the advantages of Baum-Welch is its ability to be extended to continuous observations or multivariate data. However, the algorithm involves a forward-backward pass during each iteration, which makes it increasingly slow as the size of the dataset grows. Moreover, it can easily become trapped in local minima.

An alternative approach is to use Canonical Polyadic Decomposition (CPD) to decompose a Joint Probability Tensor (JPT). This decomposition allows for the extraction of factor matrices that can be used to calculate the HMM matrices (T, O, π). Compared to the Baum-Welch algorithm, CPD and other JPT decomposition methods tend to be faster. However, they are often sensitive and may suffer from instability if the data does not fully capture the statistical behavior of the underlying HMM. Additionally, methods for JPT decomposition in HMM learning have not yet been extended to multivariate datasets or continuous settings.

To enhance stability and accommodate multivariate data, we propose a novel method that extends JPT decomposition HMM learning to a multivariate setting. This involves using a coupled CPD problem, where each observational sequence has a separate emission matrix (O) but shares a common state transition matrix (T). By transitioning from Baum-Welch to CPD, the process of learning HMMs can be significantly accelerated. Coupled CPD makes HMM learning more robust than Uncoupled CPD by relating the multivariate data through a common transition matrix T and initial distribution π. This improvement should enhance the iterability of HMM methods and make them more suitable for rapid prototyping in scientific research and the aforementioned fields.

We show in the results that Coupled CPD indeed outperforms both Uncoupled CPD and the industry-standard Baum-Welch, in most cases. It has improved stability over both methods, as well as significantly improving the calculation time, and results in higher accuracy when it comes to estimating the HMM matrices. Finally, more future improvements are suggested concerning calculation time and extensions.

Exploring the effects of memory usage on CP/TT decomposed CNNs

Master thesis (2025) - J.A. Klip (author) , K. Batselier (mentor) , Dimitris Boskos (graduation committee member) , J.F.P. Kooij (graduation committee member)

Artificial Intelligence (AI) put an increasing amount of strain on our total energy consumption and CO2 production. Not only is AI becoming increasingly more popular, but also AI models keep growing and thus need an increasing amount of computational resources. Recent research tr ...

Point Cloud Compression for Automotive LiDAR using Tensor Decomposition Methods

Master thesis (2024) - C.V.M.M. Vorage (author) , K. Batselier (mentor) , Julian Francisco Pieter Kooij (mentor) , N.J. Jonathan Myers (graduation committee member) , Holger Caesar (graduation committee member)

The training process of machine learning models for self-driving applications suffers from bottlenecks during loading and processing of LiDAR point clouds with large storage complexity.
Many studies aim to remedy this problem from an implementation perspective by developing ...

Towards Sustainable CNNs: Tensor decompositions for Green AI solutions

Exploring Energy Consumption of Large CNNs

Master thesis (2024) - D. Breen (author) , K. Batselier (mentor) , J.F.P. Kooij (mentor) , Holger Caesar (graduation committee member) , E.M. Memmel (graduation committee member)

The ever-increasing complexity of Artificial Intelligence (AI) models has led to environmental challenges due to high computation and energy demands. This thesis explores the application of tensor decomposition methods—CP, Tucker, and TT—to improve the energy ...

Canonical Polyadic Decomposition in Autoencoders for ECG Analysis

Exploring the effect of the CPD in unsupervised transfer learning methods for cardiac arrhythmia detection

Master thesis (2024) - F. Hogenbosch (author) , K. Batselier (mentor)

This thesis studies the application of the Canonical Polyadic Decomposition (CPD) in unsupervised transfer learning methods for cardiac arrhythmia detection. Unsupervised learning methods have become more prevalent in the healthcare sector due to the abundance of unlabeled data. ...

This thesis studies the application of the Canonical Polyadic Decomposition (CPD) in unsupervised transfer learning methods for cardiac arrhythmia detection. Unsupervised learning methods have become more prevalent in the healthcare sector due to the abundance of unlabeled data. Labeling of medical data is often non-trivial as it is labor-intensive and requires expert knowledge. Transfer learning can utilize the large number of unlabeled data by extracting relevant features, which can in turn be used for a smaller supervised learning part. Furthermore, in the medical field, AI models are often deployed on embedded systems, requiring efficient model architectures while maintaining high diagnostic performance.

The unsupervised transfer learning method was designed with an autoencoder for the unsupervised part and a linear network for the supervised part. The first experiment explored four models to select the most optimal architecture to apply the CPD to. These models include an autoencoder adaptation of the ResNet and ConvNeXt models, the U-Net autoencoder and a basic implementation of a convolutional autoencoder. In the second experiment, the autoencoder model is decomposed using the CPD and evaluated at various compression ratios on its reconstruction capabilities, classification accuracy and computational performance. The CPD implementation is also tested on its convergence speed and data efficiency as compared to its uncompressed counterpart.

The first experiment found that the basic implementation of a convolutional autoencoder performed best overall. The U-Net model had high reconstruction quality, however lacked the predictive accuracy. The ResNet model was found to have slightly worse reconstruction and prediction capabilities while having a larger parameter count. The ConvNeXt model failed to accurately reconstruct the images.

The second experiment showed the CP-decomposed model approached the uncompressed model in terms of predictive capabilities, while having lower reconstruction qualities. This is likely due to the regularization effect of the CPD, suggesting significant redundancy in the uncompressed model. Despite the reduction in forward pass FLOPs for the CP-decomposed model, it was found that both the computational complexity of the backpropagation process was higher than the uncompressed model at the lower compression ratios and that the memory allocation suffered a significant increase. This resulted in longer and less efficient training of the CP-decomposed models. It was furthermore found that the CP-decomposed models converged faster and had higher data efficiency as compared to the uncompressed model.

Epileptic seizure classification using scalp EEG data

A support tensor machine approach

Master thesis (2024) - M.P. van Dijk (author) , Kim Batselier (mentor) , Seline J.S. De Rooij (graduation committee member)

Algorithms which can effectively detect epileptic seizures have the potential to improve current treatment methods for people who suffer from epilepsy. The current state-of-the-art methods use neural networks, which are able to learn directly from the electroencephalogram (EEG) d ...

Sparse reconstruction for High Dimensional Tensors

Low complexity methods for large scale sensing

Master thesis (2024) - A.A. Bevelander (author) , Nitin Jonathan Jonathan Myers (mentor) , Kim Batselier (mentor)

Compressed sensing is a framework in signal processing that enables the efficient acquisition and reconstruction of sparse signals. A widely-used class of algorithms that are used for this reconstruction, called greedy-algorithms, depend on non-convex optimization. With increasin ...

Compressed sensing is a framework in signal processing that enables the efficient acquisition and reconstruction of sparse signals. A widely-used class of algorithms that are used for this reconstruction, called greedy-algorithms, depend on non-convex optimization. With increasing signal size, these problems become computationally very hard. Block compressed sensing is a framework in compressed sensing that divides the compressed sensing problem into sub-problems, gaining a better storage complexity. However, block compressed sensing has not yet been studied form a computational complexity perspective.

This thesis focuses on the application of block compressed sensing to signals of high dimension to gain insight into the relation between reconstruction performance and computational complexity. This is done by, first investigating how theoretical reconstruction guarantees change, when the problem is divided into smaller sub-problems and by doing a complexity analysis of the reconstruction itself. Each sub-problem solves for a portion of a signal, defined as a block. Next, experiments are conducted in order to get insight into the trade-off between computational complexity and quality of the reconstruction. It can be found that, by using this block-wise approach, the computational complexity of the reconstruction problem decreases, but at the same time, quality of the reconstruction deteriorates. Besides, a method to compensate for this performance loss is proposed. The key idea of this method is that, by propagating prior information among the different blocks, the reconstructions of the blocks can be improved. Finally, block compressed sensing and prior-aware block compressed sensing are analysed in a higher order tensor compressed sensing setting. Nevertheless, this setting was found to exhibit a less favourable complexity-performance trade-off than the normal one, as this setting resulted in both a more complex and a less accurate reconstruction than the normal one.

Uncertainty quantification for tensor network constrained kernel machines

A frequentist and Bayesian approach

Master thesis (2023) - R.H.W. Smeenk (author) , K. Batselier (mentor) , D. Boskos (graduation committee member) , F. Wesel (mentor)

This research aims at quantifying the uncertainty in the predictions of tensor network constrained kernel machines, focusing on the Canonical Polyadic Decomposition (CPD) constrained kernel machine. Constraining the parameters in the kernel machine optimization problem to be a CP ...

This research aims at quantifying the uncertainty in the predictions of tensor network constrained kernel machines, focusing on the Canonical Polyadic Decomposition (CPD) constrained kernel machine. Constraining the parameters in the kernel machine optimization problem to be a CPD results in a linear computational complexity in the number of features, whereas the original problem suffers heavily from the curse of dimensionality as the number of parameters scale exponentially. By employing a product feature map with polynomial features, the original data input is transformed to a higher-dimensional space.

Three different methods are investigated for quantifying the uncertainty of the predictions of the CPD constrained kernel machine. Firstly, the delta method is proposed which is a frequentist approach that linearizes a nonlinear parametric model around the estimated model. By estimating the covariance of the model parameters, the delta method can estimate the uncertainty in the model predictions based on the estimated parameter uncertainties. The delta method is compared to two other methods that are able to reflect the prediction uncertainty: the Bayesian method and Single Bayesian Core (SBC) method. The Bayesian method treats the parameters in the factor matrices of the CPD as probability distributions rather than single values and the SBC method incorporates both frequentist and Bayesian aspects. A comparison between the three different methods is performed based on an assessment on the correctness and informativeness of the uncertainty measures of prediction intervals and confidence Intervals.

It was found by regression and classification experiments that all three methods can provide valuable uncertainty quantification measures in terms of correctness and informativeness for the CPD constrained kernel machine. However, the Bayesian method provides in general more conservative uncertainty intervals compared to the delta and SBC method. A major drawback of the Bayesian method is its lack of scalability as the size of the mean and covariance, constructed by the unscented transform in the Bayesian method, scale exponentially. Furthermore, the delta and SBC method produce high quality uncertainty intervals and the methods provide remarkably similar uncertainty quantification on the prediction error variance.

Improving the Computational Speed of a Tensor-Networked Kalman Filter for Streaming Video Completion

Master thesis (2022) - A.P. van Koppen (author) , K. Batselier (mentor) , C.M. Menzen (graduation committee member)

Streaming video completion is the practice that aims to fill in missing or corrupted pixels in a video stream by using past uncorrupted data. A method to tackle this problem is recently introduced called a Tensor Networked Kalman Filter (TNKF). It shows promising results in terms ...

Symmetric Canonical Polyadic Decomposition And Gauss-Newton Optimizer For Nonlinear Volterra System Identification

Master thesis (2022) - Z. LI (author) , K. Batselier (mentor)

This thesis applies the Gauss-Newton optimizer to estimate the parameter values of the Volterra-PARAFAC model by minimizing a nonlinear least square cost (NLS) function given the input and output measurements of the MISO Volterra system.

Tensor decomposition for Independent Component Analysis

Through implicit cumulant tensor manipulation

Master thesis (2022) - P. Denarié (author) , K. Batselier (mentor) , Borbala Hunyadi (graduation committee member)

Blind Source Separation (BSS), the separation of latent source components from observed mixtures, is relevant to many fields of expertise such as neuro-imaging, economics and machine learning. Reliable estimates of the sources can be obtained through diagonalization of the cumula ...

Blind Source Separation (BSS), the separation of latent source components from observed mixtures, is relevant to many fields of expertise such as neuro-imaging, economics and machine learning. Reliable estimates of the sources can be obtained through diagonalization of the cumulant tensor, i.e., a fourth-order symmetric multi-linear array containing the cross-kurtosis values of observed mixtures. The downside of such diagonalization methods is that they scale quartically with the increase of the amount of source components to estimate due to the tensor’s quartic size. Tensor decomposition can simultaneously diagonalize the cumulant tensor and address its size. However, it does not resolve the scalability issue due to the restriction of having to first explicitly compute the tensor.

It is studied how decomposing the cumulant tensor in implicit fashion can be used to solve the BSS problem while simultaneously addressing its scalablity issue. A class of implicit cumulant tensor decomposition algorithms is derived which scale more favorably than their explicit counterparts in terms of either computational cost, storage cost or both. Firstly, a novel QR-Tensor algorithm (QRT) is introduced which allows for the simultaneous diagonalization of a tensor’s outer-slices. It is theoretically shown how an implicit version of the QRT algorithm can be used to solve the BSS problem at a linearly scaling computational cost. Secondly, a fixed-point Canonical Polyadic Decomposition (CPD) iteration method is presented. It reduces the computational complexity from a quartic dependence to a linear dependence on the amount of signals to estimate. The source estimation performance of the devised implicit decomposition methods is compared to that of the state-of-the-art FastICA for an artificial linear BSS problem.

Results show that both fixed-point CPD and QRT are superior to FastICA when it comes to the computation time needed to reach convergence, while producing estimated sources of similar quality. It is shown that when the amount of sources to estimate is increased both QRT and FastICA struggle to converge. In contrast, the fixed-point CPD method converges within a consistent amount of iterations, suggesting a method more suitable for the estimation of a large amount of sources.

Compression of the embedding layer in an LSTM model using tensor train decomposition for NLP

Master thesis (2022) - S.A. Jonnalagadda (author) , Kim Batselier (mentor) , F. Wesel (graduation committee member) , Peyman Mohajerin Mohajerin Esfahani (graduation committee member)

Natural Language Processing (NLP) deals with understanding and processing human text by any computer software. There are several network architectures in the fields of deep learning and artificial intelligence that are used for NLP. Deep learning techniques like recurrent neural ...

Natural Language Processing (NLP) deals with understanding and processing human text by any computer software. There are several network architectures in the fields of deep learning and artificial intelligence that are used for NLP. Deep learning techniques like recurrent neural networks and feed-forward neural networks are used to develop language models that perform several NLP tasks. Over the years, researchers have worked on developing state-of-the-art language models that achieve high accuracy and performance for NLP applications. With the development of deep neural network language models, the computational resources requirements and the energy costs for training and running language models increased. This led to research to compress the language models, thereby reducing the computational complexity of the language models. One of the methods used for this is tensor decomposition, like the tensor-train (TT) decomposition. During this thesis work, the application of the TT-decomposition method for compressing the embedding layer in a long-short-term memory model was investigated. Specifically, the effect of factorization and the order of factors in the embedding layer when it is represented in the TT-matrix format on the maximum test accuracy of the long- short term memory model for the NLP task of sentiment analysis was investigated. This was done by considering three different factorizations of the embedding layer in the model. Further, the effect of change in TT-ranks (hyperparameters of the model when the embedding layer is represented in the TT-matrix format) on the maximum test accuracy was also investigated. Based on the investigation and empirical results obtained, this thesis concludes that by having a larger number of factors in the factorization of the embedding layer, the maximum test accuracy of the model increases. Further, in a particular factorization, when the factors were arranged in such a way that the maximum values of the TT-ranks had a smaller gap, the maximum test accuracy of the model improved. In one particular configuration of the model, the number of parameters was reduced by 24.5 times compared to that of the original uncompressed model, and a maximum test accuracy of 77.10% was achieved compared to a maximum test accuracy of 78.05% in the case of the original model.

Graph Regularized Tensor Decomposition for Recommender Systems

Master thesis (2022) - R. Chandrashekar (author) , K. Batselier (mentor) , Elvin Isufi (graduation committee member) , Eva Memmel (coach)

Humans make decisions when presented with choices based on influences. The Internet today presents people with abundant choices to choose from. Recommending choices with an emphasis on people's preferences has become increasingly sought. Grundy (1979), the first computer libraria ...

Humans make decisions when presented with choices based on influences. The Internet today presents people with abundant choices to choose from. Recommending choices with an emphasis on people's preferences has become increasingly sought. Grundy (1979), the first computer librarian Recommender System (RS), provided users with book recommendations. Growing volumes of user data in the '90s saw increased usage of commercially available RS for e-commerce, music, movies, books, and social networking services. Due to their effectiveness in providing recommendations, Collaborative Filtering (CF) algorithms are predominantly used to build these RS. However, traditional CF algorithms adopting Matrix Factorization (MF) and Nearest Neighbor (NN) methods suffer from handling sparse data or model scalability. With exponentially increasing sparse data, building scalable and accurate RS models is of focus.

This thesis uses tensors and graphs to represent available data. Emphasis is given to capturing higher-order interactions present between the data. The use of tensors is motivated as matrices cannot capture data with higher-order relations, such as variation of user ratings to items with time. The transition to using tensors has been promising with the development of efficient tensor decomposition methods and powerful machines. Graphs can capture the correlation between different entities, providing additional information intrinsic to the underlying graph structure. A Graph Regularized CANDECOMP/PARAFAC (GRCP) tensor decomposition model framework is proposed in this thesis. The thesis highlights how to graph Laplacian regularizers (GLRs) benefit CP tensor decomposition methods to build RS. The model framework is evaluated with the MovieLens data set. The model records lesser Normalized Mean Squared Error (NMSE) values than those reported in the literature. The combination of varied data sources notably aids in overcoming the drawbacks of current RS models, offering scalability with computational efficiency in linear time.

All-at-once optimization for kernel machines with canonical polyadic decompositions

Enabling large scale learning for kernel machines

Master thesis (2022) - E.A. van Mourik (author) , Kim Batselier (mentor) , F. Wesel (mentor) , S. Wahls (graduation committee member) , J.H.G. Dauwels (graduation committee member)

This thesis studies the Canonical Polyadic Decomposition (CPD) constrained kernel machine for large scale learning, i.e. learning with a large number of samples. The kernel machine optimization problem is solved in the primal space, such that the complexity of the problem scales ...

This thesis studies the Canonical Polyadic Decomposition (CPD) constrained kernel machine for large scale learning, i.e. learning with a large number of samples. The kernel machine optimization problem is solved in the primal space, such that the complexity of the problem scales linearly in the number of samples as opposed to scaling cubically in the dual space. Product feature maps are applied to transform the input data. The weights are constrained to be a CPD, so the number of weights scales linearly in the number of features. The CPD introduces a nonlinearity, so nonlinear optimization must be applied.
It is studied in which situation it is more advantageous to apply iterative all-at-once opti- mization compared to Alternating Least Squares (ALS) to solve the CPD constrained kernel machine problem. Specifically, all-at-once gradient descent methods are studied. An efficient analytical algorithm for the all-at-once gradient is derived. Furthermore, it is shown that automatic differentiation (AD) can also be applied, but it is found to be slower than the analytical method.
The selection of a step size is found to be challenging. It is shown that the magnitude of the gradient of the mean squared error (MSE) term decreases for an increasing number of features. As a result, selecting the step size becomes more difficult for more features. To overcome this, the Line search and the Adam method are studied. A general expression for the exact line search solution is derived. It can be applied to compute the optimal step size for any step direction and any number of features. However, the Adam method performs better in terms of loss after training, convergence and the training run time. The mini-batch Adam method is used to evaluate the performance of all-at-once optimization for large scale learning.
It is found that the Adam method no longer performs well for data sets with around 16 features or more, likely due to the decrease in the magnitude of the gradient of the MSE term. On large scale data sets with fewer features, the Adam method outperforms ALS in terms of run time until convergence while achieving similar training and validation losses. The Adam method reached convergence on a data set with 11 million samples within ten minutes. Furthermore, it is shown that the scaling of the run time of the Adam method in terms of the feature map order and the CP-rank is more than an order lower than the scaling of ALS when the methods are run on a GPU. This makes to Adam method more suitable for more complex models.

Tensor-Networked Square-Root Kalman Filter for Online Video Completion

Master thesis (2021) - P. van Klaveren (author) , Kim Batselier (mentor) , R. M.G. Ferrari (graduation committee member) , C.M. Menzen (graduation committee member)

Online video completion aims to complete corrupted frames of a video in an online fashion. Consider a surveillance camera that suddenly outputs corrupted data, where up to 95% of the pixels per frame are corrupted. Real time video completion and correction is often desirable in s ...

Batch Bayesian Learning of Large-Scale LS-SVMs Based on Low-rank Tensor Networks

Master thesis (2021) - C. WANG (author) , Kim Batselier (mentor) , S. Wahls (graduation committee member) , F. Wesel (graduation committee member) , J. F. P. Kooij (graduation committee member)

Least Squares Support Vector Machines (LS-SVMs) are state-of-the-art learning algorithms that have been widely used for pattern recognition. The solution for an LS-SVM is found by solving a system of linear equations, which involves the computational complexity of O(N^3). When da ...

Nonnegative Robust PCA for Background and Foreground Image Decomposition

Master thesis (2020) - Cuicui Ling (author) , Kim Batselier (mentor) , J.W. van Wingerden (coach) , R Van de Plas (coach)

Nowadays, video surveillance and motion detection system are widely used in various environments. With the relatively low-price cameras and highly automated monitoring system, video and image analysis on road, highway and skies becomes realistic. The key process in the analysis i ...

Recursive Tensor Network Bayesian Learning of Large-Scale LS-SVMs

Master thesis (2020) - M.J. Lucassen (author) , K. Batselier (mentor)

Least-squares support-vector-machines are a frequently used supervised learning method for nonlinear regression and classification. The method can be implemented by solving either its primal problem or dual problem. In the dual problem a linear system needs to be solved, yet for ...

Development of a Computational Model of Respiratory Mechanics in Mechanical Ventilation

Master thesis (2020) - A. Mousa (author) , K. Batselier (mentor) , A. Schoe (mentor) , P. Somhorst (mentor)

Streaming Video Completion using a Tensor-Networked Kalman Filter

Master thesis (2020) - S.J.S. de Rooij (author) , Kim Batselier (mentor) , Jan Willem Van Wingerden (graduation committee member) , J. F. P. Kooij (graduation committee member)

In streaming video completion one aims to fill in missing pixels in streaming video data. This is a problem that naturally arises in the context of surveillance videos. Since these are streaming videos, they must be completed online and in real-time. This makes the streaming vide ...