KB

K. Batselier

info

Please Note

32 records found

A study of tensor train representation of the semi-explicit MPC operators

Master thesis (2026) - S.A. Maljers, K. Batselier, Laura Astola, R.D. McAllister
Model predictive control (MPC) computes optimal control actions while enforcing the physical and safety limits of a system, a combination that has led to its wide adoption. Yet each action requires solving an optimisation problem within one sampling interval, a computational burden that restricts where it can be deployed. To reduce this burden, semi-explicit MPC precomputes state-independent operators. At scale, however, storage becomes the dominant cost, as the largest operator grows quadratically with the number of constraints stacked over the horizon. Because the same dynamics repeat at every step, these operators carry a regular block structure well suited to tensor train compression. This thesis presents the first tensor train formulation of semi-explicit MPC. Rewriting the semi-explicit pipeline to separate the horizon from the per-step dynamics makes it tractable to construct the operators directly in this format and to evaluate the online active-set law without ever decompressing the operators. On a benchmark that is scaled in system size and horizon, the compressed controller remains closed-loop admissible while reducing storage by up to a factor of 36. These gains come at a price. Construction in compressed format is orders of magnitude slower than its matrix counterpart. Online evaluation is 2 to 54 times slower, since every entry used by the active-set loop must be computed on demand rather than read from memory. The approach is therefore advantageous where storage, rather than computation, is the limiting resource. At long horizons the matrix-form controller no longer fits in fast memory, whereas the compressed controller, once built, does. The reported experiments stop short of that regime because, although the final controller is small, the intermediate rank growth during construction exhausts memory. Closing this gap rests on a single open problem: evaluating the inverse action that arises during operator construction while containing rank growth. Solving it would extend semi-explicit MPC to horizons beyond the reach of current methods. ...
Kernel machines are a class of machine learning models which are powerful in terms of predictive power, but are limited to low dimensional data, as their computational complexity scales according to O(N 3). Kernel methods are able to reduce this computational scaling to be linear in D by assuming a low rank structure on the weights vector. Existing tensor network kernel machines operate mostly in a deterministic setting and lack uncertainty quantification.
On top of this, hyperparameter tuning for these types of models can be costly and complex. A fully Bayesian kernel machine, employing a CP decomposition has been shown to perform well on regression and classification tasks without additional computational cost. hyperparameter tuning was aided for this model through the introduction of ARD priors. This model is still limited to a relatively small data dimension D, due to numerical instabilities caused by repeated Hadamard products in the CPD formulation. Alternative tensor decompositions such as the tensor train decomposition exist that do not rely on Hadamard products in their formulation and may therefore be more numerically stable.
This thesis proposes a Bayesian Tensor Train Kernel Machine, a fully Bayesian tensor train decomposed kernel machine. The proposed model will include ARD hyperpriors to aid in model selection, these priors allow for automatic inference of model complexity, as well as increasing model interpretability. Mean-field variational inference is used to approximate posterior distributions of all model parameters and hyperparameters. Experiments on synthetic datasets show the functioning of the ARD hyperpriors, and the increased numerical stability compared to the CP based model. Experiments on real world datasets showcases the performance of the proposed model in terms of prediction accuracy and uncertainty quantification compared to currently available models. ...
Master thesis (2025) - T. Fujioka, K. Batselier, S.E. Kotti, B. Hunyadi
Functional Ultrasound (fUS) is an emerging neuroimaging technique capable of capturing brain activity similarly to functional magnetic resonance imaging (fMRI) but with higher spatiotemporal resolution and lower operational cost. This thesis investigates the extension of the joint detection-estimation (JDE) framework to jointly estimate both the hemodynamic response function (HRF) and neural response function (NRF) from fUS imaging data. In the proposed model, the two response functions are represented as a cascade linear time-invariant (LTI) convolution systems, enabling indirect estimation of neural activity signals from fUS measurements. Since direct recordings of neural activity are often unavailable, this Bayesian approach offers a data-driven means of probing the brain’s functional organization.
Inference is performed within a coordinate ascent variational inference (CAVI) framework. The proposed algorithm was applied to fUS datasets and validated against simultaneous recordings of neural firing rates. Results demonstrate that the model successfully captures neural activity, achieving a Pearson correlation coefficient (PCC) of approximately 0.24-0.30, and provides a modest improvement over the conventional boxcar input stimulus model. Additionally, the jointly estimated HRFs were consistent with existing literature, and regional HRF estimates revealed differences in response dynamics between the visual cortex and hippocampus, highlighting region-specific hemodynamic properties. ...
Hidden Markov Models (HMMs) are probabilistic models that are widely used in various fields, including machine learning, economics, information theory, neuroimaging, and more. In particular, they are frequently employed in functional Magnetic Resonance Imaging (fMRI) studies, which involve large datasets, to analyze the behavior of brain networks or states. For example, HMMs can help determine whether a patient is healthy.

HMMs represent the probability of transitions between different states of a system, which operates as a discrete Markov chain and is not directly observable. Additionally, they describe the probability of observing specific measurements based on the current state of the model. These probabilities are characterized by a state transition matrix (T), an emission matrix (O), and an initial state distribution (π).

The current methods for fitting HMMs to data primarily utilize the Baum-Welch algorithm, which is a specific type of Expectation Maximization (EM). One of the advantages of Baum-Welch is its ability to be extended to continuous observations or multivariate data. However, the algorithm involves a forward-backward pass during each iteration, which makes it increasingly slow as the size of the dataset grows. Moreover, it can easily become trapped in local minima.

An alternative approach is to use Canonical Polyadic Decomposition (CPD) to decompose a Joint Probability Tensor (JPT). This decomposition allows for the extraction of factor matrices that can be used to calculate the HMM matrices (T, O, π). Compared to the Baum-Welch algorithm, CPD and other JPT decomposition methods tend to be faster. However, they are often sensitive and may suffer from instability if the data does not fully capture the statistical behavior of the underlying HMM. Additionally, methods for JPT decomposition in HMM learning have not yet been extended to multivariate datasets or continuous settings.

To enhance stability and accommodate multivariate data, we propose a novel method that extends JPT decomposition HMM learning to a multivariate setting. This involves using a coupled CPD problem, where each observational sequence has a separate emission matrix (O) but shares a common state transition matrix (T). By transitioning from Baum-Welch to CPD, the process of learning HMMs can be significantly accelerated. Coupled CPD makes HMM learning more robust than Uncoupled CPD by relating the multivariate data through a common transition matrix T and initial distribution π. This improvement should enhance the iterability of HMM methods and make them more suitable for rapid prototyping in scientific research and the aforementioned fields.

We show in the results that Coupled CPD indeed outperforms both Uncoupled CPD and the industry-standard Baum-Welch, in most cases. It has improved stability over both methods, as well as significantly improving the calculation time, and results in higher accuracy when it comes to estimating the HMM matrices. Finally, more future improvements are suggested concerning calculation time and extensions.
...
Master thesis (2025) - J.A. Klip, K. Batselier, D. Boskos, J.F.P. Kooij
Artificial Intelligence (AI) put an increasing amount of strain on our total energy consumption and CO2 production. Not only is AI becoming increasingly more popular, but also AI models keep growing and thus need an increasing amount of computational resources. Recent research tries to mitigate this effect by creating more efficient hardware and by using Green AI, which is research in AI with additional focus on the computational resources a model requires. During this research one such Green AI method will be studied. This research focusses on the effects of memory usage on Canoncial Polyadic Decomposition (CPD) and Tensor Train (TT) decomposed Convolutional Neural Networks (CNN)s. These decomposed kinds of CNNs reduce the amount of parameters in the model, but may increase the amount of memory that is required to run the model. Therefore, first a theoretical analysis will be done on the memory used by these models. This analysis will then be validated by doing a real life scenario test and the effects of memory usage on inference time will be explored. Finally, regressions models will be made to see whether it is possible to predict the inference time. The results of these tests show that decomposed CNNs require more memory and the memory required in the real-life scenarios is higher than that was expected in the theoretical analysis. For the tested systems, it is shown that memory does influence the inference time negatively. Additionally, it was found that for very small kernels some initialization bias seems to be present, which makes the inference time larger,despite the CNN having less parameters and requiring less memory. Finally, it is shown that despite this inference bias, it is possible to predict whether the use of decomposed CNNs is beneficial to use compared to a regular CNN in terms of inference time. ...

Using the primal dual KMLSVD framework to describe self-attention in 3D

The self-attention mechanisms play a crucial role in multiple applications, for example modern large language models (LLMs), but their growing adoption has led to rapidly increasing energy, water, economic, and hardware demands. This thesis examines the application of the Primal-Dual Kernel Multi-linear Singular Value Decomposition (KMLSVD) framework as introduced by Wesel and Batselier on the self-attention mechanism. The Primal-Dual KMLSVD attention framework makes three-dimensional self-attention possible enabling a more information rich representation, possibly increasing accuracy, computation time and/or a decrease in energy and hardware requirements. Furthermore, the Primal formulation does not compute the attention tensor, significantly decreasing the computational and time complexity. Therefore, Primal-Dual KMLSVD attention could play a major role in green AI applications. Three tests are performed on 10 different timeseries datasets in order to: i) find the most accurate Primal KMLSVD attention variant, ii) compare Primal to Dual KMLSVD attention and iii) compare the Primal-Dual KMLSVD attention framework to primal attention and canonical attention. The results of these tests prove that Primal-Dual KMLSVD can define self-attention in 3D but, as of writing this thesis, the current used formulations are too inefficient time wise to be a valid improvement or alternative to self-attention. Furthermore, small scale tests suggest that Primal-Dual KMLSVD might not even be required to define self attention in 3D. However, as no experiments were performed on higher-order (3D) datasets, the potential of this framework for such problems remains an open question. ...
Doctoral thesis (2025) - F. Wesel, J.W. van Wingerden, B. Hunyadi, K. Batselier
In today's data-driven landscape, the capacity to efficiently process and analyze vast datasets is crucial across various domains, including healthcare, climate modeling, and finance. Despite the growing need for scalable and interpretable machine learning models, traditional approaches, particularly kernel machines, face significant challenges due to the curse of dimensionality. As data complexity increases, the computational and memory demands of kernel machines often become prohibitive, limiting their applicability in high-dimensional applications. This thesis addresses these challenges by investigating the integration of tensor networks (TNs) with kernel machines, aiming to enhance scalability, efficiency without sacrificing predictive power and interpretability.

We propose that TNs, with their ability to represent high-dimensional data through low-rank structures, can effectively alleviate the limitations of kernel machines. Our research is structured around three central inquiries: first, we examine how TNs can accelerate kernel machine scalability while accurately approximating kernel functions; second, we elucidate the theoretical links between TN-constrained kernel machines and Gaussian processes, providing insights into convergence and generalization; finally, we introduce a novel optimization framework characterizing a specific TN, the multi-linear singular value decomposition (MLSVD), in terms of primal and dual problems. ...
Doctoral thesis (2025) - C.M. Menzen, J.W. van Wingerden, M. Kok, K. Batselier
Probabilistic or Bayesian modeling plays a fundamental role in engineering and science, providing a framework for integrating noisy measurements with predictive models through probability distributions.
While probabilistic methods have many benefits, such as recursive estimation and uncertainty quantification, they often come with substantial memory and compute requirements.
Computational challenges are particularly pronounced in large-scale settings, where data sets contain a high number of measurements, and for high-dimensional problems, which require exponentially many parameters to describe probability distributions.
These scenarios can suffer from the curse of dimensionality, which requires exponentially growing computing resources, making conventional approaches computationally intractable.

This dissertation addresses computational challenges by leveraging tensor networks (TNs) to develop computationally efficient probabilistic algorithms.
TNs, also known as tensor decompositions, extend matrix decomposition to higher dimensions by representing large multidimensional arrays, i.e., tensors, in a compact, decomposed format, defined by TN components and TN ranks.
Under the assumption of low-rank structure, TNs enable efficient storage and computation, making large-scale and high-dimensional problems more tractable, even on resource-constrained hardware such as conventional laptops.
The focus of this work is on scalable solutions for Bayesian estimation problems involving Gaussian distributions and exact inference, including recursive filtering and Gaussian process (GP) regression. ...
The training process of machine learning models for self-driving applications suffers from bottlenecks during loading and processing of LiDAR point clouds with large storage complexity.
Many studies aim to remedy this problem from an implementation perspective by developing efficient data loading and processing pipelines.
This study, on the other hand, explores an alternative approach by augmenting data representations to achieve lower storage complexity known as point cloud compression.

A broad analysis is presented on novel point cloud compression codecs using tensor decomposition methods.
Several point cloud representations and tensor decomposition methods are considered over a range of hyperparameter choices and compression values.
In order to assess the performance of the presented codecs: the compression rate, quality of the reconstruction, and time complexity is compared to the octree-based baseline model: TMC13.
Compared to the baseline model, the performance of the presented tensor decomposition-based codecs falls short. One of the presented codecs does notably outperform the others. This codec uses synthetic tensorization followed by sorting using z-location and decomposition using the TT-SVD algorithm.
Sorting by z-value isolates the ground plane, which is a dominant low-rank feature, which can effectively be decomposed using the TT-SVD algorithm yielding adequate results.

Several limitations of the presented tensor decomposition-based codecs are: the omission of bitwise compression on the factor matrices, and the trade-off between bitwise precision and truncation due to tensor decomposition.
Future work could improve in these areas along with considering the use of different heuristics and optimizing the tensor network topology. ...

The ever-increasing complexity of Artificial Intelligence (AI) models has led to environmental challenges due to high computation and energy demands. This thesis explores the application of tensor decomposition methods—CP, Tucker, and TT—to improve the energy efficiency of large Convolutional Neural Networks (CNNs) during inference by reducing energy consumption. The energy consumption of several convolution layers was measured using a watt meter across various CNN configurations and different hardware architectures (Central Processing Unit(CPU) and Graphics Processing Unit (GPU)). In addition, several regression models were fitted to estimate energy savings, incorporating memory usage. It was found that TT decomposition consistently provided the most significant energy savings across various compression ratios, influenced by CNN hyperparameters such as input/output channels, feature sizes, and kernel sizes, whereas CP decomposition was the least effective in reducing energy. The GPU implementations generally resulted in additional energy consumption, and the GPU regression models suggested a need for more complex relationships. The thesis also revealed that the efficiency of tensor decompositions might be highly dependent on the implementation details of software libraries, such as TensorlyTorch, which can significantly impact the computation and memory complexities. These findings underscore the importance of both hardware specific considerations and careful software implementation in achieving energy efficient CNNs, providing a foundation for further research in energy-constrained environments. ...

Exploring the effect of the CPD in unsupervised transfer learning methods for cardiac arrhythmia detection

Master thesis (2024) - F. Hogenbosch, K. Batselier
This thesis studies the application of the Canonical Polyadic Decomposition (CPD) in unsupervised transfer learning methods for cardiac arrhythmia detection. Unsupervised learning methods have become more prevalent in the healthcare sector due to the abundance of unlabeled data. Labeling of medical data is often non-trivial as it is labor-intensive and requires expert knowledge. Transfer learning can utilize the large number of unlabeled data by extracting relevant features, which can in turn be used for a smaller supervised learning part. Furthermore, in the medical field, AI models are often deployed on embedded systems, requiring efficient model architectures while maintaining high diagnostic performance.

The unsupervised transfer learning method was designed with an autoencoder for the unsupervised part and a linear network for the supervised part. The first experiment explored four models to select the most optimal architecture to apply the CPD to. These models include an autoencoder adaptation of the ResNet and ConvNeXt models, the U-Net autoencoder and a basic implementation of a convolutional autoencoder. In the second experiment, the autoencoder model is decomposed using the CPD and evaluated at various compression ratios on its reconstruction capabilities, classification accuracy and computational performance. The CPD implementation is also tested on its convergence speed and data efficiency as compared to its uncompressed counterpart.

The first experiment found that the basic implementation of a convolutional autoencoder performed best overall. The U-Net model had high reconstruction quality, however lacked the predictive accuracy. The ResNet model was found to have slightly worse reconstruction and prediction capabilities while having a larger parameter count. The ConvNeXt model failed to accurately reconstruct the images.

The second experiment showed the CP-decomposed model approached the uncompressed model in terms of predictive capabilities, while having lower reconstruction qualities. This is likely due to the regularization effect of the CPD, suggesting significant redundancy in the uncompressed model. Despite the reduction in forward pass FLOPs for the CP-decomposed model, it was found that both the computational complexity of the backpropagation process was higher than the uncompressed model at the lower compression ratios and that the memory allocation suffered a significant increase. This resulted in longer and less efficient training of the CP-decomposed models. It was furthermore found that the CP-decomposed models converged faster and had higher data efficiency as compared to the uncompressed model. ...
Master thesis (2024) - M.P. van Dijk, K. Batselier, S.J.S. de Rooij
Algorithms which can effectively detect epileptic seizures have the potential to improve current treatment methods for people who suffer from epilepsy. The current state-of-the-art methods use neural networks, which are able to learn directly from the electroencephalogram (EEG) data without feature extraction. However, neural networks have drawbacks—they are time and data intensive to train and require a high number of parameters for efficient classification. This thesis proposes a novel approach to the epileptic seizure detection problem using Support Tensor Machines (STMs). The final models learn directly from EEG data and use far fewer model parameters compared to a state-of-the-art model using a convolutional neural network. Three types of experiments have been conducted using different representations for the EEG data. The STMs used in the experiments are the Support Higher-Order Tensor Machine, Dual Structure-preserving Kernel (DuSK), and Tensor Train Multi-way Multi-level Kernel (TT-MMK). The results show that by using TT-MMK (with a tensorized data representation) for leave-one-seizure-out validation and DuSK (with the original data representation) for leave-one-patient-out validation, the models are able to rival a state-of-the-art epileptic seizure detector. ...

Low complexity methods for large scale sensing

Master thesis (2024) - A.A. Bevelander, N.J. Myers, K. Batselier
Compressed sensing is a framework in signal processing that enables the efficient acquisition and reconstruction of sparse signals. A widely-used class of algorithms that are used for this reconstruction, called greedy-algorithms, depend on non-convex optimization. With increasing signal size, these problems become computationally very hard. Block compressed sensing is a framework in compressed sensing that divides the compressed sensing problem into sub-problems, gaining a better storage complexity. However, block compressed sensing has not yet been studied form a computational complexity perspective.

This thesis focuses on the application of block compressed sensing to signals of high dimension to gain insight into the relation between reconstruction performance and computational complexity. This is done by, first investigating how theoretical reconstruction guarantees change, when the problem is divided into smaller sub-problems and by doing a complexity analysis of the reconstruction itself. Each sub-problem solves for a portion of a signal, defined as a block. Next, experiments are conducted in order to get insight into the trade-off between computational complexity and quality of the reconstruction. It can be found that, by using this block-wise approach, the computational complexity of the reconstruction problem decreases, but at the same time, quality of the reconstruction deteriorates. Besides, a method to compensate for this performance loss is proposed. The key idea of this method is that, by propagating prior information among the different blocks, the reconstructions of the blocks can be improved. Finally, block compressed sensing and prior-aware block compressed sensing are analysed in a higher order tensor compressed sensing setting. Nevertheless, this setting was found to exhibit a less favourable complexity-performance trade-off than the normal one, as this setting resulted in both a more complex and a less accurate reconstruction than the normal one. ...
Master thesis (2023) - R.H.W. Smeenk, K. Batselier, D. Boskos, F. Wesel
This research aims at quantifying the uncertainty in the predictions of tensor network constrained kernel machines, focusing on the Canonical Polyadic Decomposition (CPD) constrained kernel machine. Constraining the parameters in the kernel machine optimization problem to be a CPD results in a linear computational complexity in the number of features, whereas the original problem suffers heavily from the curse of dimensionality as the number of parameters scale exponentially. By employing a product feature map with polynomial features, the original data input is transformed to a higher-dimensional space.

Three different methods are investigated for quantifying the uncertainty of the predictions of the CPD constrained kernel machine. Firstly, the delta method is proposed which is a frequentist approach that linearizes a nonlinear parametric model around the estimated model. By estimating the covariance of the model parameters, the delta method can estimate the uncertainty in the model predictions based on the estimated parameter uncertainties. The delta method is compared to two other methods that are able to reflect the prediction uncertainty: the Bayesian method and Single Bayesian Core (SBC) method. The Bayesian method treats the parameters in the factor matrices of the CPD as probability distributions rather than single values and the SBC method incorporates both frequentist and Bayesian aspects. A comparison between the three different methods is performed based on an assessment on the correctness and informativeness of the uncertainty measures of prediction intervals and confidence Intervals.

It was found by regression and classification experiments that all three methods can provide valuable uncertainty quantification measures in terms of correctness and informativeness for the CPD constrained kernel machine. However, the Bayesian method provides in general more conservative uncertainty intervals compared to the delta and SBC method. A major drawback of the Bayesian method is its lack of scalability as the size of the mean and covariance, constructed by the unscented transform in the Bayesian method, scale exponentially. Furthermore, the delta and SBC method produce high quality uncertainty intervals and the methods provide remarkably similar uncertainty quantification on the prediction error variance. ...
Master thesis (2022) - A.P. van Koppen, K. Batselier, C.M. Menzen
Streaming video completion is the practice that aims to fill in missing or corrupted pixels in a video stream by using past uncorrupted data. A method to tackle this problem is recently introduced called a Tensor Networked Kalman Filter (TNKF). It shows promising results in terms of performance compared to state-of-the-art methods for high percentages of missing pixels (≥ 95%). The main drawback of using a TNKF is the computational speed, which needs to be improved to compete with other existing methods and to be carried out in real-time by a regular computer. This work discusses three methods that reduce the computational load of the algorithm, which speeds up computations. The first method is replacing the existing algorithm with a Block Update TNKF. Secondly, the use of randomized rounding instead of deterministic rounding is investigated. The last method is the simplification of the TNKF update. Results that are presented in this report show that significant speedups of up to +132% can be achieved. In most situations, the considered speedup methods compromise the reconstruction’s accuracy. This thesis discusses the effects this has on the quality of the reconstruction. ...
Master thesis (2022) - Z. LI, K. Batselier
This thesis applies the Gauss-Newton optimizer to estimate the parameter values of the Volterra-PARAFAC model by minimizing a nonlinear least square cost (NLS) function given the input and output measurements of the MISO Volterra system. ...

Through implicit cumulant tensor manipulation

Master thesis (2022) - P. Denarié, K. Batselier, B. Hunyadi
Blind Source Separation (BSS), the separation of latent source components from observed mixtures, is relevant to many fields of expertise such as neuro-imaging, economics and machine learning. Reliable estimates of the sources can be obtained through diagonalization of the cumulant tensor, i.e., a fourth-order symmetric multi-linear array containing the cross-kurtosis values of observed mixtures. The downside of such diagonalization methods is that they scale quartically with the increase of the amount of source components to estimate due to the tensor’s quartic size. Tensor decomposition can simultaneously diagonalize the cumulant tensor and address its size. However, it does not resolve the scalability issue due to the restriction of having to first explicitly compute the tensor.

It is studied how decomposing the cumulant tensor in implicit fashion can be used to solve the BSS problem while simultaneously addressing its scalablity issue. A class of implicit cumulant tensor decomposition algorithms is derived which scale more favorably than their explicit counterparts in terms of either computational cost, storage cost or both. Firstly, a novel QR-Tensor algorithm (QRT) is introduced which allows for the simultaneous diagonalization of a tensor’s outer-slices. It is theoretically shown how an implicit version of the QRT algorithm can be used to solve the BSS problem at a linearly scaling computational cost. Secondly, a fixed-point Canonical Polyadic Decomposition (CPD) iteration method is presented. It reduces the computational complexity from a quartic dependence to a linear dependence on the amount of signals to estimate. The source estimation performance of the devised implicit decomposition methods is compared to that of the state-of-the-art FastICA for an artificial linear BSS problem.

Results show that both fixed-point CPD and QRT are superior to FastICA when it comes to the computation time needed to reach convergence, while producing estimated sources of similar quality. It is shown that when the amount of sources to estimate is increased both QRT and FastICA struggle to converge. In contrast, the fixed-point CPD method converges within a consistent amount of iterations, suggesting a method more suitable for the estimation of a large amount of sources. ...
Natural Language Processing (NLP) deals with understanding and processing human text by any computer software. There are several network architectures in the fields of deep learning and artificial intelligence that are used for NLP. Deep learning techniques like recurrent neural networks and feed-forward neural networks are used to develop language models that perform several NLP tasks. Over the years, researchers have worked on developing state-of-the-art language models that achieve high accuracy and performance for NLP applications. With the development of deep neural network language models, the computational resources requirements and the energy costs for training and running language models increased. This led to research to compress the language models, thereby reducing the computational complexity of the language models. One of the methods used for this is tensor decomposition, like the tensor-train (TT) decomposition. During this thesis work, the application of the TT-decomposition method for compressing the embedding layer in a long-short-term memory model was investigated. Specifically, the effect of factorization and the order of factors in the embedding layer when it is represented in the TT-matrix format on the maximum test accuracy of the long- short term memory model for the NLP task of sentiment analysis was investigated. This was done by considering three different factorizations of the embedding layer in the model. Further, the effect of change in TT-ranks (hyperparameters of the model when the embedding layer is represented in the TT-matrix format) on the maximum test accuracy was also investigated. Based on the investigation and empirical results obtained, this thesis concludes that by having a larger number of factors in the factorization of the embedding layer, the maximum test accuracy of the model increases. Further, in a particular factorization, when the factors were arranged in such a way that the maximum values of the TT-ranks had a smaller gap, the maximum test accuracy of the model improved. In one particular configuration of the model, the number of parameters was reduced by 24.5 times compared to that of the original uncompressed model, and a maximum test accuracy of 77.10% was achieved compared to a maximum test accuracy of 78.05% in the case of the original model. ...
Master thesis (2022) - R. Chandrashekar, K. Batselier, E. Isufi, E.M. Memmel
Humans make decisions when presented with choices based on influences. The Internet today presents people with abundant choices to choose from. Recommending choices with an emphasis on people's preferences has become increasingly sought. Grundy (1979), the first computer librarian Recommender System (RS), provided users with book recommendations. Growing volumes of user data in the '90s saw increased usage of commercially available RS for e-commerce, music, movies, books, and social networking services. Due to their effectiveness in providing recommendations, Collaborative Filtering (CF) algorithms are predominantly used to build these RS. However, traditional CF algorithms adopting Matrix Factorization (MF) and Nearest Neighbor (NN) methods suffer from handling sparse data or model scalability. With exponentially increasing sparse data, building scalable and accurate RS models is of focus.

This thesis uses tensors and graphs to represent available data. Emphasis is given to capturing higher-order interactions present between the data. The use of tensors is motivated as matrices cannot capture data with higher-order relations, such as variation of user ratings to items with time. The transition to using tensors has been promising with the development of efficient tensor decomposition methods and powerful machines. Graphs can capture the correlation between different entities, providing additional information intrinsic to the underlying graph structure. A Graph Regularized CANDECOMP/PARAFAC (GRCP) tensor decomposition model framework is proposed in this thesis. The thesis highlights how to graph Laplacian regularizers (GLRs) benefit CP tensor decomposition methods to build RS. The model framework is evaluated with the MovieLens data set. The model records lesser Normalized Mean Squared Error (NMSE) values than those reported in the literature. The combination of varied data sources notably aids in overcoming the drawbacks of current RS models, offering scalability with computational efficiency in linear time. ...
This thesis studies the Canonical Polyadic Decomposition (CPD) constrained kernel machine for large scale learning, i.e. learning with a large number of samples. The kernel machine optimization problem is solved in the primal space, such that the complexity of the problem scales linearly in the number of samples as opposed to scaling cubically in the dual space. Product feature maps are applied to transform the input data. The weights are constrained to be a CPD, so the number of weights scales linearly in the number of features. The CPD introduces a nonlinearity, so nonlinear optimization must be applied.
It is studied in which situation it is more advantageous to apply iterative all-at-once opti- mization compared to Alternating Least Squares (ALS) to solve the CPD constrained kernel machine problem. Specifically, all-at-once gradient descent methods are studied. An efficient analytical algorithm for the all-at-once gradient is derived. Furthermore, it is shown that automatic differentiation (AD) can also be applied, but it is found to be slower than the analytical method.
The selection of a step size is found to be challenging. It is shown that the magnitude of the gradient of the mean squared error (MSE) term decreases for an increasing number of features. As a result, selecting the step size becomes more difficult for more features. To overcome this, the Line search and the Adam method are studied. A general expression for the exact line search solution is derived. It can be applied to compute the optimal step size for any step direction and any number of features. However, the Adam method performs better in terms of loss after training, convergence and the training run time. The mini-batch Adam method is used to evaluate the performance of all-at-once optimization for large scale learning.
It is found that the Adam method no longer performs well for data sets with around 16 features or more, likely due to the decrease in the magnitude of the gradient of the MSE term. On large scale data sets with fewer features, the Adam method outperforms ALS in terms of run time until convergence while achieving similar training and validation losses. The Adam method reached convergence on a data set with 11 million samples within ten minutes. Furthermore, it is shown that the scaling of the run time of the Adam method in terms of the feature map order and the CP-rank is more than an order lower than the scaling of ALS when the methods are run on a GPU. This makes to Adam method more suitable for more complex models. ...