K. Batselier | TU Delft Repository

Tensor network square root Kalman filter for online Gaussian process regression

Journal article (2026) - Clara Menzen, Manon Kok, Kim Batselier

The state-of-the-art tensor network Kalman filter lifts the curse of dimensionality for high-dimensional recursive estimation problems. However, the required rounding operation can cause filter divergence due to the loss of positive definiteness of covariance matrices. We solve this issue by developing, for the first time, a tensor network square root Kalman filter, and apply it to high-dimensional online Gaussian process regression. In our experiments, we demonstrate that our method is equivalent to the conventional Kalman filter when choosing a full-rank tensor network. Furthermore, we apply our method to a real-life system identification problem where we estimate 4¹⁴ parameters on a standard laptop. The estimated model outperforms the state-of-the-art tensor network Kalman filter in terms of prediction accuracy and uncertainty quantification. ...

A Divide-and-conquer Approach for Sparse Recovery in High Dimensions

Conference paper (2025) - Aron Bevelander, Kim Batselier, Nitin Jonathan Myers

Block compressed sensing (BCS) alleviates the high storage and memory complexity with standard CS by dividing the sparse recovery problem into sub-problems. This paper presents a Welch bound-based guarantee on the reconstruction error with BCS, revealing that sparse recovery deteriorates with more partitions. To address this performance loss, we propose a data-driven BCS technique that leverages correlation across signal partitions. Our method surpasses classical BCS in moderate SNR regimes, with a modest increase in storage and computational complexities. ...

Tensor Network-Constrained Kernel Machines as Gaussian Processes

Journal article (2025) - Frederiek Wesel, Kim Batselier

In this paper we establish a new connection between Tensor Network (tn)-constrained kernel machines and Gaussian Processes (gps). We prove that the outputs of Canonical Polyadic Decomposition (cpd) and Tensor Train (tt)-constrained kernel machines converge in the limit of large ranks to the same product kernel gp which we fully characterize, when specifying appropriate i.i.d. priors across their components. We show that tt-constrained models convergence faster to the gp compared to their cpd counterparts for the same number of model parameters. The convergence to the gp occurs as the ranks tend to infinity, as opposed to the standard approach which introduces tns as an additional constraint on the posterior. This implies that the newly established priors allow the models to learn features more freely as they necessitate infinitely more parameters to converge to a gp, which is characterized by a fixed learning representation and thus no feature learning. As a consequence, the newly derived priors yield more flexible models which can better fit the data, albeit at increased risk of overfitting. We demonstrate these considerations by means of two numerical experiments. ...

Tensor Network Based Feature Learning Model

Journal article (2025) - Albert Saiapin, Kim Batselier

Many approximations were suggested to circumvent the cubic complexity of kernel-based algorithms, allowing their application to large-scale datasets. One strategy is to consider the primal formulation of the learning problem by mapping the data to a higher-dimensional space using tensor-product structured polynomial and Fourier features. The curse of dimensionality due to these tensor-product features was effectively solved by a tensor network reparameterization of the model parameters. However, another important aspect of model training - identifying optimal feature hyperparameters - has not been addressed and is typically handled using the standard cross-validation approach. In this paper, we introduce the Feature Learning (FL) model, which addresses this issue by representing tensor-product features as a learnable Canonical Polyadic Decomposition (CPD). By leveraging this CPD structure, we efficiently learn the hyperparameters associated with different features alongside the model parameters using an Alternating Least Squares (ALS) optimization method. We prove the effectiveness of the FL model through experiments on real data of various dimensionality and scale. The results show that the FL model can be consistently trained 3-5 times faster than and have the prediction quality on par with a standard cross-validated model. ...

Low-dimensional Subspace Regularization through Structured Tensor Priors

Journal article (2025) - Kim Batselier

Specifying a prior distribution is an essential part of solving Bayesian inverse problems. The prior encodes a belief on the nature of the solution and this regularizes the problem. In this article we completely characterize a Gaussian prior that encodes the belief that the solution is a structured tensor that lies in a low-dimensional subspace. We define the notion of (A, b)-constrained tensors and show that they describe a large variety of different structures such as Hankel, circulant, triangular, symmetric, and so on. We prove that the low-dimensional subspace defined by this prior is the right nullspace of the matrix A that defines the tensor structure. We completely characterize the Gaussian probability distribution of such tensors by specifying its mean vector and covariance matrix in terms of A and b. Furthermore, explicit expressions are proved for the covariance matrix of tensors whose entries are invariant under a permutation. These results unlock a whole new class of priors for Bayesian inverse problems. We illustrate how new kernel functions can be designed and efficiently computed and apply our results on two particular Bayesian inverse problems: completing a Hankel matrix from a few noisy measurements and learning an image classifier of handwritten digits. The effectiveness of the proposed priors is demonstrated for both problems. All applications have been implemented as reactive Pluto notebooks in Julia. ...

Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models

Journal article (2024) - F. Wesel, K. Batselier

In the context of kernel machines, polynomial and Fourier features are commonly used to provide a nonlinear extension to linear models by mapping the data to a higher-dimensional space. Unless one considers the dual formulation of the learning problem, which renders exact large-scale learning unfeasible, the exponential increase of model parameters in the dimensionality of the data caused by their tensor-product structure prohibits to tackle high-dimensional problems. One of the possible approaches to circumvent this exponential scaling is to exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. In this paper we quantize, i.e. further tensorize, polynomial and Fourier features. Based on this feature quantization we propose to quantize the associated model weights, yielding quantized models. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts, at no additional computational cost while learning from identical features. We verify experimentally how this additional tensorization regularizes the learning problem by prioritizing the most salient features in the data and how it provides models with increased generalization capabilities. We finally benchmark our approach on large regression task, achieving state-of-the-art results on a laptop computer. ...

Position: Tensor Networks are a Valuable Asset for Green AI

Journal article (2024) - Eva Memmel, Clara Menzen, Jetze Schuurmans, Frederiek Wesel, Kim Batselier

For the first time, this position paper introduces a fundamental link between tensor networks (TNs) and Green AI, highlighting their synergistic potential to enhance both the inclusivity and sustainability of AI research. We argue that TNs are valuable for Green AI due to their strong mathematical backbone and inherent logarithmic compression potential. We undertake a comprehensive review of the ongoing discussions on Green AI, emphasizing the importance of sustainability and inclusivity in AI research to demonstrate the significance of establishing the link between Green AI and TNs. To support our position, we first provide a comprehensive overview of efficiency metrics proposed in Green AI literature and then evaluate examples of TNs in the fields of kernel machines and deep learning using the proposed efficiency metrics. This position paper aims to incentivize meaningful, constructive discussions by bridging fundamental principles of Green AI and TNs. We advocate for researchers to seriously evaluate the integration of TNs into their research projects, and in alignment with the link established in this paper, we support prior calls encouraging researchers to treat Green AI principles as a research priority. ...

Enabling Large-Scale Probabilistic Seizure Detection with a Tensor-Network Kalman Filter for LS-SVM

Conference paper (2023) - S. J.S. De Rooij, K. Batselier, B. Hunyadi

Recent advancements in wearable EEG devices have highlighted the importance of accurate seizure detection algorithms, yet the ever-increasing size of the generated datasets poses a significant challenge to existing seizure detection methods based on kernel machines. Typically, this problem is mitigated by significantly undersampling the majority class, but in practice, these methods tend to suffer from too many false alarms. Recent works have proposed tensor networks to enable large-scale classification with kernel machines. In this paper, we explore the use of a probabilistic tensor method, the tensor-network Kalman filter for LS-SVMs (TNKF-LSSVM), for seizure detection, as we hypothesize that using more data will improve the detection performance. We show that the TNKF-LSSVM performs comparably to a regular LSSVM in detecting seizures when both are trained on the same dataset. Additionally, the TNKF-LSSVM can provide meaningful uncertainty quantification, and it is able to handle large-scale datasets beyond the capabilities of the LS-SVM (i.e., $N \gt 10 ^{5})$. However, for the presented model configuration detection performance does not seem to improve with more input data. ...

Bayesian Framework for a MIMO Volterra Tensor Network

Journal article (2023) - Eva Memmel, Clara Menzen, Kim Batselier

This paper proposes a Bayesian Volterra tensor network (TN) to solve high-order discrete nonlinear multiple-input multiple-output (MIMO) Volterra system identification problems. Using a low-rank tensor network to compress all Volterra kernels at once, we avoid the exponential growth of monomials with respect to the order of the Volterra kernel. Our contribution is to introduce a Bayesian framework for the low-rank Volterra TN. Compared to the least squares solution for Volterra TNs, we include prior assumptions explicitly in the model. In particular, we show for the first time how a zero-mean prior with diagonal covariance matrix corresponds to implementing a Tikhonov regularization for the MIMO Volterra TN. Furthermore, adopting a Bayesian viewpoint enables simulations with Bayesian uncertainty bounds based on noise and prior assumptions. In addition, we demonstrate via numerical experiments how Tikhonov regularization prevents overfitting in the case of higher-rank TNs. ...

Projecting basis functions with tensor networks for Gaussian process regression

Journal article (2023) - Clara Menzen, Eva Memmel, Kim Batselier, Manon Kok

This paper presents a method for approximate Gaussian process (GP) regression with tensor networks (TNs). A parametric approximation of a GP uses a linear combination of basis functions, where the accuracy of the approximation depends on the total number of basis functions M. We develop an approach that allows us to use an exponential amount of basis functions without the corresponding exponential computational complexity. The key idea to enable this is using low-rank TNs. We first find a suitable low-dimensional subspace from the data, described by a low-rank TN. In this low-dimensional subspace, we then infer the weights of our model by solving a Bayesian inference problem. Finally, we project the resulting weights back to the original space to make GP predictions. The benefit of our approach comes from the projection to a smaller subspace: It modifies the shape of the basis functions in a way that it sees fit based on the given data, and it allows for efficient computations in the smaller subspace. In an experiment with an 18-dimensional benchmark data set, we show the applicability of our method to an inverse dynamics problem. ...

A Khatri-Rao product tensor network for efficient symmetric MIMO Volterra identification

Journal article (2023) - Kim Batselier

The identification of symmetric tensor network MIMO Volterra models has been studied earlier via the computation of a Moore-Penrose pseudoinverse in tensor network form. The current state of the art requires the construction of a tensor network of a repeated Khatri-Rao product of a matrix with itself. This construction has a computational complexity that is dominated by one singular value decomposition (SVD) of an RI × IN matrix, where N is the number of measurements, I depends linearly on the number of inputs and input lags and R is the maximal tensor network rank. In this article, we prove an alternative method for constructing this tensor network without any computation whatsoever. The pseudoinverse can then be computed through an orthogonalization of the newly proposed tensor network. Furthermore, the proposed algorithm allows for the recursive identification of symmetric Volterra models of increasing degree D, which reduces the computation to one SVD of a RI × N matrix per step. Through numerical experiments we demonstrate how the proposed algorithm enables up to ten times faster identification of symmetric tensor network MIMO Volterra systems. ...

Tensor-based Kernel Machines with Structured Inducing Points for Large and High-Dimensional Data

Journal article (2023) - Frederiek Wesel, Kim Batselier

Kernel machines are one of the most studied family of methods in machine learning. In the exact setting, training requires to instantiate the kernel matrix, thereby prohibiting their application to large-sampled data. One popular kernel approximation strategy which allows to tackle large-sampled data consists in interpolating product kernels on a set of grid-structured inducing points. However, since the number of model parameters increases exponentially with the dimensionality of the data, these methods are limited to small-dimensional datasets. In this work we lift this limitation entirely by placing inducing points on a grid and constraining the primal weights to be a low-rank Canonical Polyadic Decomposition. We derive a block coordinate descent algorithm that efficiently exploits grid-structured inducing points. The computational complexity of the algorithm scales linearly both in the number of samples and in the dimensionality of the data for any product kernel. We demonstrate the performance of our algorithm on large-scale and high-dimensional data, achieving state-of-the art results on a laptop computer. Our results show that grid-structured approaches can work in higher-dimensional problems. ...

Alternating linear scheme in a bayesian framework for low-rank tensor approximation

Journal article (2022) - Clara Menzen, Manon Kok, Kim Batselier

Multiway data often naturally occurs in a tensorial format which can be approximately represented by a low-rank tensor decomposition. This is useful because complexity can be significantly reduced and the treatment of large-scale data sets can be facilitated. In this paper, we find a low-rank representation for a given tensor by solving a Bayesian inference problem. This is achieved by dividing the overall inference problem into subproblems where we sequentially infer the posterior distribution of one tensor decomposition component at a time. This leads to a probabilistic interpretation of the well-known iterative algorithm alternating linear scheme (ALS). In this way, the consideration of measurement noise is enabled, as well as the incorporation of application-specific prior knowledge and the uncertainty quantification of the low-rank tensor estimate. To compute the low-rank tensor estimate from the posterior distributions of the tensor decomposition components, we present an algorithm that performs the unscented transform in tensor train format. ...

Faster tensor train decomposition for sparse data

Journal article (2022) - Lingjie Li, Wenjian Yu, Kim Batselier

In recent years, the application of tensors has become more widespread in fields that involve data analytics and numerical computation. Due to the explosive growth of data, low-rank tensor decompositions have become a powerful tool to harness the notorious curse of dimensionality. The main forms of tensor decomposition include CP decomposition, Tucker decomposition, tensor train (TT) decomposition, etc. Each of the existing TT decomposition algorithms, including the TT-SVD and randomized TT-SVD, is successful in the field, but neither can both accurately and efficiently decompose large-scale sparse tensors. Based on previous research, this paper proposes a new quasi-optimal fast TT decomposition algorithm for large-scale sparse tensors with proven correctness and the upper bound of computational complexity derived. It can also efficiently produce sparse TT with no numerical error and slightly larger TT-ranks on demand. In numerical experiments, we verify that the proposed algorithm can decompose sparse tensors in a much faster speed than the TT-SVD, and have advantages on speed, precision and versatility over the randomized TT-SVD and TT-cross. And, with it we can realize large-scale sparse matrix TT decomposition that was previously unachievable, enabling the tensor decomposition based algorithms to be applied in more scenarios. ...

Tensor network algorithms for image classification

Book chapter (2022) - Cong Chen, Kim Batselier, Ngai Wong

For many real-world image classification tasks, collecting high-quality labeled image data is challenging. Therefore, a complicated convolutional neural network might not be able to get well trained and traditional machine learning methods would be a better choice. However, traditional vector-based machine learning algorithms cannot achieve a satisfactory performance when dealing with high-dimensional tensorial data. There are mainly two reasons. First, vectorizing tensor data loses useful structural information in the original data, which might be helpful in the classification task. Second, traditional vector-based methods commonly contain a similar number of model parameters as the data size. In this case, when the data dimension is relatively high and the number of training samples is small, an overfitted model would be derived. To address these issues, researchers extend the vector-based classifiers into their tensorial formats, which accept tensorial data as input directly, and at the same time employ much fewer model parameters. In this chapter, two traditional vector-based machine learning algorithms, namely, support vector machine and logistic regression, are generalized to their tensorial counterparts to facilitate the tensor-based classification tasks. ...

Kernelized support tensor train machines

Journal article (2022) - Cong Chen, Kim Batselier, Wenjian Yu, Ngai Wong

Tensor, a multi-dimensional data structure, has been exploited recently in the machine learning community. Traditional machine learning approaches are vector- or matrix-based, and cannot handle tensorial data directly. In this paper, we propose a tensor train (TT)-based kernel technique for the first time, and apply it to the conventional support vector machine (SVM) for high-dimensional image classification with very small number of training samples. Specifically, we propose a kernelized support tensor train machine that accepts tensorial input and preserves the intrinsic kernel property. The main contributions are threefold. First, we propose a TT-based feature mapping procedure that maintains the TT structure in the feature space. Second, we demonstrate two ways to construct the TT-based kernel function while considering consistency with the TT inner product and preservation of information. Third, we show that it is possible to apply different kernel functions on different data modes. In principle, our method tensorizes the standard SVM on its input structure and kernel mapping scheme. This reduces the storage and computation complexity of kernel matrix construction from exponential to polynomial. The validity proof and computation complexity of the proposed TT-based kernel functions are provided elaborately. Extensive experiments are performed on high-dimensional fMRI and color images datasets, which demonstrates the superiority of the proposed scheme compared with the state-of-the-art techniques. ...

Low-Rank Tensor Decompositions for Nonlinear System Identification

A Tutorial with Examples

Journal article (2022) - Kim Batselier

Nonlinear parametric system identification is the estimation of nonlinear models of dynamical systems from measured data. Nonlinear models are parameterized, and it is exactly these parameters that must be estimated. Extending familiar linear models to their nonlinear counterparts quickly leads to practical problems. For example, the generalization of a multivariate linear function to a multivariate polynomial implies that the number of parameters grows exponentially with the total degree of the polynomial. This exponential explosion of model parameters is an instance of the so-called curse of dimensionality. Both the storage and computational complexities are limiting factors in the development of system identification methods for such models. ...

MERACLE

Constructive Layer-Wise Conversion of a Tensor Train into a MERA

Journal article (2021) - Kim Batselier, Andrzej Cichocki, Ngai Wong

In this article, two new algorithms are presented that convert a given data tensor train into either a Tucker decomposition with orthogonal matrix factors or a multi-scale entanglement renormalization ansatz (MERA). The Tucker core tensor is never explicitly computed but stored as a tensor train instead, resulting in both computationally and storage efficient algorithms. Both the multilinear Tucker-ranks as well as the MERA-ranks are automatically determined by the algorithm for a given upper bound on the relative approximation error. In addition, an iterative algorithm with low computational complexity based on solving an orthogonal Procrustes problem is proposed for the first time to retrieve optimal rank-lowering disentangler tensors, which are a crucial component in the construction of a low-rank MERA. Numerical experiments demonstrate the effectiveness of the proposed algorithms together with the potential storage benefit of a low-rank MERA over a tensor train. ...

Enforcing symmetry in tensor network MIMO Volterra identification

Journal article (2021) - Kim Batselier

The estimation of an exponential number of model parameters in a truncated Volterra model can be circumvented by using a low-rank tensor decomposition approach. This low-rank property of the tensor decomposition can be interpreted as the assumption that all Volterra parameters are structured. In this article, we investigate whether it is possible to explicitly enforce symmetry of the Volterra kernels to the low-rank tensor decomposition. We show that low-rank symmetric Volterra identification is an ill-conditioned problem as the low-rank property of the exact symmetric kernels cannot be upheld in the presence of measurement noise. Furthermore, an algorithm is derived to compute the symmetric Volterra kernels directly in tensor network form. ...

Fast and Accurate Tensor Completion with Total Variation Regularized Tensor Trains

Journal article (2020) - Ching Yun Ko, Kim Batselier, Luca Daniel, Wenjian Yu, Ngai Wong

We propose a new tensor completion method based on tensor trains. The to-be-completed tensor is modeled as a low-rank tensor train, where we use the known tensor entries and their coordinates to update the tensor train. A novel tensor train initialization procedure is proposed specifically for image and video completion, which is demonstrated to ensure fast convergence of the completion algorithm. The tensor train framework is also shown to easily accommodate Total Variation and Tikhonov regularization due to their low-rank tensor train representations. Image and video inpainting experiments verify the superiority of the proposed scheme in terms of both speed and scalability, where a speedup of up to 155\times is observed compared to state-of-the-art tensor completion methods at a similar accuracy. Moreover, we demonstrate the proposed scheme is especially advantageous over existing algorithms when only tiny portions (say, 1%) of the to-be-completed images/videos are known. ...