Hidden Markov Models (HMMs) are probabilistic models that are widely used in various fields, including machine learning, economics, information theory, neuroimaging, and more. In particular, they are frequently employed in functional Magnetic Resonance Imaging (fMRI) studies, whi
...
Hidden Markov Models (HMMs) are probabilistic models that are widely used in various fields, including machine learning, economics, information theory, neuroimaging, and more. In particular, they are frequently employed in functional Magnetic Resonance Imaging (fMRI) studies, which involve large datasets, to analyze the behavior of brain networks or states. For example, HMMs can help determine whether a patient is healthy.
HMMs represent the probability of transitions between different states of a system, which operates as a discrete Markov chain and is not directly observable. Additionally, they describe the probability of observing specific measurements based on the current state of the model. These probabilities are characterized by a state transition matrix (T), an emission matrix (O), and an initial state distribution (π).
The current methods for fitting HMMs to data primarily utilize the Baum-Welch algorithm, which is a specific type of Expectation Maximization (EM). One of the advantages of Baum-Welch is its ability to be extended to continuous observations or multivariate data. However, the algorithm involves a forward-backward pass during each iteration, which makes it increasingly slow as the size of the dataset grows. Moreover, it can easily become trapped in local minima.
An alternative approach is to use Canonical Polyadic Decomposition (CPD) to decompose a Joint Probability Tensor (JPT). This decomposition allows for the extraction of factor matrices that can be used to calculate the HMM matrices (T, O, π). Compared to the Baum-Welch algorithm, CPD and other JPT decomposition methods tend to be faster. However, they are often sensitive and may suffer from instability if the data does not fully capture the statistical behavior of the underlying HMM. Additionally, methods for JPT decomposition in HMM learning have not yet been extended to multivariate datasets or continuous settings.
To enhance stability and accommodate multivariate data, we propose a novel method that extends JPT decomposition HMM learning to a multivariate setting. This involves using a coupled CPD problem, where each observational sequence has a separate emission matrix (O) but shares a common state transition matrix (T). By transitioning from Baum-Welch to CPD, the process of learning HMMs can be significantly accelerated. Coupled CPD makes HMM learning more robust than Uncoupled CPD by relating the multivariate data through a common transition matrix T and initial distribution π. This improvement should enhance the iterability of HMM methods and make them more suitable for rapid prototyping in scientific research and the aforementioned fields.
We show in the results that Coupled CPD indeed outperforms both Uncoupled CPD and the industry-standard Baum-Welch, in most cases. It has improved stability over both methods, as well as significantly improving the calculation time, and results in higher accuracy when it comes to estimating the HMM matrices. Finally, more future improvements are suggested concerning calculation time and extensions.