Stock market prediction using social media data and finding the covariance of the LASSO
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Stock market prediction has been a research topic for decades; recently, efforts to increase the accuracy by including data from social media like Google and Twitter received a lot of attention. Social media can be regarded as indicator for sentiments and sentiments are known to influence the stock market. Current models lack interpretation; it is difficult to determine what data is relevant for stock market prediction, since there is an abundance of social media data. A regression method that induces sparsity is thus required; data that is not useful is discarded automatically. The LASSO induces sparsity via L1-regularization; however, the covariance and confidence of the found regression coefficients cannot be derived easily, while this is important for interpretation. This thesis therefore reviews all known methods for approximating the covariance and confidence interval for the LASSO and determines their accuracy using numerical simulations. A new method is proposed based on the Unscented Transform, which outcompetes all methods in the underdetermined scenario, where there are more features than data points. Unfortunately, linear regression via the LASSO has limited use for stock markets as the achieved prediction accuracy is low. Nonlinear models are often applied for stock market prediction to achieve higher accuracies. Therefore a new feature selection method is proposed for the nonlinear Support Vector Regression (SVR) to select the correct data for stock market prediction using the SVR. This method yields accurate feature selection when the number of features to select from is low.