Being capable to foresee the future of a given financial asset as an investor, may lead to significant economic profits. Therefore, stock market prediction is a field that has been extensively developed by numerous researchers and companies. Recently, however, a new branch of financial assets has emerged, namely cryptocurrencies. As a representative of these tokens, we chose the largest and most popular cryptocurrency, called Bitcoin. Its value is characterized by with non-stationary behaviour and occurrence of speculative bubbles, which cause a rapid explosion of the price, followed by a major crisis and market panic.
Currently, most of the research community does not take these issues into account, while predicting its price, which may lead to wrong conclusions or unstable results. Therefore, in this thesis, we take a step back and reconsider how does the environment influence model's performance and how to use this knowledge to implement more accurate forecast in the future. Moreover, by designing an appropriate methodology and employing semantic features from online text sources, such as Twitter, Reddit and online news portals, we attempt to build a robust prediction system that offers stable performance regardless of the market fluctuations.
Executed experiments prove that non-stationarity negatively influences the results, causing the deterioration of model's performance over time. Furthermore, it appears that there may be certain properties of economic bubbles that facilitate more efficient prediction, as well as some predictors have an ability to successfully forecast the beginning of a market crisis. However, these findings are based on individual observations, which need to be confirmed by further research. In addition, by designing an appropriate methodology, we prevented performance deterioration, caused by price signal non-stationarity. Although, the semantic features based on online sources did not boost the robustness of the system significantly, combined with the suitable system's design, they lead to improvement in the overall performance of the predictor.