Cluster management system design for big data infrastructures

More Info
expand_more

Abstract

In recent years,we have seen amajor shift in computing systems: data volumes are growing very fast, but hardware capabilities to store, process, and transfer the massive data are not speeding up at the same rate. Today, data are generated from a variety of sources, such as social networking websites, business transactions, banking sectors, etc. These data are valuable and contain lots of vital information if they are analyzed efficiently. The processing capabilities of single machines, however, are not sufficient enough, which
makes it harder to use them for data analysis. As a result, most web companies, but also the traditional business organizations, research labs, and universities, are scaling out their major computational frameworks to clusters of thousands of machines. To find the hidden and interesting insights from the data, in addition to simple queries, also complex machine learning algorithms and graphs processing are becoming a common choice in many areas. Nowadays, the problem to collect, store and analyze these data is called the Big Data problem.