Relating big data and data quality in financial service organizations

More Info
expand_more

Abstract

Today’s financial service organizations have a data deluge. A number of V’s are often used to characterize big data, whereas traditional data quality is characterized by a number of dimensions. Our objective is to investigate the complex relationship between big data and data quality. We do this by comparing the big data characteristics with data quality dimensions. Data quality has been researched for decades and there are well-defined dimensions which were adopted, whereas big data characteristics represented by eleven V’s were used to characterize big data. Literature review and ten cases in financial service organizations were invested to analyze the relationship between data quality and big data. Whereas the big data characteristics and data quality have been viewed as separated domain ours findings show that these domains are intertwined and closely related. Findings from this study suggest that variety is the most dominant big data characteristic relating with most data quality dimensions, such as accuracy, objectivity, believability, understandability, interpretability, consistent representation, accessibility, ease of operations, relevance, completeness, timeliness, and value-added. Not surprisingly, the most dominant data quality dimension is value-added which relates with variety, validity, visibility, and vast resources. The most mentioned pair of big data characteristic and data quality dimension is Velocity-Timeliness. Our findings suggest that term ‘big data’ is misleading as that mostly volume (‘big’) was not an issue and variety, validity and veracity were found to be more important.