As big data is expected to contribute largely to economic growth, scalability of solutions becomes apparent for deployment by organisations. It requires automatic collection and processing of large, heterogeneous data sets of a variety of resources, dealing with various aspects like improving quality, fusion and linking data sets to provide homogeneous data to data analytics algorithms that are able to detect patterns or anomalies. This paper introduces two components in a data sharing architecture for big data. It presents the state of the art as basis for a research agenda. Aspects like ransport, storage, and processing are important for big data, but not addressed by this paper. Copyright © 2015 by the paper's authors. Copying permitted only for private and academic purposes.