Managing large multidimensional hydrologic datasets

A case study comparing NetCDF and SciDB

Journal Article (2018)
Author(s)

H. Liu (TU Delft - OLD Department of GIS Technology)

P.J.M. Oosterom (TU Delft - OLD Department of GIS Technology)

Theo Tijssen (TU Delft - OLD Department of GIS Technology)

T.J.F. Commandeur (TU Delft - Urban Data Science)

Wen Wang (Hohai University)

Research Group
OLD Department of GIS Technology
DOI related publication
https://doi.org/10.2166/hydro.2018.136
More Info
expand_more
Publication Year
2018
Language
English
Research Group
OLD Department of GIS Technology
Issue number
5
Volume number
20
Pages (from-to)
1058-1070

Abstract

Management of large hydrologic datasets including storage, structuring, clustering, indexing, and query is one of the crucial challenges in the era of big data. This research originates from a specific problem: time series extraction at specific locations takes a long time when a large multidimensional (MD) dataset is stored in the NetCDF classic or the 64-bit offset format. The essence of this issue lies in the contiguous storage structure adopted by NetCDF. In this research, NetCDF file-based solutions and a MD array database management system applying a chunked storage structure are benchmarked to determine the best solution for storing and querying large MD hydrologic datasets. Expert consultancy was conducted to establish benchmark sets, with the HydroNET-4 system being utilized to provide the benchmark environment. In the final benchmark tests, the effect of data storage configurations, elaborating chunk size, dimension order (spatio-temporal clustering) and compression on the query performance, is explored. Results indicate that for big hydrologic MD data management, the properly chunked NetCDF-4 solution without compression is, in general, more efficient than the SciDB DBMS. However, benefits of a DBMS should not be neglected, for example, the integration with other data types, smart caching strategies, transaction support, scalability, and out-of-The-box support for parallelization.

No files available

Metadata only record. There are no files for this record.