H. Liu | TU Delft Repository

nD-PointCloud Data Management

Continuous levels, adaptive histograms, and diverse query geometries

Doctoral thesis (2022) - H. Liu

In the Geomatics domain, a point cloud refers to a data set which records the coordinates and other attributes of a huge number of points. Conceptually, each of these attributes can be regarded as a dimension, representing a specific type of information. Apart from routinely concerned spatio-temporal dimensions for coordinates, other dimensions such as intensity and classification are also widely used in spatial applications. In fact, more dimensions can be involved. For instance, a point in the hydraulic modelling grid also records the flow direction, speed, sediment concentration, and other related attributes. As these point cloud data can be directly collected, computed, stored and analyzed, this thesis proposes the term – nD-PointCloud, as a general spatial data representation to cover them. At present, drastically increasing production of nD-PointCloud data raises essential demand for smart and highly efficient data management and querying solutions. However, we lack effective tools. Prevalent software for nD-PointCloud processing, analyzing and rendering are built on file-based systems, requiring substantial development of data structures and algorithms. To make things worse, when other data types are involved, multiple formats, libraries and systems need enormous effort to be integrated. Aimed at generic support for diverse applications, DataBase Management Systems (DBMSs) on the other hand avoid these issues to a large extent. However, since they are initially developed to resolve 2D or 3D issues, they do not provide native support for nD data indexing and operations. Yet the 2D and 3D operators cannot be easily extended to nD. This thesis aims at developing a generic yet efficient solution for managing and querying nD-PointCloud data. The work is based on an existing solution called PlainSFC, which maps nD data into 1D space. PlainSFC is implemented in the DBMS, adopting space filling curve based clustering and B+-tree indexing strategies. Besides, PlainSFC applies an advanced querying mechanism which recursively refines hypercubic nD spaces to 1D ranges to approach the query geometry for primary filtering. This achieves high querying efficiency. However, the solution still has drawbacks, and this research focuses on resolving them by developing and using novel methods: • A continuous Level of Importance (cLoI) method for data organization to eliminate visual artifacts of density shocks in points' rendering, which is introduced by conventional tree structures such as Quadtree or Octree. The cLoI method computes an importance value for every point according to an ideal distribution generalized from the discrete distributions of those tree structures. This forms an additional cLoI dimension, and each point actually represents a level. By integrating the cLoI dimension into PlainSFC, smooth and efficient rendering is realized. • An nD-histogram approach to improve querying efficiency on non-uniformly distributed data. PlainSFC decomposes the nD space into sub-spaces recursively to approach the query geometry without considering point distribution. This is not optimal when the distribution of points is severely skewed. To improve this, an nD-histogram which records the number of points inside each nD sub-space is established as a representation of data distribution. The developed solution called HistSFC decomposes and refines the nD space more smartly, which improves the accuracy and efficiency of primary filtering. • A convex polytope querying function. Besides orthogonal window queries, the polytope query, which is the extension of the widely adopted polygonal query in 2D, also plays a critical role in many nD spatial applications. To address this type of query, an easy-to-use polytope formulation for querying is firstly proposed. Then, based on PlainSFC and HistSFC, efficient intersection algorithms are developed for convex polytope querying on nD point clouds. These algorithms are tested through experiments with up to 10D point data. Using this newly developed function, applications including perspective view selections and flood risk queries are resolved more efficiently, achieving sub-second performance. Additionally, other optimization techniques such as parallelization are developed and experimented with, which also bring performance gain. To verify the whole framework, several benchmark tests devised by considering real applications are conducted, and comparisons with different state-of-the-art solutions are performed. The result shows that the newly developed solution outperforms the others, overall. In certain cases, the solution can be applied without further optimizations. However, this will not be the end. Rapidly arising high tech such as cloud computing platforms can boost the solution further to incorporate more data and users. Potential nD-PointCloud based applications still need to be explored, prototyped and tested to serve the society in practice. ...

In the Geomatics domain, a point cloud refers to a data set which records the coordinates and other attributes of a huge number of points. Conceptually, each of these attributes can be regarded as a dimension, representing a specific type of information. Apart from routinely concerned spatio-temporal dimensions for coordinates, other dimensions such as intensity and classification are also widely used in spatial applications. In fact, more dimensions can be involved. For instance, a point in the hydraulic modelling grid also records the flow direction, speed, sediment concentration, and other related attributes. As these point cloud data can be directly collected, computed, stored and analyzed, this thesis proposes the term – nD-PointCloud, as a general spatial data representation to cover them. At present, drastically increasing production of nD-PointCloud data raises essential demand for smart and highly efficient data management and querying solutions. However, we lack effective tools. Prevalent software for nD-PointCloud processing, analyzing and rendering are built on file-based systems, requiring substantial development of data structures and algorithms. To make things worse, when other data types are involved, multiple formats, libraries and systems need enormous effort to be integrated. Aimed at generic support for diverse applications, DataBase Management Systems (DBMSs) on the other hand avoid these issues to a large extent. However, since they are initially developed to resolve 2D or 3D issues, they do not provide native support for nD data indexing and operations. Yet the 2D and 3D operators cannot be easily extended to nD. This thesis aims at developing a generic yet efficient solution for managing and querying nD-PointCloud data. The work is based on an existing solution called PlainSFC, which maps nD data into 1D space. PlainSFC is implemented in the DBMS, adopting space filling curve based clustering and B+-tree indexing strategies. Besides, PlainSFC applies an advanced querying mechanism which recursively refines hypercubic nD spaces to 1D ranges to approach the query geometry for primary filtering. This achieves high querying efficiency. However, the solution still has drawbacks, and this research focuses on resolving them by developing and using novel methods: • A continuous Level of Importance (cLoI) method for data organization to eliminate visual artifacts of density shocks in points' rendering, which is introduced by conventional tree structures such as Quadtree or Octree. The cLoI method computes an importance value for every point according to an ideal distribution generalized from the discrete distributions of those tree structures. This forms an additional cLoI dimension, and each point actually represents a level. By integrating the cLoI dimension into PlainSFC, smooth and efficient rendering is realized. • An nD-histogram approach to improve querying efficiency on non-uniformly distributed data. PlainSFC decomposes the nD space into sub-spaces recursively to approach the query geometry without considering point distribution. This is not optimal when the distribution of points is severely skewed. To improve this, an nD-histogram which records the number of points inside each nD sub-space is established as a representation of data distribution. The developed solution called HistSFC decomposes and refines the nD space more smartly, which improves the accuracy and efficiency of primary filtering. • A convex polytope querying function. Besides orthogonal window queries, the polytope query, which is the extension of the widely adopted polygonal query in 2D, also plays a critical role in many nD spatial applications. To address this type of query, an easy-to-use polytope formulation for querying is firstly proposed. Then, based on PlainSFC and HistSFC, efficient intersection algorithms are developed for convex polytope querying on nD point clouds. These algorithms are tested through experiments with up to 10D point data. Using this newly developed function, applications including perspective view selections and flood risk queries are resolved more efficiently, achieving sub-second performance. Additionally, other optimization techniques such as parallelization are developed and experimented with, which also bring performance gain. To verify the whole framework, several benchmark tests devised by considering real applications are conducted, and comparisons with different state-of-the-art solutions are performed. The result shows that the newly developed solution outperforms the others, overall. In certain cases, the solution can be applied without further optimizations. However, this will not be the end. Rapidly arising high tech such as cloud computing platforms can boost the solution further to incorporate more data and users. Potential nD-PointCloud based applications still need to be explored, prototyped and tested to serve the society in practice.

Point clouds and Hydroinformatics

Conference paper (2022) - Vitali Diaz, Haicheng Liu, Peter van Oosterom, Martijn Meijers, Edward Verbree, Fedor Baart, Maarten Pronk, Thijs van Lankveld

Point cloud is made up of a multitude of three-dimensional (3D) points with one or more attributes attached. Point cloud is the third data paradigm in addition to the well-established object (vector) and gridded (raster) representations, since point cloud data can be directly collected, computed, stored, and analyzed without converting to other types. Modern ways of data acquisition, including laser scanning from airborne, mobile, or static platforms, multi-beam echo-sounding, and dense image matching from photos, generate millions to trillions of 3D points with attached attributes. If the collection is carried out in different periods, one of the essential attributes is precisely time, allowing spatiotemporal analysis to be performed. Its use is widespread in some fields such as metrology and quality inspection, virtual reality, indoor/outdoor navigation, object detection, vegetation monitoring, building modeling, cultural heritage, and diverse visualization applications. There are some examples in fields related to hydroinformatics, mainly related to terrain modeling. Due to its nature of big data, over the past decades, a series of developments have been carried out in the different processing chains for the optimal use of point cloud. This research seeks to introduce the various point cloud developments from which the hydroinformatics community and research could benefit. A review of recent advances is made, mainly including the analysis and visualization of point cloud for dealing with water-related problems. Potential areas of application and development in hydroinformatics are identified. These include, for example, the topics of coastal monitoring, coastal erosion, shallow water assessment, ice sheet change analysis, sea-level rise assessment, monitoring of levels in water bodies, crop and vegetation monitoring, analysis of the effects of groundwater depletion, detail tracing of basins and channels, analysis of floods with detailed terrain models, and drought monitoring in crops and forests. The challenges to overcome and ongoing developments regarding point cloud application in hydroinformatics are also discussed. ...

Point cloud is made up of a multitude of three-dimensional (3D) points with one or more attributes attached. Point cloud is the third data paradigm in addition to the well-established object (vector) and gridded (raster) representations, since point cloud data can be directly collected, computed, stored, and analyzed without converting to other types. Modern ways of data acquisition, including laser scanning from airborne, mobile, or static platforms, multi-beam echo-sounding, and dense image matching from photos, generate millions to trillions of 3D points with attached attributes. If the collection is carried out in different periods, one of the essential attributes is precisely time, allowing spatiotemporal analysis to be performed. Its use is widespread in some fields such as metrology and quality inspection, virtual reality, indoor/outdoor navigation, object detection, vegetation monitoring, building modeling, cultural heritage, and diverse visualization applications. There are some examples in fields related to hydroinformatics, mainly related to terrain modeling. Due to its nature of big data, over the past decades, a series of developments have been carried out in the different processing chains for the optimal use of point cloud. This research seeks to introduce the various point cloud developments from which the hydroinformatics community and research could benefit. A review of recent advances is made, mainly including the analysis and visualization of point cloud for dealing with water-related problems. Potential areas of application and development in hydroinformatics are identified. These include, for example, the topics of coastal monitoring, coastal erosion, shallow water assessment, ice sheet change analysis, sea-level rise assessment, monitoring of levels in water bodies, crop and vegetation monitoring, analysis of the effects of groundwater depletion, detail tracing of basins and channels, analysis of floods with detailed terrain models, and drought monitoring in crops and forests. The challenges to overcome and ongoing developments regarding point cloud application in hydroinformatics are also discussed.

An efficient nd-point data structure for querying flood risk

Journal article (2021) - H. Liu, P. Van Oosterom, B. Mao, M. Meijers, R. Thompson

Governments use flood maps for city planning and disaster management to protect people and assets. Flood risk mapping projects carried out for these purposes generate a huge amount of modelling results. Previously, data submitted are highly condensed products such as typical flood inundation maps and tables for loss analysis. Original modelling results recording critical flood evolution processes are overlooked due to cumbersome management and analysis. This certainly has drawbacks: the ĝ€ static' maps impart few details about the flood; also, the data fails to address new requirements. This significantly confines the use of flood maps. Recent development of point cloud databases provides an opportunity to manage the whole set of modelling results. The databases can efficiently support all kinds of flood risk queries at finer scales. Using a case study from China, this paper demonstrates how a novel nD-PointCloud structure, HistSFC, improves flood risk querying. The result indicates that compared with conventional database solutions, HistSFC holds superior performance and better scalability. Besides, the specific optimizations made on HistSFC can facilitate the process further. All these indicate a promising solution for the next generation of flood maps. ...

Executing convex polytope queries on nD point clouds

Journal article (2021) - Haicheng Liu, Rodney Thompson, Peter van Oosterom, Martijn Meijers

Efficient spatial queries are frequently needed to extract useful information from massive nD point clouds. Most previous studies focus on developing solutions for orthogonal window queries, while rarely considering the polytope query. The latter query, which includes the widely adopted polygonal query in 2D, also plays a critical role in many nD spatial applications such as the perspective view selection. Aiming for an nD solution, this paper first formulates a convex nD-polytope for querying. Then, the paper integrates three approximate geometric algorithms – SWEEP, SPHERE, VERTEX, and a linear programming method CPLEX, developing a solution based on an Index-Organized Table (IOT) approach. IOT is applied with space filling curve based clustering and advanced querying mechanism which recursively refines hypercubic nD spaces to approach the query geometry for primary filtering. Results from experiments based on both synthetic and real data have confirmed the superior performance of SWEEP. However, the algorithm may lag behind CPLEX due to pessimistic intersection computation in high dimensional spaces. In a real application, by properly transforming a perspective view selection into a polytope query, the solution achieves a sub-second querying performance using SWEEP. In another flood risk query, SWEEP also leads the others. In general, the robust and efficient solution can be immediately used to address different polytope queries, including those abstract ones whose constraints on combinations of different dimensions are formed into a polytope model. Besides, the knowledge of high-dimensional computations acquired also provides significant guidance for handling more nD GIS issues. ...

Fundamentals, implementations and experimental benchmarks of nD-polytype queries on point cloud data sets

Report (2021) - Haicheng Liu, Rodney Thompson, Peter van Oosterom, Martijn Meijers

As an extension to 2D polygonal queries, the nD-polytope queries on point clouds also play a crucial rolein nD GIS applications such as the perspective view selection. This report rst denes the nD-polytopemathematically, and then develops an ecient nD-polytope querying solution by extending an index-organized table (IOT) approach. The solution integrates four novel intersection algorithms includingCPLEX, SWEEP, SPHERE and VERTEX, each of which can be used to realize the primary lteringfor polytope querying. The performance of these algorithms is then measured and compared using anrepresentative nD-simplex and an nD-prism query region, respectively. It turns out that SWEEP performsthe best over all, but it may degrade signicantly as dimensionality goes up. On the other hand, thelinear programming algorithm CPLEX although takes more time on intersection computation, performsmore stable. Besides, the experiments also reveal that the properties of a same geometry can changesignicantly across dierent dimensionality, and thus optimal strategies developed in 2D/3D may not beapplicable in high dimensional spaces. ...

HistSFC: Optimization for nD massive spatial points querying

Journal article (2020) - Haicheng Liu, Peter van Oosterom, Martijn Meijers, Xuefeng Guan, Edward Verbree, Mike Horhammer

Space Filling Curve (SFC) mapping-based clustering and indexing works effectively for point clouds management and querying. It maps both points and queries into a one-dimensional SFC space so that B+- tree could be utilized. Based on the basic structure, this paper develops a generic HistSFC approach which utilizes a histogram tree recording point distribution for efficient querying. The goal is to resolve the issue of skewed data querying. Besides, the paper proposes an agile method to compute a continuous Level of Detail (cLoD), and integrates it into HistSFC to support smooth rendering of massive points. Results indicate that for range queries, HistSFC decreases the False Positive Rate (FPR) of selection by maximally 80%, compared to previous approaches. It also performs significantly faster than the state-of- the-art Oracle SDO_PC solution. With improved performance on visualization and k Nearest Neighbour (kNN) search, HistSFC can therefore be used as a new standard solution. ...

An optimized SFC approach for nD window querying on point clouds

Journal article (2020) - H. Liu, P. Van Oosterom, M. Meijers, E. Verbree

Dramatically increasing collection of point clouds raises an essential demand for highly efficient data management. It can also facilitate modern applications such as robotics and virtual reality. Extensive studies have been performed on point data management and querying, but most of them concentrate on low dimensional spaces. High dimensional data management solutions from computer science have not considered the special features of spatial data; so, they may not be optimal. A Space Filling Curve (SFC) based approach, PlainSFC which is capable of nD point querying has been proposed and tested in low dimensional spaces. However, its efficiency in nD space is still unknown. Besides that, PlainSFC performs poorly on skewed data querying. This paper develops HistSFC which utilizes point distribution information to improve the querying efficiency on skewed data. Then, the paper presents statistical analysis of how PlainSFC and HistSFC perform when dimensionality increases. By experimenting on simulated nD data and real data, we confirmed the patterns deduced: for inhomogeneous data querying, the false positive rate (FPR) of PlainSFC increases drastically as dimensionality goes up. HistSFC alleviates such deterioration to a large extent. Despite performance degeneration in ultra high dimensional spaces, HistSFC can be applied with high efficiency for most spatial applications. The generic theoretical framework developed also allows us to study related topics such as visualization and data transmission in the future. ...

Visualization of point cloud models in mobile augmented reality using continuous level of detail method

Journal article (2020) - L. Zhang, P. Van Oosterom, H. Liu

Point clouds have become one of the most popular sources of data in geospatial fields due to their availability and flexibility. However, because of the large amount of data and the limited resources of mobile devices, the use of point clouds in mobile Augmented Reality applications is still quite limited. Many current mobile AR applications of point clouds lack fluent interactions with users. In our paper, a cLoD (continuous level-of-detail) method is introduced to filter the number of points to be rendered considerably, together with an adaptive point size rendering strategy, thus improve the rendering performance and remove visual artifacts of mobile AR point cloud applications. Our method uses a cLoD model that has an ideal distribution over LoDs, with which can remove unnecessary points without sudden changes in density as present in the commonly used discrete level-of-detail approaches. Besides, camera position, orientation and distance from the camera to point cloud model is taken into consideration as well. With our method, good interactive visualization of point clouds can be realized in the mobile AR environment, with both nice visual quality and proper resource consumption. ...

The design and application of histogram trees for querying massive LiDAR point clouds

Conference paper (2019) - Haicheng Liu, Xuefeng Guan, Martijn Meijers, Peter van Oosterom

Towards a relational database Space Filling Curve (SFC) interface specification for managing nD-PointClouds

Conference paper (2019) - Peter van Oosterom, Martijn Meijers, Edward Verbree, Haicheng Liu, Theo Tijssen

In this paper we propose to treat point clouds as a first-class representation (similar to vector or raster representations), with the nD-PointCloud as the solution for this, offering deep integration of space, time and scale. For efficiency rea-sons spatial indexing and clustering of these large point clouds is extremely important and this is obtained based on a Space Filling Curved (SFC). In order to get beyond the current state of the art of storing/ managing point clouds in files, a DBMS solution is presented (with all benefits: integration with other data types, scalability, multi-user, transaction support, etc.). Finally, a DBMS SFC interface specification for point clouds is proposed. ...

Towards 10^15-level point clouds management - a nD PointCloud structure

Conference paper (2018) - Haicheng Liu, Peter van Oosterom, Martijn Meijers, Edward Verbree

Drastically increasing production of point clouds as well as modern application fields like robotics and virtual reality raises essential demand for smart and highly efficient data management. Effective tools for the managing and direct use of large point clouds are missing. Current state-of-the-art database management systems (DBMS) present critical problems such as inefficient loading/indexing, lack of support of continuous Level of Detail (cLoD) and limited functionalities. Previous research has suggested and demonstrated the importance of converting property dimensions such as time and classification to organizing dimensions for efficient data management at the storage level. However, a thorough validation and theory are still missing. Besides, how new computational platforms such as the cloud technology may support data management also needs further exploration. These problems motivate the PhD research with the focus on a new data structure (nD PointCloud) which is dedicated for smartly and flexibly organizing information of large point clouds for different use cases. ...

Management of large indoor point clouds

An initial exploration

Journal article (2018) - H. Liu, P. Van Oosterom, M. Meijers, E. Verbree

Indoor navigation and visualization become increasingly important nowadays. Meanwhile, the proliferation of new sensors as well as the advancement of data processing provide massive point clouds to model the indoor environment in high accuracy. However, current state-of-the-art solutions fail to manage such large datasets efficiently. File based solutions often require substantial development work while database solutions are still faced with issues such as inefficient data loading and indexing. In this research, through a case study which aims to solve the problem of intermittent rendering of massive points in the context of indoor navigation, we devised and implemented an algorithm to compute the continuous Level of Detail (cLoD) where geometric and classification information are considered. Benchmarks are developed and different approaches in Oracle are tested to learn the pros and cons. Surprisingly, the flat table approach could be very efficient compared with other schemes. The crucial point lies in how to address priority of different dimensions including cLoD, classification and spatial dimensions, and avoid unnecessary scanning of the table. Writing results either to the memory or the disk constitutes major part of the time cost when large output is concerned. Conventional solutions based on spatial data objects present poor performance due to cumbersome indexing structure, inaccurate selection and additional decoding process. Besides, approximate selection in the unit of physical object is proposed and the performance is satisfactory when large amount of data is requested. The knowledge acquired could prompt the development of a novel data management of high dimensional point clouds where the classification information is involved. ...

An Artificial Stream Network and Its Application on Exploring the Effect of DEM Resolution on Hydrological Parameters

Conference paper (2018) - Haicheng Liu

Digital elevation models (DEM) are widely used in various distributed hydrological models. The stream network can be extracted from it so that runoff routing can be calculated. With the advent of remote sensing and computing technologies, the computation based on DEM with high resolution becomes possible. However, there still exist regions with poor resolution, particularly in developing countries. Previous work only conducted comparisons between results by implementing hydrological models for specific basins in the real world and resolutions were only assigned to several fixed values, such as 30 and 90 m. So, the results derived were thus not in a general sense. To roughly understand how DEM resolution influences the hydrologic response, in this paper, first an artificial stream network of which the principle is originated from fractal theory is constructed. Then by implementing calculation on such artificial networks in an iterative way and performing aggregation, the influence of DEM resolution on several hydrological parameters, namely, the number of basins, drainage density of all basins, total stream length, average stream slope and average topographic index used to assess the spatial distribution of soil saturation of the largest basin can thus be acquired. It is found that DEMs of low resolution would reduce drainage density, total stream length and average stream slope, but would increase topographic index. But the effect is insignificant regarding the number of basins. In the end, the results of the simulation as well as the quality of the fractal terrain are validated by referencing field data. ...

Managing large multidimensional hydrologic datasets

A case study comparing NetCDF and SciDB

Journal article (2018) - Haicheng Liu, Peter Van Oosterom, Theo Tijssen, Tom Commandeur, Wen Wang

Management of large hydrologic datasets including storage, structuring, clustering, indexing, and query is one of the crucial challenges in the era of big data. This research originates from a specific problem: time series extraction at specific locations takes a long time when a large multidimensional (MD) dataset is stored in the NetCDF classic or the 64-bit offset format. The essence of this issue lies in the contiguous storage structure adopted by NetCDF. In this research, NetCDF file-based solutions and a MD array database management system applying a chunked storage structure are benchmarked to determine the best solution for storing and querying large MD hydrologic datasets. Expert consultancy was conducted to establish benchmark sets, with the HydroNET-4 system being utilized to provide the benchmark environment. In the final benchmark tests, the effect of data storage configurations, elaborating chunk size, dimension order (spatio-temporal clustering) and compression on the query performance, is explored. Results indicate that for big hydrologic MD data management, the properly chunked NetCDF-4 solution without compression is, in general, more efficient than the SciDB DBMS. However, benefits of a DBMS should not be neglected, for example, the integration with other data types, smart caching strategies, transaction support, scalability, and out-of-The-box support for parallelization. ...