The volume and density of geospatial data is constantly increasing as newer acquisition techniques are developed to create higher-resolution datasets and more countries are creating country-wide datasets such as point clouds. However, methods to deal with this increase in the amo
...
The volume and density of geospatial data is constantly increasing as newer acquisition techniques are developed to create higher-resolution datasets and more countries are creating country-wide datasets such as point clouds. However, methods to deal with this increase in the amount of data are lagging behind and the developments in computer hardware are no longer moving fast enough to compensate. Especially the ability to load and process datasets into the main memory of a computer is becoming more difficult. Therefore, to work with these massive datasets more modern techniques need to be used. In this thesis on one such method for dealing with these massive datasets is explored; streaming simplification of geometries.
Streaming of geometries relies on processing small portions of a geometric dataset at a time, only keeping as much information in memory as is necessary, instead of the whole dataset. This method ensures that memory bottlenecks are prevented and ever-increasingly large datasets can be processed. Processing done on a LIDAR dataset can range from creating rasterized surface or terrain models, to creating a triangulation. Triangulations are an imporant starting point for other analysis techniques such as watershed calculations, line of sight, or noise propagation. However, large datasets of high density lead to large triangulations which are still too complex for these techniques to work with.
To create more manageable triangulations it is necessary to simplify them, which means finding a subset of the vertices that best approximates the original surface. This can be done randomly, or feature-aware by taking into account the shape of the dataset that is being simplified and ensuring data is kept where it is relevant. The AHN3 dataset is so large that simplification cannot be done using traditional techniques. Therefore, to achieve simplification of the AHN3 dataset, this thesis focuses on the modification of an existing methodology for creating streaming triangulations by incorporating a simplification step.
The methodology implemented in this thesis consists of determining where a simplification algorithm can be introduced within the existing streaming pipeline, as well as determining which existing simplification techniques are applicable. After defining possible locations for a simplification module within the streaming pipeline, various methods are created and tested on different sizes of dataset. These methods are: random simplification, decimation, refinement, and a novel medial axis transform based approach. For the refinement method a number of different implementations are investigated.
After the initial results are analyzed, the decimation and medial axis transform algorithms are rejected due to extremely long processing times or a lack of accurate results, respectively. The refinement implementations are able to produce accurate simplified triangulations in a reasonable amount of time, where the best performing method hardly introduces a slowdown in processing at all. Using the fastest algorithm (FCFS) 4 billion points are processed in less than 11 hours, against 13 hours without simplification. Compared to other methods the accuracy of the methodology is similar for all of the approaches. However, the number of points that can be processed per second is less when compared to existing research. The best performing algorithm is a novel method defined as First-Come-First-Serve (FCFS), as it has the best ratio between vertex error and computation time.
The results of this research show that creation and simplification of a massive Delaunay TIN is possible and produces acceptable results, achieving RMSE's below 0.2m consistently. Despite simplification, the size of the resulting datasets is still often too large to fit within the main memory of a computer and will still require streaming processing to perform further calculations on larger areas. This suggests that with the current ever-growing datasets it is relevant to explore how all different types of calculations can be performed in a streaming pipeline, possibly allowing these to be chained one after another.
All methods created for this thesis are found here: \url{https://github.com/mdjong1/simpliPy}