| 1 |
|
An OpenCL-based Solution for Portable Bodyscan SAR Processing on Multicore Platforms
Multicore systems have become an indispensable part of our everyday life. They represent a viable alternative for increasing processor performance without hitting the memory and power walls. However, the shift from traditional programming to multicore programming has a critical influence in three dimensions: the applications, the software tools, and the hardware platforms.
In this thesis, we focus on the application dimension, and we investigate the performance potential of a 3D bodyscan SAR processing algorithm running on several multi-core processors. To allow for the processor variety, the solutions are based on OpenCL, a language that enables inter-platform portability for a large set of multicores. To allow for performance, we choose to mainly evaluate and optimize for NVIDIA GPUs, but also for Intel GPPs and ATI GPU.
Our solutions design and implementation follow a step-by-step strategy.
First, we analyse the application to determine its functionality, data characteristics, and performance requirements. Next, we design and implement a sequential reference solution, and we evaluate its performance. We use this particular solution to design a set of possible parallel solutions, which are also evaluated in terms of performance and platform utilization, using a generic prototyping platform. Further, we select the most promising parallel solution for a first portable OpenCL implementation. All platform-agnostic basic optimizations, are applied now, towards a new application, with increased performance and no decrease in portability. After these basic optimizations, we show how several platform-specific optimizations can be applied. Basically, in this step, we trade portability for performance. For this work, we specifically target NVIDIA GPUs, and we show the potential performance impact of data and memory-related optimizations on the case-study application.
Finally, we gather all these steps and findings into a generic, empirical strategy that enables programmers to reason about OpenCL applications in a systematic manner, letting them decide the level of the trade-off between the application portability and performance.
Our main conclusions are twofold. First, we conclude that OpenCL is a promising standard (and language) for enforcing the implementation of portable multicore applications. Second, we conclude that the bodyscan application is a good fit for running on multicore platforms, and we recommend the GPUs as the target platform for such an application.
For future work, we propose to advance in two directions. First, as generic research, we propose to focus on the strategy validation and refining, as well as on a more abstract way to derive the parallel OpenCL versions. Second, on the application/technical side, we plan to focus on finding and implementing hardware-dependent optimizations of the OpenCL solution on various hardware platforms.
|
[PDF]
[Abstract]
|
| 2 |
|
A Runtime Profiler for Polymorphic Computing Platforms
Reconfigurable systems map the computational intensive parts of the code in hardware while less computational intensive parts are executed on general purpose processor(s), thus achieving faster execution than systems with only general purpose processor(s). If multitasking operating systems are used on such systems, a runtime system is required to perform resource management of the reconfigurable hardware, as multiple threads might be competing for the reconfigurable hardware at the same time. Such a runtime system needs to know the current configuration and load of the system to properly allocate the reconfigurable hardware. A runtime profiler is an important tool in this regard, as it can assist the runtime system by providing vital statistics about programs running on the system. Since the runtime profiler has to run in parallel with the actual code, it must be very lightweight and therefore very efficient data structures must be used to store the collected statistics. In this thesis, we present the design and implementation of such a runtime profiler. Empirical evaluation has shown that for most applications, our profiler has an overhead of less than 1.5\% of the total execution time. Moreover, the information generated by the profiler is almost as accurate as that of popular design time profiler GProf.
|
[PDF]
[Abstract]
|
| 3 |
|
Performance Analysis and Cost-Performance Tradeoffs of a High Performance Partially Buffered Crossbar Switch
Why is it hard to build high-speed routers? Because high-speed routers are like marriages; they are unpredictable, provide no guar- antees, and become vulnerable in adversity. High-speed networks including the Internet backbone suffer from a well-known problem; packets arrive on high-speed routers much faster than commodity memory can support. On a 10 Gb/s link, packets can arrive ev- ery 32 ns, while memory can only be accessed once every 50 ns. If we are unable to bridge this performance gap, then (1) We can- not create Internet routers that reliably support links >10 Gb/s. (2) Routers cannot support the needs of real-time applications such as voice, video conferencing, multimedia, gaming, etc., that require guaranteed performance. Network operators expect certain perfor- mance characteristics; for example, if the arrival rate is less than the router’s advertised capacity, they can reasonably assume the router can handle the traffic. Somewhat surprisingly, no commercial router can do this today!
The emphasis is put on the switching architecture of a router. This thesis lays down a theoretical foundation for the Partially Buffered Crossbar switches and is about managing and resolving the prefer- ences and contention for memory between packets from participating
inputs and outputs in a switch. By combining the theory of fluid models, Lyapunov functions and the pigeonhole principle, the requirements for devising practical algorithms which can provide guarantees and emulate the performance of the ideal Output Queued switch and approximate the optimal Maximum Weight Matching scheduler are drawn up. The solutions described in this thesis, relax the memory access and band- width constraint, in fact, there is no better switching architecture described till now in terms of memory requirements and practicality regarding its achieved performance. Moreover, this thesis derives the first study of scheduling unicast and multicast traffic simultaneously in a Partially Buffered Crossbar switch.
|
[PDF]
[Abstract]
|
| 4 |
|
Design and Implementation of Real-Time High-Defnition Stereo Matching SoC on FPGA
Stereo matching has been widely used in many fields, such as viewpoint interpolation, feature detection system and free-view TV. However, the long processing time of stereo matching algorithms has been the major bottleneck that limits their real-time applications. During the past decades, many implementation platforms and corresponding algorithm adaptations are proposed to solve the processing time problem. Although notable real-time performances have been achieved, these works rarely satisfy both real-time processing and high stereo matching quality requirements.
In this thesis, we propose a stereo matching algorithm suitable for hardware implementation based on the VariableCross and the MiniCensus algorithm. Furthermore, we provide parallel computing hardware design and implementation of the proposed algorithm. The developed stereo matching hardware modules are instantiated in an SoC environment and implemented on a single EP3SL150 FPGA chip. The experimental results suggest that our work has achieved high speed real-time processing with programmable video resolutions, while preserving high stereo matching accuracy. The online benchmarks also prove that this work delivers leading matching accuracy among declared real-time implementations, with only 8.2% averaged benchmark error rate. We have also achieved 60 frames per second for 1024 × 768 high-definition stereo matching, which is the fastest high-definition stereo matching to the best of our knowledge.
|
[PDF]
[Abstract]
|
| 5 |
|
Stereoscopic Remote Vision System: A Delay Minimizing Approach for Telepresence
This thesis presents a remote vision system designed to be used for telepresence applications. Telepresence is essentially being able to assert one's presence at a remote location. Being able to see and talk to people in another corner of the world is one method of telepresence we have come to know as video conferencing. As technologies evolve, the sense of `presence' allowed by them also grows. Present-day telepresence systems limit our most prominent sense, sight, to monoscopic video where we only see one view of the remote location. In order to take telepresence to the next level by being able to manipulate remote objects, it is important that our sight be stimulated to perceive depth to the fullest. Therefore, this thesis focuses on creating a \textsl{stereoscopic} remote vision system designed for teleoperation. Delay plays a major part in the ease of teleoperation. While most teleoperation systems focus of alleviating the effects of delay by means of control methodologies and environment modeling, this system attempts to directly minimize delay and other factors of human discomfort. This was done by analyzing the several technologies involved in capture, compression, transport and display of 3D video along with an exploration on the factors of human comfort and performance for teleoperation. The most appropriate technologies were then selected while keeping these factors in mind. The stereoscopic remote vision system was then designed and implemented while keeping a focus to minimize delay. The resulting system was then used as a testbench to further explore the same factors.
|
[PDF]
[Abstract]
|
| 6 |
|
Implementation of Nexus: Dynamic Hardware Management Support for Multicore Platforms
Current trends in computer architecture focus on multicore platforms. The target of these new platforms is to scale the performance of the system with the number of cores. However, the performance of current archictectures is limited due to thread-level parallelism overhead and programmability. StarSS is a task-based programming model that eases the programmability of multicores and tries to exploit functional parallelism within applications. However, the performance of StarSS does not scale efficiently for fine-grained tasks, as for such tasks the task management overhead becomes significant in comparison to the execution of the tasks. Nexus is a dynamic hardware support system that aims to alleviate the current overhead of StarSS, by offloading the dependency resolution process and the synchronization with the cores to hardware. In this work, we implement Nexus by defining and connecting the new hardware in a Cell archictecture simulator. The scalability, performance, and throughput of the implementation are evaluated for different task sizes and number of cores, using several dependency patterns. Furthermore, different configuration parameters are evaluated, such as the dimension of the new hardware inserted in the existing architecture.
Results show a large improvement of the scalability offered by Nexus in comparison with StarSS, especially for fine-grained tasks. Nexus succeeds at alleviating the overhead of StarSS by accelerating the dependency resolution process and the synchronization with the worker cores. Furthermore, the evaluation of the Nexus system dimensions has shown that its scalability decreases slightly with its area.
|
[PDF]
[Abstract]
|
| 7 |
|
Software Infrastructure for Communications in Distributed Robotics Systems
Communication libraries that are created for Multi Agent Systems and Distributed Robotics Systems are generally application specific and do not address many systems with different capabilities. The purpose of this project is to design and implement a software communication library for Distributed Robotics Systems not only to meet the needs of Distributed Robotics Group, but also to address more general Multi Agent Systems. The Distributed Robotics Library is modeled by the standard 7 Layer OSI Reference Model. The physical transmission and medium access are standardized by WLAN standards. For networking IPv4, and for the transport control TCP and UDP protocols are used. For identification of the robots in the network, automatic advertisement broadcasting technique is used. Connection request/accept technique is used to start server/client communication between the robots.
To test the Distributed Robotics Library, 5 Asus Eee PC's are used to represent the robots. For testing several functions, an application is implemented to address the clock synchronization problem. During the tests, a message relaying function is implemented and added as an extension to the library. The Distributed Robotics Library is also tested in Unix and its portability is verified.
|
[PDF]
[Abstract]
|
| 8 |
|
Biological Sequence Alignment Using Graphics Processing Units
Alignment algorithms are used to find similarity between biological sequences, such as DNA and proteins. By aligning a sequence with a database, similar sequences can be found. These can be used to identify the source of a query sequence, to find commonalities between organisms, or to infer an ancestral relation. Various methods of performing biological sequence alignment exist, including dynamic programming and heuristic methods. Dynamic programming methods are guaranteed to find all optimal alignments, but are relatively slow; heuristic methods are faster but less precise.
This thesis investigates the acceleration of one such optimal algorithm, the Smith-Waterman local sequence alignment algorithm, by using graphics processing units (GPUs). A fully functioning GPU-based protein database search tool was designed, implemented and optimized. The optimizations mostly concern the elimination of memory bottlenecks and the conversion of the database to a format well suited for GPU use. The final implementation offers the same features its CPU-based counterparts do, such as user configurable scoring and substitution matrix settings, and includes a web interface for convenient and remote usage.
The performance of the GPU accelerated implementation was evaluated and compared to other solutions. It was found to attain a performance of more than 21 GCUPS (giga cell-updates of the Smith-Waterman score matrix per second) when searching the October 2010 release of Swiss-Prot on an NVIDIA Geforce GTX 275 GPU. With this performance, it is the fastest known GPU implementation on comparable hardware. It is also faster than the BLAST heuristic. However, the cost of purchasing a GPU, its power consumption, and the relative difficulty of maintaining a GPU software product are disadvantages of GPU acceleration.
|
[PDF]
[Abstract]
|
| 9 |
|
Radio frequency detection and mitigation for LOFAR telescope system
Radio frequency interference detection and mitigation for radio astronomy are bottleneck problems. Telescopes are becoming more sensitive therefore thresholds for detecting unwanted signals are becoming lower and telescopes have to deal with a big number of interference sources. When an unwanted signal is detected for radio astronomers it is quite important to know the probability of false alarm Pfa with which signal was detected and to set thresholds with desired Pfa level. LOFAR (LOw Frequency ARray) of radiotelescopes is located in the Netherlands and in other European countries. In the case of LOFAR telescope we look at a detector based on eigenvalue test of covariance matrices. The simple threshold for this test was derived in the work of Edelman. Currently there is no suitable solution that can allow to set this threshold with desired Pfa level. The aim of this thesis is to derive the empirical solution in order to detect a signal with a desired Pfa and to test it at a real LOFAR dataset.
|
[PDF]
[Abstract]
|
| 10 |
|
Interrupt support on the ρ-VEX processor
In this thesis, we present a design of interrupt system upon an extensible and reconfigurable VLIW softcore processor: r-VEX. This interrupt system is designed and implemented in four mechanisms to match different application requirements in terms of the hardware consumption and performance issues (interrupt latency). On the other hand, due to the fact that the VEX compiler is not an open-source compiler, extra requirements to the assembler are also considered to make our work feasible. Our interrupt system itself can also be parameterized to fit different applications. These parameters include the number of interrupt vectors, interrupt priority of each vector and Interrupt Service Routines (ISRs) location address in the instruction memory. The testing results show that each version of our interrupt system takes reasonable amount of hardware usage. We implemented our interrupt system on a virtex-6 FPGA. Besides, the interrupt latency can be reduced to only 2 clock cycles which is even better than some RISC-based softcore processors like Microblaze. This project creates a prototype of interrupt system that could work on VLIW softcore processor which extends the functionality and capability of the processor such as running operating systems and establishing a multi-core system.
|
[PDF]
[Abstract]
|
| 11 |
|
Genetic sequence alignment on a supercomputing platform
Genetic sequence alignment is an important tool for researchers. It lets them see the differences and similarities between two genetic sequences. This is used in several fields, like homology research, auto immune disease research and protein shape estimation. There are various algorithms that can perform this task and several hardware platforms suitable to deliver the necessary computation power. Given the large volume of the datasets used, throughput is nowadays the major bottleneck in sequence alignment. In this thesis we discuss some of the existing solutions for high throughput genetic sequence alignment and present a new one.
Our solution implements the well known Smith-Waterman optimal local alignment algorithm on the HC-1 hybrid supercomputer from Convey Computer. This platform features four FPGAs which can be used to accelerate the problem in question. The FPGAs, and the CPU that controls them, live in the same virtual memory space and share one large memory. We developed a hardware description for the FPGAs and a software program for the CPU. Some focus points were: a sustainable peak performance, being able to align sequences of any length, FPGA area efficient computations and the cancellation of unnecessary workload.
The result is a Smith-Waterman FPGA core that can run at 100\% utilization for many alignments long. They are packed per six on a FPGA running on 150 MHz, which results in a full system performance of 460 GCUPS (billion elementary operations per second). Our elementary processing element can deliver double the work per clock cycle than a naive implementation, resulting in a better throughput per area ratio. At a system level a notable amount of workload is cancelled. It is the most flexible implementation we are aware of . We re-evaluate the use of FPGAs for accelerating Smith-Waterman and conclude that they will continue to be a good choice per dollar and per watt, as long as we narrow the problem space.
|
[PDF]
[Abstract]
|
| 12 |
|
Scheduling in Partially Buffered Crossbar Switches
Intensive studies have been conducted to identify the most suitable architecture for high-performance packet switches. These architectures can be classified by queuing schemes, scheduling algorithms and switching fabric structures. The crossbar based switching fabric has been widely agreed to be the most suitable one, for its low cost, scalability and native multicast support. Large numbers of commercial implementations and literature studies have been conducted on the unbuffered crossbar switching architecture. Due to the requirement of the centralized scheduler, scheduling algorithms in the unbuffered crossbar have generally high complexities. This leads to time-consuming scheduling processes that prevent the unbuffered architecture from scaling up with the modern optical link operating at the Gb/S range. The buffered crossbar architecture has been proposed to overcome the scheduling complexity bottleneck faced by the unbuffered crossbar. The introduction of cross point buffers decouples the centralized scheduling process and lowers the scheduling complexity. However, the drawback of the buffered crossbar lies in the fact that it requires $N^2$ expensive on-chip memories, $N$ being the size of the switch, limiting the scalability of the buffered crossbar architecture. To provide the scheduling simplicity brought by the buffered crossbar while having a cost close to the unbuffered one, the partially buffered crossbar architecture has been proposed. With the combination of advantages of the previous two architectures, the Partially Buffered Crossbar (PBC) is deemed as one of the competitive candidates for next-generation switching architectures. However, the previously proposed algorithms did not fully exploit its potential. In this thesis, we: i) propose a unicast scheduling algorithm that further pushes the performance of the PBC switch under various non-uniform traffic settings, while using as few as 2 internal buffers per output. ii) study the multicast traffic support by the partially buffered crossbar switch and come up with an effective multicast scheduling algorithm.
|
[PDF]
[Abstract]
|
| 13 |
|
Intelligibility Based Automatic Volume Control for Public Address Systems
To convey messages to the public, public address systems (PA) are installed in buildings and at venues. These messages generally contain important information for the listener. This information has to come across well, i.e. the message should be intelligible. Because the environment, and mainly the background noise, can change over time, it is important for a public address system to adapt accordingly, so that the intelligibility of the messages is maintained. To maintain the intelligibility automatic volume control algorithms are used. In current solutions these algorithms adapt the volume to maintain the signal to noise ratio at a constant level. Such approaches require acquiring information about the noise from a sensing microphone. The difficulty in this is that the sensing microphone not only captures the noise, but also the signal coming from the PA itself, including its echoes and reverberations.
To avoid the signal separation problem, the proposed solution directly analyses the intelligibility of the message using the signal from the sensing microphone. For this an objective intelligibility method was used, that analyses correlations between the original clean message and the distorted message, from the microphone. Using the found intelligibility, the volume is controlled to maintain intelligibility. However, because maximum intelligibility occurs at the maximum volume of the PA system, before the signal starts deforming, maintaining intelligibility alone is not enough. Loud PA systems are perceived to be annoying especially if the background noise is low. That is why the proposed solution limits the loudness of the PA system in combination with maintaining the intelligibility.
|
[PDF]
[Abstract]
|
| 14 |
|
Optimization of Texture Feature Extraction Algorithm
Texture, the pattern of information or arrangement of the structure found in an image, is an important feature of many image types.In a general sense, texture refers to surface characteristics and appearance of an object given by the size, shape, density, arrangement, proportion of its elementary parts. Due to the signification of texture information, texture feature extraction is a key function in various image processing applications, remote sensing and content-based image retrieval. Texture features can be extracted in several methods, using statistical, structural, model-based and transform information, in which the most common way is using the Gray Level Co-occurrence Matrix (GLCM). GLCM contains the second-order statistical information of spatial relationship of pixels of an image. From GLCM, many useful textural properties can be calculated to expose details about the image content. However, the calculation of GLCM is very computationally intensive and time consuming. In this thesis, the optimizations in the calculation of GLCM and texture features are considered, different approaches to the structure of GLCM are compared. We also proposed parallel computing of GLCM and texture features using Cell Broadband Engine Architecture (Cell Processor). Experimental results show that our parallel approach reduces impressively the execution time for the GLCM texture feature extraction algorithm.
|
[PDF]
[Abstract]
|
| 15 |
|
A solution to misaligned data access in a vectorizing compiler framework
Vectorizing code for short vector architectures as employed by today’s multimedia extensions comes with a number of issues. The responsibilities of these issues are moved to the compiler in order to keep hardware simple. One of those issues is memory-alignment, which requires the compiler to guarantee loading and storing vectors at aligned addresses.
Previous work that covered this issue proposed a mechanism to reorder vectors at runtime to ensure proper alignments, while other work has focussed on finding a minimal number of reorderings. We combined these subjects into an in-depth research and implemented the optimization for the retar- getable CoSy(R) compiler framework. Instead of solely focussing on the minimal number of reorder- ings, we also considered dynamic (runtime) properties which may enable latency-hiding of reordering operations. Furthermore, we performed a comparison of the presented reordering-techniques and researched the impact of other compiler optimizations on the proposed transformation. Finally, we placed our results into perspective with unaligned load/store operations supplied by our target architecture.
With our implementation, we were able to vectorize a number of applications for SSE and SSE2 vector extensions where alignment-issues were involved. For randomly generated loops we were able to achieve between 50% and 80% of the speedup obtained by unaligned memory instructions. (Our targeted architecture is less strict on memory alignment and supplies instructions that can handle misalignments by hardware). As for the benchmarks, we were able to achieve speedup factors of about 2.25x for a block-matching algorithm (combined with loop versioning to avoid runtime alignment), 1.6x for the SPEC95 Swim benchmark and a factor 4x for a Sobel FIR filter.
|
[PDF]
[Abstract]
|
| 16 |
|
Multi-input Embedded Real-time Software Defined Radio
This master thesis work is inspired by the practical aspects of the Analytical Constant Modulus Algorithm (ACMA) proposed by Alle-Jan van der Veen and Arogyaswami Paulraj. The ACMA deals with the beamforming problem associated with constant modulus co-channel signal interference in wireless communication. Co-channel interference occur when multiple signals are transmitted simultaneously at the same frequency from different sources. Beamforming is a technique applied in spatial signal processing to separate out individual signals using an antenna array. ACMA provides an efficient analytical approach to solve beamforming. It is a \textit{blind beamforming} algorithm as it does not require any knowledge of the signals and the channels. The scope of this thesis work is to implement a low cost, low power, embedded Multi-input receiver application. The Multi-input receiver system shall handle partially and fully overlapping constant modulus signals from distinct sources. The signals are modulated using a generic modulation scheme. The Multi-input receiver shall separate out individual signals using the ACMA (blind beamforming) and demodulate each signal. The system shall be software defined such that the beamforming and demodulator are implemented in software on an embedded Digital Signal Processor (DSP) platform. The Multi-input receiver system shall be optimized for the DSP platform in order to achieve real-time performance in terms of speed and stay within the power budget of the system. The receiver shall work efficiently in the presence of interferences and noise such as thermal noise generated by the receiver.
|
 file embargo until: 2014-08-10
[Abstract]
|
| 17 |
|
Real-Time Gesture Recognition with a 2D camera
There has been a vast improvement in Human-Computer Interaction over the last decade. Yet there are only a very few systems with natural interfaces such as with speech and gestures. This thesis here addresses the topic of gesture recognition using a 2D camera and how they can be used as natural interfaces to control applications. The gesture recognition algorithm can identify six different gestures and was first developed in a PC and later moved to an embedded platform. A robust background subtraction technique is designed to obtain the hand segment. Two gesture recognition methods are implemented, their performances are measured and the angle-based recognition approach is chosen for its accuracy. The application is moved to an embedded platform i.MX515EVK based on ARM Cortex-A8 processor. To obtain a frame rate suitable for real-time applications, optimizations such as camera capture time reduction, algorithmic optimizations and utilizing SIMD unit of the Cortex-A8 processor known as NEON for data parallelism are performed. As experimentation, the optimized version of the algorithm is used to build a real-time application that recognizes gesture from images to control applications. The performance of the application is studied and a frame rate of 4 - 4.5 frames per second is achieved.
|
 file embargo until: 2015-09-11
[Abstract]
|
| 18 |
|
Feasibility Study and Design for Wireless Sensor Networks in a Space Environment
Wireless sensor networks is a technology that has been used in a vast number of applications and environments with successful results in the past. Therefore increasing nowadays the number of users of this type of devices and their new applications. In this thesis we worked hand-in-hand with the ISIS B.V. Company to provide a proper WSN design to be deployed inside a spacecraft structure and satisfy a group of defined requirements. ISIS or Innovative Solutions in Space is a private company founded in January 2006 that focus on the development of nanosatellites and several space related services.
|
[PDF]
[Abstract]
|
| 19 |
|
A Framework for Cooperative 3D Mapping of Unstructured Environments
Cooperative nature in robots is a much sought after feature. Weakness of individual entities in the system could be over come by cooperation, which also brings in reliability and speed. Application of Computer Vision in robotics has brought in many new path breaking techniques, especially into aerial robotics. It has also managed to upstage the use of traditional inertial sensing methods for control and stabilization. With more computation power being packed onboard the robotic platforms, it is now possible to run some of the state-of-the-art Computer Vision and Control algorithms on the platform itself. We present a hybrid solution involving a vision based markerless Simultaneous Localization and Mapping algorithm and fudicial markers in a framework to achieve cooperative 3D mapping of unstructured environments.
|
[PDF]
[Abstract]
|
| 20 |
|
GDE: A Distributed Gradient-Based Algorithm for Distance Estimation in Large-Scale Networks
Today, wireless networks are connecting more and more devices around us. The scale of these systems demands for novel techniques to maintain availability for various services such as routing, localization, context detection, etc. Distance estimation is one of their most important building blocks. The majority of current algorithms presume knowledge about node position via systems such as GPS. While for some application scenarios this approach is feasible, for a lot of cases it suffers from frequent unavailability and high costs in terms of energy consumption. The main contribution of the thesis is the introduction of a novel distributed algorithm called GDE for the estimation of distances in large-scale wireless networks. It is based on a gossiping mechanism to estimate distances between nodes solely based on local interaction. We analyze the parameters that should be considered by real applications, and present mathematical models to compensate their influence for distance estimation. Three kinds of applications are shown in the thesis using the GDE algorithm, including cluster center detection, overlay shape construction, and routing. Finally, we introduce some more improvement methods for the GDE algorithm to increase the distance estimation accuracy. The evaluations by means of simulation show that GDE succeeds in estimating the distance between nodes in both static and mobile scenarios with considerably high accuracy for various parameter setups, such as varying node density, node speed, spatial node distribution, etc.
|
[PDF]
[Abstract]
|