MB
M.D. Berkers
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
The application of accelerators in HPC applications has seen enormous growth in the last decade. In the field of HPC demands on throughput are steadily growing.
Not all of the algorithms used have a clear HW architecture which performs the best. Our work explores the performance of different HW architectures in solving a convex optimization problem. These algorithms are a sequence of dependent operations making it an interesting use-case because parallelism is not easily found. Our work focuses on a use-case of an on machine computational model present in ASML, we explore the acceleration of a quadratic programming Active-Set algorithm on dedicated hardware. There are libraries available to do this on both the CPU and GPU, while nothing is available for the FPGA. Our work focuses on filling this gap by implementing the algorithm using a high-level abstraction parallel programming language in order to ease development for FPGA accelerators. We use the Intel FPGA SDK for OpenCL framework to evaluate the performance trade-offs involved with FPGA acceleration and compare the performance to both the CPU and GPU using library functions. To fit FPGA architecture the algorithm is converted to a dataflow algorithm to enable streaming of data between kernels. The implementation leverages the features introduced in the Intel FPGA SDK for OpenCL framework to stream data using on-chip low-latency communication between kernels. We demonstrate that such a complicated algorithm can efficiently be implemented using the OpenCL framework. Our implementation achieves competitive performance compared to optimized library function on both the CPU and GPU. The OpenCL framework allows for easy design space exploration. We have explored different optimization strategies. The execution time of the final FPGA implementation is 3.5x and 1.2x longer than the CPU and GPU respectively in double precision floating-point. If the accuracy of the FPGA implementation is reduced to single precision there is a speedup of 2.2x in execution time compared to the double precision variant. Higher throughput can be achieved by duplicating the implementation. With the current size of the algorithm, two additional copies are possible. A handcrafted implementation could further improve the FPGA performance by manually managing local memory structures and reusing processing elements. However, significantly fewer lines of code are required, and a significant reduction in development time is achieved by using the OpenCL framework compared to traditional hardware description languages. ...
Not all of the algorithms used have a clear HW architecture which performs the best. Our work explores the performance of different HW architectures in solving a convex optimization problem. These algorithms are a sequence of dependent operations making it an interesting use-case because parallelism is not easily found. Our work focuses on a use-case of an on machine computational model present in ASML, we explore the acceleration of a quadratic programming Active-Set algorithm on dedicated hardware. There are libraries available to do this on both the CPU and GPU, while nothing is available for the FPGA. Our work focuses on filling this gap by implementing the algorithm using a high-level abstraction parallel programming language in order to ease development for FPGA accelerators. We use the Intel FPGA SDK for OpenCL framework to evaluate the performance trade-offs involved with FPGA acceleration and compare the performance to both the CPU and GPU using library functions. To fit FPGA architecture the algorithm is converted to a dataflow algorithm to enable streaming of data between kernels. The implementation leverages the features introduced in the Intel FPGA SDK for OpenCL framework to stream data using on-chip low-latency communication between kernels. We demonstrate that such a complicated algorithm can efficiently be implemented using the OpenCL framework. Our implementation achieves competitive performance compared to optimized library function on both the CPU and GPU. The OpenCL framework allows for easy design space exploration. We have explored different optimization strategies. The execution time of the final FPGA implementation is 3.5x and 1.2x longer than the CPU and GPU respectively in double precision floating-point. If the accuracy of the FPGA implementation is reduced to single precision there is a speedup of 2.2x in execution time compared to the double precision variant. Higher throughput can be achieved by duplicating the implementation. With the current size of the algorithm, two additional copies are possible. A handcrafted implementation could further improve the FPGA performance by manually managing local memory structures and reusing processing elements. However, significantly fewer lines of code are required, and a significant reduction in development time is achieved by using the OpenCL framework compared to traditional hardware description languages. ...
The application of accelerators in HPC applications has seen enormous growth in the last decade. In the field of HPC demands on throughput are steadily growing.
Not all of the algorithms used have a clear HW architecture which performs the best. Our work explores the performance of different HW architectures in solving a convex optimization problem. These algorithms are a sequence of dependent operations making it an interesting use-case because parallelism is not easily found. Our work focuses on a use-case of an on machine computational model present in ASML, we explore the acceleration of a quadratic programming Active-Set algorithm on dedicated hardware. There are libraries available to do this on both the CPU and GPU, while nothing is available for the FPGA. Our work focuses on filling this gap by implementing the algorithm using a high-level abstraction parallel programming language in order to ease development for FPGA accelerators. We use the Intel FPGA SDK for OpenCL framework to evaluate the performance trade-offs involved with FPGA acceleration and compare the performance to both the CPU and GPU using library functions. To fit FPGA architecture the algorithm is converted to a dataflow algorithm to enable streaming of data between kernels. The implementation leverages the features introduced in the Intel FPGA SDK for OpenCL framework to stream data using on-chip low-latency communication between kernels. We demonstrate that such a complicated algorithm can efficiently be implemented using the OpenCL framework. Our implementation achieves competitive performance compared to optimized library function on both the CPU and GPU. The OpenCL framework allows for easy design space exploration. We have explored different optimization strategies. The execution time of the final FPGA implementation is 3.5x and 1.2x longer than the CPU and GPU respectively in double precision floating-point. If the accuracy of the FPGA implementation is reduced to single precision there is a speedup of 2.2x in execution time compared to the double precision variant. Higher throughput can be achieved by duplicating the implementation. With the current size of the algorithm, two additional copies are possible. A handcrafted implementation could further improve the FPGA performance by manually managing local memory structures and reusing processing elements. However, significantly fewer lines of code are required, and a significant reduction in development time is achieved by using the OpenCL framework compared to traditional hardware description languages.
Not all of the algorithms used have a clear HW architecture which performs the best. Our work explores the performance of different HW architectures in solving a convex optimization problem. These algorithms are a sequence of dependent operations making it an interesting use-case because parallelism is not easily found. Our work focuses on a use-case of an on machine computational model present in ASML, we explore the acceleration of a quadratic programming Active-Set algorithm on dedicated hardware. There are libraries available to do this on both the CPU and GPU, while nothing is available for the FPGA. Our work focuses on filling this gap by implementing the algorithm using a high-level abstraction parallel programming language in order to ease development for FPGA accelerators. We use the Intel FPGA SDK for OpenCL framework to evaluate the performance trade-offs involved with FPGA acceleration and compare the performance to both the CPU and GPU using library functions. To fit FPGA architecture the algorithm is converted to a dataflow algorithm to enable streaming of data between kernels. The implementation leverages the features introduced in the Intel FPGA SDK for OpenCL framework to stream data using on-chip low-latency communication between kernels. We demonstrate that such a complicated algorithm can efficiently be implemented using the OpenCL framework. Our implementation achieves competitive performance compared to optimized library function on both the CPU and GPU. The OpenCL framework allows for easy design space exploration. We have explored different optimization strategies. The execution time of the final FPGA implementation is 3.5x and 1.2x longer than the CPU and GPU respectively in double precision floating-point. If the accuracy of the FPGA implementation is reduced to single precision there is a speedup of 2.2x in execution time compared to the double precision variant. Higher throughput can be achieved by duplicating the implementation. With the current size of the algorithm, two additional copies are possible. A handcrafted implementation could further improve the FPGA performance by manually managing local memory structures and reusing processing elements. However, significantly fewer lines of code are required, and a significant reduction in development time is achieved by using the OpenCL framework compared to traditional hardware description languages.
Smart sensors and communication using IoT in supermarkets
Shelf monitor system
Bachelor thesis
(2017)
-
Erik Hagenaars, Martijn Berkers, Jaap Hoekstra, Andre Bossche, Bart Frens, Ioan Lager, Paul Marcelis
This thesis tries to find a solution for the problem of managing and monitoring the banana shelf in a supermarket using IoT.
The research focuses on using a wireless sensor that detects some features of the banana shelf while being non-intrusive.
The three main features that are examined of the shelf are the quality and quantity of the bananas and the quality of the shelf.
First a research was conducted to find the best sensor to use for these measurements.
The chosen sensor is a color image sensor, the platform for the IoT device is a Raspberry Pi.
Using the python programming language in combination with the openCV library image processing was used to detect the features.
The image is first smoothed using a Gaussian filter, afterwards the foreground is segmented.
The different segmentation methods are researched and adaptive thresholding is used.
To determine the quantity of the bananas and quality of the shelf the stickers on the bananas are detected.
This detection is implemented using different filtering methods ranging from spectral filtering to color thresholding.
With the segmented foreground the quality of the bananas is assessed using a color histogram.
This information is then sent to a communication module that is connected to a IoT dashboard for user interpretation.
With the proposed design the status of the shelf including the percentage of the shelf filled, the quality of the bananas on the shelf and the neatness of the shelf are available for a supermarket manager to better organize his supermarket.
This sensor makes it possible to better organize the banana shelf and act preemptive instead of reactive. ...
The research focuses on using a wireless sensor that detects some features of the banana shelf while being non-intrusive.
The three main features that are examined of the shelf are the quality and quantity of the bananas and the quality of the shelf.
First a research was conducted to find the best sensor to use for these measurements.
The chosen sensor is a color image sensor, the platform for the IoT device is a Raspberry Pi.
Using the python programming language in combination with the openCV library image processing was used to detect the features.
The image is first smoothed using a Gaussian filter, afterwards the foreground is segmented.
The different segmentation methods are researched and adaptive thresholding is used.
To determine the quantity of the bananas and quality of the shelf the stickers on the bananas are detected.
This detection is implemented using different filtering methods ranging from spectral filtering to color thresholding.
With the segmented foreground the quality of the bananas is assessed using a color histogram.
This information is then sent to a communication module that is connected to a IoT dashboard for user interpretation.
With the proposed design the status of the shelf including the percentage of the shelf filled, the quality of the bananas on the shelf and the neatness of the shelf are available for a supermarket manager to better organize his supermarket.
This sensor makes it possible to better organize the banana shelf and act preemptive instead of reactive. ...
This thesis tries to find a solution for the problem of managing and monitoring the banana shelf in a supermarket using IoT.
The research focuses on using a wireless sensor that detects some features of the banana shelf while being non-intrusive.
The three main features that are examined of the shelf are the quality and quantity of the bananas and the quality of the shelf.
First a research was conducted to find the best sensor to use for these measurements.
The chosen sensor is a color image sensor, the platform for the IoT device is a Raspberry Pi.
Using the python programming language in combination with the openCV library image processing was used to detect the features.
The image is first smoothed using a Gaussian filter, afterwards the foreground is segmented.
The different segmentation methods are researched and adaptive thresholding is used.
To determine the quantity of the bananas and quality of the shelf the stickers on the bananas are detected.
This detection is implemented using different filtering methods ranging from spectral filtering to color thresholding.
With the segmented foreground the quality of the bananas is assessed using a color histogram.
This information is then sent to a communication module that is connected to a IoT dashboard for user interpretation.
With the proposed design the status of the shelf including the percentage of the shelf filled, the quality of the bananas on the shelf and the neatness of the shelf are available for a supermarket manager to better organize his supermarket.
This sensor makes it possible to better organize the banana shelf and act preemptive instead of reactive.
The research focuses on using a wireless sensor that detects some features of the banana shelf while being non-intrusive.
The three main features that are examined of the shelf are the quality and quantity of the bananas and the quality of the shelf.
First a research was conducted to find the best sensor to use for these measurements.
The chosen sensor is a color image sensor, the platform for the IoT device is a Raspberry Pi.
Using the python programming language in combination with the openCV library image processing was used to detect the features.
The image is first smoothed using a Gaussian filter, afterwards the foreground is segmented.
The different segmentation methods are researched and adaptive thresholding is used.
To determine the quantity of the bananas and quality of the shelf the stickers on the bananas are detected.
This detection is implemented using different filtering methods ranging from spectral filtering to color thresholding.
With the segmented foreground the quality of the bananas is assessed using a color histogram.
This information is then sent to a communication module that is connected to a IoT dashboard for user interpretation.
With the proposed design the status of the shelf including the percentage of the shelf filled, the quality of the bananas on the shelf and the neatness of the shelf are available for a supermarket manager to better organize his supermarket.
This sensor makes it possible to better organize the banana shelf and act preemptive instead of reactive.