"uuid","repository link","title","author","contributor","publication year","abstract","subject topic","language","publication type","publisher","isbn","issn","patent","patent status","bibliographic note","access restriction","embargo date","faculty","department","research group","programme","project","coordinates"
"uuid:f5fa2307-fe28-4570-a679-b4ac64d007e6","http://resolver.tudelft.nl/uuid:f5fa2307-fe28-4570-a679-b4ac64d007e6","MEP-MAS: A Message Passing Multiprocessor Array for Streaming Applications","Tjin A Djie, M.E.","Van Leuken, R. (mentor); Kumar, S. (mentor)","2012","This thesis presents the design and implementation of a Chip-Multiprocessor (CMP) targeted at streaming applications(e.g. MPEG, MP3). Streaming applications are applications which can be split into several distinct stages working on data elements in a pipelined fashion. We propose a distributed-memory array (MEP- MAS), where the cores communicate via message-passing, optimizing the throughput. Application tasks are dynamically scheduled by a hardware scheduler taking the consumer-producer locality into ac- count, thereby minimizing the communication overhead. The array is evaluated in terms of performance, scalability and predictability as a function of varied input stream sizes, multiple pipelines, number of pipeline stages and traffic volume. The array is configured as a 4 by 5 mesh and has reached speedups as high as 3.6x for a 4-stage pipeline and 13.4x for a 16-stage pipeline. Our experiments have highlighted the need for a balanced workload in order to optimize the performance. Furthermore, it is shown that MEP-MAS is scalable as the speedup and throughput almost linearly increases with the number of added pipelines. The speedup has increased from 3.6x to 13.5x and the throughput from 17k data elements per second to 65k data elements per second. Increasing the traffic volume in the network marginally affects the speedup (-1.9%). Finally, increasing the traffic volume can cause a high deviation in arrival times between two subsequent data blocks in the pipeline of up to 8%.","Multiprocessors; Streaming; Hardware scheduler","en","master thesis","","","","","","","","","Electrical Engineering, Mathematics and Computer Science","Microelectronics & Computer Engineering","","","",""
"uuid:97e95a0e-a9d1-4ecd-b4c2-717217ab8a0e","http://resolver.tudelft.nl/uuid:97e95a0e-a9d1-4ecd-b4c2-717217ab8a0e","Streaming FPGA Based Multiprocessor Architecture for Low latency Medical Image Processing","Heij, R.W.","Al-Ars, Z. (mentor)","2016","In this work a fast and efficient implementation of a Field Programmable Gate Array (FPGA) based, fixed hardware, streaming multiprocessor architecture for low latency medical image processing is introduced. The design of this computation fabric is based on the ρ-VEX Very Long Instruction Word (VLIW) softcore processor and is in influenced by architectures of modern Graphics Processing Unit (GPU) implementations. The computation fabric is capable of exploiting several types of parallelism, including pipelining, Instruction-level Parallelism (ILP) and Data-level parallelism (DLP). The multiprocessor in the fabric is implemented by a chain of ρ-VEX processors that function as a processor pipeline. A memory architecture to support the high throughput of this processor pipeline has been created, making the computation fabric capable of stream processing. The basic building blocks of this memory architecture are single cycle accessible, dual port scratchpad memories. A total of 16 instances of the computation fabric are implemented on a Virtex-7 FPGA, creating an array of multiprocessors that is capable of processing 43.52 images per second when running a typical medical image processing algorithm workload on an operating frequency of 193 MHz. This makes the implementation suitable for real-time medical image processing. The processor pipeline depth of the computation fabric is generic, and can be changed according to the requirements posed by the algorithm workload. This makes the architecture flexible and general enough to handle changes and updates to the algorithm workload.","FPGA; r-VEX; Streaming; Processor; Medical image processing","en","master thesis","","","","","","","","2017-12-01","Electrical Engineering, Mathematics and Computer Science","Computer Engineering","","Computer Engineering","CE-MS-2016-15",""
"uuid:81232e85-5c72-4b02-8cfc-462b4996b633","http://resolver.tudelft.nl/uuid:81232e85-5c72-4b02-8cfc-462b4996b633","Accelerating Software Pipelines Using Streaming Caches","Yanik, K.I.M.","Wong, S. (mentor)","2016","The trend of increasing performance by parallelism is followed by the adoption of heterogeneous systems. In order to allow more fine-tuned balancing between used thread- and instruction level parallelism, the heterogeneous ρ-VEX platform was developed. Pipelining has been a part of microprocessor development for decades to increase throughput of a data-path, where a task is split in stages which are distributed over several functional units who work in parallel. In software the concept of pipelines does exist, but mostly speaks about data-flows as here stages do not operate in parallel. This thesis proposes a step towards making this a possibility by mapping software pipelining on heterogeneous multi-core systems. This work documents the design, implementation and verification of a hybrid write-back and streaming cache scheme that aims to cut down overhead of inter-context and inter-core data communication, with the idea of allowing software pipelines to map stages over cores in the same microprocessor with different functional units, in order to fine-tune this mapping. A prototype design is first implemented in a high level behavioral simulator, after which it is implemented in VHDL, tested functionally to conform to a test-suite and a set of testing pipelines developed for this project separately. The VHDL design is implemented on the ML605 Virtex-6 platform, and in its current state conforms to all test-cases but not yet the pipelines, and a slight slow-down is measured in practice. Even though the prototype currently increased the run-time of a customly developed benchmarking pipeline from 3.3928 * 10^-4 seconds to 3.7858 * 10^-4 seconds, there is room for improvement and it enables more research in a new direction of transparently core-to-stage mapped software pipelines, which we define as horizontal software-pipelining, as opposed to traditional software pipelines who still execute code sequentially, hence vertically.","ρ-VEX; FPGA; Streaming; Processor; Cache; Pipeline; VLIW","en","master thesis","","","","","","","","","Electrical Engineering, Mathematics and Computer Science","Computer Engineering","","","",""
"uuid:75dd920a-0e50-49c9-9982-70ef7dab7a92","http://resolver.tudelft.nl/uuid:75dd920a-0e50-49c9-9982-70ef7dab7a92","Feeding High-Bandwidth Streaming-Based FPGA Accelerators","Mulder, Y.T.B. (TU Delft Electrical Engineering, Mathematics and Computer Science)","Hofstee, Peter (mentor); Delft University of Technology (degree granting institution)","2018","A new class of accelerator interfaces has signi cant implications on system architecture. An order of magnitude more bandwidth forces us to reconsider FPGA design. OpenCAPI is a new interconnect standard that enables attaching FPGAs coherently to a high-bandwidth, low- latency interface. Keeping up with this bandwidth poses new challenges for the design of accelerators, and the logic feeding them.
This thesis is conducted as part of a group project, where three other master students investigate database operator accelerators. This thesis focuses on the logic to feed the accelerators, by designing a recon gurable multi-stream bu er architecture. By generalizing across multiple common streaming-like accelerator access patterns, an interface consisting of multiple read ports with a smaller than cache line granularity is desired. At the same time, multiple read ports are allowed to request any stream, including reading across a cache line boundary.
The proposed architecture exploits di erent memory primitives available on the latest genera- tion of Xilinx FPGAs. By combining a traditional multi-read port approach for data duplication with a second level of bu ering, a hierarchy typically found in caches, an architecture is pro- posed which can supply data from 64 streams to eight read ports without any access pattern restrictions.
A correct-by-construction design methodology was used to simplify the validation of the design and to speedup the implementation phase. At the same time, the design methodology is doc- umented and examples are provided for ease of adoption. With the design methodology, the proposed architecture has been implemented and is accompanied by a validation framework.
Various con gurations of the multi-stream bu er have been tested. Con gurations up to 64 streams with four read ports meet timing with an AFU request-to-response latency of ve cycles. The largest con guration with 64 streams and eight read ports fails timing. Limiting factors are the inherent architecture of FPGAs, where memories are physically located in speci c columns. This makes extracting data complex, especially at the target frequencies of 200 MHz and 400 MHz. Wires are scattered across the FPGA and wire delay becomes dominant.
FPGA design at increasing bandwidths requires new design approaches. Synthesis results are no guarantee for the implemented design, and depending on the design size, could indicate a very optimistic operating frequency. Therefore, designing accelerators to keep up with an order of magnitude more bandwidth compared to the current state-of-the-art is complex, and requires carefully thought out accelerator cores, combined with an interface capable of feeding it.","OpenCAPI; FPGA; Streaming; HPC; Heterogeneous; Low-latency; High-bandwidth; Streaming-based","en","master thesis","","","","","","ISBN 978-94-6186-886-2","","","","","","Computer Engineering","",""
"uuid:39b1653b-cde7-419b-bcd0-8549b6e34db5","http://resolver.tudelft.nl/uuid:39b1653b-cde7-419b-bcd0-8549b6e34db5","Exploring Convolutional Neural Networks on the ρ-VEX architecture","Tetteroo, Jonathan (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Computer Engineering)","Wong, Stephan (mentor); van Genderen, Arjan (graduation committee); van Gemert, Jan (graduation committee); Delft University of Technology (degree granting institution)","2018","As machine learning algorithms play an ever increasing role in today's technology, more demands are placed on computational hardware to run these algorithms efficiently. In recent years, Convolutional Neural Networks (CNNs) have become an important part of machine learning applications in areas such as object recognition and detection. In this thesis we will explore how we can implement CNNs on the ρ-VEX processor and what can be done to optimize the performance.
The ρ-VEX processor is a VLIW processor that was developed at the Delft University of Technology and that can be reconfigured during runtime to take advantage of Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP) in an application. In this work we have developed a streaming pipeline in a simulator consisting of multiple 8-issue ρ-VEX cores connected with memory buffers. This pipeline was designed to execute CNN inference and take advantage of the overlapped execution to increase throughput. Furthermore, as each ρ-VEX core can be configured to operate in a one core, two core or four core mode based on available ILP and TLP, we can adapt the processor based on the current operation being executed and the amount of parallelism that is available.
By generating the required code from a high-level description of the CNN, it becomes straightforward to test multiple configurations of the pipeline and determine which creates the best performance. The implementation was subsequently tested using a simple network trained on the MNIST dataset.
By dividing the workload of the convolutional layers over multiple contexts to take advantage of data-level parallelism, we improved the latency by 3.03x and the throughput by 3.14x in simulation. By creating a pipeline of six cores in a single context configuration in the simulator, we achieved a throughput increase of 1.77x. A hardware implementation of the pipeline was also synthesized for a Virtex-6 FPGA, consisting of four 2-issue ρ-VEX cores at a clock speed of 200 MHz. We subsequently propose several optimizations to increase performance of CNN inference on the ρ-VEX architecture.","Convolutional Neural Networks; rVEX; Streaming","en","master thesis","","","","","","","","","","","","Computer Engineering","",""
"uuid:409f6cbf-7e8a-42dd-9973-d3719506dec1","http://resolver.tudelft.nl/uuid:409f6cbf-7e8a-42dd-9973-d3719506dec1","Live streaming via WiFi: Monitoring premature babies","Athmer, Casper (TU Delft Electrical Engineering, Mathematics and Computer Science); Chen, Qu (TU Delft Electrical Engineering, Mathematics and Computer Science)","Hanjalic, Alan (mentor); Rassels, Kianoush (graduation committee); Visser, Otto (graduation committee); Delft University of Technology (degree granting institution)","2018","This report describes the implementation of a custom streaming solution from an IP camera to a web browser. The system aims to both provide live video and Video on Demand. This will be used to monitor premature babies in incubators.","Streaming; Livestreaming; Video on Demand; Surveillance","en","bachelor thesis","","","","","","","","2020-01-02","","","","Computer Science","",""
"uuid:d71bf1c7-a35f-4de2-a5c8-90920b23e19d","http://resolver.tudelft.nl/uuid:d71bf1c7-a35f-4de2-a5c8-90920b23e19d","Minimizing bandwidth utilization for streaming noisy Monte-Carlo renders: For the 2018 individual research pilot","Lopes Cunha, Max (TU Delft Electrical Engineering, Mathematics and Computer Science)","Eisemann, Elmar (mentor); van Gemert, Jan (graduation committee); Delft University of Technology (degree granting institution)","2018","br/>This study focuses on the question how the bandwidth utilization of a high-quality video-stream from the Exposure Rendering framework can be minimized. Exposure Render uses a Monte-Carlo based rendering system to render volumetric data. The earliest estimation of lighting show high degrees of noise, leading to grainy images before convergence is complete. Exposure Render is planned to be turned into a web-service, where clients can upload volumetric data to view and interact with it. This necessitates a streaming service, which encountered difficulties regarding efficient compression. Using only JPEG compression to send still frames showed poor compression performance.
To answer this question, it was established what the noise characteristics of the frames produced by Exposure Render are. In addition, a survey was done on recent advanced in screen-space de-noising techniques to see which image-filtering techniques would be effective. This survey concluded that most of the state of the art could not readily be applied to Exposure Render, because the methods either are not designed for real-time Monte-Carlo rendering, or because they rely on additional rendering data, such as surface normals, which are not available in Exposure.
Three experiments were executed, namely a region-experiment on single image regions, where the best filtering methods were selected for local regions only. These were integrated in proposed enhancements to Exposure Render. The second experiment tested the similarity of a converging image sequence, before and after filtering. It was concluded that the delta-encoder and Median Blur performed the best in terms of speeding up the convergence in similarity over time.
The third experiment tested the bandwidth consumption of the methods and concluded that the Adaptive Gaussian Pyramid methods performed the best.
The best combination of algorithms to minimize the bandwidth utilization was found to be a Macro-block based bandwidth limiter in combination with an Adaptive Gaussian Pyramid resolution scaler, which increased compression ratio to 18.7 in comparison with the reference solution.","Image reconstruction; Monte-Carlo denoising; Image denoising; Image compression; Streaming","en","bachelor thesis","","","","","","","","","","","","","TI3806 Individual research pilot (2017/18 Q4)",""
"uuid:5788379a-66df-478f-8395-8567ff6b9aab","http://resolver.tudelft.nl/uuid:5788379a-66df-478f-8395-8567ff6b9aab","Well-being Driven Design: Creating a Meaningful Streaming-platform","Huijbregts, Matthijs (TU Delft Industrial Design Engineering)","Hekkert, Paul (mentor); Lomas, Derek (graduation committee); Delft University of Technology (degree granting institution)","2019","People are actively looking to pursue happiness by spending their precious time on meaningful experiences, which are often sought for in media entertainment as it is people’s most engaged in leisure activity today. However, current services fail to support them or even counter them, with their manipulative media-platforms in favor of their goals often at the cost of people’s goals. This creates a world of regret instead of happiness. The student of this graduation project claims that media entertainment designers should feel responsible to design services that respect and allow people to pursue their goals in finding happiness through it. This project can be considered an example, or even proof that there is indeed a manner to realize that and that, as company, it is crucial to do so as people are starting to reject services that are threatening their happiness.
APPROACH
This graduation project first investigated what happiness consisted of, and how media entertainment could contribute to that. The conclusion was that people need to engage in mindful, intentional, intrinsically motivated media-experiences which well-balances short-term pleasure (Hedonia) with long-term happiness (Eudaimonia). Then, current platforms were analysed to understand why they exist and why they are currently designed the way they are and how this affects well- being. The conclusion was that current streaming-platforms are being created to maximize media-consumption which resulted in an over-focus on hedonia at the cost of eudaimonia and therefore people’s long-term happiness. As ViP focuses on re-framing and reinvention by creating future opportunities, instead of solving the present-day problems, a future context of 2020 had been outlined through an extensive analysis using academic literature, trend reports- and sites and interviews. The result was three meta-factors that described the world of 2020 as “embracing the mindful pursuit of meaningfulness’.
SOLUTION
ViP states that as a designer, you should take a position in this future vision. The goal of this project is to design a new streaming-platform that improves people’s well- being through media-entertainment, which resulted in the following statement:
I want to empower people to experience media-entertainment meaningfully, by guiding them in articulating their intentions through trusted others.
DESIGN
The new concept considers media-entertaiment as meaningfull experience packages and facilites people in finding these packages through the suggestions of trusted others, such as people close to the user (like family and friends) but also famous people/accounts (like inspirational influencers or design blogs).
Plus it empowers people to experience these entertainment-packages meaningfully by increasing their mindfulness through 6 steps.
The concept contributes to well-being by focussing on improving both the hedonic and the eudaimonic experience by increasing people’s autonomy, competence and relatedness, focussing on personal growth, achieving goals, finding meaning in life and improving people’s vitalty by better self-regulating their media-behavior.
CityJSON has advantages in web applications over the CityGML data format, thus being chosen as the main datasets used in this research. Inspired by OGC API – Feature, a RESTful API for fast access to geospatial features in CityJSON has been developed. The second part of my research is related to the data streaming. CityJSONFeature has been proposed to enable CityJSON to be parsed incrementally on the web. To improve the overall data streaming perfor-mance, I proposed two methods to stream the data from the database to the RESTful API so that the RESTful API can immediately start constructing the first CityJSONFeature and sending it to the user. The third part of my research is to explore a proper database solution for CityJSON datasets to best support the fast data access in the RESTful API. In addition, some auxiliary data is added into the database to improve the RESTful API’s capabilities with efficient data filtering and streaming.
To evaluate the efficiency of the implemented methodology, a systematic benchmarking has been performed on three aspects: 1) the database schema, 2) the two streaming methods, and 3) the performance difference between the DBMS and the file system. The results show that in most use cases the DBMS can better support the RESTful API than the file system with the help of the built-in query and index mechanism. In addition, the overall streaming performance is largely improved when adding data streaming from the DBMS to the RESTful API. Lastly, based on the benchmarks, an optimal database schema has been chosen to support the RESTful API with fast data access, efficient data filtering and streaming.","3D city model; RESTful API; Database; Streaming","en","master thesis","","","","","","","","","","","","Geomatics","",""
"uuid:f694a414-0c92-492f-a493-d604d88ce4e0","http://resolver.tudelft.nl/uuid:f694a414-0c92-492f-a493-d604d88ce4e0","Neural Radiance Field (NeRF) as a Rendering Primitive: StreamNeRF - Adapting a NeRF Model for Progressive Decoding","Găleşanu, Matei (TU Delft Electrical Engineering, Mathematics and Computer Science)","Eisemann, E. (mentor); Kellnhofer, P. (mentor); Weinmann, M. (mentor); van Gemert, J.C. (graduation committee); Delft University of Technology (degree granting institution)","2023","Neural Radiance Fields (NeRF) and their adaptations are known to be computationally intensive during both the training and the evaluating stages. Despite being the end goal, directly rendering a full-resolution representation of the scene is not necessary and not very practical for scenarios like streamed applications. Our goal is to design a streamable adaptation for a model that can produce fast, rough estimates of 3D scenes, by only using a shallow part of the network. The quality is subsequently improved as more parts of the network are available, such that it can be used in online applications where the model needs to be transferred. Separate models can be trained at different resolutions, but this approach results in a large space overhead and also increases the evaluation time. This can be mitigated by reducing the depth of low-resolution models, but redundancy will still be high as each new model needs to re-evaluate the input data, rendering previous calculations obsolete. Our method combines key concepts from previous approaches to create a progressively trained model that is able to produce intermediate outputs of increasing quality while attempting to optimize the trade-off between overhead and quality. Our model is able to produce a recognizable representation of the scene with as little as one hidden layer from the original model. It also allows for division into streamable chunks which can be sent individually and, upon reconstruction, provide intermediate outputs that bring consistent improvement in quality. The newly streamed data uses the residual output from previous computations in order to reduce redundancy. We show that the final quality of our adaptation is within 2% of the original in terms of previously used quantitative metrics.","Neural Radiance Fields; Progressive Decoding; Streaming","en","bachelor thesis","","","","","","","","","","","","Computer Science and Engineering","CSE3000 Research Project",""