| 1 |
|
Modeling SRAM Start-up Characteristics For Physical Unclonable Functions
|
[PDF]
|
| 2 |
|
RFID Guardian Prototype
The RFID guardian is an embedded device to protect the privacy of people using their RFID (Radio Frequency IDentification) enabled products.
This BSc. project is about how version 2 of the RFID Guardian hardware was designed. Besides the description about the hardware it handles some obligatory paperwork for the design flow which was used during the hardware design.
|
[PDF]
[Abstract]
|
| 3 |
|
A power-aware fault-tolerant hardware system for a custom reconfigurable platform
Electronics systems in deep-submicron era face many new challenges. Increased intricacy of the manufacturing process will likely to increase the manufacturing defect while testing of those effect will be very challenging. Smaller feature size will also face new reliability issues due to phenomenas such as Joule heating and electromigration. Furthermore, chances of temporal defects, namely soft-errors, or Single Event Fault (SEU) will increase because the critical charge, the charge required to flip a logical value in flip-flops decreases with technology scaling.
Power consumption is another issue that is ever greater in electronics design. Technical, financial, and ecological concern all require devices that consumes as small amount of energy as possible.
The Ubichip, a bio-inspired reconfigurable VLSI developed in PERPLEXUS European project has attributes such as dynamic self replication and dynamic routing capability, both of which may help develop a system to increase reliability while addressing the power consumption issues.
The author has designed a power aware fault tolerant system based on the Triple Modular Redundancy (TMR) fault tolerant strategy to be implemented on Ubichip. The system also controls the number of functional unit to regulate the overall system power consumption. This report describes the design, implementation, and simulation of the fault-tolerant system.
|
[PDF]
[Abstract]
|
| 4 |
|
MEP-MAS: A Message Passing Multiprocessor Array for Streaming Applications
This thesis presents the design and implementation of a Chip-Multiprocessor (CMP) targeted at streaming applications(e.g. MPEG, MP3). Streaming applications are applications which can be split into several distinct stages working on data elements in a pipelined fashion. We propose a distributed-memory array (MEP- MAS), where the cores communicate via message-passing, optimizing the throughput. Application tasks are dynamically scheduled by a hardware scheduler taking the consumer-producer locality into ac- count, thereby minimizing the communication overhead. The array is evaluated in terms of performance, scalability and predictability as a function of varied input stream sizes, multiple pipelines, number of pipeline stages and traffic volume. The array is configured as a 4 by 5 mesh and has reached speedups as high as 3.6x for a 4-stage pipeline and 13.4x for a 16-stage pipeline. Our experiments have highlighted the need for a balanced workload in order to optimize the performance. Furthermore, it is shown that MEP-MAS is scalable as the speedup and throughput almost linearly increases with the number of added pipelines. The speedup has increased from 3.6x to 13.5x and the throughput from 17k data elements per second to 65k data elements per second. Increasing the traffic volume in the network marginally affects the speedup (-1.9%). Finally, increasing the traffic volume can cause a high deviation in arrival times between two subsequent data blocks in the pipeline of up to 8%.
|
[PDF]
[Abstract]
|
| 5 |
|
Integration of existing optimisation techniques with the DWARV C-to-VHDL compiler
Hardware acceleration using reconfigurable devices is a hot research item. To facilitate this acceleration technique, also called reconfigurable computing, tools are being developed that translate High-Level Languages into Hardware Description Languages. DWARV is such a tool developed at Delft University of Technology, translating C into VHDL.
The way C code is written affects the performance of DWARV significantly. Manual modifications can improve the speedup of that kernel significantly. This thesis will work towards closing the performance gap between modified sources and unmodified sources. By integrating different optimisation techniques, less manual modifications will need to be made for improved performance. The techniques covered by this thesis are loop unrolling, loop invariant code motion, software-sided caching and algebraic simplifications.
Integrating these techniques into DWARV was done, taking into account possible dependencies between the different optimisation techniques. When combining all these techniques together, speedups of 2.8 were achieved for individual kernels. On average, a kernel is 1.45 times faster when the optimisation techniques from this thesis are used.
|
[PDF]
[Abstract]
|
| 6 |
|
MEP-MAS: A message passing multiprocessor array for streaming applications
This thesis presents the design and implementation of a Chip-Multiprocessor (CMP) targeted at streaming applications(e.g. MPEG, MP3). Streaming applications are applications which can be split into several distinct stages working on data elements in a pipelined fashion. We propose a distributed-memory array (MEP-MAS), where the cores communicate via message-passing, optimizing the throughput. Application tasks are dynamically scheduled by a hardware scheduler taking the consumer-producer locality into account, thereby minimizing the communication overhead. The array is evaluated in terms of performance, scalability and predictability as a function of varied input stream sizes, multiple pipelines, number of pipeline stages and traffic volume. The array is configured as a 4 by 5 mesh and has reached speedups as high as 3.6x for a 4-stage pipeline and 13.4x for a 16-stage pipeline. Our experiments have highlighted the need for a balanced workload in order to optimize the performance. Furthermore, it is shown that MEP-MAS is scalable as the speedup and throughput almost linearly increases with the the number of added pipelines. The speedup has increased from 3.6x to 13.5x and the throughput from 17k data elements per second to 65k data elements per second. Increasing the traffic volume in the network marginally affects the speedup (-1.9%). Finally, increasing the traffic volume can cause a high deviation in arrival times between two subsequent data blocks in the pipeline of up to 8%.
|
[PDF]
[Abstract]
|
| 7 |
|
Compile time aanalysis for hardware transactional memory architectures
Transactional Memory is a parallel programming paradigm in which tasks are executed, in forms of transactions, concurrently by different resources in a system and resolve conflicts between them at run-time. Conflicts, caused by data dependencies, result in aborts and restarts of transactions, thus, degrading the performance of the system. In case these data dependencies are known at compile time, then the transactions can be scheduled in a way that conflicts are avoided, thereby, reducing the number of aborts and improving significantly the system’s performance. This thesis presents the Compiler insights to Transactional memory (CiT) tool, an architecture independent static analyzer for parallel programs, which detects all potential data dependencies between parallel sections of a program. It provides feedback about load-store instructions in a transaction, dependencies inside of a loop and branches, and severals warnings related to system calls which can affect the performance. The efficiency of the tool was tested on an application including different types of induced data dependencies, as well as several applications in the STAMP benchmark suit. In the first experiment, a 20% performance improvement was observed when the two versions of the application were executed on the TMFv2 HTM simulator.
|
[PDF]
[Abstract]
|
| 8 |
|
Microcoded Reconfigurable Embedded Processors
|
[PDF]
|
| 9 |
|
Feasibilty Analysis for Hardware Acceleration of Pattern Recognition Algorithms
This thesis presents a feasibility analysis for hardware acceleration of the pattern recognition algorithms used by the Media Knowledge Engineering department at the Delft University of Technology. The feasibility analysis is conducted on a number of different algorithm classes. The Parzen Window algorithm appeared to be the most suitable option for acceleration when recongurable hardware is considered. The reason for this is that the Parzen Window consists of independent calculations that can be computed in parallel. It can be computed by execution of Custom Configured Hardware Units (CCU) in Field Programmable Gate Arrays (FPGAs). The feasibility analysis presented, gave insight in the question whether it is useful to implement these kind of algorithms in hardware. Our results showed that algorithms that have independent calculations and thus are able to be executed in parallel are strong candidates for hardware implementation, certainly when the design can be executed with integer calculations. Integer calculations reduce the complexity of hardware implementation, require smaller area on the FPGA, reduce the bandwidth of the calculations and can be computed faster than their floating point version. In the future our methodology can be reused for other algorithms that have a parallel structure.
|
[PDF]
[Abstract]
|
| 10 |
|
A Novel Concurrent Validation Scheme for Hardware Transactional Memory
Transactional memory is a lock-free parallel programming model,
which aims at replacing conventional lock-based threaded programming techniques, currently used by multi-core systems. These techniques are difficult to implement and impose unnecessary overheads caused by conservative programming practices. In this thesis, the scalability potential of a transactional memory system, called TMFab, was explored for different numbers of processors and it was concluded that for more than 4 processors the system presents reduced scalability, due
to an increase in the validation overhead. In response to this observation, a novel validation scheme was proposed which reduces this overhead, first by allowing multiple transactions to perform their validations and commit operations concurrently, and second by removing the need for broadcasting messages between the active transactions. A distributed shared memory scheme was used to increase the validation and memory access throughput, as well as allow for transactions to commit concurrently on different memory partitions. The two architectures were compared by means of SystemC simulation, and a maximum of 2.5x validation speedup was observed for the modified design, together with a 2.7x reduction in memory access latency. In total, the modified design achieved a maximum execution speedup of 30% over the original, for the benchmarks that were used. Furthermore, the modified system guarantees sequential consistency even in
corner case scenarios.
|
[PDF]
[Abstract]
|
| 11 |
|
Greening Information Technology (IT) Infrastructures: Designing a green IT assessment methodology that supports IT decision-makers contribute to corporate responsibility strategy
Introduction
There are several issues and opportunities with regard to information technology (IT). On the one hand, the IT industry is responsible for a large amount of global GHG emission, water pollution, depletion of scarce materials, growing volumes of e-waste and the largest release of hazardous waste worldwide. On the other hand, IT is an important source of cost efficiency and competitive advantage. Examples of economic opportunities of making IT greener are cost saving, risk reduction, innovation and prevention of resource restriction. Moreover, the social impact of the IT industry is immense. IT might be manufactured from minerals from military conflict zones or produced under derived working conditions. These examples of environmental, economic and social implications of IT illustrate that IT should constitute a significant part of an organisation’s sustainability policy and corporate responsibility (CR) strategy. But how can an IT decision-maker contribute to this?
It is expected that a framework providing insights into the greenness of the hardware IT infrastructure of an organisation could support IT decision-makers contribute to the overall CR strategy of organisations. Consequently, the research question of this thesis project has been formulated as follows:
What generic framework based upon environmental and economic life cycle assessment criteria could be developed to assess the relative greenness of the hardware IT infrastructure of an organization as a step towards a comprehensive corporate responsibility strategy?
In essence this research addresses environmental and economic aspects of IT, often referred to as green IT. Emphasis is put on environmental sustainability and costs associated with the physical IT infrastructure supporting business applications through processing, transferring or storing computer programs or data. This is referred to as the greening the hardware IT infrastructure. The purpose of this research is to understand how an IT decision-maker can contribute to CR strategy by addressing several environmental issues efficiently. These environmental issues are related to water use, energy use and raw material use, greenhouse gas (GHG) emissions and generation of electronic and electrical waste (WEEE).
Research methodology
To structure and guide the explorative research of designing a new framework, the design science research by Hevner et al. (2004) is applied. The design science research is an outcome based research methodology that focuses on designing artefacts. Basically this methodology consists of three types of iterations; relevance, rigor and design. To establish rigor and relevance literature was reviewed from the knowledge base (rigor) and design requirements were analysed from stakeholder interviews (relevance). This information constitutes the academic and practical grounding of the new artefact. To design a new framework three design iterations were carried out; two formative validations by expert panels and one operational validation through a case study.
Results
The outcome of the design process was a new framework that can be used to assess the greenness of an organisation’s hardware IT infrastructure. The framework consists of several viable performance indicators related to energy use, water use, GHG emission and generation of raw material waste at organisational level (see Chapter 7). The operationalization of these can be found in the functional design in Chapter 8. The functional design describes how the performance indicator scores can be estimated and aggregated into assessment criteria scores and an index. The assessment criteria were defined as follows:
1. Water use over the life cycle of hardware IT (m3)
2. Energy use over the life cycle of hardware IT (MJ)
3. Generation of waste over the life cycle of hardware IT (kg)
4. Greenhouse gas emissions over the life cycle of hardware IT (ton CO2)
5. Costs over the life cycle of hardware IT (euro)
The index is entitled the Hardware IT infrastructure Greenness (HITIG) index. The HITIG index can be determined by applying the weighted sum method. This requires normalization and weighing of assessment criteria. At the moment normalization is not possible as an unbiased reference score cannot be established.
Evaluation, reflection and recommendations
Although three design iterations were carried out to design the new framework, the framework design process and the artefact have several limitations. First, the expert panel reviews have several limitations. The panels were small and expert opinions about which design requirement constituted “core requirements” differed between the panels. Second, the case study research had several limitations. Data used to estimate some of the performance indicators in the case study was deprived. Moreover, the external validity of the framework is limited as only one case study was executed with a limited number of hardware IT infrastructure units. Third, a limited number of aspects related to sustainability have been incorporated in the new framework. Environmental issues have been limited resource usage (water, energy and raw materials), GHG emission and waste generation and economic aspects have been limited to costs. Social implications of sustainability have not been included at all. Fourth, measuring performance is challenging and using indicators to assess the greenness of hardware IT is a reductionist tool that possibly cannot encapsulate the complexity of sustainability and greenness of IT. Lastly, several experts from KPMG have been involved in the definition of design requirements and the expert panel reviews. This could have biased the framework, but this cannot be completely proven.
To deal with the shortcomings of the new framework several things could be done. The external validity of the framework could be enhanced by carrying out additional case studies in which different units of analysis are investigated. The functional design could be improved by incorporating more accurate and up-to-date data. The framework could be further expanded to incorporate additional economic and environmental aspects and social implications of IT. Land use, hazardous waste, quality and working conditions are examples of four aspects that could be incorporated in the framework. Furthermore, the framework could be accompanied with a management process to ensure an organisations’ progress is measured over time. The management process could be based upon the plan-do-check-act (PDCA) cycle. Implementing a new management process or integrating the framework in an already existing environmental management process could require awareness of green and sustainable IT within an organisation as well as a clear governance structure.
Conclusion and further research
The new framework can be used to determine the greenness of the hardware IT infrastructure of an organisation as a step towards a comprehensive CR strategy when incorporated in a measurement process. The framework supports achieving the desired green IT assessment criteria scores. Organisations can use the outcome of periodical measurements from the framework to, if required, adjust their policies in order to achieve the CR goals as part of their CR strategy. The framework can be used to assess green IT progression related to energy use, water use, generation of raw material waste, GHG emission and costs over the life cycle of hardware IT. Assessing the relative greenness of the hardware IT infrastructure of an organisation would require implementing the framework in a continuous management process. Measuring the hardware IT infrastructure greenness with the purpose of benchmarking results, it is recommended organisations apply the same calculation methodology to ensure consistency and comparability of results.
For further research it is recommended to investigate how social aspects can be incorporated in the framework to ensure a more balanced contribution to CR strategy. It is also recommended to improve the quality of certain data used in the functional design and to extend the scope of environmental and economic sustainability aspects in the framework. Furthermore, research should focus on improving the framework through additional refinement cycles. Particularly important are additional case studies to test the general applicability of the framework, further refine the functional design and evaluate the use of the framework over time as part of a continuous management process.
|
[PDF]
[Abstract]
|
| 12 |
|
The rho-trimedia processor
|
[PDF]
|
| 13 |
|
Secure computing on reconfigurable systems
This thesis proposes a Secure Computing Module (SCM) for reconfigurable computing systems. SC provides a protected and reliable computational environment, where data security and protection against malicious attacks to the system is assured. SC is strongly based on encryption algorithms and on the attestation of the executed functions. The use of SC on reconfigurable devices has the advantage of being highly adaptable to the application and the user requirements, while providing high performances.
Moreover, it is adaptable to new algorithms, protocols, and threats. In this dissertation, high performance cryptographic units for symmetric encryption and hash functions, were designed in order to achieve a high performance SCM. Implementations results, in particular for the AES algorithm, suggest improvements of more than 500% in terms of Throughput per Slice compared to related art, with absolute throughputs of up to 34Gbit/s on a Virtex II Pro FPGA. A method to attest dynamically reconfigured hardware structures is also proposed.
In addition, this method does not penalize the performance of the SCM. The presented attestation mechanism allows the configuration bitstreams to be stored in unsecured locations, for example on an external memory or even on the internet, without posing a security threat. Experimental results obtained by implementing the proposed SCM on a Virtex II Pro FPGA suggest speedups up to 750 times, compared with software implemented algorithms, achieving throughputs above 1Gbit/s at low area cost. Overall, this dissertation demonstrates the applicability and identifies the main advantages of implementing SC on reconfigurable systems.
|
[PDF]
[Abstract]
|
| 14 |
|
Design Trade-offs in Customized On-chip Crossbar Schedulers
In this paper, we present a design and an analysis of customized crossbar schedulers for reconfigurable on-chip crossbar networks. In order to alleviate the scalability problem in a conventional crossbar network, we propose adaptive schedulers on customized crossbar ports. Specifically, we present a scheduler with a weighted round robin arbitration scheme that takes into account the bandwidth requirements of specific applications. In addition, we propose the sharing of
schedulers among multiple ports in order to reduce the implementation cost. The proposed schedulers arbitrate on-demand (at design time) interconnects and adhere to the link bandwidth requirements, where physical topologies are identical to logical topologies for given applications. Considering conventional crossbar schedulers as reference designs, a comparative performance analysis is conducted. The hardware scheduler modules are parameterized. Experiments with practical applications show that our custom schedulers occupy up to 83% less area, and maintain better
performance compared to the reference schedulers.
|
[PDF]
[Abstract]
|
| 15 |
|
Final Report Flock of Birds System
Report describing the development process on the Flock of Birds Visualizer software, used to acquire and visualize measurements of shoulder movements for medical research. The major final results of the process consist of a system able to communicate with the sensor based tracking hardware, the Flock of Birds. Secondly, development has resulted in a fully configurable application, allowing users to specify the sensors and bony landmarks according to personal preferences. Furthermore the basis for the center of rotation estimation method was implemented. Development of this final bachelor project was done in collaboration with the Leiden University Medical Center.
|
[PDF]
[Abstract]
|
| 16 |
|
A Quantitative Model for Hardware/Software Partitioning
Heterogeneous System Development needs Hardware/Software Partitioning performed early on in the development process. In order to do this early on predictions of hardware resource usage and delay are necessary. In this thesis a Quantitative Model is presented that can make early predictions to support the partitioning process. The model is based on Software Complexity Metrics, which capture important aspects of functions like control intensity, data intensity, code size, etc. In order to remedy the interdependence of the software metrics a Principal Component Analysis performed. The hardware characteristics were determined by automatically generating VHDL from C using the DWARV C-to-VHDL compiler. Using the results from the principal component analysis, the quantitative model was generated using linear regression. The error of the model differs per hardware characteristic. We show that for flip-flops the mean error for the predictions is 69%. In conclusion, our quantitative model can make fast and sufficiently accurate area predictions to support Hardware/Software Partitioning. In the future, the model can be extended by introducing extra software metrics, using more advanced modeling techniques, and using a larger collection of functions and algorithms.
|
[PDF]
[Abstract]
|
| 17 |
|
Evaluation Methodology and Systematic Selection of Microcontrollers for Delfi-n3Xt Nanosatellite
|
[PDF]
|
| 18 |
|
Generic and Orthogonal March Element based Memory BIST Engine
A Memory BIST architecture and implementation based on the novel concept of Generic and Orthogonal March Element
|
[PDF]
[Abstract]
|
| 19 |
|
TMFab: A Transactional Memory Fabric for Chip Multiprocessors
With the performance of single-core processors approaching its limits, an increased amount of research effort is focused on chip multiprocessors (CMP). However, existing lock-based synchronization methods that are critical to performing parallel computation possess limited scalability and are inherently complex to use while programming. This thesis uses the concept of transactional memory implemented within a synthesizable fabric named TMFab, containing all the requisite hardware components needed to prototype a scalable chip-multiprocessor. Its processor independent nature enables the instantiation and use of any suitable soft-processor core inside the fabric without significant modifications to the fabric hardware. Additionally, the fabric offers scalability on account of its 3D interconnect architecture that supports die-stacking to add additional processor cores to the CMP without increasing its area footprint. The hardware transactional memory system of the fabric reduces performance overheads of transactional operations, allowing transactions to complete execution faster. TMFab is shown to provide speed up as high as 3.44x for correctly partitioned independent transactions and can be used to analyze the points of contention for conflicting transactions. The fabric was synthesized for both Field Programmable Gate Array (FPGA) as well as 90nm semi-custom targets.
|
[PDF]
[Abstract]
|
| 20 |
|
Hardware Support for Dynamic Partial Reconfiguration
Dynamically reconfigurable architectures have demonstrated superior performance in comparison to the general-purpose processors. This thesis describes a generic approach for Dynamic Partial Reconfiguration (DPR) of a reconfigurable platform, connected to a general purpose system through a high-speed interconnect. Thus, the system can dynamically install and execute hardware instances of software functions (bitstreams) on-demand. Furthermore, the thesis also serves as a starting point for accelerating multiple software functions concurrently on the hardware. To achieve DPR, system calls are inserted into the original program. The host processor of the system, thus manages the hardware reconfiguration and execution through a Linux device driver. Accelerating multiple software functions is achieved by implementing multiple accelerators on the hardware, managing their interfaces for communication, and providing memory consistency for read and write requests. To do so, the driver provides support for non-blocking execution of multiple software functions. The above system is implemented on a general purpose machine providing a Hyper Transport bus to connect a Xilinx Virtex4-100 FPGA, an AMD Opteron-244, and 1 GB of DDR main memory. The efficiency of the system is evaluated using audio processing and encryption workloads. The proposed system achieves a 12x speedup over software with audio processing workload and 13x speedup when the workloads were accelerated concurrently.
|
[PDF]
[Abstract]
|