<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
Journal article(2018)
-
Anderson L. Sartor, Pedro H. E. Becker, Joost Hoozemans, Stephan Wong, Antonio C.S. Beck
In the design of modern-day processors, energy consumption and fault tolerance have gained significant importance next to performance. This is caused by battery constraints, thermal design limits, and higher susceptibility to errors as transistor feature sizes are decreasing. However, achieving the ideal balance among them is challenging due to their conflicting nature (e.g., fault-tolerance techniques usually influence execution time or increase energy consumption), and that is why current processor designs target at most two of these axes. Based on that, we propose a new VLIW-based processor design capable of adapting the execution of the application at run-time in a totally transparent fashion, considering performance, fault tolerance, and energy consumption altogether, in which the weight (priority) of each one can be defined a priori. This is achieved by a novel decision module that dynamically controls the application's ILP to increase the possibility of replicating instructions or applying power gating. For an energy-oriented configuration, it is possible, on average, to reduce energy consumption by 37.2% with an overhead of only 8.2% in performance, while maintaining low levels of failure rate, when compared to a fault-tolerant design.
...
In the design of modern-day processors, energy consumption and fault tolerance have gained significant importance next to performance. This is caused by battery constraints, thermal design limits, and higher susceptibility to errors as transistor feature sizes are decreasing. However, achieving the ideal balance among them is challenging due to their conflicting nature (e.g., fault-tolerance techniques usually influence execution time or increase energy consumption), and that is why current processor designs target at most two of these axes. Based on that, we propose a new VLIW-based processor design capable of adapting the execution of the application at run-time in a totally transparent fashion, considering performance, fault tolerance, and energy consumption altogether, in which the weight (priority) of each one can be defined a priori. This is achieved by a novel decision module that dynamically controls the application's ILP to increase the possibility of replicating instructions or applying power gating. For an energy-oriented configuration, it is possible, on average, to reduce energy consumption by 37.2% with an overhead of only 8.2% in performance, while maintaining low levels of failure rate, when compared to a fault-tolerant design.
Conference paper(2018)
-
Pedro H. Exenberger Becker, Anderson L. Sartor, Marcelo Brandalero, Tiago Trevisan Jost, Stephan Wong, Luigi Carro, Antonio C. Beck
Many modern FPGA-based soft-processor designs must include dedicated hardware modules to satisfy the requirements of a wide range of applications. Not seldom they all do not fit in the FPGA target, so their functionalities must be mapped into the much slower software domain. However, many complex soft-core processors usually underuse the available Block RAMs (BRAMs) when comparing to LUTs and registers. By taking advantage of this fact, we propose a generic low-cost BRAM-based function reuse mechanism (the BRAM-FR) that can be easily configured for precise or approximate modes to accelerate execution. The BRAM-FR was implemented in HDL and coupled to a configurable 4-issue VLIW processor. It was used to optimize different applications that use a soft-float library to emulate a Floating-Point Unit (FPU), and an image processing filter that tolerates a certain level of error. We show that our technique can accelerate the former by 1.23x and the latter by 1.52x, with a Reuse Table that fits in the BRAMs (that would otherwise be idle) of five tested FPGA targets with a marginal increase in the number of slice registers and LUTs.
...
Many modern FPGA-based soft-processor designs must include dedicated hardware modules to satisfy the requirements of a wide range of applications. Not seldom they all do not fit in the FPGA target, so their functionalities must be mapped into the much slower software domain. However, many complex soft-core processors usually underuse the available Block RAMs (BRAMs) when comparing to LUTs and registers. By taking advantage of this fact, we propose a generic low-cost BRAM-based function reuse mechanism (the BRAM-FR) that can be easily configured for precise or approximate modes to accelerate execution. The BRAM-FR was implemented in HDL and coupled to a configurable 4-issue VLIW processor. It was used to optimize different applications that use a soft-float library to emulate a Floating-Point Unit (FPU), and an image processing filter that tolerates a certain level of error. We show that our technique can accelerate the former by 1.23x and the latter by 1.52x, with a Reuse Table that fits in the BRAMs (that would otherwise be idle) of five tested FPGA targets with a marginal increase in the number of slice registers and LUTs.