AG
A. Gil
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
The Interlock Collapsing ALU (ICALU), introduced by IBM in the 1990s, aimed to mitigate execution interlocks, which occur when an instruction cannot be executed because it depends on the result of a previous one, causing pipeline stalls. ICALU collapsed dependent instruction pairs into a single three-operand operation executed in parallel through a 3-1 ALU. Its design extended the traditional ALU delay by only one Carry-Save-Adder stage, preserving overall cycle time.
This project revisits ICALU in the context of modern out-of-order RISC-V processors and evaluates its practicality and impact on performance. A feasibility study first reassessed IBM’s claims and mapped the subset of RISC-V instructions suitable for collapsing. Trace analyses of CoreMark, Embench, and SPEC CPU2017 integer benchmarks revealed that collapsible dependencies occur approximately once every eleven instructions (7.7%), with half being adjacent and more than 92% within a distance of three instructions.
ICALU was implemented in the open-source BOOM core through a dedicated collapse detection logic and a three-input ALU while minimizing additional path delay. Cycle-accurate RTL simulations showed negligible benefits in 2-wide configurations (around 1%) but speed-ups of up to 7% in 3- and 4-wide cores, with 70–90% collapse effectiveness compared to theoretical predictions.
Although performance gains grow with core width, integration requires data-path widening from issue to commit, accommodating one extra micro-operation. These modifications lengthen several timing-critical back-end paths. Overall, ICALU provides a 3–7% performance uplift on integer workloads when sufficient timing slack exists, representing a promising but design-dependent trade-off for modern RISC-V processors. ...
This project revisits ICALU in the context of modern out-of-order RISC-V processors and evaluates its practicality and impact on performance. A feasibility study first reassessed IBM’s claims and mapped the subset of RISC-V instructions suitable for collapsing. Trace analyses of CoreMark, Embench, and SPEC CPU2017 integer benchmarks revealed that collapsible dependencies occur approximately once every eleven instructions (7.7%), with half being adjacent and more than 92% within a distance of three instructions.
ICALU was implemented in the open-source BOOM core through a dedicated collapse detection logic and a three-input ALU while minimizing additional path delay. Cycle-accurate RTL simulations showed negligible benefits in 2-wide configurations (around 1%) but speed-ups of up to 7% in 3- and 4-wide cores, with 70–90% collapse effectiveness compared to theoretical predictions.
Although performance gains grow with core width, integration requires data-path widening from issue to commit, accommodating one extra micro-operation. These modifications lengthen several timing-critical back-end paths. Overall, ICALU provides a 3–7% performance uplift on integer workloads when sufficient timing slack exists, representing a promising but design-dependent trade-off for modern RISC-V processors. ...
The Interlock Collapsing ALU (ICALU), introduced by IBM in the 1990s, aimed to mitigate execution interlocks, which occur when an instruction cannot be executed because it depends on the result of a previous one, causing pipeline stalls. ICALU collapsed dependent instruction pairs into a single three-operand operation executed in parallel through a 3-1 ALU. Its design extended the traditional ALU delay by only one Carry-Save-Adder stage, preserving overall cycle time.
This project revisits ICALU in the context of modern out-of-order RISC-V processors and evaluates its practicality and impact on performance. A feasibility study first reassessed IBM’s claims and mapped the subset of RISC-V instructions suitable for collapsing. Trace analyses of CoreMark, Embench, and SPEC CPU2017 integer benchmarks revealed that collapsible dependencies occur approximately once every eleven instructions (7.7%), with half being adjacent and more than 92% within a distance of three instructions.
ICALU was implemented in the open-source BOOM core through a dedicated collapse detection logic and a three-input ALU while minimizing additional path delay. Cycle-accurate RTL simulations showed negligible benefits in 2-wide configurations (around 1%) but speed-ups of up to 7% in 3- and 4-wide cores, with 70–90% collapse effectiveness compared to theoretical predictions.
Although performance gains grow with core width, integration requires data-path widening from issue to commit, accommodating one extra micro-operation. These modifications lengthen several timing-critical back-end paths. Overall, ICALU provides a 3–7% performance uplift on integer workloads when sufficient timing slack exists, representing a promising but design-dependent trade-off for modern RISC-V processors.
This project revisits ICALU in the context of modern out-of-order RISC-V processors and evaluates its practicality and impact on performance. A feasibility study first reassessed IBM’s claims and mapped the subset of RISC-V instructions suitable for collapsing. Trace analyses of CoreMark, Embench, and SPEC CPU2017 integer benchmarks revealed that collapsible dependencies occur approximately once every eleven instructions (7.7%), with half being adjacent and more than 92% within a distance of three instructions.
ICALU was implemented in the open-source BOOM core through a dedicated collapse detection logic and a three-input ALU while minimizing additional path delay. Cycle-accurate RTL simulations showed negligible benefits in 2-wide configurations (around 1%) but speed-ups of up to 7% in 3- and 4-wide cores, with 70–90% collapse effectiveness compared to theoretical predictions.
Although performance gains grow with core width, integration requires data-path widening from issue to commit, accommodating one extra micro-operation. These modifications lengthen several timing-critical back-end paths. Overall, ICALU provides a 3–7% performance uplift on integer workloads when sufficient timing slack exists, representing a promising but design-dependent trade-off for modern RISC-V processors.