- document
-
Li, H. (author), Mentens, Nele (author), Picek, S. (author)SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1 600] permutation, which operates on an internal state of 1 600 bits, mostly represented as a 5 x 5 x 64-bit matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1 600]...conference paper 2023
- document
-
Kühn, Martin J. (author), Holke, Johannes (author), Lutz, Annette (author), Thies, J. (author), Röhrig-Zöllner, Melven (author), Bleh, Alexander (author), Backhaus, Jan (author), Basermann, Achim (author)Developments in numerical simulation of flows and high-performance computing influence one another. More detailed simulation methods create a permanent need for more computational power, while new hardware developments often require changes to the software to exploit new hardware features. This dependency is very pronounced in the case of...journal article 2023
- document
-
Li, H. (author), Mentens, Nele (author), Picek, S. (author)This paper uses RISC-V vector extensions to speed up lattice-based operations in architectures based on HW/SW co-design. We analyze the structure of the number-theoretic transform (NTT), inverse NTT (INTT), and coefficient-wise multiplication (CWM) in CRYSTALS-Kyber, a lattice-based key encapsulation mechanism. We propose 12 vector extensions...conference paper 2022
- document
-
Koene, Davy (author)With the increase in the amount of data being gathered, the need for data processing is also rising. Furthermore, in addition to the proprietary ISAs that have been prevalent, the free and open RISC-V ISA has seen major interest. The modularity of the RISC-V ISA allows it to be extended with many instruction set extensions. One such extension...master thesis 2021
- document
-
Bulavintsev, V. (author), Zhdanov, Dmitry D. (author)We propose a generalized method for adapting and optimizing algorithms for efficient execution on modern graphics processing units (GPU). The method consists of several steps. First, build a control flow graph (CFG) of the algorithm. Next, transform the CFG into a tree of loops and merge non-parallelizable loops into parallelizable ones....conference paper 2021
- document
-
Machidon, Alina L. (author), Machidon, Octavian M. (author), Ciobanu, C.B. (author), Ogrutan, Petre L. (author)Remote sensing data has known an explosive growth in the past decade. This has led to the need for efficient dimensionality reduction techniques, mathematical procedures that transform the high-dimensional data into a meaningful, reduced representation. Projection Pursuit (PP) based algorithms were shown to be efficient solutions for...journal article 2020
- document
-
Westen, H.P. (author)The Single Instruction Multiple Data (SIMD) paradigm promises speedup at relatively low silicon area cost for software that exposes a large amount of loop level parallelism. Automatic simdization–the act of exploiting loop level parallelism by issueing SIMD instructions that operate on multiple data elements at once– remains a daunting task for...master thesis 2012
- document
-
Okmen, Y. (author)In the last decade, the importance of graphics capabilities have become very important in the mobile market. As a result low power embedded solutions for mobile devices have been eveloped to run computationally intensive graphics applications, which extensively uses floating point calculations. The work proposed in this thesis target the...master thesis 2011
- document
-
Pereira de Azevedo Filho, A. (author)In this dissertation we present methodologies and evaluations aiming at increasing the efficiency of video coding applications for heterogeneous many-core processors composed of SIMD-only, scratchpad memory based cores. Our contributions are spread in three different fronts: thread-level parallelism strategies for many-cores, identification of...doctoral thesis 2011
- document
-
De Smalen, S. (author)Vectorizing code for short vector architectures as employed by today’s multimedia extensions comes with a number of issues. The responsibilities of these issues are moved to the compiler in order to keep hardware simple. One of those issues is memory-alignment, which requires the compiler to guarantee loading and storing vectors at aligned...master thesis 2009
- document
-
Shahbahrami, A. (author)In this dissertation, a novel SIMD extension called Modified MMX (MMMX) for multimedia computing is presented. Specifically, the MMX architecture is enhanced with the extended subwords and the matrix register file techniques. The extended subwords technique uses SIMD registers that are wider than the packed format used to store the data. The...doctoral thesis 2008
- document
-
Shahbahrami, A. (author), Juurlink, B. (author), Vassiliadis, S. (author)The 2-D Discrete Wavelet Transform (DWT) consumes up to 68% of the JPEG2000 encoding time. In this paper, we develop efficient implementations of this important kernel on general-purpose processors (GPPs), in particular the Pentium 4 (P4). Efficient implementations of the 2-D DWT on the P4 must address three issues. First, the P4 suffers from a...journal article 2008