It is possible to reduce the computation time of data parallel programs by dividing the computation work among different threads and executing them on multiple compute cores. Some data parallel programs are functionally correct only when they are solved with a specific number of
...
It is possible to reduce the computation time of data parallel programs by dividing the computation work among different threads and executing them on multiple compute cores. Some data parallel programs are functionally correct only when they are solved with a specific number of threads. For these programs, the question whether or not parallelization reduces computation time, can be answered with a simple “yes” or “no”. Other data parallel programs can be solved with any number of threads. This study focusses on these other data parallel programs which can use any number of threads and are still able to function correctly. With those data parallel programs, there is not a single answer to the question whether or not parallelization reduces computation time, it depends on the number of threads used.
There are many things that impact the computation time when data parallel programs are executed. For example: overhead, the cache size, and/or the data size. Therefore, the relation between the number of threads and the computation time of a data parallel program is not linear. A typical time-thread curve has a first part where it decreases towards a certain minimum time, and a second part where the curve increases. In the first part, there is sufficient data to make efficient use of the extra threads. In the second part of the curve, threads can no longer be used efficiently, therefore there is no further decrease in execution time possible. Since, the overhead still increases, the curve increases as well. In order to make the computation time of data parallel programs as short as possible, it is essential to determine the optimal number of threads.
There is another reason not to use just the maximum number of threads. To execute multiple threads in parallel, extra hardware is needed. Therefore, the energy consumption increases, when more threads are used. By using less than the maximum number of threads energy can be saved by deactivate threads that are not used for computations.
We describe a method for automated determination of the optimal number of threads. We have developed a technique that we call the smart decision tool. With the smart decision tool the number of threads needed for maximum speedup can be determined for every parallel computation task in a program.
The program language Single assignment C (SaC) was used as a host for our research. SaC is a functional programming language with a syntax that is similar to C. The programming language has a special syntactical construction (array comprehension) with which data parallel operations can be described. The SaC compiler can turn these array comprehensions into program parts that can be executed on any number of threads. With the add-in of the smart decision tool the number of threads that are actually involved in a parallel computation can be optimized.
Single assignment C puts threads that are not used for computations in a spinlock. These threads in spinlock consume energy. No energy is saved when a part of the treads is hold in spinlock. We have developed barriers that deactivate threads that are not used for computations. In this way energy can be saved.