Multicore systems have become an indispensable part of our everyday life. They represent a viable alternative for increasing processor performance without hitting the memory and power walls. However, the shift from traditional programming to multicore programming has a critical influence in three dimensions: the applications, the software tools, and the hardware platforms. In this thesis, we focus on the application dimension, and we investigate the performance potential of a 3D bodyscan SAR processing algorithm running on several multi-core processors. To allow for the processor variety, the solutions are based on OpenCL, a language that enables inter-platform portability for a large set of multicores. To allow for performance, we choose to mainly evaluate and optimize for NVIDIA GPUs, but also for Intel GPPs and ATI GPU. Our solutions design and implementation follow a step-by-step strategy. First, we analyse the application to determine its functionality, data characteristics, and performance requirements. Next, we design and implement a sequential reference solution, and we evaluate its performance. We use this particular solution to design a set of possible parallel solutions, which are also evaluated in terms of performance and platform utilization, using a generic prototyping platform. Further, we select the most promising parallel solution for a first portable OpenCL implementation. All platform-agnostic basic optimizations, are applied now, towards a new application, with increased performance and no decrease in portability. After these basic optimizations, we show how several platform-specific optimizations can be applied. Basically, in this step, we trade portability for performance. For this work, we specifically target NVIDIA GPUs, and we show the potential performance impact of data and memory-related optimizations on the case-study application. Finally, we gather all these steps and findings into a generic, empirical strategy that enables programmers to reason about OpenCL applications in a systematic manner, letting them decide the level of the trade-off between the application portability and performance. Our main conclusions are twofold. First, we conclude that OpenCL is a promising standard (and language) for enforcing the implementation of portable multicore applications. Second, we conclude that the bodyscan application is a good fit for running on multicore platforms, and we recommend the GPUs as the target platform for such an application. For future work, we propose to advance in two directions. First, as generic research, we propose to focus on the strategy validation and refining, as well as on a more abstract way to derive the parallel OpenCL versions. Second, on the application/technical side, we plan to focus on finding and implementing hardware-dependent optimizations of the OpenCL solution on various hardware platforms.