Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Part 1: Foundations of OpenCL programming > General coding principles - Pg. 221

General coding principles This chapter covers I I I Determining values for global size and local size Implementing the reduction algorithm in OpenCL Synchronizing work-items in different work-groups In the preceding chapters, the example host applications have executed kernels using a single work-item. This is fine when you're learning Open CL or testing a new application, but for production code, this is unacceptable. Open CL 's great strength is that you can execute kernels using millions or even billions of work-items, and if you're not going to put them to use, you might as well program in regular C. Making use of all this processing power isn't easy. You need a clear understand- ing of how work-items and work-groups access memory, and how synchronization can be used to coordinate their operation. To reach this understanding, it helps to look at a fully optimized example application. Most of this chapter will be con- cerned with the process of reduction, or adding together elements of an array. Spe- cifically, we're going to compute the sum of 2 20 floating-point values using 2 20 work- items. We'll spend some time examining the reduction algorithm, but remember that it's the method that's important. This example will illuminate the issues that arise when processing large amounts of data, such as memory bandwidth, memory 221