Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
| Listing 2.1 | HelloWorld OpenCL Kernel and Main Function | 46 |
| Listing 2.2 | Choosing a Platform and Creating a Context | 49 |
| Listing 2.3 | Choosing the First Available Device and Creating a Command-Queue | 51 |
| Listing 2.4 | Loading a Kernel Source File from Disk and Creating and Building a Program Object | 53 |
| Listing 2.5 | Creating a Kernel | 54 |
| Listing 2.6 | Creating Memory Objects | 55 |
| Listing 2.7 | Setting the Kernel Arguments, Executing the Kernel, and Reading Back the Results | 56 |
| Listing 3.1 | Enumerating the List of Platforms | 66 |
| Listing 3.2 | Querying and Displaying Platform-Specific Information | 67 |
| Listing 3.3 | Example of Querying and Displaying Platform-Specific Information | 79 |
| Listing 3.4 | Using Platform, Devices, and Contexts—Simple Convolution Kernel | 90 |
| Listing 3.5 | Example of Using Platform, Devices, and Contexts—Simple Convolution | 91 |
| Listing 6.1 | Creating and Building a Program Object | 221 |
| Listing 6.2 | Caching the Program Binary on First Run | 229 |
| Listing 6.3 | Querying for and Storing the Program Binary | 230 |
| Listing 6.4 | Example Program Binary for HelloWorld.cl (NVIDIA) | 233 |
| Listing 6.5 | Creating a Program from Binary | 235 |
| Listing 7.1 | Creating, Writing, and Reading Buffers and Sub-Buffers Example Kernel Code | 262 |
| Listing 7.2 | Creating, Writing, and Reading Buffers and Sub-Buffers Example Host Code | 262 |
| Listing 8.1 | Creating a 2D Image Object from a File | 284 |
| Listing 8.2 | Creating a 2D Image Object for Output | 285 |
| Listing 8.3 | Query for Device Image Support | 291 |
| Listing 8.4 | Creating a Sampler Object | 293 |
| Listing 8.5 | Gaussian Filter Kernel | 295 |
| Listing 8.6 | Queue Gaussian Kernel for Execution | 297 |
| Listing 8.7 | Read Image Back to Host Memory | 300 |
| Listing 8.8 | Mapping Image Results to a Host Memory Pointer | 307 |
| Listing 12.1 | Vector Add Example Program Using the C++ Wrapper API | 379 |
| Listing 13.1 | Querying Platform and Device Profiles | 384 |
| Listing 14.1 | Sequential Implementation of RGB Histogram | 393 |
| Listing 14.2 | A Parallel Version of the RGB Histogram—Compute Partial Histograms | 395 |
| Listing 14.3 | A Parallel Version of the RGB Histogram—Sum Partial Histograms | 397 |
| Listing 14.4 | Host Code of CL API Calls to Enqueue Histogram Kernels | 398 |
| Listing 14.5 | A Parallel Version of the RGB Histogram—Optimized Version | 400 |
| Listing 14.6 | A Parallel Version of the RGB Histogram for Half-Float and Float Channels | 403 |
| Listing 15.1 | An OpenCL Sobel Filter | 408 |
| Listing 15.2 | An OpenCL Sobel Filter Producing a Grayscale Image | 410 |
| Listing 16.1 | Data Structure and Interface for Dijkstra’s Algorithm | 413 |
| Listing 16.2 | Pseudo Code for High-Level Loop That Executes Dijkstra’s Algorithm | 414 |
| Listing 16.3 | Kernel to Initialize Buffers before Each Run of Dijkstra’s Algorithm | 415 |
| Listing 16.4 | Two Kernel Phases That Compute Dijkstra’s Algorithm | 416 |
| Listing 20.1 | ImageFilter2D.py | 489 |
| Listing 20.2 | Creating a Context | 492 |
| Listing 20.3 | Loading an Image | 494 |
| Listing 20.4 | Creating and Building a Program | 495 |
| Listing 20.5 | Executing the Kernel | 496 |
| Listing 20.6 | Reading the Image into a Numpy Array | 496 |
| Listing 21.1 | A C Function Implementing Sequential Matrix Multiplication | 500 |
| Listing 21.2 | A kernel to compute the matrix product of A and B summing the result into a third matrix, C. Each work-item is responsible for a single element of the C matrix. The matrices are stored in global memory. | 501 |
| Listing 21.3 | The Host Program for the Matrix Multiplication Program | 503 |
| Listing 21.4 | Each work-item updates a full row of C. The kernel code is shown as well as changes to the host code from the base host program in Listing 21.3. The only change required in the host code was to the dimensions of the NDRange. | 507 |
| Listing 21.5 | Each work-item manages the update to a full row of C, but before doing so the relevant row of the A matrix is copied into private memory from global memory. | 508 |
| Listing 21.6 | Each work-item manages the update to a full row of C. Private memory is used for the row of A and local memory (Bwrk) is used by all work-items in a work-group to hold a column of B. The host code is the same as before other than the addition of a new argument for the B-column local memory. | 510 |
| Listing 21.7 | Different Versions of the Matrix Multiplication Functions Showing the Permutations of the Loop Orderings | 513 |
| Listing 22.1 | Sparse Matrix-Vector Multiplication OpenCL Kernels | 530 |