Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Chapter 13. Scaling and parallel process... > Remote chunking (multiple machines) - Pg. 387

Remote chunking (multiple machines) 387 to products for books and mobile phones. For this task, you define a split element with the identifier readWrite B . This split defines two flows with a single step for each flow and for each product type C . Once these two steps end, you call the step moveProcessedFiles . As mentioned previously, using parallel steps implies multithreading. By default, par- allel step execution uses a SyncTaskExecutor , but you can specify your own using the task-executor attribute on the split element, as described in the following listing. Listing 13.9 Configuring a task executor <batch:job id="importProductsJob"> (...) <batch:split id="readWrite" task-executor="taskExecutor" next="moveHandledFiles"> (...) </batch:split> </batch:job> <bean id="taskExecutor" (...)/> Sets task executor Our first two scaling techniques use multithreading to parallelize processing of chunks and steps where all processing executes on the same machine. For this reason, performance correlates to a machine's capabilities. In the next section, we use tech- niques to process jobs remotely, providing a higher level of scalability. Let's start with the remote chunking pattern, which executes chunks on several slave computers. 13.4 Remote chunking (multiple machines) The previously described techniques aim to integrate concurrent and parallel process- ing in batch processing. This improves performance, but it may not be sufficient. A single machine will eventually hit a performance limit. Therefore, if performance still isn't suitable, you can consider using multiple machines to handle processing. In this section, we describe remote chunking, our first Spring Batch scaling tech- nique for batch processing on multiple machines. 13.4.1 What is remote chunking? Remote chunking separates data reading and processing between a master and multi- ple slave machines. The master machine reads and dispatches data to slave machines. The master machine reads data in a step and delegates chunk processing to slave machines through a remote communication mechanism like JMS . Figure 13.9 provides an overview of remote chunking, the actors involved, and where processing takes place. Because the master is responsible for reading data, remote chunking is relevant only if reading isn't a bottleneck. As you can see in figure 13.9, Spring Batch implements remote chunking through two core interfaces respectively implemented on the master and slave machines: