Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Streaming analysis should typically come before, or in parallel with, a database or Hadoop infrastructure. Streaming analysis should be used in conjunction with your database or Hadoop infrastructure to obtain models and reference data. Streaming analysis should keep as much information as possible in memory. Memory is getting big enough to keep large collections of data active across a distributed cluster. Streams provides windowing functions to help manage data in memory. Do not use Streams for transactional workload. You can make a Streams application highly reliable, but if you need 100% transactional guarantees, you should use a database. When a database is too slow, you can use Streams, but have to compromise by giving up some level of guarantees or by providing more application logic. Do not use Streams when there are multipass algorithms that are highly complex or need to process more data than will fit in memory; use a database or Hadoop instead. Use Streams when there are a variety of data types to analyze, such as voice, video, network packets, audio, and waveforms.