Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Chapter 10. Tika and the Lucene search stack > The finishing touches - Pg. 162

162 C HAPTER 10 Tika and the Lucene search stack Now that we've covered the steel frame of the Lucene search ecosystem, it's time to talk about some of the advanced applications that sit on top of the frame. You proba- bly won't be surprised at this, but Tika is used a lot in each of the applications and soft- ware systems we're about to discuss. 10.3 The finishing touches With a strong foundation and core, it's no wonder that higher-level applications and frameworks have blossomed in the Lucene ecosystem. The oldest of these frameworks was the original home to Apache Hadoop--the Apache Nutch project. Nutch's goal is to leverage Lucene, Solr, and various content-loading and extraction technologies to provide web-scale (tens of billions of web pages) search, in an efficient and effective matter. Apache Droids is an Incubator podling whose focus is developing a lightweight extensible crawler that can integrate into projects such as Nutch, Lucene, and Solr, without all the complex features and functions that those technologies provide. Finally, though we discussed Mahout earlier (in section 3.3), we'll revisit it in the con- text of the Lucene ecosystem discussion, and discuss the applications that sit on top of the core and load-bearing walls of Lucene. The best thing about our upcoming foray into these technologies? They all lever- age Tika!