Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

About this Book - Pg. xxi

about this book We wrote Tika in Action to be a hands-on guide for developers working with search engines, content management systems, and other similar applications who want to exploit the information locked in digital documents. The book introduces you to the world of mining text and binary documents and other information sources like inter- net media types and Dublin Core metadata. Then it shows where Tika fits within this landscape and how you can use Tika to build and extend applications. Case studies present real-world experience from domains ranging from search engines to digital asset management and scientific data processing. In addition to the architectural overviews, you will find more detailed information in the later chapters that focus on advanced features like XMP metadata processing, automatic language detection, and custom parser extensions. The book also describes common file formats like MS Word, PDF , HTML , and Zip , and open source libraries used to process files in these formats. The included code examples are designed to support hands-on experimentation. No previous knowledge of Tika or text mining techniques is required. The book will be most valuable to readers with a working knowledge of Java. Roadmap Chapter 1 gives the reader a contextual overview of Tika, including its history, its core capabilities, and some basic use cases where Tika is most helpful. Tika includes abili- ties for file type identification, text extraction, integration of existing parsing libraries, and language identification. xxi