Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
In this section, we’ll jump into Tika’s code-level support for managing instances of metadata—the actual information captured in metadata, informed by the metadata models. Specifically we’ll explore Tika’s org.apache.tika.metadata package and its Metadata and Property classes, and their relationships. These classes will become your friend: transforming metadata and making it viewable by your end users is going to be something that you’ll have to get used to. Never fear! Tika’s here to help.
We’ve talked a lot so far about metadata models, but we’ve done little to show what instances of those models look like. Metadata instances are actual metadata attributes, prescribed by a model, along with their values that are captured for files. In other words, instances are the actual metadata captured for each file that you run through Tika. Let’s get ourselves some metadata to work with in the following listing. When given a URL, the program will obtain the metadata corresponding to the content available from that URL.