Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Pdftk can extract a document’s
metadata (author, title etc.) to a text file, either in ASCII format (with
non-ASCII characters encoded as XML-style numerical entities) or as
Unicode UTF8. This is achieved with the dump_data or dump_data_utf8 keywords. For example:
pdftk input.pdf dump_data output data.txt
writes the data in Example 9-1 to data.txt.
Example 9-1. Example output of pdftk dump_data operation (ellipses indicate where we have truncated the output for brevity)
InfoKey: Creator InfoValue: XSL Formatter V4.3 R1 (4,3,2008,0424) for Linux InfoKey: Title InfoValue: PDF Explained InfoKey: Producer InfoValue: Antenna House PDF Output Library 2.6.0 (Linux) InfoKey: ModDate InfoValue: D:20110713115225-05'00' InfoKey: CreationDate InfoValue: D:20110713115225-05'00' PdfID0: 57f4673abea4ca58a27e19bf1871dfa PdfID1: 57f4673abea4ca58a27e19bf1871dfa NumberOfPages: 90 ... BookmarkTitle: Table of Contents BookmarkLevel: 1 BookmarkPageNumber: 5 BookmarkTitle: Preface BookmarkLevel: 1 BookmarkPageNumber: 9 BookmarkTitle: Why Read This Book? BookmarkLevel: 2 BookmarkPageNumber: 9 BookmarkTitle: Audience BookmarkLevel: 2 BookmarkPageNumber: 9 ... PageLabelNewIndex: 1 PageLabelStart: 1 PageLabelNumStyle: DecimalArabic....