Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
If you can't make the verification go any faster, the only remaining option is not to do it, or at least not do so much of it. This approach was suggested by Wolfgang Hoschek. He noticed that in an XML document the same names keep coming up over and over. For instance, in an XHTML document, there are only about 100 different element names, and a few dozen of those account for most elements (p, table, div, span, strong, and so on). Once you've verified that a name is legal, you can store it in a collection somewhere. The next time you see a name, you first check to see whether it's one of the names you've seen before; if it is, you just accept it and don't check it again.
However, you do have to be very careful here. It may take longer to find some names (especially shorter ones) in a collection such as a hash map than it would take to check them all over again. The only way to tell is to benchmark and profile the caching scheme very carefully on several different VMs using different kinds of documents. You may need to tune parameters such as the size of the collection to fit different kinds of documents, and what works well for one document type may not work well for another. Furthermore, if the cache is shared between threads, thread contention can become a serious problem.