Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Now that you know the composition of the various types of tokens, let's see how to use HTML::TokeParser to write useful programs. Many problems are quite simple and require only one token at a time. Programs to solve these problems consist of a loop over all the tokens, with an if statement in the body of the loop identifying the interesting parts of the HTML:
use HTML::TokeParser;
my $stream = HTML::TokeParser->new($filename)
|| die "Couldn't read HTML file $filename: $!";
# For a string: HTML::TokeParser->new( \$string_of_html );
while (my $token = $stream->get_token) {
if ($token->[0] eq 'T') { # text
# process the text in $text->[1]
} elsif ($token->[0] eq 'S') { # start-tag
my($tagname, $attr) = @$token[1,2];
# consider this start-tag...
} elsif ($token->[0] eq 'E') {
my $tagname = $token->[1];
# consider this end-tag
}
# ignoring comments, declarations, and PIs
}