Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Chapter 13. Tips and tricks > Example automation tasks - Pg. 385

Tips and tricks We've examined how to make scripts easy to run and easy to schedule, but we've said little about the kinds of things you might want such a script to do. Our next section gives a few examples to whet your appetite. 13.5. Example automation tasks We couldn't possibly tell you what your automation needs are. However, many tasks have similar flavors. By giving you a few examples, we hope we'll set some sparks going in your imagination. You may have a moment where you spot that a repetitive task that has been getting under your skin could easily be automated in Groovy. If that's the case, feel free to rush straight to your nearest computer before you lose inspiration. We'll wait until you've finished. Still here? Let's roll up our sleeves and get groovy. 13.5.1. Scraping HTML pages The web is not only full of endless information, but it is also full of interesting new and updated information. Regularly visiting your favorite pages for updated content is one of the plates you need to keep spinning. It's easy to delegate this task to a Groovy script. The script needs to 1. Connect to a URL. 2. Read the HTML content. 3. Find the interesting information in the HTML. Finding the information of interest is the tricky part, because HTML source code can be complex. Also, our script should be forgiving in terms of whitespaces, attribute sequences, quoting of attribute values, and so on. In other words, we cannot use regular expressions to cut the information out of the source code. If we could work in XML rather than HTML, we could use an XML parser and GPath or XPath expression to scrape off the interesting parts reliably. By the Way The term scraping stems from olden times when users were faced with a 25x80 character terminal screen. New automation features could be added by reading characters off this screen. This technique was called screen scraping. The good news is that there are free open-source parsers that read HTML and expose the content as SAX events such that Groovy's XML parsers can work with it. The popular NekoHTML parser can be found at http:// people.apache.org/~andyc/neko/doc/index.html. Download it, and copy its jar file to the classpath. As an example, consider analyzing the HTML page of http://java.sun.com as captured in figure 13.2. Let's assume we're interested in the news items, or everything that appears as links in bold type. For the screen shown in figure 13.2, our script should print Developing Web Services Using JAX-WS More Enhancements in Java SE 6 (Mustang) "Get Java" Software Button Now Available Gosling T-Shirt Hurling Contest 385