Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

II. Projects > 9. Image-Capturing Webbots

Chapter 9. Image-Capturing Webbots

In this chapter, I’ll describe a webbot that identifies and downloads all of the images on a web page. This webbot also stores images in a directory structure similar to the directory structure on the target website. This project will show how a seemingly simple webbot can be made more complex by addressing these common problems:

  • Finding the page base, or the address that defines the address from which all relative addresses are referenced

  • Dealing with changes to the page base, caused by page redirection

  • Converting relative addresses into fully resolved URLs

  • Replicating complex directory structures

  • Properly downloading image files with binary formats

In Chapter 17, you’ll expand on these concepts to develop a spider that downloads images from an entire website, not just one page.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint