Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Most scalable web applications use separate systems for handling web requests and for storing data. The request handling system routes each request to one of many machines, each of which handles the request without knowledge of other requests going to other machines. Each request handler behaves as if it is stateless, acting solely on the content of the request to produce the response. But most web applications need to maintain state, whether it’s remembering that a customer ordered a product, or just remembering that the user who made the current request is the same user who made an earlier request handled by another machine. For this, request handlers must interact with a central database to fetch and update the latest information about the state of the application.
Just as the request handling system distributes web requests across many machines for scaling and robustness, so does the database. But unlike the request handlers, databases are by definition stateful, and this poses a variety of questions. Which machine remembers which piece of data? How does the system route a data query to the machine—or machines—that can answer the query? When a client updates data, how long does it take for all machines that know that data to get the latest version, and what does the system return for queries about that data in the meantime? What happens when two clients try to update the same data at the same time? What happens when a machine goes down?
As with request handling, Google App Engine manages the scaling and maintenance of data storage automatically. Your application interacts with an abstract model that hides the details of managing and growing a pool of data servers. This model and the service behind it provide answers to the questions of scalable data storage specifically designed for web applications.
App Engine’s abstraction for data is easy to understand, but it is not obvious how to best take advantage of its features. In particular, it is surprisingly different from the kind of database with which most of us are most familiar, the relational database. It’s different enough, in fact, that Google doesn’t call it a “database,” but a “datastore.”
We will dedicate the next several chapters to this important subject.