Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

Chapter 4. The WHERE Clause > Correlated Subqueries

4.6. Correlated Subqueries

Since this chapter is all about the WHERE clause, this is the appropriate context in which to discuss the concept of correlation. In this context, a subquery correlates (co-relates) to its parent query if the subquery refers to—and is therefore dependent on—the parent to be valid.

A correlated subquery can’t be run by itself, because it makes reference—via a correlation variable— to the outer or main query. To demonstrate, let’s work through an example based on the entries table in the Content Management System application that we saw in Chapter 3. This is shown in Figure 4-5.

Figure 4-5. The CMS entries table


The example will use this table in the outer query, and have a correlated subquery that obtains the latest entry in each category based on the created date:

listing 4-7. CMS_13_Correlated_subquery.sql (excerpt)

SELECT
  category
, title
, created
FROM
  entries AS t
WHERE
  created = (
    SELECT
      MAX(created)
    FROM
      entries
    WHERE
      category = t.category
  )

Let’s start looking at this by reviewing the subquery first. There are two features to note here:

  • The subquery has a WHERE condition of category = t.category. The “t” is the correlation variable, and it’s defined in the outer or main query as a table alias for the entries table.

  • You’ll also notice the MAX keyword in the subquery’s SELECT clause. We haven’t covered aggregate functions yet, of which MAX is one, although we did see another one, COUNT, in Chapter 2. In this case, MAX simply returns the highest value in the named column—the latest created date.


Note:

AS is a versatile keyword. It allows you to create an alias for almost any database object you can reference in a SELECT statement. In the example above, it creates an alias for a table. It can also alias a column, a view, and a subquery.


In essence, what this query does can be paraphrased as: “return the category, title, and created date of all entries, but only if the created date for the entry being returned is the latest created date for all the entries in that particular category.” Or, in brief, return the most recent entry in each category. The correlation ensures that the particular category is taken into consideration to determine the latest date, which is then used to compare to the date on each entry, as shown in Figure 4-6.

Figure 4-6. How correlation works


In this example, a comparison is made between each entry’s created value, and the maximum created value of all rows in that category, as produced by the subquery. If that entry contains the same date for its category as found by the subquery, it’s returned in the result set. If it’s not the same date, it’s discarded.

Because this is a very simple example, only one category actually has more than one entry: humor. The subquery determines that “Hello Statue” has the most recently created date, and thus discards "Uncle Karl and the Gasoline."

If Figure 4-6 reminds you of Figure 3-11 (which demonstrated how an inner join worked), remember that the distinguishing characteristic of a correlated subquery is that it’s tied to an object in the outer or main query, and can’t be run on its own. Joins, on the other hand, are part of the main query.

Aside from that, the inner join and the correlated subquery are quite similar. In the join, the rows of the categories and entries tables were joined, based on the comparison of their category columns in the join condition. In the correlated subquery, the rows of the entries table are compared to the rows of the tabular result set produced by the correlated subquery, and if this somehow reminds you of a join, full marks. In fact, correlated subqueries can usually be rewritten as joins.

Here’s the equivalent query written using a join instead of a correlated subquery:

listing 4-8. CMS_13_Correlated_subquery.sql (excerpt)

SELECT
  t.category
, t.title
, t.created
FROM
  entries AS t
    INNER JOIN (
      SELECT
        category
      , MAX(created) AS maxdate
      FROM
        entries 
      GROUP BY
        category
    ) AS m
      ON m.category = t.category AND m.maxdate = t.created

The join version employs a subquery as a derived table, containing a GROUP BY clause. We’ll cover the GROUP BY clause in detail in Chapter 5, but for now, please just note that the purpose of the GROUP BY here is to produce one row per category. So the subquery produces a tabular result set consisting of one row per category, and each row will have that category’s latest date, which is given the column alias maxdate. Then the derived table, called m, is joined to the entries table, which uses the table alias t. Notice that there are two join conditions. You can see both of these conditions in the correlated subquery version, too—one inside the subquery (the category correlation), and the other in the WHERE clause (where maxdate in the subquery should equal the created date in the outer query).

  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint