Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
In 2004, Ian Hickson, the editor of the HTML5 spec, mined one billion web pages via the Google index, looking to see what the “real” Web is made of. One of the analyses he subsequently published (http://code.google.com/webstats/2005-12/classes.html) was a list of the most popular class names in those HTML documents.
More recently, in 2009, the Opera MAMA crawler looked again at class attributes in 2,148,723 randomly chosen URLs and also ids given to elements (which the Google dataset didn’t include) in 1,806,424 URLs. See Table 1.1 and Table 1.2.
As you can see, once we remove obviously presentational classes, we’re left with a good idea of the structures that authors are trying to use on their pages.