Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Chapter 3. Associative Containers > Item 25: Familiarize yourself with the nons...

Item 25: Familiarize yourself with the nonstandard hashed containers

It generally doesn’t take much time for STL programmers to begin to wonder, “Vectors, lists, maps, sure, but where are the hash tables?” Alas, there aren’t any hash tables in the standard C++ library. Everyone agrees that this is unfortunate, but the Standardization Committee felt that the work needed to add them would have unduly delayed completion of the Standard. It’s a foregone conclusion that the next version of the Standard will include hash tables, but for the time being, the STL doesn’t do hashing.

If you like hash tables, however, take heart. You need not do without or roll your own. STL-compatible hashed associative containers are available from multiple sources, and they even have de facto standard names: hash_set, hash_multiset, hash_map, and hash_multimap.

Behind these common names, different implementations, er, differ. They differ in their interfaces, their capabilities, their underlying data structures, and the relative efficiency of the operations they support. It’s still possible to write reasonably portable code using hash tables, but it’s not as easy as it would be had the hashed containers been standardized. (Now you know why standards are important.)

Of the several available implementations for hashed containers, the two most common are from SGI (see Item 50) and Dinkumware (see Appendix B), so in what follows, I restrict myself to the designs of the hashed containers from these vendors. STLport (again, see Item 50) also offers hashed containers, but the STLport hashed containers are based on those from SGI. For purposes of this Item, assume that whatever I write about the SGI hashed containers applies to the STLport hashed containers, too.

Hashed containers are associative containers, so it should not surprise you that, like all associative containers, they need to know the type of objects stored in the container, the comparison function for such objects, and the allocator for these objects. In addition, hashed containers require specification of a hashing function. That suggests the following declaration for hashed containers:

template<typename T,
         typename HashFunction,
         typename CompareFunction,
         typename Allocator = allocator<T> >
class hash_container;

This is quite close to the SGI declaration for hashed containers, the primary difference being that SGI provides default types for HashFunction and CompareFunction. The SGI declaration for hash_set looks essentially like this (I’ve tweaked it a bit for presentation purposes):

template<typename T,
         typename HashFunction = hash<T>,
         typename CompareFunction = equal_to<T>,
         typename Allocator = allocator<T> >
class hash_set;

A noteworthy aspect of the SGI design is the use of equal_to as the default comparison function. This is a departure from the conventions of the standard associative containers, where the default comparison function is less. This design decision signifies more than a simple change in default comparison functions. SGI’s hashed containers determine whether two objects in a hashed container have the same value by testing for equality, not equivalence (see Item 19). For hashed containers, this is not an unreasonable decision, because hashed associative containers, unlike their standard (typically tree-based) counterparts, are not kept in sorted order.

The Dinkumware design for hashed containers adopts some different strategies. It still allows you to specify object types, hash function types, comparison function types, and allocator types, but it moves the default hash and comparison functions into a separate traits-like class called hash_compare, and it makes hash_compare the default argument for the HashingInfo parameter of the container templates. (If you’re unfamiliar with the notion of a “traits” class, open a good STL reference like Josuttis’ The C++ Standard Library [3] and study the motivation and implementation of the char_traits and iterator_traits templates.)

For example, here’s the Dinkumware hash_set declaration (again, tweaked for presentation):

template<typename T, typename CompareFunction>
class hash_compare;

template<typename T,
         typename HashingInfo = hash_compare<T, less<T> >,
         typename Allocator = allocator<T> >
class hash_set;

The interesting part of this interface design is the use of HashingInfo. The container’s hashing and comparison functions are stored there, but the HashingInfo type also holds enums controlling the minimum number of buckets in the table as well as the maximum allowable ratio of container elements to buckets. When this ratio is exceeded, the number of buckets in the table is increased, and some elements in the table are rehashed. (SGI provides member functions that afford similar control over the number of table buckets and, hence, the ratio of table elements to buckets.)

After some tweaks for presentation, hash_compare (the default value for HashingInfo) looks more or less like this:

template<typename T, typename CompareFunction = less<T> >
class hash_compare {
public:

  enum {
    bucket_size = 4,                     // max ratio of elements to buckets
    min_buckets = 8                      // minimum number of buckets

  };
  size_t operator()(const T&) const;     // hash function

  bool operator()(const T&,              // comparison function
                  const T&) const;

  ...                                    // a few things omitted, including
};                                       // the use of CompareFunction

					  

The overloading of operator() (in this case, to implement both the hashing and comparison functions) is a strategy that crops up more frequently than you might imagine. For another application of the same idea, take a look at Item 23.

The Dinkumware design allows you to write your own hash_compare-like class (possibly by deriving from hash_compare itself), and as long as your class provides definitions for bucket_size, min_buckets, two operator() functions (one taking one argument, one taking two), plus a few things I’ve left out, you can use it to control the configuration and behavior of a Dinkumware hash_set or hash_multiset. Configuration control for hash_map and hash_multimap is similar.

Notice that with both the SGI and the Dinkumware designs, you can leave all the decision-making to the implementations and simply write something like this:

hash_set<int> intTable;          // create a hashed set of ints

For this to compile, the hash table must hold an integral type (such as int), because the default hashing functions are generally limited to integral types. (SGI’s default hashing function is slightly more flexible. Item 50 will tell you where to find all the details.)

Under the hood, the SGI and Dinkumware implementations go their very separate ways. SGI employs a conventional open hashing scheme composed of an array (the buckets) of pointers to singly linked lists of elements. Dinkumware also employs open hashing, but its design is based on a novel data structure consisting of an array of iterators (essentially the buckets) into a doubly linked list of elements, where adjacent pairs of iterators identify the extent of the elements in each bucket. (For details, consult Plauger’s aptly titled column, “Hash Tables” [16].)

As a user of these implementations, it’s likely you’ll be interested in the fact that the SGI implementation stores the table elements in singly linked lists, while the Dinkumware implementation uses a doubly linked list. The difference is noteworthy, because it affects the iterator categories for the two implementations. SGI’s hashed containers offer forward iterators, so you forgo the ability to perform reverse iterations; there are no rbegin or rend member functions in SGI’s hashed containers. The iterators for Dinkumware’s hashed containers are bidirectional, so they offer both forward and reverse traversals. In terms of memory usage, the SGI design is a bit more parsimonious than that from Dinkumware.

Which design is best for you and your applications? I can’t possibly know. Only you can determine that, and this Item hasn’t tried to give you enough information to come to a reasonable conclusion. Instead, the goal of this Item is to make sure you know that though the STL itself lacks hashed containers, STL-compatible hashed containers (with varying interfaces, capabilities, and behavioral trade-offs) are not difficult to come by. In the case of the SGI and STLport implementations, you can even come by them for free, because they’re available for free download.