Hashtables

A hashtable makes a good representation for a set or dictionary. If set up well, you get Θ(1) performance for add, remove, and lookup!

The Basic Idea

You need to define a hash function, h, that maps your keys into integers in the range 0..N-1. Each key k then goes into bucket h(k).

Example

Let

Then if we put in the values {"hashtables", "will", "usually", "execute", "constant", "time"} the slots get computed like this:

h("hashtables") = ('h' * 's') % 11 = 3
h("will") = ('w' * 'l') % 11 = 4
h("usually") = ('u' * 'y') % 11 = 0
h("execute") = ('e' * 'e') % 11 = 4
h("constant") = ('c' * 't') % 11 = 0
h("time") = ('t' * 'e') % 11 = 1

Note that there are collisions, which need to be resolved. There are many collision resolution strategies; one of the most common is just to "chain" the elements in a linked list. If we did that our hashtable would look like this:

hashtableexample.png

Criteria for Good Hashtables

If every element hashed to the same slot, then the hashtable would have linear, not constant performance. So you need to be smart about

Choosing a Good Hash Function

This is hard in general, and is application dependent. It's also beyond the scope of these notes. Start with Wikipedia's Hash table article and then do your own research.

Table Resizing

Hashtables perform pretty well until they get about 70% full (i.e. a load factor of 0.7). Then you'll probably want to make the table bigger and rehash everything. This takes time, but is worth it when you do more lookups then inserts.

Ditto for deletions, in case you want to shrink the hashtable if it has been taking up too much memory...

Collision Resolution

Many strategies are known

Details in Wikipedia's Hash table article.

Hashtables vs. Other Dictionary Implementations

Hashtables are sometimes good because

Hashtables aren't always the best choice because

More

Please read Wikipedia's Hash table article.