Sets and Dictionaries
Sets
A set is an unordered collection of (unique) items. Examples:

Operations
- s.add(e)
- add element e to the set s
- s.remove(e)
- remove element e from the set s
- s.contains(e)
- returns whether e is a member of s
- s.size()
- returns the number of elements in s
- s.isEmpty()
- returns whether there are no elements in s
- s.addAll(t)
- adds all the elements of set t to set s
- s.removeAll(t)
- removes all the elements of t from s
- s.retainAll(t)
- removes all the elements from s that are not in t
- s.union(t)
- returns a new set which contains all the elements of
set s and all the elements of set t, and no others
- s.intersect(t)
- returns a new set which contains all and only those
elements in both s and t
- s.minus(t)
- returns a new set which contains all and only those
elements in s but not in t
- s.symmetricMinus(t)
- returns a new set which contains all and only those
elements in either set s or set t but not both
Dictionaries
A dictionary (also called a map, associative array, hash,
or lookup table) is a collection of key-value pairs where every
key is unique. We don't care as much about the pairs as we do
about looking up the value associated with a given key.
Example:
| www.nato.int | 152.152.96.1 |
| cr.yp.to | 131.193.178.175 |
| losangeles.citysearch.com | 209.104.35.15 |
| www.cl.cam.ac.uk | 128.232.0.20 |
| mit.edu | 18.7.22.69 |
| www.altan.ie | 212.108.64.74 |
| securitysearch.net | 64.94.136.5 |
| linuxassembly.org | 64.202.189.170 |
| regular-expressions.info | 66.39.67.31 |
| jpl.nasa.gov | 137.78.160.180 |
| groups.google.com | 216.239.57.147 |
| script.aculo.us | 86.59.5.67 |
Say: www.nato.int maps to 152.152.96.1, or
www.nato.int is bound to 152.152.96.1.
Note that a set is just a kind of dictionary in which the
value part is ignored.
Operations
- m.put(k,v)
- adds new key value pair to map m
- m.get(k)
- returns the value associated with key k in map m
- m.containsKey(k)
- returns whether there's a key-value pair in map m with key k
- m.containsValue(v)
- returns whether there's a key-value pair in map m with value v
- m.putAll(otherMap)
- adds all the key-value pairs of otherMap to m
- m.keySet()
- returns the set of keys of m
- m.values()
- returns the values of m in some collection (not a set, since they
don't have to be unique)
- m.entrySet()
- returns the set of key-value pairs in m
- m.remove(k)
- removes the key-value pair with key k from map m
- m.clear()
- removes all the key-value pairs from m
- m.size()
- returns the number of key-value pairs in m
- m.isEmpty()
- returns whether there are zero pairs in map m
Representations of Sets and Dictionaries
There are dozens of ways to implement these.
- Association Lists
- Unsorted association lists
- Sorted association lists
- Search Trees
- General Search Trees
- (a,b) Trees
- B Trees and B+ Trees
- Binary Search Trees
- Unbalanced Binary Search Trees
- Self-Balancing Binary Search Trees
- AVL Trees
- Scapegoat Trees
- Red-Black Trees
- AA Trees
- Splay Trees
- Skip Lists
- Tries and Radix (Patricia) Trees
- Hash Tables
- Bitsets
- Judy Arrays
- Bloom Filters
No one representation is best for all situations. You need to
take into account:
- Will the collection be loaded once and only read and
never written?
- Will insertions and deletions be frequent or uncommon,
compared to lookups?
- Will the collection contain only a very small number
of elements or can it be huge?
- Are there any restrictions on the keys? For example are
keys only strings? Or only integers?
- Will the keys come from an type that can be ordered?
- Is the collection subject to algorithmic complexity
attacks?
But remember
If your keys are integers in the range 0..N,
use a PLAIN OLD ARRAY
As you go over each data structure, make a little fact sheet
such as the following:
| Unsorted Associative Array or List |
Basic idea: Key-value pairs are stored in a simple
unsorted array or linked list
Insertion: Θ(n) if you check for duplicate keys, Θ(1) otherwise
Deletion: Θ(n) since you have to find the item first
Lookup: Θ(n) since the whole list must be scanned
Key type restrictions: None at all!
Good for: Small collections only |
| Sorted Associative Array |
Basic idea: Key-value pairs are stored in an array
sorted by key
Insertion: Θ(n) in general due to sifting; Θ(1) if adding at the end
Deletion: Θ(n) in general due to sifting; Θ(1) if from the end
Lookup: Θ(log n) using binary search
Key type restrictions: Must be ordered
Good for: Memory-constrained environments; read-only collections
or those filled only in ascending order |