Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have...
-
Upload
mitchell-goodwin -
Category
Documents
-
view
219 -
download
0
Transcript of Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have...
![Page 1: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/1.jpg)
Sets, Maps and Hash Tables
![Page 2: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/2.jpg)
RHS – SOC 2
Sets
• We have learned that different data struc-tures have different advantages – and drawbacks
• Choosing the proper data structure depends on typical usage patterns
• Array- and list-oriented data structures are appropriate when the order of elements matter – but that is not always the case
![Page 3: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/3.jpg)
RHS – SOC 3
Sets
• A Set is a data structure which can hold an unordered collection of elements
• Not having to worry about ordering can improve performance of other operations
• On a Set, we want to be able to– Insert an element– Delete an element– Check if a given element is in the Set
![Page 4: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/4.jpg)
RHS – SOC 4
Sets
public interface Set<T>
{
void add(T element);
void remove();
boolean contains(T element);
Iterator<T> iterator();
}
![Page 5: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/5.jpg)
RHS – SOC 5
Sets
• It turns out that insertion, deletion and check for containment can be done in O(log(n)), or even faster!
• Depends on the underlying implemen-tation of the interface
• In Java, implementation is either– HashSet (based on Hash Tables)– TreeSet (based on Trees)
![Page 6: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/6.jpg)
RHS – SOC 6
Sets
• A Set iterator is ”simpler” than e.g. a List iterator– Elements will occur in ”random” order– No add method – we just call add on the Set
itself– No previous method – does not make sense
• The Set iterator does however have a delete method (why?)
![Page 7: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/7.jpg)
RHS – SOC 7
Sets – Quality tip
• When using a Set, we must choose a spe-cific implementation (HashSet or TreeSet)
• However, the definition should look like:
Set<Car> cars = new HashSet<Car>();
![Page 8: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/8.jpg)
RHS – SOC 8
Sets – Quality tip
Set<Car> cars = new HashSet<Car>();
• Why…? We should in general only refer to the interface, not the implementation
• Easy to switch implementation!
![Page 9: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/9.jpg)
RHS – SOC 9
Maps
• A Map is a data structure which stores associations between– A collection of keys– A collection of values
• All keys map to a value
• Keys are unique (values are not)
![Page 10: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/10.jpg)
RHS – SOC 10
Maps
K1
K2
K3
K4
V1
V3
V2
![Page 11: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/11.jpg)
RHS – SOC 11
Map
public interface Map<K,V>
{
void put(K key,V value);
V get(K key);
void remove(K key);
Set<K> keySet();
}
![Page 12: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/12.jpg)
RHS – SOC 12
Map
• The keySet method returns a Set containing all keys in the Map
• You must then iterate through this Set, in order to get all values stored in the Map
![Page 13: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/13.jpg)
RHS – SOC 13
Map
Map<String,Car> carMap = new HashMap<String,Car>();
...
Set<String> regNumbers = carMap.keySet();
for (String regNo : regNumbers)
{
Car aCar = carMap.get(regNo);
... // Do something with the Car object
}
![Page 14: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/14.jpg)
RHS – SOC 14
Exercises
• Review: R16.1, R16.4, R16.6
• Programming: P16.4, P16.12
![Page 15: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/15.jpg)
RHS – SOC 15
Hash Tables
• A Set and a Map are both abstract data types – we need a concrete implemen-tation in order to use them
• In the Java library, two implementations are available:– Sets: HashSet, TreeSet– Maps: HashMap, TreeMap
![Page 16: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/16.jpg)
RHS – SOC 16
Hash Tables
• The implementations HashSet and HashMap are based on a Hash Table
• A Hash Table is based on the below ideas:– Create an array of length N, which can store
objects of some type T– Find a mapping from T to the interval [0; N-1]
(a Hash Function f)– Store an object t of type T in the position f(t)
![Page 17: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/17.jpg)
RHS – SOC 17
Hash Tables
0 1 2 3 4
Car1 Car2
Car3f(Car1) = 3
f(Car2) = 0
f(Car3) = 2
![Page 18: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/18.jpg)
RHS – SOC 18
Hash Tables
• A Hash Table is thus ”almost” an array
• Instead of having an index directly available, we must calculate it
• If calculation can be done in constant time, then all basic operations (insert, delete, lookup) can be done in constant time!
• Better than tree-based implementations, which have O(log(N))
![Page 19: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/19.jpg)
RHS – SOC 19
Hash Tables
• However, there are some issues:– How do we define a
good mapping from the objects to [0; N-1]?
– What happens if we try to store two objects at the same position?
![Page 20: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/20.jpg)
RHS – SOC 20
Hash Functions
• Before finding a good mapping – i.e. a good hash function – we must consider the size of the array
• For good performance, the array should at least be as large as the maximal number of objects stored
• Rule of thumb is about 30 % larger
• Size should be a prime number (???)
![Page 21: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/21.jpg)
RHS – SOC 21
Hash Functions
• What if the expected number of objects is unknown in advance?
• We can expand a hash table dynamically
• If the hash table in running out of space, double the capacity
• Start out with a reasonably large array (space is cheap…)
![Page 22: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/22.jpg)
RHS – SOC 22
Hash Functions
• Having handled the choice of N, how do we define a proper hash function?
• Properties of a hash function:– Must map all objects of type T to the interval
[0; N-1]– Should map objects as uniformly as possible
to the interval [0; N-1]
![Page 23: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/23.jpg)
RHS – SOC 23
Hash Functions
• We can enforce the mapping to [0;N-1] by using the modulo operator:
f(t) = g(t) % N
• g(t) can then produce any integer value
• How do we achieve a uniform distribution?
• Theory for this is complicated, but there are some general rules to follow
![Page 24: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/24.jpg)
RHS – SOC 24
Hash Functions
• A good hash function should be ”almost ran-dom”, but deterministic– ”Almost random” –
values are well distri-buted in the interval
– Deterministic – always produce the same output for the same input
![Page 25: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/25.jpg)
RHS – SOC 25
Hash Functions
• In Java, all objects have a hashCode method– Defined in Object class– Can be overrided– Returns an integer (the Hash Code)– We must use modulo on the value ourselves
![Page 26: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/26.jpg)
RHS – SOC 26
Hash Functions
• Hash function for integers:– The number itself…
• Hash function for strings:final int HASH_MULTIPLIER = 31;
int h = 0;
for (int i = 0; i < s.length; i++)
h = (HASH_MULTIPLIER * h) + s.charAt(i);
![Page 27: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/27.jpg)
RHS – SOC 27
Hash Functions
• Hash code for an object can be calculated by combining hash codes for instance fields
• Combine values in a way similar to the algorithm used to find string hash codes
![Page 28: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/28.jpg)
RHS – SOC 28
Hash Functions
public int hashCode()
{
final int MULTIPLIER = 31;
int h1 = regNo.hashCode();
int h2 = mileage;
int h3 = model.hashCode();
int h = h1*MULTIPLIER + h2;
h = h*MULTIPLIER + h3;
return h;
}
![Page 29: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/29.jpg)
RHS – SOC 29
Hash Functions
• But wait…what about numeric overflow?
• We multiply a ”random” integer value with a number…?
• Does not really matter…
• As long as the algorithm is deterministic, overflow is not a problem
• Just helps ”scrambling” the value
![Page 30: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/30.jpg)
RHS – SOC 30
Hash Functions
• Common pitfalls:– Remember to define a hashCode function – If you forget, the hashCode implementation in
Object is used– Based solely on memory location of object– Two objects with the same value of instance
fields will produce different hash codes…
![Page 31: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/31.jpg)
RHS – SOC 31
Hash Functions
• Common pitfalls:– The hashCode function must be
”compatible” with your equals function– If a.equals(b) it must hold that a.hashCode() == b.hashCode()
– If not, duplicates are allowed!– The reverse condition is not required; two
different objects may have the same hash code
![Page 32: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/32.jpg)
RHS – SOC 32
Hash Functions
• In general, you must remember to:– Either define the hashCode and the equals method
– Or not define any of them!
![Page 33: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/33.jpg)
RHS – SOC 33
Handling collisions
• Even with a good hash function, we will still experience collisions
• Collision: two different objects t1 and t2 have the same hash code
• We will then try to store both objects in the same position in the array
• Now what…?
![Page 34: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/34.jpg)
RHS – SOC 34
Handling collisions
• What we store in each position in the array is not the objects themselves, but a linked list of objects
• Objects with the same hash code h are stored in the linked list in position h
• With a good hash function, the average length of non-empty lists is less than 2
![Page 35: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/35.jpg)
RHS – SOC 35
Handling collisions
0 1 2 3 4
Car1Car2 Car3
Car4
Car5
Car6
![Page 36: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/36.jpg)
RHS – SOC 36
Handling collisions
• Basic operations (insert, delete, lookup) follow this structure:– Calculate hash code for the object– Find the corresponding position in the array
• Insert: Insert element at the end of list• Delete/Lookup: Iterate through list until element is
found, or end of list is reached
![Page 37: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/37.jpg)
RHS – SOC 37
Handling collisions
• Basic operations are thus not done in truly constant time
• However, if a proper hash function is used, running time is constant in practice
• Use hash-based implementations unless special circumstances apply– Hard to define hash/equals function– More functionality required
![Page 38: Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f075503460f94c1cd17/html5/thumbnails/38.jpg)
RHS – SOC 38
Exercises
• Review: R16.8, R16.10
• Programming: P16.6