Skip to content

Commit 59cdb50

Browse files
authored
Merge pull request #2 from learn-co-curriculum/hash-table
Updated Hash Table readme
2 parents 16476a7 + 65d13fc commit 59cdb50

File tree

1 file changed

+100
-38
lines changed
  • 07-week-6--foundational-data-structures/02-day-4--underneath-hashes

1 file changed

+100
-38
lines changed
Lines changed: 100 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,44 @@
1-
# Hash Table
1+
# Day 4: Underneath Hashes
22

3-
## Objectives
3+
## Learning Goals
44

5-
- Learn the components of a hash table.
6-
- Learn about collisions and how to resolve them.
7-
- Learn the role of a hash function and the attributes of a good hash function.
5+
- Explain how programming languages implement hashes
6+
- Identify the runtime complexity of common hash methods in Big O notation
87

9-
## Hash Tables
8+
## Introduction
109

11-
![](https://s3.amazonaws.com/learn-verified/reintroduce-415x400.png)
10+
Now it's time to formally introduce you to the hash. When we talk about hashes
11+
in this lesson, we're referring to the general data structure known as a `Hash`
12+
in Ruby, as an `Object` in JavaScript, a `Dictionary` in Python, and so on.
13+
Just about every language has an implementation of this data structure!
1214

13-
Now it's time to formally introduce you to the hash. A hash table is where information related to a key is assigned to a specific index.
15+
Hashes are used for storing key-value pairs. This allows for quick retrieval of
16+
data — the Big O for accessing a value in a hash is constant time: O(n). But how
17+
does it work under the hood?
1418

15-
For a hash to work, we use a **hash function** to determine where exactly to store a information related to that key. Later, use the same hash function to determine where to search for a given key.
19+
## Hash Functions
1620

17-
## A library as an analogy
21+
For a hash to work, we use a **hash function** to determine where in memory to
22+
store information related to that key. Later, we use the same hash function to
23+
determine where to search for a given key.
1824

19-
One way to think about how hashes relate to hash functions is thinking about how we find a book in a library. We do this by telling a librarian the title and author of a book, and the librarian tells us precisely where to find the book.
25+
One way to think about how hashes relate to hash functions is thinking about how
26+
we find a book in a library. We do this by telling a librarian the title and
27+
author of a book, and the librarian tells us precisely where to find the book.
2028

21-
![](https://s3-us-west-2.amazonaws.com/curriculum-content/algorithms/dewey-decimal-arrangement.jpg)
29+
![library book example](https://s3-us-west-2.amazonaws.com/curriculum-content/algorithms/dewey-decimal-arrangement.jpg)
2230

23-
So here our key is the title and author of the book, which then responds with a card catalogue id. The cart catalogue id (which comes from the Dewey Decimal System above) tells us exactly where to find the book. If the book is there, we have our book and all of the information inside. If nothing is there, there is no book.
31+
In this analogy, our **key** is the title and author of the book, which we can
32+
use to determine the appropriate card catalog id. The card catalog id (which
33+
comes from the Dewey Decimal System above — think of that as our **hash
34+
function**) tells us exactly where to find the book. If the book is there, we
35+
have our book and all of the information inside. If nothing is there, there is
36+
no book.
2437

25-
So let's start with inserting some books. We have the following books: _The Bible_, _Alexander Hamilton_, _Introduction to Physics_, and _War and Peace_. Based on our hash function, we store the books in the following locations:
38+
Let's start by inserting some books into a hash table structure. We have the
39+
following books: _The Bible_, _Alexander Hamilton_, _Introduction to Physics_,
40+
and _War and Peace_. Based on our hash function, we store the books in the
41+
following locations:
2642

2743
| Index | Book |
2844
| ----- | :-----------------------: |
@@ -37,21 +53,33 @@ So let's start with inserting some books. We have the following books: _The Bibl
3753
| 800 | _War and Peace_ |
3854
| 900 | _Alexander Hamilton_ |
3955

40-
You will see that while the Dewey Decimal System assigns us one of a range of numbers, we adapt its formula to store each book at the lowest number possible for each section. So based on that, The Bible is assigned 200, because it falls under religion. Accordingly, we also assign Introduction to Physics number 500, War and Peace 800 and Alexander Hamilton 900.
56+
You will see that while the Dewey Decimal System assigns us one of a range of
57+
numbers, we adapt its formula to store each book at the lowest number possible
58+
for each section. So based on that, The Bible is assigned 200, because it falls
59+
under religion. Accordingly, we also assign Introduction to Physics number 500,
60+
War and Peace 800 and Alexander Hamilton 900.
4161

42-
Because we assigned each of our books according to this formula, when we retrieve a book, we do not need to look through every index to find our books, instead we just look at the place of the book based on the Dewey Decimal System.
62+
Because we assigned each of our books according to this formula, when we
63+
retrieve a book, we do not need to look through every index to find our books.
64+
Instead, we just look at the place of the book based on the Dewey Decimal
65+
System.
4366

44-
![](https://s3.amazonaws.com/learn-verified/geroge-peabody-library-horizontal-large-gallery.jpg)
67+
![A massive library](https://s3.amazonaws.com/learn-verified/geroge-peabody-library-horizontal-large-gallery.jpg)
4568

46-
> A massive library
69+
We can _also_ use our formula to tell us both where to insert a book, as well as
70+
to know if a book exists in our collection. If someone asks us if _Eloquent
71+
Javascript_ is in our hash table, we simply visit our index at location 600, see
72+
that nothing is there, and can confidently reply that the book is not located
73+
there.
4774

48-
So we use our formula to tell us both where to insert a book.
75+
With a hash table, we look at the data in our key, run it through our hash
76+
function to determine where to place the element and associated data. Later, we
77+
also use the information in the key, run it through our hash function to tell us
78+
where to retrieve this data. With this process, we achieve our goal of **O(n)**
79+
(constant time) for inserting and retrieving elements, irrespective of the
80+
number of elements in our collection.
4981

50-
And we also use our formula to know if a book exists in our collection. If someone asks us if _Eloquent Javascript_ is in our hash table, we simply visit our index at location 600, see that nothing is there, and can confidently reply that the book is not located there. Because our formula tells us where to retrieve a book we are able to retrieve and insert an element in constant time.
51-
52-
So with a hash table, we look at the data in our key, run it through our hash function to determine where to place the element and associated data. Later, we also use the information in the key, run it through our hash function to tell us where to retrieve this data. With this process we achieve our goal of constant time for inserting and retrieving elements irrespective of the number of elements in our collection.
53-
54-
### The Problem: Collision
82+
### Hash Table Collisions
5583

5684
Our hash table currently looks like the following:
5785

@@ -68,9 +96,17 @@ Our hash table currently looks like the following:
6896
| 800 | _War and Peace_ |
6997
| 900 | _Alexander Hamilton_ |
7098

71-
Now what happens if we need to store another book, this time _Introduction to Biology_. Well, our adapted Dewey Decimal System tells us to store the key at precisely index 500. The only problem is that the slot is already filled. We have just encountered a **collision**. A collision is where our hash function outputs an index that already is assigned to another key in our hash table.
99+
What happens if we need to store another book, this time _Introduction to
100+
Biology_? Well, our adapted Dewey Decimal System tells us to store the key at
101+
precisely index 500. The only problem is that the slot is already filled. We
102+
have just encountered a **collision**. A collision is where our hash function
103+
outputs an index that already is assigned to another key in our hash table.
72104

73-
To handle our collision we apply a technique called _separate chaining_. With separate chaining, each index points to a linked list. So in our example above we could place both _Introduction to Physics_ and _Introduction to Biology_ in the place linked list is located at index 500. Applying the separate chaining technique, our hash table looks like the following:
105+
To handle our collision, we apply a technique called _separate chaining_. With
106+
separate chaining, each index points to a linked list. So in our example above
107+
we could place both _Introduction to Physics_ and _Introduction to Biology_ in
108+
the place linked list is located at index 500. Applying the separate chaining
109+
technique, our hash table looks like the following:
74110

75111
| Index | Book |
76112
| ----- | :----------------------------------------------------------: |
@@ -85,20 +121,46 @@ To handle our collision we apply a technique called _separate chaining_. With se
85121
| 800 | [ "*War and Peace*" ] |
86122
| 900 | [ "*Alexander Hamilton*" ] |
87123

88-
Note that in the worse case scenario, all of our inserted elements collide and we have to traverse a linked list of length n to retrieve an element, so we have O(n). However, on average collisions do not occur, so we retrieve constant time for lookup, insertion and deletion _on average_.
124+
In the worst case scenario, all of our inserted elements collide, and we have to
125+
traverse a linked list of length `n` to retrieve an element, so we have **O(n)**
126+
runtime. However, on average collisions do not occur, so we achieve constant
127+
time for lookup, insertion and deletion _on average_.
89128

90-
## Choosing a good hash function
129+
### Identifying Good Hash Functions
91130

92-
Going forward, we should choose a hash function that minimizes the chance of a collision occurring. Some properties of a good hash function.
131+
Programming languages that implement hashes use a hash function that minimizes
132+
the chance of a collision occurring. Some properties of a good hash function are:
93133

94-
1. Makes use of all information provided by a given key to maximize the number of possible hash values. Note that the real Dewey Decimal System does a better job at this: different titles by different authors map to different values.
134+
1. Makes use of all information provided by a given key to maximize the number
135+
of possible hash values. Note that the real Dewey Decimal System does a
136+
better job at this: different titles by different authors map to different
137+
values.
95138
2. Maps similar keys to very different values - making collisions much less likely.
96-
3. Also hash function called frequently so should employ simple and quick introductions.
97-
98-
## Summary
99-
100-
In this function we learned about hash tables. Hash tables place the value of an element into a hash function which outputs a hash value. The hash value determines where to place the element. Because a hash function produces the same hash value for a given element, it also gives us fast lookup time to retrieve an element.
101-
102-
When a hash function outputs the same hash value for two different elements we have a collision. We can resolve a collision by employing separate chaining where each hash value points to a linked list, and when there is a collision we attach the element to the linked list.
103139

104-
Because retrieving elements from a linked list is O(n), we try to choose a hash function that avoids collisions. Because we must use our hash function to insert, delete, and retrieve elements we also choose a fast hash function.
140+
## Conclusion
141+
142+
In this lesson, we learned about **hash tables**. Hash tables use a **hash
143+
function** to output a **hash value**. The hash value determines where to place
144+
the element in memory. Because a hash function produces the same hash value for
145+
a given element, it also gives us fast lookup time to retrieve an element.
146+
147+
When a hash function outputs the same hash value for two different elements we
148+
have a collision. We can resolve a collision by employing separate chaining
149+
where each hash value points to a linked list, and when there is a collision we
150+
attach the element to the linked list.
151+
152+
Because retrieving elements from a linked list is O(n), programming languages
153+
use hash functions that avoid collisions as much as possible.
154+
155+
When you use a hash to solve an algorithm problem, it's useful to know how
156+
hashes work under the hood in order to understand their runtime. Here's a
157+
summary of the Big O of common hash methods. While collisions can occur that may
158+
result in worse performance than listed below, we can generalize the runtime as
159+
follows:
160+
161+
| Method | Big O |
162+
| ------------------------------------------------ | ----- |
163+
| Access (looking for a value with a known key) | O(1) |
164+
| Search (looking for a value without a known key) | O(n) |
165+
| Insertion | O(1) |
166+
| Deletion | O(1) |

0 commit comments

Comments
 (0)