DSA
brian | Published: March 13, 2024, 12:25 p.m. | Updated: May 25, 2025, 10:13 p.m.
*Hash tables: most programming languages include a data structure called hash table .. in python its implemented as dictionaries
*a hash table is a list of paired values
*mainly used for key value look up
*the first item in each pair is called the (key), and the second item is the (value)
*Looking up a value in a hash table usually takes O (1)
*inserting a value in a hashtable usually takes O (1)
*Deleting a value in a hashtable usually takes O (1)
*Hashing: the process of taking characters and converting them, into numbers
*Hash function: the code that is used to convert those letters into particular numbers
* a hash table stores its data in a bunch of cells in a row, similar to an array
*How hash tables work
1)computer applies the hash function to the key
def get_hash(self, key):
h = 0
for char in key:
h+= ord(char)
return h % 100 ---> the number of elements
2) lets say our hash function returns a value of 8
[] [] [] [] [] [] [] [8] ----> computer places the value into cell 8
person = {}
person['name'] = 'brian'
*To find the value, associated with 'name' the computer executes 2 simple steps
1) hashes the key you're looking for, in my case 'name'
2) if the hash is 8, then computer looks in cell 8 and returns the value
*in a hash table, the placement of each value is determined ,AFTER the key has been hashed
*__setitem__ method is a special method that allows you to define the behavior of assignment (=) for objects
of your custom classes. This method is used to implement container-like behavior for your objects, making them
compatible with square bracket notation (obj[key] = value) commonly used with dictionaries, lists, and other collections.
When you define the __setitem__ method in your class, you're essentially defining how the object should behave when elements are assigned to it using square bracket notation.
**The __getitem__ method in Python is used to define the behavior of accessing elements using square
bracket notation ([]) on objects of your custom classes. This method allows you to implement the functionality
for getting values from your objects in a way that's similar to how you access items from dictionaries, lists, and other collections.
**__delitem__ method in Python is used to define the behavior of deleting elements using the del
statement and square bracket notation ([]) on objects of your custom classes. This method allows
you to implement the functionality for removing items from your objects in a way that's similar to how you delete items
from dictionaries, lists, and other collections.
Example:
class HashTable:
def __init__(self):
self.MAX = 100
self.arr = [None for i in range(self.MAX)]
def get_hash(self, key):
h = 0
for char in key:
h+= ord(char)
return h % 100
def __setitem__(self, key, val):
h = self.get_hash(key)
self.arr[h] = val
def __getitem__(self, key):
h = self.get_hash(key)
return self.arr[h]
def __delitem__(self, key):
h = self.get_hash(key)
self.arr[h] = None
h = HashTable()
h['march 6'] = 'rambo'
h['june 17'] = 'brian'
print(h['march 6'])
print(h['june 17'])
del h['march 6']
del h['june 17']
print(h['march 6'])
print(h['june 17'])
---output---
rambo
brian
None
None
*in a hash table, each key can exist only once, but there can be multiple instances of the same value
*example: only 1: BBQ Burger, but many burgers can cost the same 9.99 which would be the value
**a good hash function is one that distributes its data across all available cells
*Collisions: Trying to add data to a cell that is already filled
*Seperate Chaining: when a collisions occurs, instead of placing a single value in the cell , it places it in a reference to an array
*This array contains subarrays where the first value is the word and the second value is what ever
1) ex: hashes the key: DAB = 8
2) it looks up cell 8, the computer sees that cell 8 contains an array of arrays rather than a single value
3) it searches through the array linearly, looking at index 0 of each sub array, until it finds our key('DAB'). it then returns the value at index 1 of the correct subarray
*The great balancing act
* a good hash table strikes a balance of avoiding collisions while not consuming lots of memory
* computer scientist says a good rule of thumb for every 7 data elements it should have 10 cells or 0.7
* 7 / 10 = 0.7 14/20 = 0.7
*Hash tables are indespensible when it comes to building efficient software with their O(1) read and insertions
Exercise:
'''
1. Write a function that accepts an array of strings and returns the first
duplicate value it finds. For example, if the array is ["a", "b", "c", "d", "c", "e",
"f"], the function should return "c", since it’s duplicated within the array.
(You can assume that there’s one pair of duplicates within the array.)
Make sure the function has an efficiency of O(N).
'''
arr2 = [1, 2, 3, 4, 5, 6, 1]
def is_dup(arr):
hash_table = {}
for value in arr:
if value in hash_table:
return [value]
hash_table[value] = None
print(is_dup(arr2))
---output----
>>1