Hash Tables
Why Hash Tables Exist
Concept pioneered by Hans Peter Luhn at IBM (1953), formalized as “hash tables” in the 1960s
Provides average O(1) lookups by trading space for time using mathematical hashing. This is the fundamental trade-off that powers most modern software - from database indexes to compiler symbol tables.
When to Use Hash Tables
✅ Use when:
- Need fast lookups by key
- Counting frequencies of items
- Caching computed results
- Implementing sets (unique values)
- Building lookup tables or indexes
❌ Don’t use when:
- Need to maintain ordering of elements
- Memory is extremely constrained
- Keys don’t hash well (cause many collisions)
- Need to iterate in sorted order
🏢 Modern reality: Your go-to data structure. Use Dictionary<K,V>
in C#. Hash tables power most modern software - databases, caches, compilers, etc.
Time Complexity
- Average case: O(1) for all operations (lookup, insert, delete)
- Worst case: O(n) when many collisions occur
- Space: O(n) for storing key-value pairs
How Hash Tables Work
- Hash Function: Converts key to array index
- Compression: Map hash code to array size using modulo
- Collision Handling: Deal with multiple keys mapping to same index
Collision Resolution Strategies
The Problem: Multiple keys may hash to the same index. With n items and m slots, by the pigeonhole principle, collisions are inevitable when n > m.
Separate Chaining (Most Common)
- Each array slot holds a linked list (or dynamic array) of key-value pairs
- Pros: Simple, never “full”, handles high load factors (even > 1.0)
- Cons: Extra memory for pointers, poor cache locality
- Used by: Java’s HashMap, Python’s dict (historically)
Open Addressing (Linear Probing)
- When collision occurs, probe next slot(s) until finding empty space
- Pros: Better cache performance, no pointer overhead, all data in one array
- Cons: Clustering issues, table can become full, deletions tricky
- Variants: Linear probing (check i+1, i+2…), Quadratic probing (check i+1², i+2²…), Double hashing
Modern Approach
Many modern implementations use hybrid strategies or sophisticated open addressing:
- Robin Hood Hashing: Minimizes variance in probe lengths
- Cuckoo Hashing: Uses multiple hash functions and tables for guaranteed O(1) worst-case lookup
C# Implementation
public class HashMap<TKey, TValue>
{
private int arraySize;
private (TKey key, TValue value)?[] array;
public HashMap(int arraySize)
{
this.arraySize = arraySize;
array = new (TKey key, TValue value)?[arraySize];
}
private int Hash(TKey key, int countCollisions = 0)
{
int hashCode = key.GetHashCode();
return Math.Abs(hashCode + countCollisions);
}
private int Compressor(int hashCode)
{
return hashCode % arraySize;
}
public void Assign(TKey key, TValue value)
{
int arrayIndex = Compressor(Hash(key));
var currentArrayValue = array[arrayIndex];
// Empty slot - insert directly
if (!currentArrayValue.HasValue)
{
array[arrayIndex] = (key, value);
return;
}
// Key already exists - update value
if (EqualityComparer<TKey>.Default.Equals(currentArrayValue.Value.key, key))
{
array[arrayIndex] = (key, value);
return;
}
// Collision - use linear probing
int numberCollisions = 1;
while (!EqualityComparer<TKey>.Default.Equals(currentArrayValue.Value.key, key))
{
int newHashCode = Hash(key, numberCollisions);
int newArrayIndex = Compressor(newHashCode);
currentArrayValue = array[newArrayIndex];
if (!currentArrayValue.HasValue)
{
array[newArrayIndex] = (key, value);
return;
}
if (EqualityComparer<TKey>.Default.Equals(currentArrayValue.Value.key, key))
{
array[newArrayIndex] = (key, value);
return;
}
numberCollisions++;
// Prevent infinite loop if table is full
if (numberCollisions >= arraySize)
{
throw new InvalidOperationException("HashMap is full");
}
}
}
public TValue Retrieve(TKey key)
{
int arrayIndex = Compressor(Hash(key));
var possibleReturnValue = array[arrayIndex];
if (!possibleReturnValue.HasValue)
{
return default(TValue);
}
if (EqualityComparer<TKey>.Default.Equals(possibleReturnValue.Value.key, key))
{
return possibleReturnValue.Value.value;
}
// Handle collision case
int retrievalCollisions = 1;
while (possibleReturnValue.HasValue &&
!EqualityComparer<TKey>.Default.Equals(possibleReturnValue.Value.key, key))
{
int newHashCode = Hash(key, retrievalCollisions);
int retrievingArrayIndex = Compressor(newHashCode);
possibleReturnValue = array[retrievingArrayIndex];
if (!possibleReturnValue.HasValue)
{
return default(TValue);
}
if (EqualityComparer<TKey>.Default.Equals(possibleReturnValue.Value.key, key))
{
return possibleReturnValue.Value.value;
}
retrievalCollisions++;
// Prevent infinite loop
if (retrievalCollisions >= arraySize)
{
break;
}
}
return default(TValue);
}
}
// Example usage
var hashMap = new HashMap<string, string>(16);
hashMap.Assign("name", "Alice");
hashMap.Assign("age", "25");
hashMap.Assign("city", "NYC");
Console.WriteLine(hashMap.Retrieve("name")); // "Alice"
Console.WriteLine(hashMap.Retrieve("age")); // "25"
Console.WriteLine(hashMap.Retrieve("missing")); // null or default
Common Hash Table Applications
Frequency Counting
Caching/Memoization
Two-Sum Problem (Interview Classic)
Performance Considerations
Load Factor: Ratio of filled slots to total slots
- Keep load factor < 0.7 for good performance
- Resize table when load factor gets too high
Hash Function Quality:
- Good distribution reduces collisions
- C# has optimized hash functions for built-in types via
GetHashCode()
Collision Strategy Impact:
- Linear probing: Good cache performance but clustering issues
- Separate chaining: More memory overhead but handles high load factors better
Interview Tips
Common Questions:
- Implement basic hash table operations
- Handle collisions gracefully
- Explain time complexity trade-offs
- Solve problems using hash tables (two-sum, anagrams, etc.)
Quick Reference
Hash Table Operations
| Operation | Average | Worst Case | Notes | |———–|———|————|——-| | Insert | O(1) | O(n) | Amortized with resizing | | Search | O(1) | O(n) | Depends on collisions | | Delete | O(1) | O(n) | Same as search |
Collision Handling
- Chaining: Use linked lists at each bucket (flexible, handles any load)
- Open Addressing: Find next empty slot (better cache, fixed size)
C# Built-ins
Dictionary<K,V>
- Standard hash tableHashSet<T>
- Set operations, unique valuesConcurrentDictionary<K,V>
- Thread-safe
Key Trade-off: Memory for speed - O(1) operations at cost of O(n) space
Found this guide helpful? Share it with your team:
Share on LinkedIn