C++哈希表容器实现与STL封装详解

李昦

1. 从零实现C++哈希表容器封装

作为C++开发者，我们经常需要用到STL中的unordered_map和unordered_set这两个高效的关联容器。今天我将带大家深入底层，自己动手实现一套类似的哈希表容器。这个过程不仅能加深对哈希表工作原理的理解，还能掌握C++模板编程和迭代器设计的精髓。

2. 哈希表容器设计基础

2.1 哈希表的核心结构

哈希表的核心在于哈希函数和冲突解决策略。我们选择链地址法（开散列）来实现，这也是STL采用的方式。每个桶是一个链表，哈希冲突时直接在链表后追加节点。

哈希表的基本结构包含：

一个存储链表头指针的vector（桶数组）
哈希函数对象
键值比较函数对象
节点计数器

cpp复制template<class T>
struct HashNode {
    T _data;
    HashNode<T>* _next;
    HashNode(const T& data) : _data(data), _next(nullptr) {}
};

template<class K, class T, class KeyOfT, class Hash>
class HashTable {
    vector<HashNode<T>*> _tables;
    size_t _n = 0;
    // ...
};

2.2 哈希函数设计

我们实现一个通用的哈希函数模板，支持整数和字符串类型：

cpp复制template<class K>
struct HashFunc {
    size_t operator()(const K& key) {
        return (size_t)key;
    }
};

// 字符串特化版本
template<>
struct HashFunc<string> {
    size_t operator()(const string& key) {
        size_t hashi = 0;
        for (auto& ch : key) {
            hashi *= 131;
            hashi += ch;
        }
        return hashi;
    }
};

对于自定义类型（如日期类），用户需要提供自己的哈希函数特化版本。

3. 核心功能实现

3.1 插入操作实现

插入操作需要考虑以下几个关键点：

检查键是否已存在
动态扩容（负载因子达到1时）
计算哈希值并插入到对应桶中

cpp复制pair<Iterator, bool> Insert(const T& data) {
    KeyOfT kot;
    Iterator it = Find(kot(data));
    if(it != End()) {
        return {it, false};
    }
    
    Hash hs;
    // 扩容逻辑
    if (_n == _tables.size()) {
        vector<Node*> newtables(GetNextPrime(_tables.size()), nullptr);
        // 重新哈希所有元素
        // ...
    }
    
    size_t hashi = hs(kot(data)) % _tables.size();
    // 头插法插入新节点
    Node* newnode = new Node(data);
    newnode->_next = _tables[hashi];
    _tables[hashi] = newnode;
    ++_n;
    
    return {Iterator(newnode, this), true};
}

扩容时我们采用素数表策略，避免哈希值聚集：

cpp复制static const int num_primes = 28;
static const unsigned long prime_list[num_primes] = {
    53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593,
    49157, 98317, 196613, 393241, 786433, 1572869, 3145739,
    6291469, 12582917, 25165843, 50331653, 100663319,
    201326611, 402653189, 805306457, 1610612741, 3221225473,
    4294967291
};

unsigned long GetNextPrime(unsigned long n) {
    // 返回大于n的最小素数
    // ...
}

3.2 查找与删除操作

查找操作相对简单，直接计算哈希值并在对应桶中线性搜索：

cpp复制Iterator Find(const K& key) {
    Hash hs;
    KeyOfT kot;
    size_t hashi = hs(key) % _tables.size();
    Node* cur = _tables[hashi];
    while (cur) {
        if (kot(cur->_data) == key)
            return Iterator(cur, this);
        cur = cur->_next;
    }
    return End();
}

删除操作需要注意边界条件处理，特别是删除头节点的情况：

cpp复制bool Erase(const K& key) {
    Hash hs;
    KeyOfT kot;
    size_t hashi = hs(key) % _tables.size();
    Node* cur = _tables[hashi];
    Node* prev = nullptr;
    
    while (cur) {
        if (kot(cur->_data) == key) {
            if (prev == nullptr) {
                _tables[hashi] = cur->_next;
            } else {
                prev->_next = cur->_next;
            }
            delete cur;
            --_n;
            return true;
        }
        prev = cur;
        cur = cur->_next;
    }
    return false;
}

4. 迭代器设计与实现

4.1 迭代器核心结构

哈希表迭代器需要维护两个关键信息：

当前节点指针
哈希表指针（用于跨桶遍历）

cpp复制template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
struct HTIterator {
    typedef HashNode<T> Node;
    typedef HashTable<K, T, KeyOfT, Hash> HT;
    
    Node* _node;
    const HT* _ht;
    
    // 运算符重载
    Ref operator*() { return _node->_data; }
    Ptr operator->() { return &_node->_data; }
    // ...
};

4.2 迭代器递增操作

++操作符需要处理两种情况：

当前桶还有下一个节点
需要切换到下一个非空桶

cpp复制Self& operator++() {
    if (_node->_next) {
        _node = _node->_next;
    } else {
        KeyOfT kot;
        Hash hs;
        size_t hashi = hs(kot(_node->_data)) % _ht->_tables.size();
        ++hashi;
        while(hashi < _ht->_tables.size()) {
            if (_ht->_tables[hashi]) {
                _node = _ht->_tables[hashi];
                break;
            }
            ++hashi;
        }
        if (hashi == _ht->_tables.size())
            _node = nullptr;
    }
    return *this;
}

4.3 const迭代器支持

通过模板参数Ref和Ptr，我们可以统一实现普通迭代器和const迭代器：

cpp复制typedef HTIterator<K, T, T&, T*, KeyOfT, Hash> Iterator;
typedef HTIterator<K, T, const T&, const T*, KeyOfT, Hash> Const_Iterator;

5. unordered_map和unordered_set封装

5.1 unordered_set实现

unordered_set相对简单，直接封装哈希表并暴露必要接口：

cpp复制template<class K, class Hash = HashFunc<K>>
class unordered_set {
public:
    struct SetKeyOfT {
        const K& operator()(const K& key) {
            return key;
        }
    };
    
    typedef typename HashTable<K, const K, SetKeyOfT, Hash>::Iterator iterator;
    
    iterator begin() { return _ht.Begin(); }
    iterator end() { return _ht.End(); }
    
    bool insert(const K& key) {
        return _ht.Insert(key).second;
    }
    
private:
    HashTable<K, const K, SetKeyOfT, Hash> _ht;
};

5.2 unordered_map实现

unordered_map需要支持[]操作符，这是它的核心特性：

cpp复制template<class K, class V, class Hash = HashFunc<K>>
class unordered_map {
public:
    struct MapKeyOfT {
        const K& operator()(const pair<const K, V>& kv) {
            return kv.first;
        }
    };
    
    V& operator[](const K& key) {
        auto ret = _ht.Insert({key, V()});
        return ret.first->second;
    }
    
private:
    HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;
};

6. 关键问题与解决方案

6.1 键不可修改问题

为了保证哈希表的一致性，我们需要确保键不会被意外修改：

unordered_set直接存储const K
unordered_map存储pair<const K, V>

cpp复制// unordered_set
HashTable<K, const K, SetKeyOfT, Hash> _ht;

// unordered_map
HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;

6.2 自定义类型支持

对于无法直接取模的类型（如日期类），用户需要提供自定义哈希函数：

cpp复制struct DateHash {
    size_t operator()(const Date& date) {
        // 自定义哈希计算逻辑
        return date.year * 10000 + date.month * 100 + date.day;
    }
};

unordered_set<Date, DateHash> dateSet;

7. 性能优化与注意事项

负载因子控制：保持合理的负载因子（我们设置为1）对性能至关重要。负载因子过高会导致冲突增加，过低会浪费内存。
素数容量：使用素数作为哈希表容量可以减少哈希冲突，这是我们维护素数表的原因。
内存管理：确保在析构时正确释放所有节点内存，避免内存泄漏。
迭代器失效：插入操作可能导致扩容，使所有迭代器失效。这是哈希表迭代器的固有特性，需要在文档中明确说明。
异常安全：确保在插入失败时不会泄露已分配的内存。

8. 完整实现与测试

完整代码实现可以在我的GitHub仓库找到：项目链接

下面是一个简单的测试示例：

cpp复制void test_map() {
    unordered_map<string, string> dict;
    dict["insert"] = "插入";
    dict["erase"] = "删除";
    dict["find"] = "查找";
    
    for (auto& kv : dict) {
        cout << kv.first << ": " << kv.second << endl;
    }
    
    dict["erase"] = "移除";  // 修改值
    cout << "after modify: " << dict["erase"] << endl;
}

void test_set() {
    unordered_set<int> nums;
    int arr[] = {5, 12, 7, 19, 3, 8};
    
    for (int n : arr) {
        nums.insert(n);
    }
    
    for (auto it = nums.begin(); it != nums.end(); ++it) {
        cout << *it << " ";
    }
    cout << endl;
}