引子
问题:给定一串数字{1,2,5,7,15,24,33,52},如何在时间复杂度为O(1)下,对数据进行CURD?
数组:我创建一个Length为53的数组,将元素插入相同下标处,是不是就可以实现查找复杂度O(1)了?但是添加修改元素时间复杂度为O(n)了。
链表:添加删除复杂度为O(1),但是查找时间复杂度为O(n)了。
身为.NETer肯定熟练使用Dictionary和HashSet,这两个容器的底层就是HashTable,所以带着对技术浓重的兴趣(面试),所以就从头到尾梳理一下!
理论
链地址法(拉链法)
回到问题本身,我们用数组可以实现查找复杂度为O(1),链表实现添加删除复杂度为O(1),如果我们将两个合起来,不就可以实现增删查都为O(1)了么?如何结合呢?
我们先定义一个数组,长度为7(敲黑板,思考下为什么选7?),将所有元素对7取余,这样所有元素都可以放在数组上了,如下图所示:
如上图,如果我们将数组中每个下标位置都放成一个链条,这样,复杂度不久降下去了么?
有问题么?没问题。真没问题么?有问题......
注意
-
插入元素是{0,7,14,21,28}怎么办?这样都落在下标为0的链条里,时间复杂度不又上去了?针对这种情况,隔壁Java将链表优化成了红黑书,我们.NET呢?往下看。
-
如果我的数组长度不是7,是2怎么办?所有数对2取余,不是1就是0,时间复杂度不又上去了?所以我们对数组长度应该取素数。
-
如果元素超级多或者特别少,我们的数组长度要固定么?就要动态长度
上边这种方法学名就叫拉链法!
开放地址法
上边我们聊过拉链法(为什么老想着裤子拉链......),拉链法是向下开辟新的空间,如果我们横向开辟空间呢?还是刚才的例子,我们这样搞一下试试。
线性探测法
我们插完7以后,在插24时,发现下标为2的地方有元素了,于是向后移动一位,发现有空位,于是就插进去了。
上边这种方法就是线性探测法!
二次聚集(堆积)
聪明的老鸟们,肯定疑惑啦,如果我们继续添加元素{x%11=4},{y%11=5},此时x,y元素都要往下标6插数据。这样就导致了原始哈希地址不同的元素要插入同一个地址。即添加同义词的冲突过程中又添加了非同义词的冲突。这就是二次聚集。
二次探测法
如果在线性探测法中,我们不依次寻找下一个呢?我们针对"下一个"采取{1 ^ 2,-1 ^ 2,2 ^ 2,-2 ^ 2....}(垃圾编辑器,次方样式乱了)这样的步长呢?真聪明!你已经知道二次探测法了!
这......这还能用么?不都乱了么?下标和元素对不上了呀!怎么去查找元素呢?
别急呀,家人们呐,我们按照这个思路查询就好了:
查找算法步骤
1. 给定待查找的关键字key,获取原始应该插入的下标index 2. 如果原始下标index处,元素为空,则所查找的元素不存在 3. 如果index处的元素等于key,则查找成功 4. 否则重复下述解决冲突的过程 * 按照处理冲突的方法,计算下一个地址nextIndex * 若nextIndex为空,则查找元素不存在 * 若nextIndex等于关键词key,则查找成功
还有要注意的点么?必须有!
注意(敲重点啦)
- 数组长度必须大于给定元素的长度!
- 当数组元素快装满时,时间复杂度也是O(n)!
- 如果都装满了,就会一直循环找空位,我们应该进行扩容!
理论小结
接口设计
干活啦,干活啦,领导嫌查询效率太低,让设计一种CURD时间复杂度都为O(n)的数据结构。给了接口。接口如下:
internal interface IDictionary<TK, TV> : IEnumerable<KeyValuePair<TK, TV>> { TV this[TK key] { get; set; } int Count { get; } /// <summary> /// 根据key判断元素是否存在 /// </summary> /// <param name="key"></param> /// <returns></returns> bool ContainsKey(TK key); /// <summary> /// 添加元素 /// </summary> /// <param name="key"></param> /// <param name="value"></param> void Add(TK key, TV value); /// <summary> /// 根据key移除元素 /// </summary> /// <param name="key"></param> void Remove(TK key); /// <summary> /// 清除 /// </summary> void Clear(); }
.NET实现线性探测法
实现过程
1. 先来个对象,存储key和value
对象:KeyValuePair
internal class DictionaryKeyValuePair<TK, TV> { internal TK Key; internal TV Value; internal DictionaryKeyValuePair(TK key, TV value) { Key = key; Value = value; } }
2. 来个类OpenAddressDictionary,继承IDictionary接口,就是我们的实现类
实现类:OpenAddressDictionary
/// <summary> /// 使用线性探测法实现哈希表 /// </summary> /// <typeparam name="TK"></typeparam> /// <typeparam name="TV"></typeparam> public class OpenAddressDictionary<TK, TV> : IDictionary<TK, TV> { //创建一个数组,用来存储元素 private DictionaryKeyValuePair<TK, TV>[] hashArray; //记录已插入元素的数量 public int Count { get; private set; } public OpenAddressDictionary(int capacity) { if (capacity < 0) throw new ArgumentOutOfRangeException("初始值容量不能小于0"); hashArray = new DictionaryKeyValuePair<TK, TV>[capacity]; } public TV this[TK key] { get => throw new System.NotImplementedException(); set => throw new System.NotImplementedException(); } public void Add(TK key, TV value) { throw new System.NotImplementedException(); } public void Clear() { throw new System.NotImplementedException(); } public System.Boolean ContainsKey(TK key) { throw new System.NotImplementedException(); } public IEnumerator<KeyValuePair<TK, TV>> GetEnumerator() { throw new System.NotImplementedException(); } public void Remove(TK key) { throw new System.NotImplementedException(); } IEnumerator IEnumerable.GetEnumerator() { throw new System.NotImplementedException(); } }
3.如何实现查找?跟着上文查找步骤就行
线性探测:查找
/// <summary> /// 查找,按照上文线性探测查找步骤 /// </summary> /// <param name="key"></param> /// <returns></returns> public bool ContainsKey(TK key) { //1.给定待查找的关键字key,获取原始应该插入的下标index var hashCode = GetHash(key); var index = hashCode % hashArray.Length; //2.如果原始下标index处,元素为空,则所查找的元素不存在 if (hashArray[index] == null) return false; var current = hashArray[index];//当前元素 /*这个点用来判断是否走了一整圈*/ var hitKey = current.Key; //4.否则重复下述解决冲突的过程 while (current != null) { //3.如果index处的元素等于key,则查找成功 if (current.Key.Equals(key)) return true; /*这个地方来修改获取下一个元素位置*/ index++; /*到尾了,但是没有走完一圈*/ if (index == hashArray.Length) index = 0; current = hashArray[index]; /*走完一圈了,没找到*/ if (current != null && current.Key.Equals(hitKey)) break; } return false; }
4. 添加
线性探测:添加
/// <summary> /// 添加元素 /// </summary> /// <param name="key"></param> /// <param name="value"></param> /// <exception cref="Exception"></exception> public void Add(TK key, TV value) { Grow(); //1.获取原始插入位置 var hashCode = GetHash(key); var index = hashCode % hashArray.Length; //2.此位置为空,直接插入 if (hashArray[index] == null) { hashArray[index] = new DictionaryKeyValuePair<TK, TV>(key, value); } //3.坑被占了,去看看下一个 else { var current = hashArray[index]; /*这个点用来判断是否走了一整圈*/ var hitKey = current.Key; while (current != null) { if (current.Key.Equals(key)) throw new Exception("重复key"); /*这个地方来修改获取下一个元素位置*/ index++; /*到尾了,但是没有走完一圈*/ if (index == hashArray.Length) index = 0; current = hashArray[index]; /*走完一圈了,没找到空位*/ if (current != null && current.Key.Equals(hitKey)) throw new Exception("容器满了"); } hashArray[index] = new DictionaryKeyValuePair<TK, TV>(key, value); } Count++; } /// <summary> /// 扩容 /// </summary> private void Grow() { /*这个地方判断使用多少扩容*/ if (hashArray.Length * 0.7 <= Count) { var orghashArray = hashArray.Length; var currentArray = hashArray; /*这个地方改变扩容大小的规则*/ hashArray = new DictionaryKeyValuePair<TK, TV>[hashArray.Length * 2]; for (var i = 0; i < orghashArray; i++) { var current = currentArray[i]; /*旧数组中存在元素,添加到新数组中,Add方法会对Count++,所以加入后要Count--*/ if (current != null) { Add(current.Key, current.Value); Count--; } } currentArray = null; } }
5. 删除
线性探测:删除
/// <summary> /// 删除元素key /// </summary> /// <param name="key"></param> /// <exception cref="Exception"></exception> public void Remove(TK key) { //1.获取原始插入位置 var hashCode = GetHash(key); var curIndex = hashCode % hashArray.Length; //2.此位置为空,无法删除 if (hashArray[curIndex] == null) throw new Exception("未找到元素key"); var current = hashArray[curIndex]; /*这个点用来判断是否走了一整圈*/ var hitKey = current.Key; #region 找到待删除元素 DictionaryKeyValuePair<TK, TV> target = null; while (current != null) { if (current.Key.Equals(key)) { target = current; break; } /*这个地方来修改获取下一个元素位置*/ curIndex++; /*到尾了,但是没有走完一圈*/ if (curIndex == hashArray.Length) curIndex = 0; current = hashArray[curIndex]; /*走完一圈了,没找到空位*/ if (current != null && current.Key.Equals(hitKey)) throw new Exception("No such item for given key"); } if (target == null) { throw new Exception("未找到元素key"); } #endregion //删除,将当前位置置空 hashArray[curIndex] = null; #region 之前讲过删除,造成元素丢失,所以在此处处理 curIndex++; /*到尾了,但是没有走完一圈*/ if (curIndex == hashArray.Length) curIndex = 0; current = hashArray[curIndex]; //直到下一个为空的点,到空说明后边的还没有被线性探测插入污染 while (current != null) { //先删除 hashArray[curIndex] = null; //重新插入 Add(current.Key, current.Value); Count--; curIndex++; /*到尾了,但是没有走完一圈*/ if (curIndex == hashArray.Length) curIndex = 0; current = hashArray[curIndex]; } #endregion Count--; Shrink(); } /// <summary> /// 减容 /// </summary> private void Shrink() { /*这个地方判断元素在什么程度算少*/ if (Count <= hashArray.Length * 0.3 && hashArray.Length / 2 > 0) { var orghashArray = hashArray.Length; var currentArray = hashArray; /*这个地方改变扩容大小的规则*/ hashArray = new DictionaryKeyValuePair<TK, TV>[hashArray.Length / 2]; for (var i = 0; i < orghashArray; i++) { var current = currentArray[i]; /*旧数组中存在元素,添加到新数组中,Add方法会对Count++,所以加入后要Count--*/ if (current != null) { Add(current.Key, current.Value); Count--; } } currentArray = null; } }
最终代码
线性探测:最终代码
/// <summary> /// 使用线性探测法实现哈希表 /// </summary> /// <typeparam name="TK"></typeparam> /// <typeparam name="TV"></typeparam> public class OpenAddressDictionary<TK, TV> : IDictionary<TK, TV> { //创建一个数组,用来存储元素 private DictionaryKeyValuePair<TK, TV>[] hashArray; //记录已插入元素的数量 public int Count { get; private set; } public TV this[TK key] { get => GetValue(key); set => SetValue(key, value); } public OpenAddressDictionary(int capacity) { if (capacity < 0) throw new ArgumentOutOfRangeException("初始值容量不能小于0"); hashArray = new DictionaryKeyValuePair<TK, TV>[capacity]; } /// <summary> /// 清除最简单 /// </summary> public void Clear() { if (Count > 0) Array.Clear(hashArray, 0, hashArray.Length); } /// <summary> /// 查找,按照上文线性探测查找步骤 /// </summary> /// <param name="key"></param> /// <returns></returns> public bool ContainsKey(TK key) { //1.给定待查找的关键字key,获取原始应该插入的下标index var hashCode = GetHash(key); var index = hashCode % hashArray.Length; //2.如果原始下标index处,元素为空,则所查找的元素不存在 if (hashArray[index] == null) return false; var current = hashArray[index];//当前元素 /*这个点用来判断是否走了一整圈*/ var hitKey = current.Key; //4.否则重复下述解决冲突的过程 while (current != null) { //3.如果index处的元素等于key,则查找成功 if (current.Key.Equals(key)) return true; /*这个地方来修改获取下一个元素位置*/ index++; /*到尾了,但是没有走完一圈*/ if (index == hashArray.Length) index = 0; current = hashArray[index]; /*走完一圈了,没找到*/ if (current != null && current.Key.Equals(hitKey)) break; } return false; } /// <summary> /// 添加元素 /// </summary> /// <param name="key"></param> /// <param name="value"></param> /// <exception cref="Exception"></exception> public void Add(TK key, TV value) { Grow(); //1.获取原始插入位置 var hashCode = GetHash(key); var index = hashCode % hashArray.Length; //2.此位置为空,直接插入 if (hashArray[index] == null) { hashArray[index] = new DictionaryKeyValuePair<TK, TV>(key, value); } //3.坑被占了,去看看下一个 else { var current = hashArray[index]; /*这个点用来判断是否走了一整圈*/ var hitKey = current.Key; while (current != null) { if (current.Key.Equals(key)) throw new Exception("重复key"); /*这个地方来修改获取下一个元素位置*/ index++; /*到尾了,但是没有走完一圈*/ if (index == hashArray.Length) index = 0; current = hashArray[index]; /*走完一圈了,没找到空位*/ if (current != null && current.Key.Equals(hitKey)) throw new Exception("容器满了"); } hashArray[index] = new DictionaryKeyValuePair<TK, TV>(key, value); } Count++; } /// <summary> /// 删除元素key /// </summary> /// <param name="key"></param> /// <exception cref="Exception"></exception> public void Remove(TK key) { //1.获取原始插入位置 var hashCode = GetHash(key); var curIndex = hashCode % hashArray.Length; //2.此位置为空,无法删除 if (hashArray[curIndex] == null) throw new Exception("未找到元素key"); var current = hashArray[curIndex]; /*这个点用来判断是否走了一整圈*/ var hitKey = current.Key; #region 找到待删除元素 DictionaryKeyValuePair<TK, TV> target = null; while (current != null) { if (current.Key.Equals(key)) { target = current; break; } /*这个地方来修改获取下一个元素位置*/ curIndex++; /*到尾了,但是没有走完一圈*/ if (curIndex == hashArray.Length) curIndex = 0; current = hashArray[curIndex]; /*走完一圈了,没找到空位*/ if (current != null && current.Key.Equals(hitKey)) throw new Exception("No such item for given key"); } if (target == null) { throw new Exception("未找到元素key"); } #endregion //删除,将当前位置置空 hashArray[curIndex] = null; #region 之前讲过删除,造成元素丢失,所以在此处处理 curIndex++; /*到尾了,但是没有走完一圈*/ if (curIndex == hashArray.Length) curIndex = 0; current = hashArray[curIndex]; //直到下一个为空的点,到空说明后边的还没有被线性探测插入污染 while (current != null) { //先删除 hashArray[curIndex] = null; //重新插入 Add(current.Key, current.Value); Count--; curIndex++; /*到尾了,但是没有走完一圈*/ if (curIndex == hashArray.Length) curIndex = 0; current = hashArray[curIndex]; } #endregion Count--; Shrink(); } /// <summary> /// 扩容 /// </summary> private void Grow() { /*这个地方判断使用多少扩容*/ if (hashArray.Length * 0.7 <= Count) { var orghashArray = hashArray.Length; var currentArray = hashArray; /*这个地方改变扩容大小的规则*/ hashArray = new DictionaryKeyValuePair<TK, TV>[hashArray.Length * 2]; for (var i = 0; i < orghashArray; i++) { var current = currentArray[i]; /*旧数组中存在元素,添加到新数组中,Add方法会对Count++,所以加入后要Count--*/ if (current != null) { Add(current.Key, current.Value); Count--; } } currentArray = null; } } /// <summary> /// 减容 /// </summary> private void Shrink() { /*这个地方判断元素在什么程度算少*/ if (Count <= hashArray.Length * 0.3 && hashArray.Length / 2 > 0) { var orghashArray = hashArray.Length; var currentArray = hashArray; /*这个地方改变扩容大小的规则*/ hashArray = new DictionaryKeyValuePair<TK, TV>[hashArray.Length / 2]; for (var i = 0; i < orghashArray; i++) { var current = currentArray[i]; /*旧数组中存在元素,添加到新数组中,Add方法会对Count++,所以加入后要Count--*/ if (current != null) { Add(current.Key, current.Value); Count--; } } currentArray = null; } } private void SetValue(TK key, TV value) { var index = GetHash(key) % hashArray.Length; if (hashArray[index] == null) { Add(key, value); } else { var current = hashArray[index]; var hitKey = current.Key; while (current != null) { if (current.Key.Equals(key)) { Remove(key); Add(key, value); return; } index++; //wrap around if (index == hashArray.Length) index = 0; current = hashArray[index]; //reached original hit again if (current != null && current.Key.Equals(hitKey)) throw new Exception("Item not found"); } } throw new Exception("Item not found"); } private TV GetValue(TK key) { var index = GetHash(key) % hashArray.Length; if (hashArray[index] == null) throw new Exception("Item not found"); var current = hashArray[index]; var hitKey = current.Key; while (current != null) { if (current.Key.Equals(key)) return current.Value; index++; //wrap around if (index == hashArray.Length) index = 0; current = hashArray[index]; //reached original hit again if (current != null && current.Key.Equals(hitKey)) throw new Exception("Item not found"); } throw new Exception("Item not found"); } private int GetHash(TK key) { return Math.Abs(key.GetHashCode()); } IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } //迭代器就不写了,想了解看我博客容器栏目 public IEnumerator<KeyValuePair<TK, TV>> GetEnumerator() { throw new System.NotImplementedException(); } } internal class DictionaryKeyValuePair<TK, TV> { internal TK Key; internal TV Value; internal DictionaryKeyValuePair(TK key, TV value) { Key = key; Value = value; } }
.NET实现拉链法
实现过程
回想一下,上边的拉链法,每个下标位置放置的是一个链条,所以我们先实现一个双向链表
1. 实现一个双向链表
拉链法:构建双向链表
internal class DLinkedNode<T> { public T Data; public DLinkedNode<T> Next; public DLinkedNode<T> Previous; public DLinkedNode(T data) { Data = data; } }
2. 创建一个拉链法实体类
拉链法:实现类
/// <summary> /// 拉链法:实现类 /// </summary> /// <typeparam name="TK"></typeparam> /// <typeparam name="TV"></typeparam> internal class SeparateChainingDictionary<TK, TV>:IDictionary<TK, TV> { //构建一个数组,数组每个节点都是链表 private DLinkedNode<KeyValuePair<TK, TV>>[] hashArray; //已使用数组下标个数 private int filledBuckets; public SeparateChainingDictionary(int capacity) { if (capacity < 0) throw new ArgumentOutOfRangeException("初始值容量不能小于0"); hashArray = new DLinkedNode<KeyValuePair<TK, TV>>[capacity]; } public TV this[TK key] { get => throw new NotImplementedException(); set => throw new NotImplementedException(); } public int Count => throw new NotImplementedException(); public void Add(TK key, TV value) { throw new NotImplementedException(); } public void Clear() { throw new NotImplementedException(); } public bool ContainsKey(TK key) { throw new NotImplementedException(); } public void Remove(TK key) { throw new NotImplementedException(); } public IEnumerator<KeyValuePair<TK, TV>> GetEnumerator() { throw new NotImplementedException(); } IEnumerator IEnumerable.GetEnumerator() { throw new NotImplementedException(); } }
3. 拉链法:查找
拉链法:查找
/// <summary> /// 查找 /// </summary> /// <param name="key"></param> /// <returns></returns> public bool ContainsKey(TK key) { /*1.获取原始下标*/ var index = Math.Abs(key.GetHashCode()) % hashArray.Length; /*2.为空即无*/ if (hashArray[index] == null) return false; var current = hashArray[index]; /*3.遍历链表*/ while (current != null) { if (current.Data.Key.Equals(key)) return true; current = current.Next; } return false; }
4. 拉链法:添加
拉链法:添加
/// <summary> /// 添加 /// </summary> /// <param name="key"></param> /// <param name="value"></param> /// <exception cref="Exception"></exception> public void Add(TK key, TV value) { Grow(); var index = Math.Abs(key.GetHashCode()) % hashArray.Length; if (hashArray[index] == null) { hashArray[index] = new DLinkedNode<KeyValuePair<TK, TV>>(new KeyValuePair<TK, TV>(key, value)); filledBuckets++; } else { var current = hashArray[index]; while (current != null && current.Next != null) { /*此处可以判断是重复修改,还是抛异常*/ if (current.Data.Key.Equals(key)) throw new Exception("重复key"); current = current.Next; } if (current.Data.Key.Equals(key)) throw new Exception("重复key"); current.Next = new DLinkedNode<KeyValuePair<TK, TV>>(new KeyValuePair<TK, TV>(key, value)); } Count++; } /// <summary> /// 扩容 /// </summary> private void Grow() { if (filledBuckets >= hashArray.Length * 0.7) { filledBuckets = 0; var newBucketSize = hashArray.Length * 2; var biggerArray = new DLinkedNode<KeyValuePair<TK, TV>>[newBucketSize]; for (var i = 0; i < hashArray.Length; i++) { var item = hashArray[i]; if (item != null) { var current = item; while (current != null) { var next = current.Next; var newIndex = Math.Abs(current.Data.Key.GetHashCode()) % newBucketSize; if (biggerArray[newIndex] == null) { filledBuckets++; biggerArray[newIndex] = current; } var bItem = biggerArray[newIndex]; while(bItem.Next != null) bItem = bItem.Next; bItem.Next = current; current = next; } } } hashArray = biggerArray; } }
5. 拉链法:删除
拉链法:删除
public void Remove(TK key) { var index = Math.Abs(key.GetHashCode()) % hashArray.Length; if (hashArray[index] == null) throw new Exception("未找到key"); var current = hashArray[index]; /*查找待删除元素*/ DLinkedNode<KeyValuePair<TK, TV>> item = null; while (current != null) { if (current.Data.Key.Equals(key)) { item = current; break; } current = current.Next; } if (item == null) { throw new Exception("未找到key"); } /*删除*/ if (item.Next == null) item = null; else { item.Previous = item.Next; item.Next.Previous =item.Previous ; item = null; } if (hashArray[index] == null) { filledBuckets--; } Count--; Shrink(); } private void Shrink() { /*是否减容*/ if (Math.Abs(filledBuckets - hashArray.Length * 0.3) < 0.1 && hashArray.Length / 2 > 0) { filledBuckets = 0; var newBucketSize = hashArray.Length / 2; var smallerArray = new DLinkedNode<KeyValuePair<TK, TV>>[newBucketSize]; for (var i = 0; i < hashArray.Length; i++) { var item = hashArray[i]; if (item != null) { var current = item; /*找到新的存储点*/ while (current != null) { var next = current.Next; var newIndex = Math.Abs(current.Data.Key.GetHashCode()) % newBucketSize; if (smallerArray[newIndex] == null) { filledBuckets++; smallerArray[newIndex] = current; } var newItem = smallerArray[newIndex]; while(newItem.Next != null) newItem = newItem.Next; newItem.Next = current; current = next; } } } hashArray = smallerArray; } }
最终代码
拉链法:最终代码
internal class DLinkedNode<T> { public T Data; public DLinkedNode<T> Next; public DLinkedNode<T> Previous; public DLinkedNode(T data) { Data = data; } } internal class SeparateChainingDictionary<TK, TV> : IDictionary<TK, TV> { //构建一个数组,数组每个节点都是链表 private DLinkedNode<KeyValuePair<TK, TV>>[] hashArray; //已使用数组下标个数 private int filledBuckets; public SeparateChainingDictionary(int capacity) { if (capacity < 0) throw new ArgumentOutOfRangeException("初始值容量不能小于0"); hashArray = new DLinkedNode<KeyValuePair<TK, TV>>[capacity]; } public TV this[TK key] { get => throw new NotImplementedException(); set => throw new NotImplementedException(); } public int Count { get; private set; } /// <summary> /// 添加 /// </summary> /// <param name="key"></param> /// <param name="value"></param> /// <exception cref="Exception"></exception> public void Add(TK key, TV value) { Grow(); var index = Math.Abs(key.GetHashCode()) % hashArray.Length; if (hashArray[index] == null) { hashArray[index] = new DLinkedNode<KeyValuePair<TK, TV>>(new KeyValuePair<TK, TV>(key, value)); filledBuckets++; } else { var current = hashArray[index]; while (current != null && current.Next != null) { /*此处可以判断是重复修改,还是抛异常*/ if (current.Data.Key.Equals(key)) throw new Exception("重复key"); current = current.Next; } if (current.Data.Key.Equals(key)) throw new Exception("重复key"); current.Next = new DLinkedNode<KeyValuePair<TK, TV>>(new KeyValuePair<TK, TV>(key, value)); } Count++; } /// <summary> /// 扩容 /// </summary> private void Grow() { if (filledBuckets >= hashArray.Length * 0.7) { filledBuckets = 0; var newBucketSize = hashArray.Length * 2; var biggerArray = new DLinkedNode<KeyValuePair<TK, TV>>[newBucketSize]; for (var i = 0; i < hashArray.Length; i++) { var item = hashArray[i]; if (item != null) { var current = item; while (current != null) { var next = current.Next; var newIndex = Math.Abs(current.Data.Key.GetHashCode()) % newBucketSize; if (biggerArray[newIndex] == null) { filledBuckets++; biggerArray[newIndex] = current; } var bItem = biggerArray[newIndex]; while(bItem.Next != null) bItem = bItem.Next; bItem.Next = current; current = next; } } } hashArray = biggerArray; } } public void Clear() { throw new NotImplementedException(); } /// <summary> /// 查找 /// </summary> /// <param name="key"></param> /// <returns></returns> public bool ContainsKey(TK key) { /*1.获取原始下标*/ var index = Math.Abs(key.GetHashCode()) % hashArray.Length; /*2.为空即无*/ if (hashArray[index] == null) return false; var current = hashArray[index]; /*3.遍历链表*/ while (current != null) { if (current.Data.Key.Equals(key)) return true; current = current.Next; } return false; } public void Remove(TK key) { var index = Math.Abs(key.GetHashCode()) % hashArray.Length; if (hashArray[index] == null) throw new Exception("未找到key"); var current = hashArray[index]; /*查找待删除元素*/ DLinkedNode<KeyValuePair<TK, TV>> item = null; while (current != null) { if (current.Data.Key.Equals(key)) { item = current; break; } current = current.Next; } if (item == null) { throw new Exception("未找到key"); } /*删除*/ if (item.Next == null) item = null; else { item.Previous = item.Next; item.Next.Previous =item.Previous ; item = null; } if (hashArray[index] == null) { filledBuckets--; } Count--; Shrink(); } private void Shrink() { /*是否减容*/ if (Math.Abs(filledBuckets - hashArray.Length * 0.3) < 0.1 && hashArray.Length / 2 > 0) { filledBuckets = 0; var newBucketSize = hashArray.Length / 2; var smallerArray = new DLinkedNode<KeyValuePair<TK, TV>>[newBucketSize]; for (var i = 0; i < hashArray.Length; i++) { var item = hashArray[i]; if (item != null) { var current = item; /*找到新的存储点*/ while (current != null) { var next = current.Next; var newIndex = Math.Abs(current.Data.Key.GetHashCode()) % newBucketSize; if (smallerArray[newIndex] == null) { filledBuckets++; smallerArray[newIndex] = current; } var newItem = smallerArray[newIndex]; while(newItem.Next != null) newItem = newItem.Next; newItem.Next = current; current = next; } } } hashArray = smallerArray; } } public IEnumerator<KeyValuePair<TK, TV>> GetEnumerator() { throw new NotImplementedException(); } IEnumerator IEnumerable.GetEnumerator() { throw new NotImplementedException(); } }
Dictionary源码分析
模拟实现:一个Dictionary,存储数据{1,'a'},{'4','b'},{5,'c'}
1. 创建一个单链表,用来存储K-V
private struct Entry { public uint hashCode; //值为-1,表示是该链条最后一个节点 //值小于-1,表示已经被删除的自由节点 public int next; public TKey key; // Key of entry public TValue value; // Value of entry }
2. 创建一个数组当桶,还有一个链表数组(核心就这两个数组)
private int[]? _buckets; private Entry[]? _entries;
3. 模拟实现插入{1,'a'},{'4','b'},{5,'c'}
初始化
第一次插入{1,'a'}
第二次插入{'4','b'}
第三次插入{5,'c'}
仔细看一下这三个数据的插入,及数据的变化,应该可以理解_buckets和_entries的关系
4.删除
上边再讲哈希表,包括我们自己实现的代码中,删除一个节点后,都要重新计算后边的位置。如何解决这个问题呢?我们可以使用Entry的next,来表示是否已经删除,小于0就表示是自由节点。
关于删除就这样几个变量:
private int _freeList;//最后一个删除的Entry下标 private int _freeCount;//当前已删除,但是还未重新使用的节点数量 private const int StartOfFreeList = -3;//帮助寻找自由节点的一个常量
看一下StartOfFreeList和_freeList和Entry.next如何寻找自由节点
- 删除时:Entry[i].next=上一层中的StartOfFreeList-_freeList
- 添加&&_freeCount>0:_freeList=StartOfFreeList - entries[_freeList].next
请看图理解:
源码:简化版(debug理解)
源码:简化版可直接运行
public static void Main(string[] args) { Dictionary<int, char> dic = new Dictionary<int, char>(); dic.TryInsert(1, 'a'); dic.TryInsert(4, 'b'); dic.TryInsert(5, 'c'); dic.Remove(4); dic.Remove(5); dic.TryInsert(0, 'd'); dic.TryInsert(1, 'e'); }
public class Dictionary<TKey, TValue> { private int[]? _buckets; private Entry[]? _entries; private int _count; private int _freeList; private int _freeCount; private int _version; private const int StartOfFreeList = -3; public Dictionary() { /*初始值为素数,这里就不动态了,获取素数可以使用埃及筛选法*/ Initialize(7); } private int Initialize(int capacity) { int size = capacity; int[] buckets = new int[size]; Entry[] entries = new Entry[size]; _freeList = -1; _buckets = buckets; _entries = entries; return size; } public bool TryInsert(TKey key, TValue value) { Entry[]? entries = _entries; uint hashCode = (uint)key.GetHashCode(); uint collisionCount = 0; ref int bucket = ref GetBucket(hashCode); int i = bucket - 1; // Value in _buckets is 1-based if (typeof(TKey).IsValueType) { while (true) { if ((uint)i >= (uint)entries.Length) { break; } if (entries[i].hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entries[i].key, key)) { entries[i].value = value; return true; } i = entries[i].next; collisionCount++; if (collisionCount > (uint)entries.Length) { throw new Exception(""); } } } int index; if (_freeCount > 0) { index = _freeList; // Debug.Assert((StartOfFreeList - entries[_freeList].next) >= -1, "shouldn't overflow because `next` cannot underflow"); _freeList = StartOfFreeList - entries[_freeList].next; _freeCount--; } else { int count = _count; if (count == entries.Length) { //Resize(); bucket = ref GetBucket(hashCode); } index = count; _count = count + 1; entries = _entries; } ref Entry entry = ref entries![index]; entry.hashCode = hashCode; entry.next = bucket - 1; // Value in _buckets is 1-based entry.key = key; entry.value = value; // Value in _buckets is 1-based bucket = index + 1; _version++; return true; } public bool Remove(TKey key) { if (key == null) return false; if (_buckets != null) { uint collisionCount = 0; uint hashCode = (uint)key.GetHashCode(); ref int bucket = ref GetBucket(hashCode); Entry[]? entries = _entries; int last = -1; int i = bucket - 1; // Value in buckets is 1-based while (i >= 0) { ref Entry entry = ref entries[i]; if (entry.hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entry.key, key)) { if (last < 0) { bucket = entry.next + 1; } else { entries[last].next = entry.next; } entry.next = StartOfFreeList - _freeList; entry.key = default!; entry.value = default!; _freeList = i; _freeCount++; return true; } last = i; i = entry.next; collisionCount++; if (collisionCount > (uint)entries.Length) { } } } return false; } private ref int GetBucket(uint hashCode) { int[] buckets = _buckets!; return ref buckets[hashCode % (uint)buckets.Length]; } private struct Entry { public uint hashCode; //值为-1,表示是该链条最后一个节点 public int next; public TKey key; // Key of entry public TValue value; // Value of entry }