0%

Java HashMap源代码分析及简单实现

HashMap

HashMap permits null key and null value, HashMap is equivalent to HashTabel except it is unsynchronized and permits nulls. Iteration over collection views requires time proportional to the “capacity” of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings).
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table.The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

Changes in JDK 1.8:The principal idea is that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree. In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).This technique has already been implemented in the latest version of the java.util.concurrent.ConcurrentHashMap class, which is also slated for inclusion in JDK 8 as part of JEP 155. Portions of that code will be re-used to implement the same idea in the HashMap and LinkedHashMap classes. Only the implementations will be changed; no interfaces or specifications will be modified. Some user-visible behaviors, such as iteration order, will change within the bounds of their current specifications.

1
2
3
4
5
6
7
8
9
//constructors
HashMap()
//Constructs an empty HashMap with the default initial capacity (16) and the default load factor (0.75).
HashMap(int initialCapacity)
//Constructs an empty HashMap with the specified initial capacity and the default load factor (0.75).
HashMap(int initialCapacity, float loadFactor)
//Constructs an empty HashMap with the specified initial capacity and load factor.
HashMap(Map<? extends K,? extends V> m)
//Constructs a new HashMap with the same mappings as the specified Map.

HashMap通过维护一个哈希表来存取元素,存取元素的平均时间复杂度都为为O(1)。存取键值对数据时,首先对键值的hashCode()函数返回的int型哈希值进行移位和异或操作,这样做可以尽可能地让键的哈希值分散;然后,得到的哈希值与HashMap数组长度-1的值进行与操作,由于HashMap中哈希表的长度总是2的整数倍,相与的结果在0-数组长度-1之间。

如果多个键值得到相同的哈希值,那么根据键的equals()方法判断桶中的元素是否是同一个元素,同一个桶内的不同元素以拉链的形式连在一起。从jdk1.8开始,如果哈希表table的容量超过(默认64)并且其中一个桶中entry的数量超过TREEIFY_THRESHOLD(默认8),那么这个桶中的entries将使用红黑树结构实现。

影响HashMap性能的两个参数是装载因子和容量,装载因子表示HashMap中键值对的装满程度,它等于键值对个数与桶的个数的比值,容量表示HashMap中桶的个数。容量越大,装载因子越小那么由键值对映射到哈希表时冲突的可能性越小,HashMap的性能越高,但是空间浪费会比较多。HashMap默认桶的个数是16,装载因子0.75.

如果HashMap中键值对的个数超过桶的个数与装载因子的乘积,HashMap的容量扩大一次,并重新建立哈希表。
下面是一个HashMap的简单实现,实现了默认构造函数、静态内部类Entry、get()、put()和Rehash,没有实现带有Capacity和loadfactor参数和Map容器参数的构造函数,没有实现异常、迭代器以及jdk1.8中的红黑树。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
    class MyHashMap<K, V> {
private final int DEFAULT_CAPACITY = 16;
private final float DEFAULT_FACTOR = 0.75F;
private int capacity;
private float loadFactor;
private Entry<K, V>[] table;
int size = 0;

static class Entry<K, V> {
final K key;
V value;
final Entry<K, V> next;
public Entry(K key, V value, Entry<K, V> next) {
this.key = key;
this.value = value;
this.next = next;
}
public K getKey() {
return key;
}
public V getValue() {
return value;
}

public void setValue(V value) {
this.value = value;
}
}

public MyHashMap() {
capacity = DEFAULT_CAPACITY;
loadFactor = DEFAULT_FACTOR;
table = new Entry[capacity];
}

private int hash(K key) {
int h = key.hashCode();
h ^= h >>> 12 ^ h >>> 1;
return h ^ (h >> 7);
}

private int indexOf(K key, int length) {
return hash(key) & (length - 1);
}

public V get(K key) {
int index = indexOf(key, table.length);
Entry<K, V> entry = table[index];
if(entry == null)
return null;
while(entry != null) {
if (entry.getKey().equals(key))
return entry.getValue();
entry = entry.next;
}
return null;
}

public void put(K key, V value) {
int index = indexOf(key, table.length);
for(Entry<K, V> e = table[index]; e != null; e = e.next) {
if(e.getKey().equals(key)) {
e.setValue(value);
return;
}
}
table[index] = new Entry<>(key, value, table[index]);
size++;

if (size >= (int) capacity * loadFactor)
rehash();
}

private void rehash() {
Entry<K, V>[] newTable = new Entry[table.length << 1];
for (int i = 0; i < table.length; ++i) {
for (Entry<K, V> e = table[i]; e != null; e = e.next) {
newTable[indexOf(e.getKey(), newTable.length)] = e;
}
}
table = newTable;
capacity = table.length;
}
}

由于HashMap不是线程安全的,HashMap中采用了fail-fast机制尽可能早地发现迭代过程中的非法修改错误。其实现原理是,每次对HashMap执行修改操作(例如put、clear、remove等)时,modCount值都会加1,迭代器在初始化时会有一个变量expectedModCount变量保存当前的ModCount值,迭代过程中如果发现这两个值不相等,就会抛出ConcurrentModificationException异常。迭代器执行remove操作后会将mdCount值更新为exceptedModCount的值。迭代器的fail-fast机制并不能保证线程之间的同步,只能用来快速检测错误。

put 流程

  1. 判断当前桶是否为空,若为空开始初始化(resize 方法中)
  2. 根据当前 key 的 hashcode 定位到具体的桶中,并判断是否为空,为空表示没有 hash 冲突,直接在当前位置创建一个新桶即可
  3. 如果当前桶有值(Hash 冲突),比较当前桶中 key 与写入 key 是否相等,如果相等覆盖 value
  4. 如果当前桶为红黑树,则按红黑树的方式写入数据
  5. 如果是链表,将当前 key、value 封装成新节点写入当前桶后面;然后判断当前链表大小是否大于预设的阈值,大于时转化为红黑树
  6. 如果在遍历过程中找到 key 相同时,直接退出遍历
  7. 最后判断是否扩容

get 流程

  1. 将 key hash,定位到桶
  2. 如果桶为空,直接返回 null;否则判断桶的第一个位置(可能是链表或红黑树)的 key 是否是查询的 key ,如果是直接返回 value
  3. 如果第一个位置不是,分别按链表或红黑树的方式查找

与 HashTable 的区别

HashMap 和 HashTable 底层都采用数组+链表结构实现,HashMap 可以存储 null 的 key 和 value;HashTable 的 key 和 value 都不能是 null。

HashTable 实现线程安全的方式是锁住整个 HashTable(get、put 方法都是 synchronized)效率低下;HashMap 不是线性安全的,对应的ConcurrentHashMap对并发做了优化,效率更高

HashMap 初始 size 为 16,扩容时 newsize = oldsize * 2;HashTable 初始 size 是 11,newsize = oldsize * 2 + 1

HashMap 计算 index 方法是 index = hash & (tab.length - 1);HashTable 计算 index 方法是 index = (hash & 0x7FFFFFFF) % tab.length

总结

对 HashMap 来说,无论是1.7 还是 1.8,都没有任何同步操作,在 1.7 中可能在 rehash 过程中出现死循环:

两个线程并发 put 时,同时扩容,由于采用了头插法,链表出现环

1.8 中已修复:

在扩容时,保证新链表和原链表保持同样的顺序

参考资料

  1. 一文让你彻底理解 Java HashMap 和 ConcurrentHashMap
  2. HashMap 多线程下死循环分析及JDK8修复