jdk集合源码之ArrayList

经常使用jdk提供给我们的集合，比如ArrayList，LinkedList，HashMap等等，还学习过他们之间的不同和相同点，比如ArrayList查询快，新增删除慢，LinkedList则相反，没有看过底层的源码，是没办法理解这些特性的。ArrayList是基于数组的实现，在内存中是一块儿连续的空间，所以查询速度快，但是当涉及到新增和删除的时候，需要涉及到数据的拷贝，而LinkedList是基于链表的实现，更严格来说是双向链表，链表的新增和删除只是涉及到对指针的操作，速度肯定快，但是相应的查询需要遍历链表。所以这两种结构需要分情况来使用，jdk collection包对于迭代器实现非常优美，屏蔽了不同底层数据结构的差异，提供统一的遍历元素接口给上层应用。下面就结合jdk的源码分析一下ArraList的实现。

<span style="background-color: rgb(160, 255, 255);">ArrayList</span>继承自AbstractList，AbstractList主要定义了一些常用的增删改查的接口，注意它也给出了默认的迭代器实现，包括单向和双向的。

private transient Object[] elementData;

上面的数组elementData就是存储我们元素的容器，是一个Object的数组。

先分析下构造方法：

/**
     * Constructs an empty list with the specified initial capacity.
     *
     * @param   initialCapacity   the initial capacity of the list
     * @exception IllegalArgumentException if the specified initial capacity
     *            is negative
     */
    public ArrayList(int initialCapacity) {
	super();
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
	this.elementData = new Object[initialCapacity];
    }

    /**
     * Constructs an empty list with an initial capacity of ten.
     */
    public ArrayList() {
	this(10);
    }

    /**
     * Constructs a list containing the elements of the specified
     * collection, in the order they are returned by the collection's
     * iterator.
     *
     * @param c the collection whose elements are to be placed into this list
     * @throws NullPointerException if the specified collection is null
     */
    public ArrayList(Collection<? extends E> c) {
	elementData = c.toArray();
	size = elementData.length;
	// c.toArray might (incorrectly) not return Object[] (see 6260652)
	if (elementData.getClass() != Object[].class)
	    elementData = Arrays.copyOf(elementData, size, Object[].class);
    }

最常用的是第二种，不带参数的，实际上无参构造方法，调用的是第一个构造函数，并传入10，这个10就是指的数组的容量，注意不是数组中实际存储的数据的size，默认为10。生成一个容量为10的数组。这个方法也很简单，重点看一下最后一种构造函数，依赖传入的collection新建一个ArrayList，

elementData.getClass() != Object[].class

这句话的主要目的是修复jdk的一个bug，bug号为6260652，这个bug主要是因为Arrays.asList方法返回一个List，如果针对该List调用toArray方法，返回的并不是Object[]，而是具体的类型，比如Arrays.asList("a").toArray()返回的是String数组，如果使用该List构造一个ArrayList并且填加一个int类型的数据，会报ArrayStoreException异常，所以需要向上转型为Object[].class类型。

接着看add方法：

public boolean add(E e) {
	ensureCapacity(size + 1);  // Increments modCount!!
	elementData[size++] = e;
	return true;
    }

ensureCapacity方法是一个挺关键的方法，首先的一个功能是确保数组还有剩余的空间，其次它里面对一个非常重要的变量做了++操作：

public void ensureCapacity(int minCapacity) {
	modCount++;
	int oldCapacity = elementData.length;
	if (minCapacity > oldCapacity) {
	    Object oldData[] = elementData;
	    int newCapacity = (oldCapacity * 3)/2 + 1;
    	    if (newCapacity < minCapacity)
		newCapacity = minCapacity;
            // minCapacity is usually close to size, so this is a win:
            elementData = Arrays.copyOf(elementData, newCapacity);
	}
    }

modCount变量是父类中的一个变量，主要的作用是在集合结构改变的时候就++一次，记录集合被改变的次数，modCount只在使用迭代器遍历的时候才使用，防止遍历的时候有并发的操作，或者遍历的时候使用非迭代器提供的方法来改变集合的结构（新增，删除等操作）。实现的原理是当迭代的时候，获取迭代前的modCount值并在每次迭代的时候比较该值是否改变。迭代器的内容后面会讲到，继续看ensureCapacity方法，如果当前数组的容量不够，需要扩容，每次扩大1.5倍加1的容量。之后将新增的元素放到当前list的size位置，size++。

再来看下任意位置的插入操作：

public void add(int index, E element) {
	if (index > size || index < 0)
	    throw new IndexOutOfBoundsException(
		"Index: "+index+", Size: "+size);

	ensureCapacity(size+1);  // Increments modCount!!
	System.arraycopy(elementData, index, elementData, index + 1,
			 size - index);
	elementData[index] = element;
	size++;
    }

假设当前的存储结构如下图：

需要将d插入到位置1，需要分两步操作，将1（包括1）后面的部分向后移动一个位置，将d插入到位置1。移动的操作使用的是System.arraycopy方法，

从该数组的1（包括1）开始，复制到该数组的index+1=2位置开始，然后复制size-index个元素，也即将index包括index后面的元素全部后移index+1-index=1个位置。

分析完单个元素的插入，再来分析更复杂的多个元素的随机插入。

<pre name="code" class="java">public boolean addAll(int index, Collection<? extends E> c) {
	if (index > size || index < 0)
	    throw new IndexOutOfBoundsException(
		"Index: " + index + ", Size: " + size);

	Object[] a = c.toArray();
	int numNew = a.length;
	ensureCapacity(size + numNew);  // Increments modCount

	int numMoved = size - index;
	if (numMoved > 0)
	    System.arraycopy(elementData, index, elementData, index + numNew,
			     numMoved);

        System.arraycopy(a, 0, elementData, index, numNew);
	size += numNew;
	return numNew != 0;
    }

假设当前的初始结构如下，需要插入的集合元素为d和e：

addAll方法中的numNew为待插入集合的长度，该长度也是位置index及其后面的元素需要移动的位置数。如果直接插入到ArrayList的末尾，则直接应用一次arraycopy函数，假设需要插入到3的位置则将a的元素从0开始复制到elementData数组从位置index=3开始的位置，复制numNew=2个元素。

如果插入的位置不在末尾，在情况稍微复杂一点，需要先移动元素，假设插入的位置为1，则需要将从1开始的后面的所有元素size-index=2，向后移动numNew=2个元素。

即System.arraycopy(element,1,element,3,2)，之后继续执行arraycopy函数，完成整个插入的动作。

新增全部分析完，可以看到主要是通过System.arraycopy函数完成，再来看一下remove的操作，可以想象应该也是通过System.arraycopy完成，只不过是需要向前移动。

public E remove(int index) {
	RangeCheck(index);

	modCount++;
	E oldValue = (E) elementData[index];

	int numMoved = size - index - 1;
	if (numMoved > 0)
	    System.arraycopy(elementData, index+1, elementData, index,
			     numMoved);
	elementData[--size] = null; // Let gc do its work

	return oldValue;
    }

假设初始的结构如下

现在需要删除的元素是e，也即index为2，需要将index+1=3及其后面的size-index-1=2个元素向前移动一个位置。移动完成后的结构：

然后将--size位置的元素置为null。新增删除方法看完，再来看看查询的操作。

public int indexOf(Object o) {
	if (o == null) {
	    for (int i = 0; i < size; i++)
		if (elementData[i]==null)
		    return i;
	} else {
	    for (int i = 0; i < size; i++)
		if (o.equals(elementData[i]))
		    return i;
	}
	return -1;
    }

ArrayList的查找是从头往后开始，返回第一个满足条件的位置，且ArrayList中可以存储null对象，因此查找动作需要分null和非null两种情况，如果没有找到则返回-1。

public void trimToSize() {
	modCount++;
	int oldCapacity = elementData.length;
	if (size < oldCapacity) {
            elementData = Arrays.copyOf(elementData, size);
	}
    }

trimToSize的方法名已经表明了该函数的作用，就像cleancode中讲到的方法命名需要做到知名达意，该方法就是将当前未使用到得数组空间删除，使capacity正好等于size，使用的Arrays.copyOf方法完成，因此这个方法也不能随意调用，设计到拷贝动作，还是很浪费资源的。

/**
     * Returns a shallow copy of this <tt>ArrayList</tt> instance.  (The
     * elements themselves are not copied.)
     *
     * @return a clone of this <tt>ArrayList</tt> instance
     */
    public Object clone() {
	try {
	    ArrayList<E> v = (ArrayList<E>) super.clone();
	    v.elementData = Arrays.copyOf(elementData, size);
	    v.modCount = 0;
	    return v;
	} catch (CloneNotSupportedException e) {
	    // this shouldn't happen, since we are Cloneable
	    throw new InternalError();
	}
    }

ArrayList的clone方法就像注释上描述的一样，返回的是对象的潜拷贝，如果存储的是对象，拷贝的只是对象的引用。

ArrayList继承自AbstractList，并且实现了List<E>, RandomAccess, Cloneable, java.io.Serializable这四个接口，后面三个接口都是标识接口，接口中并没有任何方法，分别表明ArrayList支持随机访问，克隆，序列化，主要是方便instanceof方法来进行识别。如果一个类没有实现Cloneable接口，则调用Object的clone方法会抛出CloneNotSupportedException异常。我们主要来看下AbstractList为子类提供的迭代器的功能。首先看单向的迭代器，只能向后迭代，Itr是AbstractList中的内部类，主要有三个成员变量，cursor初始值为0标示当前迭代的元素，lastRet，expectedModCount初始值为modCount的值，检测迭代期间集合接口是否被非法的修改。

hasNext如果cursor!=size则返回true，否则false，cursor的移动操作，在next方法中，每当调用next的时候，先保存当前cursor指向的值，然后赋值给lastRet，最后cursor++；所以这里的lastRet（非-1）永远指向当前返回的元素的下标。目前还没看出来lastRet究竟有什么用，不急，先看下remove方法（迭代期间可以安全的进行删除操作）。

public void remove() {
	    if (lastRet == -1)
		throw new IllegalStateException();
            checkForComodification();

	    try {
		AbstractList.this.remove(lastRet);
		if (lastRet < cursor)
		    cursor--;
		lastRet = -1;
		expectedModCount = modCount;
	    } catch (IndexOutOfBoundsException e) {
		throw new ConcurrentModificationException();
	    }
	}

如果lastRet==-1则抛出异常，那么什么时候lastRet为-1，首先初始化的时候，也就是迭代的时候，什么都没做，先去remove，第二种情况是当针对当前元素做过一次remove的时候，再次调用remove的时候。AbstractList.this.remove(lastRet);方法表明删除的是当前的元素，由于删除操作涉及到modCount的改变，所以需要对expectedModCount重新一次赋值。

普通的单向迭代器只提供了一个remove操作，双向的迭代器在继承单向迭代器的基础上还提供了set和add操作，支持指针的向前移动。

public boolean hasPrevious() {
	    return cursor != 0;
	}

        public E previous() {
            checkForComodification();
            try {
                int i = cursor - 1;
                E previous = get(i);
                lastRet = cursor = i;
                return previous;
            } catch (IndexOutOfBoundsException e) {
                checkForComodification();
                throw new NoSuchElementException();
            }
        }

如果cursor!=0表明前面还有元素，previous返回当前下标的前一个节点，比如当前cursor为5则返回下标为4的元素，并且lastRet等于cursor，分别做-1操作。也就是都分别指向本次迭代返回的元素。再来看看修改操作：

public void set(E e) {
	    if (lastRet == -1)
		throw new IllegalStateException();
            checkForComodification();

	    try {
		AbstractList.this.set(lastRet, e);
		expectedModCount = modCount;
	    } catch (IndexOutOfBoundsException ex) {
		throw new ConcurrentModificationException();
	    }
	}

由于可以更新多次，这里并没有将lastRet置为-1。注意set操作也只能在next操作后才能正常使用。

public void add(E e) {
            checkForComodification();

	    try {
		AbstractList.this.add(cursor++, e);
		lastRet = -1;
		expectedModCount = modCount;
	    } catch (IndexOutOfBoundsException ex) {
		throw new ConcurrentModificationException();
	    }
	}
    }

add操作，针对当前的元素在其后面增加新的元素，并且lastRet置为-1。add操作完成后不能立马调用remove和set操作。

迭代期间增删改并不影响我们的迭代过程。

jdk集合源码之ArrayList

SQL优化-20231016

各種排序算法python和java實現(二)

jvm的happens-before原則

關於類的初始化

一個關於awk命令和sort命令的小例子

從一道題目看類加載

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結