Collections 工具類
Collections 工具類中有自帶的sort()排序方法,比較好奇是怎麼實現的,然後查看源碼,發現不得不歎服這個方法,用盡了各種優化,使得能夠自適應不同的特殊序列的排序,最好的情況時間複雜度O(n),最壞O(nlogn),目前最優的排序算法,而且還是穩定排序
用到了Array類中的sort方法
public static <T> void sort(T[] a, Comparator<? super T> c) {
if (c == null) {
sort(a);
} else {
//當
if (LegacyMergeSort.userRequested)
legacyMergeSort(a, c);
else
TimSort.sort(a, 0, a.length, c, null, 0, 0);
}
}
當使用setProperty()方法設置LegacyMergeSort.userRequested爲true時纔會使用legacyMergeSort()傳統的合併排序(該方法在以後版本會被棄用)
因此我們着重來看下TimSort裏的排序方法
static <T> void sort(T[] a, int lo, int hi, Comparator<? super T> c,
T[] work, int workBase, int workLen) {
assert c != null && a != null && lo >= 0 && lo <= hi && hi <= a.length;
int nRemaining = hi - lo;
if (nRemaining < 2)
return; // Arrays of size 0 and 1 are always sorted
// If array is small, do a "mini-TimSort" with no merges
//當數組大小小於32時,使用後以下方法(優化的二分插入排序)
if (nRemaining < MIN_MERGE) {
int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
binarySort(a, lo, hi, lo + initRunLen, c);
return;
}
/**
* March over the array once, left to right, finding natural runs,
* extending short natural runs to minRun elements, and merging runs
* to maintain stack invariant.
*/
TimSort<T> ts = new TimSort<>(a, c, work, workBase, workLen);
int minRun = minRunLength(nRemaining);
do {
// Identify next run
int runLen = countRunAndMakeAscending(a, lo, hi, c);
// If run is short, extend to min(minRun, nRemaining)
if (runLen < minRun) {
int force = nRemaining <= minRun ? nRemaining : minRun;
binarySort(a, lo, lo + force, lo + runLen, c);
runLen = force;
}
// Push run onto pending-run stack, and maybe merge
ts.pushRun(lo, runLen);
ts.mergeCollapse();
// Advance to find next run
lo += runLen;
nRemaining -= runLen;
} while (nRemaining != 0);
// Merge all remaining runs to complete sort
assert lo == hi;
ts.mergeForceCollapse();
assert ts.stackSize == 1;
}
if (nRemaining < MIN_MERGE)
其中countRunAndMakeAscending()源碼如下:作用是返回從lo開始連續升序(降序則會反轉爲升序)的元素個數,如1,2,3,8,1,2,3則返回3;降序同理
然後進行二分插入排序,binarySort()源碼如下:
private static <T> void binarySort(T[] a, int lo, int hi, int start,
Comparator<? super T> c) {
assert lo <= start && start <= hi;
if (start == lo)
start++;
for ( ; start < hi; start++) {
T pivot = a[start];
// Set left (and right) to the index where a[start] (pivot) belongs
int left = lo;
int right = start;
assert left <= right;
/*
* Invariants:
* pivot >= all in [lo, left).
* pivot < all in [right, start).
*/
while (left < right) {
int mid = (left + right) >>> 1;
if (c.compare(pivot, a[mid]) < 0)
right = mid;
else
left = mid + 1;
}
assert left == right;
/*
* The invariants still hold: pivot >= all in [lo, left) and
* pivot < all in [left, start), so pivot belongs at left. Note
* that if there are elements equal to pivot, left points to the
* first slot after them -- that's why this sort is stable.
* Slide elements over to make room for pivot.
*/
int n = start - left; // The number of elements to move
// Switch is just an optimization for arraycopy in default case
switch (n) {
case 2: a[left + 2] = a[left + 1];
case 1: a[left + 1] = a[left];
break;
default: System.arraycopy(a, left, a, left + 1, n);
}
a[left] = pivot;
}
}
start = lo + runlen
數組分爲兩段[lo, runlen]爲升序,[runlen+1, hi]爲亂序,然後降亂序的元素與升序元素比較,再排序
下面到minRunlength():
private static int minRunLength(int n) {
assert n >= 0;
int r = 0; // Becomes 1 if any 1 bits are shifted off
while (n >= MIN_MERGE) {
r |= (n & 1);
n >>= 1;
}
return n + r;
}
- 如果數組大小爲2的N次冪,則返回16(MIN_MERGE / 2)
- 其他情況下,逐位向右位移(即除以2),直到找到介於16和32間的一個數
下面來看數組大小>32的情況
進入do…while循環,獲取已爲升序的元素個數
do {
// Identify next run
int runLen = countRunAndMakeAscending(a, lo, hi, c);
// If run is short, extend to min(minRun, nRemaining)
if (runLen < minRun) {
int force = nRemaining <= minRun ? nRemaining : minRun;
binarySort(a, lo, lo + force, lo + runLen, c);
runLen = force;
}
// Push run onto pending-run stack, and maybe merge
ts.pushRun(lo, runLen);
ts.mergeCollapse();
// Advance to find next run
lo += runLen;
nRemaining -= runLen;
} while (nRemaining != 0);
在判斷是否小於minRun,若小於使用binarySort來插入補足元素,runLen記錄當前區塊大小
再將[lo, runLen]區塊壓入棧中,然後stackSize+1
再進入mergeCollapse()方法中
private void mergeCollapse() {
while (stackSize > 1) {
int n = stackSize - 2;
if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) {
if (runLen[n - 1] < runLen[n + 1])
n--;
mergeAt(n);
} else if (runLen[n] <= runLen[n + 1]) {
mergeAt(n);
} else {
break; // Invariant is established
}
}
}
當棧中的區塊數>1時,進入while循環,令A,B, C分別爲 = 區塊1,區塊2,區塊3元素個數
- n>0(即區塊數爲3),且A <= B + C時,如果A < C,n–,不小於的話再進入mergeAt(n),對A,B,C進行合併
- 區塊數爲2的時候,如果A < B,將A和B merge
具體合併的方法如下:
private void mergeAt(int i) {
assert stackSize >= 2;
assert i >= 0;
assert i == stackSize - 2 || i == stackSize - 3;
int base1 = runBase[i];
int len1 = runLen[i];
int base2 = runBase[i + 1];
int len2 = runLen[i + 1];
assert len1 > 0 && len2 > 0;
assert base1 + len1 == base2;
/*
* Record the length of the combined runs; if i is the 3rd-last
* run now, also slide over the last run (which isn't involved
* in this merge). The current run (i+1) goes away in any case.
*/
runLen[i] = len1 + len2;
if (i == stackSize - 3) {
runBase[i + 1] = runBase[i + 2];
runLen[i + 1] = runLen[i + 2];
}
stackSize--;
/*
* Find where the first element of run2 goes in run1. Prior elements
* in run1 can be ignored (because they're already in place).
*/
//尋找區塊1的第一個元素應當插入區塊0中哪個位置,
//然後就可以忽略之前區塊0的元素因爲都比區塊1的第一個元素小
int k = gallopRight(a[base2], a, base1, len1, 0, c);
assert k >= 0;
base1 += k;
len1 -= k;
if (len1 == 0)
return;
/*
* Find where the last element of run1 goes in run2. Subsequent elements
* in run2 can be ignored (because they're already in place).
*/
len2 = gallopLeft(a[base1 + len1 - 1], a, base2, len2, len2 - 1, c);
assert len2 >= 0;
if (len2 == 0)
return;
// Merge remaining runs, using tmp array with min(len1, len2) elements
if (len1 <= len2)
mergeLo(base1, len1, base2, len2);
else
mergeHi(base1, len1, base2, len2);
}
以下通過一個示例來說明,如[1,4,5,8] (run1) , [2,6,7,9,12,13] (run2),
gallopRight():尋找run2的第一個元素應當插入run1中哪個位置
base:插入後的起始位置
len1:run1的長度
[base, len1]即爲run1需要歸併的區間
run2中的run2[0]應當插入run1[0]之後,所以k = 1,base = 1, len1 = len1 - k = 3
gallopLeft():尋找run1的最後一個元素應當插入run2中哪個位置
len2 = 插入的位置
[0,len2]爲run2需要歸併的區間
最後,根據兩個需要歸併的區間大小來使用相應的合併方法(用於節省空間)
/*
先把整個TimSort流程看完再來看該合併方法(主要原因是太長了)
*/
再回來看sort()方法裏面最後幾條語句
// Merge all remaining runs to complete sort
assert lo == hi;
ts.mergeForceCollapse();
assert ts.stackSize == 1;
合併最後剩下的單獨的區間,完成排序
參考:
https://www.jianshu.com/p/892ebd063ad9
https://www.coder4.com/archives/4092