Swift Beta性能:排序數組

本文翻譯自:Swift Beta performance: sorting arrays

I was implementing an algorithm in Swift Beta and noticed that the performance was very poor. 我在Swift Beta中實現了一個算法,並注意到性能非常差。 After digging deeper I realized that one of the bottlenecks was something as simple as sorting arrays. 在深入挖掘之後,我意識到其中一個瓶頸就像排序數組一樣簡單。 The relevant part is here: 相關部分在這裏:

let n = 1000000
var x =  [Int](repeating: 0, count: n)
for i in 0..<n {
    x[i] = random()
}
// start clock here
let y = sort(x)
// stop clock here

In C++, a similar operation takes 0.06s on my computer. 在C ++中,類似的操作在我的計算機上需要0.06秒

In Python, it takes 0.6s (no tricks, just y = sorted(x) for a list of integers). 在Python中,它需要0.6秒 (沒有技巧,只有y =排序(x)表示整數列表)。

In Swift it takes 6s if I compile it with the following command: 在Swift中,如果我使用以下命令編譯它需要6秒

xcrun swift -O3 -sdk `xcrun --show-sdk-path --sdk macosx`

And it takes as much as 88s if I compile it with the following command: 如果我使用以下命令編譯它需要多達88秒

xcrun swift -O0 -sdk `xcrun --show-sdk-path --sdk macosx`

Timings in Xcode with "Release" vs. "Debug" builds are similar. Xcode中的“釋放”與“調試”版本的計時相似。

What is wrong here? 這有什麼不對? I could understand some performance loss in comparison with C++, but not a 10-fold slowdown in comparison with pure Python. 與C ++相比,我可以理解一些性能損失,但與純Python相比,速度沒有降低10倍。


Edit: weather noticed that changing -O3 to -Ofast makes this code run almost as fast as the C++ version! 編輯:天氣注意到將-O3更改爲-Ofast使得此代碼的運行速度幾乎與C ++版本一樣快! However, -Ofast changes the semantics of the language a lot — in my testing, it disabled the checks for integer overflows and array indexing overflows . 但是, -Ofast改變了語言的語義 - 在我的測試中,它禁止檢查整數溢出和數組索引溢出 For example, with -Ofast the following Swift code runs silently without crashing (and prints out some garbage): 例如,使用-Ofast ,以下Swift代碼以靜默方式運行而不會崩潰(並打印出一些垃圾):

let n = 10000000
print(n*n*n*n*n)
let x =  [Int](repeating: 10, count: n)
print(x[n])

So -Ofast is not what we want; 所以-Ofast不是我們想要的; the whole point of Swift is that we have the safety nets in place. 斯威夫特的全部意義在於我們有安全網。 Of course, the safety nets have some impact on the performance, but they should not make the programs 100 times slower. 當然,安全網對性能有一些影響,但它們不應該使程序慢100倍。 Remember that Java already checks for array bounds, and in typical cases, the slowdown is by a factor much less than 2. And in Clang and GCC we have got -ftrapv for checking (signed) integer overflows, and it is not that slow, either. 請記住,Java已經檢查了數組邊界,並且在典型情況下,減速是一個遠小於2的因素。在Clang和GCC中,我們有-ftrapv用於檢查(簽名)整數溢出,並且它不是那麼慢,無論是。

Hence the question: how can we get reasonable performance in Swift without losing the safety nets? 因此,問題是:如何在不丟失安全網的情況下在Swift中獲得合理的性能?


Edit 2: I did some more benchmarking, with very simple loops along the lines of 編輯2:我做了一些基準測試,非常簡單的循環

for i in 0..<n {
    x[i] = x[i] ^ 12345678
}

(Here the xor operation is there just so that I can more easily find the relevant loop in the assembly code. I tried to pick an operation that is easy to spot but also "harmless" in the sense that it should not require any checks related to integer overflows.) (這裏的xor操作只是爲了讓我可以更容易地在彙編代碼中找到相關的循環。我試圖選擇一個易於發現但也“無害”的操作,因爲它不需要任何相關的檢查到整數溢出。)

Again, there was a huge difference in the performance between -O3 and -Ofast . 同樣, -O3-Ofast之間的性能差異-Ofast So I had a look at the assembly code: 所以我看了一下彙編代碼:

  • With -Ofast I get pretty much what I would expect. 隨着-Ofast我得到了我所期望的。 The relevant part is a loop with 5 machine language instructions. 相關部分是一個包含5個機器語言指令的循環。

  • With -O3 I get something that was beyond my wildest imagination. 有了-O3我得到的東西超出了我的想象力。 The inner loop spans 88 lines of assembly code. 內環跨越88行彙編代碼。 I did not try to understand all of it, but the most suspicious parts are 13 invocations of "callq _swift_retain" and another 13 invocations of "callq _swift_release". 我沒有嘗試理解所有這些,但最可疑的部分是13次調用“callq _swift_retain”和另外13次調用“callq _swift_release”。 That is, 26 subroutine calls in the inner loop ! 也就是說, 內循環中有26個子程序調用


Edit 3: In comments, Ferruccio asked for benchmarks that are fair in the sense that they do not rely on built-in functions (eg sort). 編輯3:在評論中,Ferruccio要求提供公平的基準,因爲他們不依賴於內置函數(例如排序)。 I think the following program is a fairly good example: 我認爲以下程序是一個相當好的例子:

let n = 10000
var x = [Int](repeating: 1, count: n)
for i in 0..<n {
    for j in 0..<n {
        x[i] = x[j]
    }
}

There is no arithmetic, so we do not need to worry about integer overflows. 沒有算術,所以我們不需要擔心整數溢出。 The only thing that we do is just lots of array references. 我們唯一做的就是大量的數組引用。 And the results are here—Swift -O3 loses by a factor almost 500 in comparison with -Ofast: 結果在這裏 - 與-Ofast相比,Swift -O3損失了近500倍:

  • C++ -O3: 0.05 s C ++ -O3: 0.05秒
  • C++ -O0: 0.4 s C ++ -O0:0.4秒
  • Java: 0.2 s Java: 0.2秒
  • Python with PyPy: 0.5 s 使用PyPy的Python:0.5秒
  • Python: 12 s Python: 12秒
  • Swift -Ofast: 0.05 s Swift -Ofast:0.05秒
  • Swift -O3: 23 s Swift -O3: 23秒
  • Swift -O0: 443 s Swift -O0:443秒

(If you are concerned that the compiler might optimize out the pointless loops entirely, you can change it to eg x[i] ^= x[j] , and add a print statement that outputs x[0] . This does not change anything; the timings will be very similar.) (如果您擔心編譯器可能會完全優化無意義循環,您可以將其更改爲例如x[i] ^= x[j] ,並添加一個輸出x[0]的print語句。這不會改變任何內容;時間將非常相似。)

And yes, here the Python implementation was a stupid pure Python implementation with a list of ints and nested for loops. 是的,這裏的Python實現是一個愚蠢的純Python實現,帶有一個int列表和嵌套for循環。 It should be much slower than unoptimized Swift. 它應該比未優化雨燕慢得多 Something seems to be seriously broken with Swift and array indexing. 使用Swift和數組索引似乎嚴重破壞了某些東西。


Edit 4: These issues (as well as some other performance issues) seems to have been fixed in Xcode 6 beta 5. 編輯4:這些問題(以及一些其他性能問題)似乎已在Xcode 6 beta 5中得到修復。

For sorting, I now have the following timings: 爲了排序,我現在有以下時間:

  • clang++ -O3: 0.06 s clang ++ -O3:0.06 s
  • swiftc -Ofast: 0.1 s swiftc -Ofast:0.1秒
  • swiftc -O: 0.1 s swiftc -O:0.1秒
  • swiftc: 4 s swiftc:4秒

For nested loops: 對於嵌套循環:

  • clang++ -O3: 0.06 s clang ++ -O3:0.06 s
  • swiftc -Ofast: 0.3 s swiftc -Ofast:0.3秒
  • swiftc -O: 0.4 s swiftc -O:0.4 s
  • swiftc: 540 s swiftc:540秒

It seems that there is no reason anymore to use the unsafe -Ofast (aka -Ounchecked ); 似乎沒有理由再使用unsafe -Ofast (aka -Ounchecked ); plain -O produces equally good code. plain -O產生同樣好的代碼。


#1樓

參考:https://stackoom.com/question/1d7xO/Swift-Beta性能-排序數組


#2樓

From The Swift Programming Language : 來自The Swift Programming Language

The Sort Function Swift's standard library provides a function called sort, which sorts an array of values of a known type, based on the output of a sorting closure that you provide. Sort函數Swift的標準庫提供了一個名爲sort的函數,它根據您提供的排序閉包的輸出對已知類型的值數組進行排序。 Once it completes the sorting process, the sort function returns a new array of the same type and size as the old one, with its elements in the correct sorted order. 完成排序過程後,sort函數返回一個與舊數組相同類型和大小的新數組,其元素按正確的排序順序排列。

The sort function has two declarations. sort函數有兩個聲明。

The default declaration which allows you to specify a comparison closure: 允許您指定比較閉包的默認聲明:

func sort<T>(array: T[], pred: (T, T) -> Bool) -> T[]

And a second declaration that only take a single parameter (the array) and is "hardcoded to use the less-than comparator." 第二個聲明只接受一個參數(數組),並且“硬編碼使用less-than比較器”。

func sort<T : Comparable>(array: T[]) -> T[]

Example:
sort( _arrayToSort_ ) { $0 > $1 }

I tested a modified version of your code in a playground with the closure added on so I could monitor the function a little more closely, and I found that with n set to 1000, the closure was being called about 11,000 times. 我在操場上測試了你的代碼的修改版本,並添加了閉包,這樣我可以更接近地監視函數,並且我發現當n設置爲1000時,閉包被調用大約11,000次。

let n = 1000
let x = Int[](count: n, repeatedValue: 0)
for i in 0..n {
    x[i] = random()
}
let y = sort(x) { $0 > $1 }

It is not an efficient function, an I would recommend using a better sorting function implementation. 它不是一個有效的功能,我建議使用更好的排序功能實現。

EDIT: 編輯:

I took a look at the Quicksort wikipedia page and wrote a Swift implementation for it. 我看了一下Quicksort維基百科頁面併爲它編寫了一個Swift實現。 Here is the full program I used (in a playground) 這是我用過的完整程序(在操場上)

import Foundation

func quickSort(inout array: Int[], begin: Int, end: Int) {
    if (begin < end) {
        let p = partition(&array, begin, end)
        quickSort(&array, begin, p - 1)
        quickSort(&array, p + 1, end)
    }
}

func partition(inout array: Int[], left: Int, right: Int) -> Int {
    let numElements = right - left + 1
    let pivotIndex = left + numElements / 2
    let pivotValue = array[pivotIndex]
    swap(&array[pivotIndex], &array[right])
    var storeIndex = left
    for i in left..right {
        let a = 1 // <- Used to see how many comparisons are made
        if array[i] <= pivotValue {
            swap(&array[i], &array[storeIndex])
            storeIndex++
        }
    }
    swap(&array[storeIndex], &array[right]) // Move pivot to its final place
    return storeIndex
}

let n = 1000
var x = Int[](count: n, repeatedValue: 0)
for i in 0..n {
    x[i] = Int(arc4random())
}

quickSort(&x, 0, x.count - 1) // <- Does the sorting

for i in 0..n {
    x[i] // <- Used by the playground to display the results
}

Using this with n=1000, I found that 使用n = 1000,我發現了

  1. quickSort() got called about 650 times, quickSort()被召喚約650次,
  2. about 6000 swaps were made, 大約6000掉期交易,
  3. and there are about 10,000 comparisons 並且有大約10,000個比較

It seems that the built-in sort method is (or is close to) quick sort, and is really slow... 似乎內置的排序方法是(或接近)快速排序,而且非常慢......


#3樓

tl;dr Swift 1.0 is now as fast as C by this benchmark using the default release optimisation level [-O]. tl; Dr Swift 1.0現在使用默認版本優化級別[-O]通過此基準測試與C一樣快。


Here is an in-place quicksort in Swift Beta: 這是Swift Beta中的就地快速排序:

func quicksort_swift(inout a:CInt[], start:Int, end:Int) {
    if (end - start < 2){
        return
    }
    var p = a[start + (end - start)/2]
    var l = start
    var r = end - 1
    while (l <= r){
        if (a[l] < p){
            l += 1
            continue
        }
        if (a[r] > p){
            r -= 1
            continue
        }
        var t = a[l]
        a[l] = a[r]
        a[r] = t
        l += 1
        r -= 1
    }
    quicksort_swift(&a, start, r + 1)
    quicksort_swift(&a, r + 1, end)
}

And the same in C: 在C中也一樣:

void quicksort_c(int *a, int n) {
    if (n < 2)
        return;
    int p = a[n / 2];
    int *l = a;
    int *r = a + n - 1;
    while (l <= r) {
        if (*l < p) {
            l++;
            continue;
        }
        if (*r > p) {
            r--;
            continue;
        }
        int t = *l;
        *l++ = *r;
        *r-- = t;
    }
    quicksort_c(a, r - a + 1);
    quicksort_c(l, a + n - l);
}

Both work: 兩者都有效:

var a_swift:CInt[] = [0,5,2,8,1234,-1,2]
var a_c:CInt[] = [0,5,2,8,1234,-1,2]

quicksort_swift(&a_swift, 0, a_swift.count)
quicksort_c(&a_c, CInt(a_c.count))

// [-1, 0, 2, 2, 5, 8, 1234]
// [-1, 0, 2, 2, 5, 8, 1234]

Both are called in the same program as written. 兩者都在與編寫的程序中調用。

var x_swift = CInt[](count: n, repeatedValue: 0)
var x_c = CInt[](count: n, repeatedValue: 0)
for var i = 0; i < n; ++i {
    x_swift[i] = CInt(random())
    x_c[i] = CInt(random())
}

let swift_start:UInt64 = mach_absolute_time();
quicksort_swift(&x_swift, 0, x_swift.count)
let swift_stop:UInt64 = mach_absolute_time();

let c_start:UInt64 = mach_absolute_time();
quicksort_c(&x_c, CInt(x_c.count))
let c_stop:UInt64 = mach_absolute_time();

This converts the absolute times to seconds: 這會將絕對時間轉換爲秒:

static const uint64_t NANOS_PER_USEC = 1000ULL;
static const uint64_t NANOS_PER_MSEC = 1000ULL * NANOS_PER_USEC;
static const uint64_t NANOS_PER_SEC = 1000ULL * NANOS_PER_MSEC;

mach_timebase_info_data_t timebase_info;

uint64_t abs_to_nanos(uint64_t abs) {
    if ( timebase_info.denom == 0 ) {
        (void)mach_timebase_info(&timebase_info);
    }
    return abs * timebase_info.numer  / timebase_info.denom;
}

double abs_to_seconds(uint64_t abs) {
    return abs_to_nanos(abs) / (double)NANOS_PER_SEC;
}

Here is a summary of the compiler's optimazation levels: 以下是編譯器優化級別的摘要:

[-Onone] no optimizations, the default for debug.
[-O]     perform optimizations, the default for release.
[-Ofast] perform optimizations and disable runtime overflow checks and runtime type checks.

Time in seconds with [-Onone] for n=10_000 : 對於n = 10_000[-Onone]的時間以秒爲單位

Swift:            0.895296452
C:                0.001223848

Here is Swift's builtin sort() for n=10_000 : 這是Swift的內置排序(),用於n = 10_000

Swift_builtin:    0.77865783

Here is [-O] for n=10_000 : 對於n = 10_000,這是[-O]

Swift:            0.045478346
C:                0.000784666
Swift_builtin:    0.032513488

As you can see, Swift's performance improved by a factor of 20. 如您所見,Swift的性能提高了20倍。

As per mweathers' answer , setting [-Ofast] makes the real difference, resulting in these times for n=10_000 : 根據mweathers的回答 ,設置[-Ofast]會產生真正的差異,導致n = 10_000的這些時間:

Swift:            0.000706745
C:                0.000742374
Swift_builtin:    0.000603576

And for n=1_000_000 : 對於n = 1_000_000

Swift:            0.107111846
C:                0.114957179
Swift_sort:       0.092688548

For comparison, this is with [-Onone] for n=1_000_000 : 爲了比較,對於n = 1_000_000 ,這是[-Onone]

Swift:            142.659763258
C:                0.162065333
Swift_sort:       114.095478272

So Swift with no optimizations was almost 1000x slower than C in this benchmark, at this stage in its development. 因此,在這個基準測試中,沒有優化的Swift在開發的這個階段比C慢了近1000倍。 On the other hand with both compilers set to [-Ofast] Swift actually performed at least as well if not slightly better than C. 另一方面,兩個編譯器都設置爲[-Ofast] Swift實際上至少表現得好,如果不是比C略好一點。

It has been pointed out that [-Ofast] changes the semantics of the language, making it potentially unsafe. 已經指出[-Ofast]改變了語言的語義,使其可能不安全。 This is what Apple states in the Xcode 5.0 release notes: 這就是Apple在Xcode 5.0發行說明中所說的:

A new optimization level -Ofast, available in LLVM, enables aggressive optimizations. LLVM中提供的新優化級別-Ofast可實現積極的優化。 -Ofast relaxes some conservative restrictions, mostly for floating-point operations, that are safe for most code. -Ofast放鬆了一些保守的限制,主要用於浮點運算,對大多數代碼都是安全的。 It can yield significant high-performance wins from the compiler. 它可以從編譯器中獲得顯着的高性能勝利。

They all but advocate it. 他們都提倡它。 Whether that's wise or not I couldn't say, but from what I can tell it seems reasonable enough to use [-Ofast] in a release if you're not doing high-precision floating point arithmetic and you're confident no integer or array overflows are possible in your program. 這是否明智我不能說,但從我可以說的是,如果你沒有進行高精度浮點運算並且你確信沒有整數或者一個版本,那麼在一個版本中使用[-Ofast]似乎是合理的。您的程序中可能存在數組溢出。 If you do need high performance and overflow checks / precise arithmetic then choose another language for now. 如果您確實需要高性能溢出檢查/精確算術,那麼現在就選擇另一種語言。

BETA 3 UPDATE: BETA 3更新:

n=10_000 with [-O] : n = 10_000,[ - O]

Swift:            0.019697268
C:                0.000718064
Swift_sort:       0.002094721

Swift in general is a bit faster and it looks like Swift's built-in sort has changed quite significantly. 一般來說Swift有點快,看起來Swift的內置排序已經發生了很大變化。

FINAL UPDATE: 最終更新:

[-Onone] : [-Onone]

Swift:   0.678056695
C:       0.000973914

[-O] : [-O]

Swift:   0.001158492
C:       0.001192406

[-Ounchecked] : [-Ounchecked]

Swift:   0.000827764
C:       0.001078914

#4樓

TL;DR : Yes, the only Swift language implementation is slow, right now . TL; DR:是的,只有雨燕語言的實現是緩慢的, 就是現在 If you need fast, numeric (and other types of code, presumably) code, just go with another one. 如果您需要快速,數字(以及其他類型的代碼,可能是代碼)代碼,請使用另一個代碼。 In the future, you should re-evaluate your choice. 將來,您應該重新評估您的選擇。 It might be good enough for most application code that is written at a higher level, though. 但是,對於大多數應用程序代碼而言,它可能已經足夠好了。

From what I'm seeing in SIL and LLVM IR, it seems like they need a bunch of optimizations for removing retains and releases, which might be implemented in Clang (for Objective-C), but they haven't ported them yet. 從我在SIL和LLVM IR中看到的情況來看,似乎他們需要一堆優化來刪除保留和釋放,這可能在Clang (針對Objective-C)中實現,但他們還沒有移植它們。 That's the theory I'm going with (for now… I still need to confirm that Clang does something about it), since a profiler run on the last test-case of this question yields this “pretty” result: 這就是我要去的理論(現在......我仍然需要確認Clang對此做了些什麼),因爲在這個問題的最後一個測試用例上運行的探查器產生了這個“漂亮”的結果:

-O3上的時間分析時間分析--Ofast

As was said by many others, -Ofast is totally unsafe and changes language semantics. 正如許多其他人所說的那樣, -Ofast完全不安全並且改變了語言語義。 For me, it's at the “If you're going to use that, just use another language” stage. 對我來說,它是在“如果你打算使用它,只需使用另一種語言”階段。 I'll re-evaluate that choice later, if it changes. 如果它發生變化,我將在稍後重新評估該選擇。

-O3 gets us a bunch of swift_retain and swift_release calls that, honestly, don't look like they should be there for this example. -O3給我們帶來了一堆swift_retainswift_release調用,老實說,看起來他們不應該在這個例子中。 The optimizer should have elided (most of) them AFAICT, since it knows most of the information about the array, and knows that it has (at least) a strong reference to it. 優化器應該將它們(大部分)省略爲AFAICT,因爲它知道有關該數組的大部分信息,並且知道它(至少)有一個強引用。

It shouldn't emit more retains when it's not even calling functions which might release the objects. 當它甚至不調用可能釋放對象的函數時,它不應該發出更多的保留。 I don't think an array constructor can return an array which is smaller than what was asked for, which means that a lot of checks that were emitted are useless. 我不認爲數組構造函數可以返回一個小於所要求的數組,這意味着發出的大量檢查都是無用的。 It also knows that the integer will never be above 10k, so the overflow checks can be optimized (not because of -Ofast weirdness, but because of the semantics of the language (nothing else is changing that var nor can access it, and adding up to 10k is safe for the type Int ). 它也知道整數永遠不會超過10k,因此可以優化溢出檢查(不是因爲-Ofast怪異,而是因爲語言的語義(沒有其他任何改變var也無法訪問它,並且加起來)對於類型Int ),10k是安全的。

The compiler might not be able to unbox the array or the array elements, though, since they're getting passed to sort() , which is an external function and has to get the arguments it's expecting. 但是,編譯器可能無法取消裝入數組或數組元素,因爲它們已經傳遞給sort() ,這是一個外部函數,必須得到它所期望的參數。 This will make us have to use the Int values indirectly, which would make it go a bit slower. 這將使我們必須間接使用Int值,這會使它變得有點慢。 This could change if the sort() generic function (not in the multi-method way) was available to the compiler and got inlined. 如果編譯器可以使用sort()泛型函數(不是以多方法方式)並且內聯,則可能會發生這種情況。

This is a very new (publicly) language, and it is going through what I assume are lots of changes, since there are people (heavily) involved with the Swift language asking for feedback and they all say the language isn't finished and will change. 這是一種非常新的(公開)語言,它正在經歷我認爲的很多變化,因爲有些人(大量)參與Swift語言請求反饋,他們都說語言沒有完成,並且更改。

Code used: 使用的代碼:

import Cocoa

let swift_start = NSDate.timeIntervalSinceReferenceDate();
let n: Int = 10000
let x = Int[](count: n, repeatedValue: 1)
for i in 0..n {
    for j in 0..n {
        let tmp: Int = x[j]
        x[i] = tmp
    }
}
let y: Int[] = sort(x)
let swift_stop = NSDate.timeIntervalSinceReferenceDate();

println("\(swift_stop - swift_start)s")

PS: I'm not an expert on Objective-C nor all the facilities from Cocoa , Objective-C, or the Swift runtimes. PS:我不是Objective-C的專家,也不是Cocoa ,Objective-C或Swift運行時的所有工具。 I might also be assuming some things that I didn't write. 我也可能會假設一些我沒寫過的東西。


#5樓

I decided to take a look at this for fun, and here are the timings that I get: 我決定看看這個很有趣,以下是我得到的時間:

Swift 4.0.2           :   0.83s (0.74s with `-Ounchecked`)
C++ (Apple LLVM 8.0.0):   0.74s

Swift 迅速

// Swift 4.0 code
import Foundation

func doTest() -> Void {
    let arraySize = 10000000
    var randomNumbers = [UInt32]()

    for _ in 0..<arraySize {
        randomNumbers.append(arc4random_uniform(UInt32(arraySize)))
    }

    let start = Date()
    randomNumbers.sort()
    let end = Date()

    print(randomNumbers[0])
    print("Elapsed time: \(end.timeIntervalSince(start))")
}

doTest()

Results: 結果:

Swift 1.1 Swift 1.1

xcrun swiftc --version
Swift version 1.1 (swift-600.0.54.20)
Target: x86_64-apple-darwin14.0.0

xcrun swiftc -O SwiftSort.swift
./SwiftSort     
Elapsed time: 1.02204304933548

Swift 1.2 Swift 1.2

xcrun swiftc --version
Apple Swift version 1.2 (swiftlang-602.0.49.6 clang-602.0.49)
Target: x86_64-apple-darwin14.3.0

xcrun -sdk macosx swiftc -O SwiftSort.swift
./SwiftSort     
Elapsed time: 0.738763988018036

Swift 2.0 Swift 2.0

xcrun swiftc --version
Apple Swift version 2.0 (swiftlang-700.0.59 clang-700.0.72)
Target: x86_64-apple-darwin15.0.0

xcrun -sdk macosx swiftc -O SwiftSort.swift
./SwiftSort     
Elapsed time: 0.767306983470917

It seems to be the same performance if I compile with -Ounchecked . 如果我使用-Ounchecked編譯它似乎是相同的性能。

Swift 3.0 Swift 3.0

xcrun swiftc --version
Apple Swift version 3.0 (swiftlang-800.0.46.2 clang-800.0.38)
Target: x86_64-apple-macosx10.9

xcrun -sdk macosx swiftc -O SwiftSort.swift
./SwiftSort     
Elapsed time: 0.939633965492249

xcrun -sdk macosx swiftc -Ounchecked SwiftSort.swift
./SwiftSort     
Elapsed time: 0.866258025169373

There seems to have been a performance regression from Swift 2.0 to Swift 3.0, and I'm also seeing a difference between -O and -Ounchecked for the first time. 似乎從Swift 2.0到Swift 3.0的性能迴歸,我也看到了-O-Ounchecked之間的第一次差異。

Swift 4.0 Swift 4.0

xcrun swiftc --version
Apple Swift version 4.0.2 (swiftlang-900.0.69.2 clang-900.0.38)
Target: x86_64-apple-macosx10.9

xcrun -sdk macosx swiftc -O SwiftSort.swift
./SwiftSort     
Elapsed time: 0.834299981594086

xcrun -sdk macosx swiftc -Ounchecked SwiftSort.swift
./SwiftSort     
Elapsed time: 0.742045998573303

Swift 4 improves the performance again, while maintaining a gap between -O and -Ounchecked . Swift 4再次提高了性能,同時保持-O-Ounchecked之間的差距。 -O -whole-module-optimization did not appear to make a difference. -O -whole-module-optimization似乎沒有什麼區別。

C++ C ++

#include <chrono>
#include <iostream>
#include <vector>
#include <cstdint>
#include <stdlib.h>

using namespace std;
using namespace std::chrono;

int main(int argc, const char * argv[]) {
    const auto arraySize = 10000000;
    vector<uint32_t> randomNumbers;

    for (int i = 0; i < arraySize; ++i) {
        randomNumbers.emplace_back(arc4random_uniform(arraySize));
    }

    const auto start = high_resolution_clock::now();
    sort(begin(randomNumbers), end(randomNumbers));
    const auto end = high_resolution_clock::now();

    cout << randomNumbers[0] << "\n";
    cout << "Elapsed time: " << duration_cast<duration<double>>(end - start).count() << "\n";

    return 0;
}

Results: 結果:

Apple Clang 6.0 Apple Clang 6.0

clang++ --version
Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.0.0
Thread model: posix

clang++ -O3 -std=c++11 CppSort.cpp -o CppSort
./CppSort     
Elapsed time: 0.688969

Apple Clang 6.1.0 Apple Clang 6.1.0

clang++ --version
Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.3.0
Thread model: posix

clang++ -O3 -std=c++11 CppSort.cpp -o CppSort
./CppSort     
Elapsed time: 0.670652

Apple Clang 7.0.0 Apple Clang 7.0.0

clang++ --version
Apple LLVM version 7.0.0 (clang-700.0.72)
Target: x86_64-apple-darwin15.0.0
Thread model: posix

clang++ -O3 -std=c++11 CppSort.cpp -o CppSort
./CppSort     
Elapsed time: 0.690152

Apple Clang 8.0.0 Apple Clang 8.0.0

clang++ --version
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

clang++ -O3 -std=c++11 CppSort.cpp -o CppSort
./CppSort     
Elapsed time: 0.68253

Apple Clang 9.0.0 Apple Clang 9.0.0

clang++ --version
Apple LLVM version 9.0.0 (clang-900.0.38)
Target: x86_64-apple-darwin16.7.0
Thread model: posix

clang++ -O3 -std=c++11 CppSort.cpp -o CppSort
./CppSort     
Elapsed time: 0.736784

Verdict 判決書

As of the time of this writing, Swift's sort is fast, but not yet as fast as C++'s sort when compiled with -O , with the above compilers & libraries. 截至撰寫本文時,Swift的排序速度很快,但與使用上述編譯器和庫編譯時使用-O編譯時的C ++排序速度相-O With -Ounchecked , it appears to be as fast as C++ in Swift 4.0.2 and Apple LLVM 9.0.0. 使用-Ounchecked ,它似乎與Swift 4.0.2和Apple LLVM 9.0.0中的C ++一樣快。


#6樓

As of Xcode 7 you can turn on Fast, Whole Module Optimization . 從Xcode 7開始,您可以啓用Fast, Whole Module Optimization This should increase your performance immediately. 這應該會立即提高您的表現。

在此輸入圖像描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章