Swift Beta 性能:排序數組 - Swift Beta performance: sorting arrays

問題:

I was implementing an algorithm in Swift Beta and noticed that the performance was very poor.我在 Swift Beta 中實現了一個算法,發現性能很差。 After digging deeper I realized that one of the bottlenecks was something as simple as sorting arrays.在深入挖掘之後,我意識到瓶頸之一就是排序數組一樣簡單。 The relevant part is here:相關部分在這裏:

let n = 1000000
var x =  [Int](repeating: 0, count: n)
for i in 0..<n {
    x[i] = random()
}
// start clock here
let y = sort(x)
// stop clock here

In C++, a similar operation takes 0.06s on my computer.在 C++ 中,類似的操作在我的電腦上需要0.06 秒

In Python, it takes 0.6s (no tricks, just y = sorted(x) for a list of integers).在 Python 中,它需要0.6 秒(沒有技巧,對於整數列表只需 y = sorted(x))。

In Swift it takes 6s if I compile it with the following command:在 Swift 中,如果我使用以下命令編譯它需要6 秒

xcrun swift -O3 -sdk `xcrun --show-sdk-path --sdk macosx`

And it takes as much as 88s if I compile it with the following command:如果我使用以下命令編譯它,則需要多達88 秒

xcrun swift -O0 -sdk `xcrun --show-sdk-path --sdk macosx`

Timings in Xcode with "Release" vs. "Debug" builds are similar. “發佈”與“調試”構建在 Xcode 中的時間是相似的。

What is wrong here?這裏有什麼問題? I could understand some performance loss in comparison with C++, but not a 10-fold slowdown in comparison with pure Python.與 C++ 相比,我可以理解一些性能損失,但與純 Python 相比,速度不會降低 10 倍。


Edit: weather noticed that changing -O3 to -Ofast makes this code run almost as fast as the C++ version!編輯:天氣注意到將-O3更改爲-Ofast使此代碼的運行速度幾乎與 C++ 版本一樣快! However, -Ofast changes the semantics of the language a lot — in my testing, it disabled the checks for integer overflows and array indexing overflows .然而, -Ofast改變了語言的語義——在我的測試中,它禁用了對整數溢出和數組索引溢出的檢查 For example, with -Ofast the following Swift code runs silently without crashing (and prints out some garbage):例如,使用-Ofast下面的 Swift 代碼可以靜默運行而不會崩潰(並打印出一些垃圾):

let n = 10000000
print(n*n*n*n*n)
let x =  [Int](repeating: 10, count: n)
print(x[n])

So -Ofast is not what we want;所以-Ofast不是我們想要的; the whole point of Swift is that we have the safety nets in place. Swift 的全部意義在於我們已經建立了安全網。 Of course, the safety nets have some impact on the performance, but they should not make the programs 100 times slower.當然,安全網對性能有一些影響,但它們不應該使程序慢 100 倍。 Remember that Java already checks for array bounds, and in typical cases, the slowdown is by a factor much less than 2. And in Clang and GCC we have got -ftrapv for checking (signed) integer overflows, and it is not that slow, either.請記住,Java 已經檢查了數組邊界,在典型情況下,速度下降的係數遠小於 2。在 Clang 和 GCC 中,我們有-ftrapv用於檢查(有符號)整數溢出,而且速度並沒有那麼慢,任何一個。

Hence the question: how can we get reasonable performance in Swift without losing the safety nets?因此,問題是:我們如何在不失去安全網的情況下在 Swift 中獲得合理的性能?


Edit 2: I did some more benchmarking, with very simple loops along the lines of編輯 2:我做了一些更多的基準測試,沿着非常簡單的循環

for i in 0..<n {
    x[i] = x[i] ^ 12345678
}

(Here the xor operation is there just so that I can more easily find the relevant loop in the assembly code. I tried to pick an operation that is easy to spot but also "harmless" in the sense that it should not require any checks related to integer overflows.) (這裏的異或操作只是爲了我可以更容易地在彙編代碼中找到相關的循環。我試圖選擇一個易於發現但“無害”的操作,因爲它不需要任何相關檢查到整數溢出。)

Again, there was a huge difference in the performance between -O3 and -Ofast .同樣, -O3-Ofast之間的性能存在巨大差異。 So I had a look at the assembly code:所以我看了一下彙編代碼:

  • With -Ofast I get pretty much what I would expect.使用-Ofast我得到了我所期望的。 The relevant part is a loop with 5 machine language instructions.相關部分是一個帶有 5 條機器語言指令的循環。

  • With -O3 I get something that was beyond my wildest imagination.使用-O3我得到了超出我最瘋狂想象的東西。 The inner loop spans 88 lines of assembly code.內部循環跨越 88 行彙編代碼。 I did not try to understand all of it, but the most suspicious parts are 13 invocations of "callq _swift_retain" and another 13 invocations of "callq _swift_release".我沒有試圖理解所有內容,但最可疑的部分是“callq _swift_retain”的 13 次調用和另外 13 次“callq _swift_release”的調用。 That is, 26 subroutine calls in the inner loop !也就是說,內循環中有 26 個子程序調用


Edit 3: In comments, Ferruccio asked for benchmarks that are fair in the sense that they do not rely on built-in functions (eg sort).編輯 3:在評論中,Ferruccio 要求提供公平的基準,因爲它們不依賴於內置函數(例如排序)。 I think the following program is a fairly good example:我認爲下面的程序是一個很好的例子:

let n = 10000
var x = [Int](repeating: 1, count: n)
for i in 0..<n {
    for j in 0..<n {
        x[i] = x[j]
    }
}

There is no arithmetic, so we do not need to worry about integer overflows.沒有算術,所以我們不需要擔心整數溢出。 The only thing that we do is just lots of array references.我們唯一要做的就是大量的數組引用。 And the results are here—Swift -O3 loses by a factor almost 500 in comparison with -Ofast:結果就在這裏——與 -Ofast 相比,Swift -O3 損失了近 500 倍:

  • C++ -O3: 0.05 s C++ -O3: 0.05 秒
  • C++ -O0: 0.4 s C++ -O0:0.4 秒
  • Java: 0.2 s Java: 0.2 秒
  • Python with PyPy: 0.5 s帶有 PyPy 的 Python:0.5 秒
  • Python: 12 s蟒蛇: 12 秒
  • Swift -Ofast: 0.05 s Swift -Ofast:0.05 秒
  • Swift -O3: 23 s斯威夫特-O3: 23 秒
  • Swift -O0: 443 s Swift -O0:443 秒

(If you are concerned that the compiler might optimize out the pointless loops entirely, you can change it to eg x[i] ^= x[j] , and add a print statement that outputs x[0] . This does not change anything; the timings will be very similar.) (如果您擔心編譯器可能會完全優化無意義的循環,您可以將其更改爲例如x[i] ^= x[j] ,並添加輸出x[0]的打印語句。這不會改變任何內容;時間將非常相似。)

And yes, here the Python implementation was a stupid pure Python implementation with a list of ints and nested for loops.是的,這裏的 Python 實現是一個愚蠢的純 Python 實現,帶有一個整數列表和嵌套的 for 循環。 It should be much slower than unoptimized Swift.它應該比未優化雨燕慢得多 Something seems to be seriously broken with Swift and array indexing. Swift 和數組索引似乎嚴重破壞了某些東西。


Edit 4: These issues (as well as some other performance issues) seems to have been fixed in Xcode 6 beta 5.編輯 4:這些問題(以及其他一些性能問題)似乎已在 Xcode 6 beta 5 中修復。

For sorting, I now have the following timings:對於排序,我現在有以下時間:

  • clang++ -O3: 0.06 s鐺++ -O3:0.06 秒
  • swiftc -Ofast: 0.1 s swiftc -Ofast:0.1 秒
  • swiftc -O: 0.1 s swiftc -O: 0.1 秒
  • swiftc: 4 s swiftc:4 秒

For nested loops:對於嵌套循環:

  • clang++ -O3: 0.06 s鐺++ -O3:0.06 秒
  • swiftc -Ofast: 0.3 s swiftc -Ofast:0.3 秒
  • swiftc -O: 0.4 s swiftc -O: 0.4 秒
  • swiftc: 540 s swiftc:540 秒

It seems that there is no reason anymore to use the unsafe -Ofast (aka -Ounchecked );似乎沒有理由再使用不安全的-Ofast (又名-Ounchecked ); plain -O produces equally good code.普通-O產生同樣好的代碼。


解決方案:

參考一: https://stackoom.com/question/1d7xO
參考二: Swift Beta performance: sorting arrays
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章