確定整數是否在具有已知值集的兩個整數(包括)之間的最快方法

本文翻譯自:Fastest way to determine if an integer is between two integers (inclusive) with known sets of values

Is there a faster way than x >= start && x <= end in C or C++ to test if an integer is between two integers? 在C或C ++中是否有比x >= start && x <= end更快的方法來測試整數是否在兩個整數之間?

UPDATE : My specific platform is iOS. 更新 :我的特定平臺是iOS。 This is part of a box blur function that restricts pixels to a circle in a given square. 這是盒子模糊功能的一部分,它將像素限制爲給定方塊中的圓圈。

UPDATE : After trying the accepted answer , I got an order of magnitude speedup on the one line of code over doing it the normal x >= start && x <= end way. 更新 :在嘗試接受的答案後 ,我在一行代碼上獲得了一個數量級的加速,而不是正常的x >= start && x <= end way。

UPDATE : Here is the after and before code with assembler from XCode: 更新 :這是來自XCode的彙編程序的after和before代碼:

NEW WAY 新方法

// diff = (end - start) + 1
#define POINT_IN_RANGE_AND_INCREMENT(p, range) ((p++ - range.start) < range.diff)

Ltmp1313:
 ldr    r0, [sp, #176] @ 4-byte Reload
 ldr    r1, [sp, #164] @ 4-byte Reload
 ldr    r0, [r0]
 ldr    r1, [r1]
 sub.w  r0, r9, r0
 cmp    r0, r1
 blo    LBB44_30

OLD WAY 老路

#define POINT_IN_RANGE_AND_INCREMENT(p, range) (p <= range.end && p++ >= range.start)

Ltmp1301:
 ldr    r1, [sp, #172] @ 4-byte Reload
 ldr    r1, [r1]
 cmp    r0, r1
 bls    LBB44_32
 mov    r6, r0
 b      LBB44_33
LBB44_32:
 ldr    r1, [sp, #188] @ 4-byte Reload
 adds   r6, r0, #1
Ltmp1302:
 ldr    r1, [r1]
 cmp    r0, r1
 bhs    LBB44_36

Pretty amazing how reducing or eliminating branching can provide such a dramatic speed up. 非常驚人的是如何減少或消除分支可以提供如此驚人的速度。


#1樓

參考:https://stackoom.com/question/19jH2/確定整數是否在具有已知值集的兩個整數-包括-之間的最快方法


#2樓

It depends on how many times you want to perform the test over the same data. 這取決於您希望對同一數據執行測試的次數。

If you are performing the test a single time, there probably isn't a meaningful way to speed up the algorithm. 如果您一次執行測試,可能沒有一種有意義的方法來加速算法。

If you are doing this for a very finite set of values, then you could create a lookup table. 如果您爲一組非常有限的值執行此操作,則可以創建查找表。 Performing the indexing might be more expensive, but if you can fit the entire table in cache, then you can remove all branching from the code, which should speed things up. 執行索引可能會更昂貴,但如果您可以將整個表放在緩存中,那麼您可以從代碼中刪除所有分支,這樣可以加快速度。

For your data the lookup table would be 128^3 = 2,097,152. 對於您的數據,查找表將是128 ^ 3 = 2,097,152。 If you can control one of the three variables so you consider all instances where start = N at one time, then the size of the working set drops down to 128^2 = 16432 bytes, which should fit well in most modern caches. 如果你可以控制三個變量中的一個,那麼你可以考慮一次start = N所有實例,那麼工作集的大小下降到128^2 = 16432字節,這應該適合大多數現代緩存。

You would still have to benchmark the actual code to see if a branchless lookup table is sufficiently faster than the obvious comparisons. 您仍然需要對實際代碼進行基準測試,以查看無分支查找表是否比明顯的比較快得多。


#3樓

There's an old trick to do this with only one comparison/branch. 只用一個比較/分支就可以做到這一點。 Whether it'll really improve speed may be open to question, and even if it does, it's probably too little to notice or care about, but when you're only starting with two comparisons, the chances of a huge improvement are pretty remote. 它是否能真正提高速度可能會受到質疑,即使它確實如此,它可能太少注意或不關心,但當你只是開始兩次比較時,巨大改進的可能性非常小。 The code looks like: 代碼如下:

// use a < for an inclusive lower bound and exclusive upper bound
// use <= for an inclusive lower bound and inclusive upper bound
// alternatively, if the upper bound is inclusive and you can pre-calculate
//  upper-lower, simply add + 1 to upper-lower and use the < operator.
    if ((unsigned)(number-lower) <= (upper-lower))
        in_range(number);

With a typical, modern computer (ie, anything using twos complement), the conversion to unsigned is really a nop -- just a change in how the same bits are viewed. 對於典型的現代計算機(即使用二進制補碼的任何東西),轉換爲無符號實際上是一個不必要的 - 只是改變了相同位的查看方式。

Note that in a typical case, you can pre-compute upper-lower outside a (presumed) loop, so that doesn't normally contribute any significant time. 請注意,在典型情況下,您可以在(假定的)循環外預先計算upper-lower因此通常不會貢獻任何重要時間。 Along with reducing the number of branch instructions, this also (generally) improves branch prediction. 隨着減少分支指令的數量,這也(通常)改進了分支預測。 In this case, the same branch is taken whether the number is below the bottom end or above the top end of the range. 在這種情況下,無論數字是低於底端還是高於範圍的頂端,都會採用相同的分支。

As to how this works, the basic idea is pretty simple: a negative number, when viewed as an unsigned number, will be larger than anything that started out as a positive number. 至於它是如何工作的,基本思路非常簡單:當被視爲無符號數時,負數將大於以正數開頭的任何數字。

In practice this method translates number and the interval to the point of origin and checks if number is in the interval [0, D] , where D = upper - lower . 在實踐中,此方法將number和間隔轉換爲原點,並檢查number是否在區間[0, D] ,其中D = upper - lower If number below lower bound: negative , and if above upper bound: larger than D . 如果number低於下限: 負數 ,如果高於上限: 大於D


#4樓

It's rare to be able to do significant optimizations to code on such a small scale. 能夠對如此小規模的代碼進行重要優化是很少見的。 Big performance gains come from observing and modifying the code from a higher level. 從更高級別觀察和修改代碼可以獲得巨大的性能提升。 You may be able to eliminate the need for the range test altogether, or only do O(n) of them instead of O(n^2). 您可以完全消除對範圍測試的需要,或者僅執行O(n)而不是O(n ^ 2)。 You may be able to re-order the tests so that one side of the inequality is always implied. 您可以重新排序測試,以便始終隱含不平等的一面。 Even if the algorithm is ideal, gains are more likely to come when you see how this code does the range test 10 million times and you find a way to batch them up and use SSE to do many tests in parallel. 即使算法是理想的,當您看到此代碼如何進行1000萬次範圍測試並且您找到一種方法來批量處理並使用SSE並行執行多個測試時,更有可能獲得增益。


#5樓

Is it not possible to just perform a bitwise operation on the integer? 是不是隻能對整數執行按位運算?

Since it has to be between 0 and 128, if the 8th bit is set (2^7) it is 128 or more. 由於它必須在0到128之間,如果第8位被設置(2 ^ 7),則它是128或更多。 The edge case will be a pain, though, since you want an inclusive comparison. 然而,邊緣情況將是一個痛苦,因爲你想要一個包容性的比較。


#6樓

This answer is to report on a testing done with the accepted answer. 這個答案是報告用已接受的答案完成的測試。 I performed a closed range test on a large vector of sorted random integer and to my surprise the basic method of ( low <= num && num <= high) is in fact faster than the accepted answer above! 我對一個排序隨機整數的大向量進行了一個封閉範圍測試,令我驚訝的是(低<= num && num <=高)的基本方法實際上比上面接受的答案更快! Test was done on HP Pavilion g6 (AMD A6-3400APU with 6GB ram. Here's the core code used for testing: 在HP Pavilion g6(AMD A6-3400APU,6GB內存)上進行了測試。以下是用於測試的核心代碼:

int num = rand();  // num to compare in consecutive ranges.
chrono::time_point<chrono::system_clock> start, end;
auto start = chrono::system_clock::now();

int inBetween1{ 0 };
for (int i = 1; i < MaxNum; ++i)
{
    if (randVec[i - 1] <= num && num <= randVec[i])
        ++inBetween1;
}
auto end = chrono::system_clock::now();
chrono::duration<double> elapsed_s1 = end - start;

compared with the following which is the accepted answer above: 與以下是上面接受的答案相比:

int inBetween2{ 0 };
for (int i = 1; i < MaxNum; ++i)
{
    if (static_cast<unsigned>(num - randVec[i - 1]) <= (randVec[i] - randVec[i - 1]))
        ++inBetween2;
}

Pay attention that randVec is a sorted vector. 注意randVec是一個有序矢量。 For any size of MaxNum the first method beats the second one on my machine! 對於任何大小的MaxNum,第一種方法勝過我機器上的第二種方法!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章