七種bit count快速計算方法比較

轉自：http://blog.chinaunix.net/u/13991/showart_115947.html http://idning.iteye.com/blog/732769

代碼：http://infolab.stanford.edu/~manku/bitcount/bitcount.c

Fast Bit Counting Routines

Compiled from various sources by Gurmeet Singh Manku

A common problem asked in job interviews is to count the number of bits that are on in an unsigned integer. Here are seven solutions to this problem. Source code in C is available.

Iterated Count

   int bitcount (unsigned int n)  
   {
      int count=0;    
      while (n)
      {
         count += n & 0x1u ;
         n >>= 1 ;
      }
      return count ;
   }

Sparse Ones

   int bitcount (unsigned int n)  
   {  
      int count=0 ;
      while (n)
      {
         count++ ;
         n &= (n - 1) ;
      }
      return count ;
   }

Dense Ones

   int bitcount (unsigned int n)     
   {
      int count = 8 * sizeof(int) ;
      n ^= (unsigned int) -1 ;
      while (n)
      {
         count-- ;
         n &= (n - 1) ;
      }
      return count ;
   }

Precompute_8bit

   // static int bits_in_char [256] ;           
   
   int bitcount (unsigned int n)
   {
      // works only for 32-bit ints
    
      return bits_in_char [n         & 0xffu]
          +  bits_in_char [(n >>  8) & 0xffu]
          +  bits_in_char [(n >> 16) & 0xffu]
          +  bits_in_char [(n >> 24) & 0xffu] ;
   }

Iterated Count runs in time proportional to the total number of bits. It simply loops through all the bits, terminating slightly earlier because of the while condition. Useful if 1's are sparse and among the least significant bits.Sparse Ones runs in time proportional to the number of 1 bits. The line n &= (n - 1) simply sets the rightmost 1 bit in n to 0. Dense Ones runs in time proportional to the number of 0 bits. It is the same as Sparse Ones, except that it first toggles all bits (n ~= -1), and continually subtracts the number of 1 bits from sizeof(int).Precompute_8bit assumes an array bits_in_char such that bits_in_char[i] contains the number of 1 bits in the binary representation for i. It repeatedly updates count by masking out the last eight bits in n, and indexing into bits_in_char.

Precompute_16bit

  // static char bits_in_16bits [0x1u << 16] ;      

  int bitcount (unsigned int n)
  {
     // works only for 32-bit ints
    
     return bits_in_16bits [n         & 0xffffu]
         +  bits_in_16bits [(n >> 16) & 0xffffu] ;
  }

Precompute_16bit is a variant of Precompute_8bit in that an array bits_in_16bits[] stores the number of 1 bits in successive 16 bit numbers (shorts).

Parallel Count

  #define TWO(c)     (0x1u << (c))
  #define MASK(c)    (((unsigned int)(-1)) / (TWO(TWO(c)) + 1u))
  #define COUNT(x,c) ((x) & MASK(c)) + (((x) >> (TWO(c))) & MASK(c))
     
  int bitcount (unsigned int n)
  {
     n = COUNT(n, 0) ;
     n = COUNT(n, 1) ;
     n = COUNT(n, 2) ;
     n = COUNT(n, 3) ;
     n = COUNT(n, 4) ;
     /* n = COUNT(n, 5) ;    for 64-bit integers */
     return n ;
  }

Parallel Count carries out bit counting in a parallel fashion. Consider n after the first line has finished executing. Imagine splitting n into pairs of bits. Each pair contains the number of ones in those two bit positions in the original n. After the second line has finished executing, each nibble contains thenumber of ones in those four bits positions in the original n. Continuing this for five iterations, the 64 bits contain the number of ones among these sixty-four bit positions in the original n. That is what we wanted to compute.

Nifty Parallel Count

 #define MASK_01010101 (((unsigned int)(-1))/3)
 #define MASK_00110011 (((unsigned int)(-1))/5)
 #define MASK_00001111 (((unsigned int)(-1))/17)

 int bitcount (unsigned int n)
 {
    n = (n & MASK_01010101) + ((n >> 1) & MASK_01010101) ; 
    n = (n & MASK_00110011) + ((n >> 2) & MASK_00110011) ; 
    n = (n & MASK_00001111) + ((n >> 4) & MASK_00001111) ; 
    return n % 255 ;
 }

Nifty Parallel Count works the same way as Parallel Count for the first three iterations. At the end of the third line (just before the return), each byte of n contains the number of ones in those eight bit positions in the original n. A little thought then explains why the remainder modulo 255 works.

MIT HACKMEM Count

 int bitcount(unsigned int n)                          
 {
    /* works for 32-bit numbers only    */
    /* fix last line for 64-bit numbers */
    
    register unsigned int tmp;
    
    tmp = n - ((n >> 1) & 033333333333)
            - ((n >> 2) & 011111111111);
    return ((tmp + (tmp >> 3)) & 030707070707) % 63;
 }

MIT Hackmem Count is funky. Consider a 3 bit number as being 4a+2b+c. If we shift it right 1 bit, we have 2a+b. Subtracting this from the original gives 2a+b+c. If we right-shift the original 3-bit number by two bits, we get a, and so with another subtraction we have a+b+c, which is the number of bits in the original number. How is this insight employed? The first assignment statement in the routine computestmp. Consider the octal representation of tmp. Each digit in the octal representation is simply the number of 1's in the corresponding three bit positions in n. The last return statement sums these octal digits to produce the final answer. The key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63. This is accomplished by right-shifting tmp by three bits, adding it to tmp itself and ANDing with a suitable mask. This yields a number in which groups of six adjacent bits (starting from the LSB) contain the number of 1's among those six positions in n. This number modulo 63 yields the final answer. For 64-bit numbers, we would have to add triples of octal digits and use modulus 1023. This is HACKMEM 169, as used in X11 sources. Source: MIT AI Lab memo, late 1970's.

     No Optimization         Some Optimization       Heavy Optimization

  Precomp_16 52.94 Mcps    Precomp_16 76.22 Mcps    Precomp_16 80.58 Mcps  
   Precomp_8 29.74 Mcps     Precomp_8 49.83 Mcps     Precomp_8 51.65 Mcps
    Parallel 19.30 Mcps      Parallel 36.00 Mcps      Parallel 38.55 Mcps
         MIT 16.93 Mcps           MIT 17.10 Mcps         Nifty 31.82 Mcps
       Nifty 12.78 Mcps         Nifty 16.07 Mcps           MIT 29.71 Mcps
      Sparse  5.70 Mcps        Sparse 15.01 Mcps        Sparse 14.62 Mcps
       Dense  5.30 Mcps        Dense  14.11 Mcps         Dense 14.56 Mcps
    Iterated  3.60 Mcps      Iterated  3.84 Mcps      Iterated  9.24 Mcps

  Mcps = Million counts per second

Which of the several bit counting routines is the fastest? Results of speed trials on an i686 are summarized in the table on left. "No Optimization" was compiled with plain gcc. "Some Optimizations" was gcc -O3. "Heavy Optimizations" corresponds to gcc -O3 -mcpu=i686 -march=i686 -fforce-addr -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4.
Thanks to Seth Robertson who suggested performing speed trials by extending bitcount.c. Seth also pointed me to MIT_Hackmem routine. Thanks to Denny Gursky who suggested the idea of Precompute_11bit. That would require three sums (11-bit, 11-bit and 10-bit precomputed counts). I then tried Precompute_16bit which turned out to be even faster.

If you have niftier solutions up your sleeves, please send me an e-mail

問題1：

 #define MASK_01010101 (((unsigned int)(-1))/3)
 #define MASK_00110011 (((unsigned int)(-1))/5)
 #define MASK_00001111 (((unsigned int)(-1))/17)

這裏的(unsigned int) (-1) /3爲啥是01010101的樣子？？？

做了下實驗:

>>> bin(int('11111111',2)/3)
'0b1010101'

>>> bin(int('11111111',2)/5)
'0b110011'

這個只是一個小技巧，具體在紙上除一下：

001

101 | 11111111
101

00110

101 | 11111111
101

101

問題2：

n % 255 ;

n%63

這裏的%63 是什麼作用？？

1. 假設最後結果n爲:
000111 001111
   b     a
n = b*64+a
= 63b + (a+b)
所以
n%63 = [63b + (a+b)]%63
     = 63b % 63 + (a+b) % 63   根據模的性質（(a%m + b%m)%m = (a+b)%m）
   = (a+b)
2. 假設結果n爲:
000011 000111 001111
   c     b      a

n = c*64² + b*64 + a
= c*(64²-1+1) + 64b + a
   = c*(64²-1) + c + 64b + a
   = c*(64-1)(64+1) + c + 64b + a
   = c*65*63 + 63b + (a + b + c )
所以 n%63 = a+b+c
3. 現在我們看64⁴, 64⁵ ...
64⁴= (64⁴-1 +1) = (64⁴ -1 ) + 1
而(64⁴ - 1) 一定可以分解爲(64⁴ - 1) *... ，必然能被63整除.
所以n % 63 = n的64進制各個數位上的數字之和.
這也解釋了爲什麼必須是63, 當數字是用64進製表示的時候，就只能選擇64-1 = 63

模的基本性質:
   (a + b) % n = (a % n + b % n) % n            （1）
   (a - b) % n = (a % n - b % n) % n            （2）
   (a * b) % n = (a % n * b % n) % n            （3）

StevenIsSnail

發佈了114 篇原創文章 · 獲贊 15 · 訪問量 29萬+

私信關注

七種bit count快速計算方法比較

Fast Bit Counting Routines

問題1：

問題2：

電子科技大學計算機科學與技術就讀體驗

Golang爬蟲代理接入的技術與實踐

軟件系統結構與開發環境

【總結】JAVA多線程與併發學習總結分析

[multi]set/map/table/hash 及海量數據相關問題

【並行計算】Linus：爲何對象引用計數必須是原子的

網絡協議棧和tcpdump抓包練習

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結