隨機交換內容可以看這:http://blog.csdn.net/lgnlgn/article/details/5936945
其實是直接翻譯作者perl源碼過來的... 作者perl源碼在這:http://www.cs.helsinki.fi/hiit_bru/software/swaps/ 作者用的是self loop的實現方式。
不懂perl不過還是勉強看懂過程,python源碼直接貼出來:
import sys
import random
g = {}
iref = []
jref = []
n = 0
rowc = 0
def swap():
a = random.randint(0, n-1)
b = random.randint(0, n-1)
aj = jref[a]
ai = iref[a]
bj = jref[b]
bi = iref[b]
if g.get((aj, bi)) is None and g.get((bj, ai)) is None:
##delete edges
g.pop((aj , ai))
g.pop((bj , bi))
## add edges
g[(aj, bi)] = a
g[(bj, ai)] = b
## replace $ai with $bi and $bi with $ai
iref[a] = bi
iref[b] = ai
return 1
return 0
def main(dbpath, prefix, iterlen, loop):
f = open(dbpath)
global iref, jref, g , n , rowc
for line in f.xreadlines():
items = line.split()
for item in items:
i = int(item)
g[(rowc, i)] = n
jref.append(rowc)
iref.append(i)
n += 1
rowc += 1
f.close()
swaps = 0
i = 0
size = iterlen * loop
while i <= size:
if i % iterlen == 0:
k = 0
row = []
f = open("%s.%d.dat" %(prefix, i/iterlen) , 'w')
for l in xrange(n+1):
if l < n and k == jref[l]:
row.append(iref[l])
else:
row.sort()
f.write(" ".join(map(str, row)) + "\n")
if l < n:
row = [iref[l]]
k = jref[l]
f.close()
if i >0:
print "%d\t%d\t%.5f\t%.5f" %(i, swaps, (swaps+0.0)/i, (swaps+0.0)/n)
else:
print "0 0 0 0"
swaps += swap()
i += 1
if __name__ == '__main__':
if not len(sys.argv) == 5:
print "usage: EXE dbpath, prefix, iterlen, loops"
else:
main(sys.argv[1], sys.argv[2], int(sys.argv[3]), int(sys.argv[4]) )
變量名沒完全沿用,jref是爲每個item存行號用的,iref存的是item的id, db視爲一個圖結構g,(行號,id)作爲map的key
程序首先讀取數據集進內存,分別存在上述3個空間裏。
迭代進行local swap操作,成功返回1,不成功返回0
當迭代達到iterlen的步數 輸出swap後的db
在小本本上跑的 python代碼和perl代碼 時間差不多,內存python 少一些;也試過pypy1.4,速度更快一些但內存消耗大一些。
作者有c的實現,肯定效率更高,
準備在jung裏試試,圖結構裏直接swap, 另外有向邊的swap也值得嘗試~
-------------------
突然發現原來有個bug ,open("%s.%d.dat" %(prefix, i%iterlen) , 'w')
%應該是/ 。否則一直輸出到0。
java的也實現了一下,速度比Python快一倍多,但如果限制內存頻繁GC,倒也快不了太多