調和平均數的公式[1]:
定義很簡單,具體有什麼應用價值呢,網上的博客沒有說。[2]是研究dash視頻傳輸的,就是怎麼根據預測的帶寬,去請求合適的碼率。而根據歷史數據預測帶寬,harmonic mean就有了用武之地。文中說這個方法能夠有效濾除異常值。
First, the harmonic mean is more appropriate when we want to compute the average of rates which is the case with throughput estimation. Second, it is also more robust to larger outliers.
python實現:
import os
class HarmnicMean(object):
def __init__(self,window):
self.w=window
self.c=0
self.his=[]
def newSample(self,s):
mean=0.0
sample=float(s)
if self.c==0:
mean=sample
if sample>0:
self.his.append(1000/sample)
self.c+=1
if self.c>self.w:
a=self.his[self.c-self.w:]
self.his=a
if len(self.his)!=self.w:
print "error"
self.c=self.w
if self.c<self.w:
mean=self.c*1000/sum(self.his)
if self.c==self.w:
mean=self.w*1000/sum(self.his)
return mean
h=HarmnicMean(20)
fileName="data_in.txt"
f_h=open("data_out.txt",'w')
with open(fileName) as txtData:
for line in txtData.readlines():
lineArr = line.strip().split()
x=lineArr[0]
y=float(lineArr[2])
mean=h.newSample(y)
f_h.write(x+"\t"+str(mean)+"\n")
f_h.close()
看看處理效果如何:
[1] Harmonic mean
[2] Improving Fairness, Efficiency, and Stability in HTTP-based Adaptive Video Streaming with FESTIVE