data mining notes

兩個對象i和j之間的相異性可以根據不匹配率來計算：

d(i,j) = (p-m)/p;

其中，m是匹配的數目（即i和j取值相同狀態的屬性數), 而p是刻畫對象的屬性總數。

相似性

d(i,j)=1-d(i,j);

對於對稱的二元屬性，每個狀態都同樣重要。基於對稱二元屬性的相異性稱做對稱的二元相異性。

d(i,j)=(r+s)/(q+r+s+t);

非對稱的二元屬性，兩個狀態不是同等重要的，非對稱的二元相異性，負匹配數t被認爲是不重要的，

d(i,j)=(r+s)/(q+r+s);

數值屬性的相異性：euclidean distance, manhattan distance,minkoski distance;

euclidean distance :d(i,j)=sqrt(power((x1-y1),2) + power((x2-y2),2)+power((xn-yn),2));

manhattan distance:d(i,j)=abs(x1-y1)+abs(x2-y2)+abs(xn-yn);

upper distance :produce the max minus value between each dimension of the object

-------------------------------------------------------

weighted euclidean distance

that's d(i,j)=sqrt(power((x1-y1),2)*weight+power((x2-y2),2)*weight+power((xn-yn),2)*weight)

--------------------------------------------------------

So, how can we calculate the dissimilarity of the objects which had mixed attributes .

one method is to group according to the each type of the attribute,then we can proceed

data mining based on the each attribute.however,in real application,each attribute type

which is anabyzed individually can't produce the compatible result

One better way is process all attributes at one time,and only do one analysis.one technology can assemble the different attribute combination in one dissimilarity maxtrix.

and transfer all meaningful attributes to common interval [0.0,1.0]

Assume that the dataset include mixed type attribute amount to p,the dissimilarity between

object i and j will be defined

-------------------------------------------------

the cosine similarity:

s(i,j)=(i*j)/(|i|*|j|)=((x1*y1)+(x2*y2)+(x3*y3)+(xn*yn))/(sqrt(power(x1,2)+power(x2,2)+power(xn,2))*sqrt(power(y1,2)+power(y2,2)+power(yn,2))

---------------------------------------------------

我的友情鏈接

oracle max_seq_calc

oracle shortest_path

unix 腳本

java eight_queue

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結