兩個對象i和j之間的相異性可以根據不匹配率來計算:
d(i,j) = (p-m)/p;
其中,m是匹配的數目(即i和j取值相同狀態的屬性數), 而p是刻畫對象的屬性總數。
相似性
d(i,j)=1-d(i,j);
對於對稱的二元屬性,每個狀態都同樣重要。基於對稱二元屬性的相異性稱做對稱的二元相異性。
d(i,j)=(r+s)/(q+r+s+t);
非對稱的二元屬性,兩個狀態不是同等重要的,非對稱的二元相異性,負匹配數t被認爲是不重要的,
d(i,j)=(r+s)/(q+r+s);
數值屬性的相異性:euclidean distance, manhattan distance,minkoski distance;
euclidean distance :d(i,j)=sqrt(power((x1-y1),2) + power((x2-y2),2)+power((xn-yn),2));
manhattan distance:d(i,j)=abs(x1-y1)+abs(x2-y2)+abs(xn-yn);
upper distance :produce the max minus value between each dimension of the object
-------------------------------------------------------
weighted euclidean distance
that's d(i,j)=sqrt(power((x1-y1),2)*weight+power((x2-y2),2)*weight+power((xn-yn),2)*weight)
--------------------------------------------------------
So, how can we calculate the dissimilarity of the objects which had mixed attributes .
one method is to group according to the each type of the attribute,then we can proceed
data mining based on the each attribute.however,in real application,each attribute type
which is anabyzed individually can't produce the compatible result
One better way is process all attributes at one time,and only do one analysis.one technology can assemble the different attribute combination in one dissimilarity maxtrix.
and transfer all meaningful attributes to common interval [0.0,1.0]
Assume that the dataset include mixed type attribute amount to p,the dissimilarity between
object i and j will be defined
-------------------------------------------------
the cosine similarity:
s(i,j)=(i*j)/(|i|*|j|)=((x1*y1)+(x2*y2)+(x3*y3)+(xn*yn))/(sqrt(power(x1,2)+power(x2,2)+power(xn,2))*sqrt(power(y1,2)+power(y2,2)+power(yn,2))
---------------------------------------------------