Similarity Metric in KATE

Next: Microsoft Belief Networks Up: KATE Previous: KATE

Similarity Metric in KATE

KATE is using a version of the nearest neighbor algorithm for computing similarity metrics. A simplified version of this algorithm is described here. For a further description, see [6]. For a discussion on k -Nearest Neighbor algorithms, see [24].

The similarity between two cases x and y having p features is:

$\begin{displaymath} Similarity(x,y) = -\sqrt{\sum_{i=1}^{p} f(x_{i},y_{i}) } \end{displaymath}$

Where f is defined as:

$\begin{displaymath} f(x_{i},y_{i}) = \left \{ \begin{array} {ll} (x_{i} - y_{i... ...if $x_{i}$, $y_{i}$\space are symbolic} \end{array} \right. \end{displaymath}$

The algorithm is then:


Classified Data = 0
for each Case x in Casebase do
	1. for each y in Classified Data do
	   Sim(y) = Similarity(y,x)
	2. y_max = (y_1,...,y_k) such that Sim(y_k) = max(K-nearest neighbors)
	3. if class(y_max) = class(x)
	   then classification is correct
		Classified Data = Classified Data + {x}
	   else classification is incorrect

Torgeir Dingsoyr
2/26/1998