next up previous contents
Next: Microsoft Belief Networks Up: KATE Previous: KATE

Similarity Metric in KATE

KATE is using a version of the nearest neighbor algorithm for computing similarity metrics. A simplified version of this algorithm is described here. For a further description, see [6]. For a discussion on k -Nearest Neighbor algorithms, see [24].

The similarity between two cases x and y having p features is:

\begin{displaymath}
Similarity(x,y) = -\sqrt{\sum_{i=1}^{p} f(x_{i},y_{i}) }
\end{displaymath}

Where f is defined as:

\begin{displaymath}
f(x_{i},y_{i}) = \left \{ \begin{array}
{ll}
 (x_{i} - y_{i...
 ...if $x_{i}$, $y_{i}$\space are symbolic}
 \end{array}
\right.
\end{displaymath}

The algorithm is then:


Classified Data = 0
for each Case x in Casebase do
	1. for each y in Classified Data do
	   Sim(y) = Similarity(y,x)
	2. y_max = (y_1,...,y_k) such that Sim(y_k) = max(K-nearest neighbors)
	3. if class(y_max) = class(x)
	   then classification is correct
		Classified Data = Classified Data + {x}
	   else classification is incorrect



Torgeir Dingsoyr
2/26/1998