next up previous contents
Next: Evaluation of Results Up: Data Mining and Influence Previous: Differences in Results

Why Differences Occur

If we define the difference between the query and the retrieved case as the number of features that does not match and does not have the value ``NA'' in the query, we get the list in Tab 5.12. We see that these differences, in the lines ``KATE-Query'' and ``CBRDM-Query'', are never higher than 4, the average is 1.02, and 17 out of 45 are exactly similar. The difference between ``CBRDM-KATE'' is always larger. This indicates that the differences experienced come mostly from other feature values than the ones that are given by the user in the ``new'' case.

Why does this occur? In the similarity metric for CBRDM (see page [*]) we add 1 for each feature value that is similar. If the feature value is not similar, we add the probability (<1) for having the value that is similar, given the rest of the feature values of the case. In that way, if we cannot find an exact hit, we choose the one that has the highest probability of occurring in the future (if we assume that the frequency in the casebase approximates the probability if the number of cases is large).

The difference between the query and the retrieved cases are relatively stable over the different series. But the difference between CBRDM and KATE results vary greatly with the series, as shown in Tab 5.12. This is not surprising, as there are nodes in the Bayesian network in Fig 5.3 that are not connected, and other nodes that have several connections. The similarity metrics with features that have connections will be higher than those without connections.


next up previous contents
Next: Evaluation of Results Up: Data Mining and Influence Previous: Differences in Results
Torgeir Dingsoyr
2/26/1998