Specification of an Integrated System

To demonstrate an integrated system, we propose the following algorithm which we will denote CBRDM: A new case description is taken from the user. The CBR engine does a search in the casebase for similar cases. The similarity metric uses a Bayesian network computed from the database for inference such that: If no complete match is found, the most similar case is assumed to be the case which for the features that are different, has the highest probability of having the feature values, given other values and a probability distribution. The procedure of constructing the Bayesian network and Casebase, and an overview of the system is given in Fig 4.4. We will use a database from NOEMIE, OREDA, which is described in chapter 5. Then we use Bayesian Knowledge Discoverer to construct the Bayesian network. The whole process of constructing the casebase and network is explained in the next chapter. We use the Case-Based Reasoning tool KATE to compare with the results obtained from the new Systems. Both KATE and Bayesian Knowledge Discoverer are briefly explained in section 4.5.

**Figure 4.1:** Architecture of the Proposed Integrated System (CBRDM).
$\begin{figure} \begin{center} \scalebox {0.5}{\includegraphics{newsyst.eps}} \end{center}\end{figure}$

To better explain the use of the Bayesian network, we illustrate the new method for computing similarity metrics by an example:

If we then can derive the following probability distribution from a network using Bayesian inference:

Then, hand-crushed would be the most similar, because it has the highest probability, given the other feature values.

**Figure 4.2:** Pseudocode for Proposed System (CBRDM).
$\begin{figure} \begin{center} \begin{verbatim} // Retrieve Casebase CB Cas... ... // Retain Update(Bayesian_network)\end{verbatim} \end{center} \end{figure}$

$\begin{displaymath} Similarity(x,y) = -\sqrt{\sum_{i=1}^{p} f(x_{i},y_{i}) } \end{displaymath}$

$\begin{displaymath} f(x_{i},y_{i}) = \left \{ \begin{array} {ll} (x_{i} - y_{i... ..._{p}) = q$} \\ 1 & \mbox{Otherwise} \end{array} \right. \end{displaymath}$

The strengths of this approach are that the similarity metric is learned from the database, thus it is dynamic and can be updated as new data arrive. It is also easy to control the network for an expert by viewing it in an editor. The tools used in the implementation are described in the next section.

We originally planned to implement algorithms for reuse , revise and retain as well which would make use of the Bayesian network as background knowledge. It was also planned to use Dynamic Link Library (DLL) routines from the tools KATE and Microsoft Belief Networks described in the next section, to reduce implementation time. But this was abandoned due to lack of time and compiler and platform difficulties with the programs available. The program was therefore created with Borland C++ version 3.1 for DOS, and routines for case management and Bayesian inference was implemented. The source code can be found in appendix E.