Next: Bayesian Inference Up: Data Mining Previous: Decomposition

Bayesian Networks

Bayesian networks is a statistical method for Data Mining, a statistical method for discovering valid, novel and potentially useful patterns in data. A description of Probabilistic networks and machine learning techniques is given in [15].

Bayesian networks are used to represent essential information in databases in a network structure. The network consists of edges and vertices, where the vertices are events and the edges relations between events. A simple Bayesian network is illustrated in Fig 3.2, where symptoms are dependent on a disease, and a disease is dependent on age, work and work environment. Introductions to Bayesian networks can be found in [13], [25], [17], [14] and [18].

Bayesian networks are easy to interpret for humans, and are able to store causal relationships , that is, relations between causes and effects. The networks can be used to represent domain knowledge, and it is possible to control inference and produce explanations on a network.

**Figure 3.3:** A Naive Bayesian Network.
$\begin{figure} \begin{center} \scalebox {0.4}{\includegraphics*[02cm,18cm][28cm,28cm]{naiveb.eps}}\end{center}\end{figure}$

A simple usage of Bayesian networks is denoted naive Bayesian classification [15]. These networks consist only of one parent and several child nodes as in Fig 3.3. Classification is done by considering the parent node to be a hidden variable (H in the figure) stating which class (child node) each object in the database should belong to. An existing system using naive Bayesian classification is AutoClass [29].

The theoretical foundation for Bayesian networks is Bayes rule, which states:

$\begin{displaymath} P(H \mid e) = \frac{P(e \mid H)P(H)}{P(e)} \end{displaymath}$

Where H is a hypothesis, and e an event. $P(e \mid H)$ is the posterior probability, and P(H) is the prior probability. A proof of Bayes rule is given in appendix A.

To give a formal definition of Bayesian networks, we introduce some terminology which is taken from [25]:

If a subset of Z nodes in a graph G intercepts all paths between the nodes X and Y (written $\langle X\mid Y\mid Z \rangle_G$ ), then this corresponds to conditional independence between X and Y given Z:

$\begin{displaymath} \langle X\mid Y\mid Z \rangle_G \Rightarrow I(X,Z,Y)_M \end{displaymath}$

conversely:

$\begin{displaymath} I(X,Z,Y)_M \Rightarrow \langle X\mid Y\mid Z \rangle_G \end{displaymath}$

with respect to some dependency model M.

**Figure 3.4:** X is Independent of Y Given Z.
$\begin{figure} \begin{center} \scalebox {0.4}{\includegraphics*[02cm,10cm][27cm,28cm]{indepx.eps}} \end{center}\end{figure}$

A Directed, Acyclic Graph (DAG) D is said to be a I-map of a dependency model M if for every three disjoint sets of vertices, X, Y and Z we have:

$\begin{displaymath} \langle X\mid Y\mid Z \rangle_G \Rightarrow I(X,Z,Y)_M \end{displaymath}$

A DAG is a minimal I-map of M if none of its arrows can be deleted without destroying its I-mapness.

Given a probability distribution P on a set of variables U, a DAG $D=(U,\vec{E})$ is called a Bayesian Network of P iff D is a minimal I-map of P.

A Bayesian network is shown in Fig 3.3, representing the probability distribution P:

$\begin{displaymath} P(x_6 \mid x_5)P(x_5 \mid x_2,x_3)P(x_4 \mid x_2,x_1)P(x_3 \mid x_1)P(x_2 \mid x_1) P(x_1) \end{displaymath}$

**Figure 3.5:** A Bayesian Network Representing the Distribution P.
$\begin{figure} \begin{center} \scalebox {0.4}{\includegraphics*[05cm,11cm][25cm,29cm]{xs.eps}} \end{center}\end{figure}$

Next: Bayesian Inference Up: Data Mining Previous: Decomposition

Torgeir Dingsoyr
2/26/1998