Next: Construction of Bayesian Network Up: Results Previous: The OREDA Database

Construction of Casebase

The OREDA database contains around 33000 objects, and several hundred features. Some of the features are shown in Tab 5.1, which were extracted by Sintef for use in the NOEMIE project with the SQL search given in appendix B.

**Table 5.1:** Some of the Features in OREDA.
FAILURE_EVENT.F_ID	M_ID	INVENTORY.I_ID	FM_CODE
FD_NO	FAILURE_EVENT.REM_CODE	FC_CODE	OM_CODE
FAILURE_EVENT.EC_CODE	F_DETECTED_DATE	F_DETECTED_DAY	F_DETECTED_month
F_DETECTED_year	MAIN_EVENT.SU_CODE	MAC_CODE	MC_CODE
MAIN_EVENT.MC_CODE	MAIN_EVENT.EC_CODE	M_MAINT_DATE	M_MAINT_DAY
M_MAINT_MONTH	M_MAINT_YEAR	M_MEC_MANHOUR	M_EL_MANHOUR
M_INST_MANHOUR	M_PATF_PERS	M_ACTIVE_MAINT	M_DOWNTIME
INVENTORY_OWNER_ID	INVENTORY.INST_ID	OP_CODE	INVENTOR.EE_CODE
DC_CODE	INVENTORY.EC.CODE	I_CHECKED_BY	I_CHECKED_DATE
I_TAG_NO	I_SURV_START_DATE	I_SURV_START_DATE	I_SURV_END_DATE
I_SURV_END_DAY	I_SURV_END_MONTH	I_SURV_END_YEAR	I_OPER_TIME
I_OPER_TIME_CODE	I_NO_OF_STARTS	I_INSTALLED_DATE	I_INSTALLED_DAY
I_INSTALLED_MONTH	I_INSTALLED_YEAR	I_SCRAPPED_AT_END	M_OTHER_MANHOUR
M_RES_DRILL_RIG	M_RES_DIVING_VESSEL	M_RES_SERVICE_VESSEL	M_RES_DIVERS

For our use, we reduced the number of objects and features. This was done to:

Experiment with only one inventory item. OREDA contains several different types of items like compressors, pumps, etc. that have little or nothing in common. We here focus on compressors.
Bayesian Knowledge Discoverer is extremely slow when the number of features and possible feature values is large. It also has a tendency to crash when the number of features is large.

We selected features that are relevant for compressors according to advises from an OREDA database expert.

The values COMP in the column INVENTORY.EC_CODE was replaced by #COMP# to be able to select only the compressors with the following Unix command:


cat oreda332.txt | grep #COMP# > comp.txt

This produced 4646 objects.

The database was then imported into a text editor which could handle large files, where it was prepared in Rosetta format by including a new line with statements if the features are strings, integers or floats. Missing values were given the value ``Undefined''. The data was then imported into the Data Mining tool Rosetta [23] for manual inspection. We found that the feature MC_CODE had only one value ``CORRECT'' so it was removed. This gave the features that are described in Tab 5.2. A description of some of the codes used as feature values is given in appendix G.

**Table 5.2:** Description of Features Used.
Feature	Description
INVENTORY_INST_ID	Installation identification. Unique number.
FM_CODE	Failure mode.
FD_NO	Failure description number.
FAILURE_EVENT_REM_CODE	Failure remark.
FC_CODE	Failure consequence code.
MAIN_EVENT_SU_CODE	Subunit code.
M_DOWNTIME	Downtime in hours.
DC_CODE	Design class code.
I_OPER_TIME	Operational time in hours.

The ranges of the features M_DOWNTIME, I_OPER_TIME and FD_NO were reduced in Rosetta, and the ranges of the features FC_CODE and FM_CODE were reduced with the perl script given in appendix C. This produced the casebase used in experiments, with the features in Tab 5.2.

Next: Construction of Bayesian Network Up: Results Previous: The OREDA Database

Torgeir Dingsoyr
2/26/1998