Next: Decomposition
Up: Data Mining
Previous: Data Mining
There are different methods for Data Mining, that can be classified according
to output and characterization of the algorithms. The first two methods
make predictive patterns from data, the last three make informative
patterns. The difference is that the predictive patterns give an educated
guess on the value of a feature, given some known features. The
informative patterns are patterns that are interesting to a domain expert.
Each method is described, and illustrated by a small example.
- Classification - to learn a function that classifies the data into
a set of predefined classes. A bank might want to learn a function saying
if a customer should get a loan or not. Decision trees and Bayesian
classifiers are examples of classification algorithms.
- Regression - learn a function which can describe data with real
values as the output range. A car salesperson might want to know how many
cars she can expect to sell the next month. Linear regression and neural
networks are the most used regression methods.
- Clustering - to find categories of objects which are ``similar''.
A mail-order firm might for instance want to classify its customers into
groups that are likely to purchase different items. Thorough
investigation is required to understand the meaning of the clusters. An
example of clustering is the minimax algorithm.
- Dependency modeling - find structural or quantitative dependencies
between features. A dairy might want to know whether there is a dependency
between the sales of milk and milkshake products, or if the latter compete
with other products. (That is, can a function be derived from the data?) Association rules and Rough Sets are
examples of dependency models.
- Change and deviation analysis - find changes in a database, by
comparing with previously measured or normative values. A supermarket
might want to know if there is a shift in marked towards products that
are considered healthy. The multistrategy tool Explora can be used for
this kind of analysis.
Next: Decomposition
Up: Data Mining
Previous: Data Mining
Torgeir Dingsoyr
2/26/1998