Missing Value Treatment in Oracle Data Mining

Missing value treatment depends on the algorithm and on the nature of the data (categorical or numerical, sparse or missing at random). Missing value treatment is summarized in Table 3-2.

Note:

Oracle Data Mining performs the same missing value treatment whether or not Automatic Data Preparation is being used.


Table 3-2 Missing Value Treatment by Algorithm

Missing Data EM, GLM, NMF, k-Means, SVD, SVM DT, MDL, NB, OC Apriori

NUMERICAL missing at random

The algorithm replaces missing numerical values with the mean.

For EM, the replacement only occurs in columns that are modeled with Gaussian distributions.

The algorithm handles missing values naturally as missing at random.

The algorithm interprets all missing data as sparse.

CATEGORICAL missing at random

GLM, NMF, k-Means, and SVM replaces missing categorical values with the mode.

SVD does not support categorical data.

EM does not replace missing categorical values. EM treats NULLs as a distinct value with its own frequency count.

The algorithm handles missing values naturally as missing random.

The algorithm interprets all missing data as sparse.

NUMERICAL sparse

The algorithm replaces sparse numerical data with zeros.

O-Cluster does not support nested data and therefore does not support sparse data. DT, MDL, and NB and replace sparse numerical data with zeros.

The algorithm handles sparse data.

CATEGORICAL sparse

All algorithms except SVD replace sparse categorical data with zero vectors. SVD does not support categorical data.

O-Cluster does not support nested data and therefore does not support sparse data. DT, MDL, and NB replace sparse categorical data with the special value DM$SPARSE.

The algorithm handles sparse data.