Handling Missing Values

Oracle Data Mining distinguishes between sparse data and data that contains random missing values. The latter means that some attribute values are unknown. Sparse data, on the other hand, contains values that are assumed to be known, although they are not represented in the data.

A typical example of sparse data is market basket data. Out of hundreds or thousands of available items, only a few are present in an individual case (the basket or transaction). All the item values are known, but they are not all included in the basket. Present values may have a quantity, while the items that are not represented are sparse (with a known quantity of zero).

Oracle Data Mining interprets missing data as follows:

  • Missing at random — Missing values in columns with a simple data type (not nested) are assumed to be missing at random.

  • Sparse — Missing values in nested columns indicate sparsity.