This preface lists changes in Oracle Data Mining Concepts.
The following are changes in Oracle Data Mining Concepts for Oracle Database 12c Release 1 (12.1).
The following features are new in this release:
New clustering algorithm: Expectation Maximization
In addition to enhanced k-Means and O-Cluster, Oracle Data Mining now supports Expectation Maximization, a probabilistic clustering algorithm that creates a density model of the data. The density model allows for an improved approach to combining data originating in different domains (for example, sales transactions and customer demographics, or structured data and text or other unstructured data).
Because of the probabilistic nature of Expectation Maximization, its cluster assignment probabilities may be more reliable than those produced by k-Means or O-Cluster. Also, the Expectation Maximization algorithm automatically determines the optimal number of clusters needed to model the data.
New feature extraction algorithm: Singular Value Decomposition with Principal Component Analysis
In addition to Non-Negative Matrix Factorization, Oracle Data Mining now supports Singular Value Decomposition and Principal Component Analysis, two powerful feature extraction methods that use orthogonal linear projections to capture the underlying variance of the data. Principal Component Analysis is implemented as a special scoring method for the Singular Value Decomposition algorithm.
Singular Value Decomposition and Principal Component Analysis scale well to very large data sizes (both rows and attributes), and they have a powerful data compression capability. With the introduction of these new methods, Oracle Data Mining extends its feature extraction capabilities to new contexts involving time series, unstructured data, and very large numerical data sets (for example, data from sensors such as Radio Frequency Identification).
Generalized Linear Models enhanced to support feature selection and creation
Generalized Linear Models provide great transparency, which may be achieved at the expense of accuracy. With the introduction of a feature selection and creation capability, Generalized Linear Models can maintain a high degree of accuracy without sacrificing transparency (the ability to explain the predictions made by the model).
Feature selection is the process of selecting the most meaningful attributes. Feature creation is the process of combining attributes into features. With feature selection, Generalized Linear Models can be created with fewer predictors, leading to smaller models and faster scoring. With feature creation, Generalized Linear Models use non-linear terms (up to cubic terms), leading to more powerful models and increased transparency.
Significant enhancements in text mining
This enhancement greatly simplifies the data mining process (model build, deployment and scoring) when unstructured text data is present in the input:
See "Text Data". (See Oracle Data Mining User's Guide for details.)
Prediction details expanded
The PREDICTION_DETAILS
function now supports all predictive algorithms and returns more details about the predictors. New functions, CLUSTER_DETAILS
and FEATURE_DETAILS
, are introduced.
See "In-Database Scoring" for information about the Data Mining SQL functions. (See Oracle Database SQL Language Reference for details.)
Dynamic scoring
The Data Mining SQL functions now support an analytic clause for scoring data dynamically without a pre-defined model.
See "In-Database Scoring" for information about the Data Mining SQL functions. (See Oracle Database SQL Language Reference for details.)
The following features are no longer supported by Oracle. See Oracle Database Upgrade Guide for a complete list of desupported features in this release.
Oracle Data Mining Java API
Adaptive Bayes Network (ABN) algorithm
The following are additional changes in Oracle Data Mining Concepts for 12c Release 1 (12.1):
The single-chapter product overview that was previously in Part I has been divided into two chapters:
The chapter on Predictive Analytics that was previously in Part I has been removed.
This chapter was based on examples that were generated by Oracle Spreadsheet Add-In for Predictive Analytics. The Spreadsheet Add-In is still available for download on the Oracle Technology Network.