Oracle® Data Mining Application Developer's Guide, 10g Release 2 (10.2) Part Number B14340-01 |
|
|
View PDF |
This chapter will assist you in converting your data mining applications from the 10.1 proprietary Java API to the standard-compliant Java API available with Oracle 10g Release 2 (10.2).
See Also:
JSR-000073 Data Mining API page of the Java Community Process Web Site at http://jcp.org/aboutJava/communityprocess/final/jsr073
JDM 1.0 javadoc at http://www.oracle.com/technology/products/bi/odm
Oracle Data Mining Java API Reference (ODM 10.2 javadoc)
This chapter includes the following topics:
The new ODM Java API available with Oracle 10g Release 2 (10.2)is standardized under the Java Community Process and is fully compliant with the JDM 1.0 standard. Oracle supports open standards for Java and is one of the primary vendors that implements JDM.
The ODM 10.2 JDM-based API replaces the proprietary Java API for data mining that was available with Oracle 10.1.
Note:
The proprietary Java API is no longer supported in ODM 10.2.If you have created applications in 10.1 and you want to use them in your Oracle 10.2 installation, you must convert them to use the 10.2 API.
Table 8-1 lists the major differences between the ODM 10.1 and ODM 10.2 Java APIs.
Table 8-1 Differences Between Oracle 10.1 and 10.2 Java APIs for Data Mining
Feature | ODM 10.1 Java API | ODM 10.2 Java API |
---|---|---|
Standards |
Oracle proprietary Java API designed for accessing data mining functionality in the Database. Not supported in Oracle 10.2. |
Java industry standard API defined under Java Community Process (JCP). ODM 10.2 implements conformant subsets of the standard along with Oracle proprietary extensions. |
Not interoperable with models created by the PL/SQL API. |
Interoperable with PL/SQL API. All objects created using the ODM 10.2 Java API can be used with the PL/SQL API. Results and values are consistent with the PL/SQL API. |
|
Functions and algorithms |
Classification function
Clustering function
Regression function
Association function
Attribute Importance function
Feature Extraction function
|
Classification function
Clustering function
Regression function
Association function
Attribute Importance function
Feature Extraction function
|
Object creation |
Primarily designed as Java classes. Objects are instantiated using constructors or static |
Uses the factory method pattern to instantiate objects. |
Task execution |
Tasks executed by
Asynchronous task execution implemented by |
Tasks executed by
Asynchronous task execution implemented by |
Data |
Supports both physical and logical data representations. Supports transactional and non-transactional format. Transactional format enables sparse data representation and wide data (>1000 columns) |
Supports only physical data representation. Logical data can be represented with database views. Supports nested tables in place of transactional format. |
Settings for model building |
Settings for model building created by |
Settings for model building created by Settings are saved as a table in the user's schema. The name of the |
Model |
Models represented by The |
Models represented by The |
Cost matrix |
Cost matrix represented by Cost matrix for all classification algorithms is specified at build time, even though the cost matrix is used as a post-processing step to the apply operation. |
Cost matrix represented by Cost matrix for the decision tree algorithm is specified at build time. All other classification algorithms are specified with apply and test operations. |
Model detail |
Model details not represented as an object. Model details are stored with the associated model object. |
Model details represented by |
Apply settings |
Apply settings represented by |
Apply settings represented by |
Results object |
Mining results represented by |
Mining results are not explicit objects. Each task creates either a Java object or a database object such as a table. |
Transformations |
Supports automated data preparation. Provides utility methods for external and embedded data preparation. |
Does not support automated transformations. The transformation task |
Text transformation |
Supports text data types, such as CLOB and BLOB, for SVM and NMF. No explicit text transformations are provided. |
Supports explicit text transformations. These can be used with any algorithm to emulate text data type support. |
Most objects in the ODM 10.2 API are similar to the objects in the ODM 10.1 API. However, there are some major differences in class names, package structures, and object usage. Some of the primary differences are:
In 10.1, all primary objects are created using constructors or create
methods. In 10.2, objects are created using object factories, as described in "Connection Factory" and "Features of a DMS Connection".
In 10.1, DMS metadata-related operations are distributed in each class. In 10.2, most DMS metadata-related operations are centralized in a Connection
object. For example, a mining task is restored in 10.1 with the MiningTask.restore
method and in 10.2 with the Connection.retrieveObject
method.
In 10.1, all named objects are persisted in the database. In 10.2, PhysicalDataSet
and ApplySettings
are transient objects.
Note:
Although the ODM 10.1 Java API is incompatible with Oracle 10.2, future releases will follow the backward compatibility scheme proposed by the JDM standard.Table 8-2 provides sample code for performing various mining operations in both 10.1 and 10.2. Refer to Chapter 6 for additional 10.2 code samples.
Table 8-2 Sample Code from 10.1 and 10.2 ODM Java APIs
ODM 10.1 Java API | ODM 10.2 Java API |
---|---|
Connect to the DMS |
Connect to the DMS |
//Create a DMS object DataMiningServer m_dms = new DataMiningServer ( "put DB URL here", //JDBC URL "user name", //User Name "password" //Password ); //Login to the DMS and create a DMS Connection m_dmsConn = m_dms.login(); |
//Create ConnectionFactory & connection OraConnectionFactory m_dmeConnFactory = new OraConnectionFactory(); ConnectionSpec connSpec = m_dmeConnFactory.getConnectionSpec(); connSpec.setURI( "put DB URL here" ); connSpec.setName( "user name" ); connSpec.setPassword( "password" ); m_dmeConn = m_dmeConnFactory.getConnection( connSpec ); |
Create a PhysicalDataSpecification |
Create and Save PhysicalDataSet |
LocationAccessData lad = new LocationAccessData ( "MINING_DATA_BUILD_V", //Table/view Name "DMUSER" //Schema Name ); PhysicalDataSpecification pds = newNonTransactionalDataSpecification (lad); |
PhysicalDataSetFactory pdsFactory = ( PhysicalDataSetFactory )m_dmeConn.getFactory ( "javax.datamining.data.PhysicalDataSet" ); m_paFactory = ( PhysicalAttributeFactory ) m_dmeConn.getFactory ( "javax.datamining.data.PhysicalAttribute" ); PhysicalDataSet buildData = m_pdsFactory.create ( "MINING_DATA_BUILD_V",false ); PhysicalAttribute pa =m_paFactory.create ( "cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); buildData.addAttribute( pa ); m_dmeConn.saveObject( "nbBuildData", buildData, true ); |
Create and Save MiningFunctionSettings |
Create BuildSettings |
NaiveBayesSettings nbAlgo = new NaiveBayesSettings (0.01f, 0.01f); ClassificationFunctionSettings mfs = ClassificationFunctionSettings.create ( m_dmsConn, //DMS Connection nbAlgo, //NB algorithm settings pds, //Build data specification "AFFINITY_CARD", //Target column AttributeType.categorical, //Attribute type DataPreparationStatus.unprepared ); //Set Cust_ID attribute as inactive mfs.adjustAttributeUsage( new String[]{"CUST_ID"}, AttributeUsage.inactive ); mfs.store( m_dmsConn,"NBDemo_MFS" ); |
m_clasFactory = ( ClassificationSettingsFactory ) m_dmeConn.getFactory ( "javax.datamining.supervised.classification. ClassificationSettings" ); m_nbFactory = ( NaiveBayesSettingsFactory ) m_dmeConn.getFactory ("javax.datamining.algorithm.naivebayes. NaiveBayesSettings"); //Create NB algorithm settings NaiveBayesSettings nbAlgo = m_nbFactory.create(); nbAlgo.setPairwiseThreshold( 0.01f ); nbAlgo.setSingletonThreshold( 0.01f ); //Create ClassificationSettings ClassificationSettings buildSettings = m_clasFactory.create(); buildSettings.setAlgorithmSettings(nbAlgo); buildSettings.setTargetAttributeName ( "affinity_card"); m_dmeConn.saveObject ("nbBuildSettings",buildSettings,true); |
Create and Execute MiningBuildTask |
Create and Execute BuildTask |
MiningBuildTask buildTask = new MiningBuildTask ( pds, //Build data specification "NBDemo_MFS", //Mining function settings "NBDemo_Model" //Mining model name ); //Store the taskbuild buildTask.store( m_dmsConn,"NBDemoBuildTask" ); Task.execute( m_dmsConn ); //Wait for completion of the task MiningTaskStatus taskStatus = buildTask.waitForCompletion( m_dmsConn ); |
m_buildFactory = ( BuildTaskFactory ) m_dmeConn.getFactory ( "javax.datamining.task.BuildTask" ); BuildTask buildTask = m_buildFactory.create ( "nbBuildData", //Build data specification "nbBuildSettings", //build settings name "nbModel" //Mining model namem_dme ); Conn.saveObject( "nbBuildTask", taskObj, true ); ExecutionHandle execHandle = m_dmeConn.execute( taskName ); ExecutionStatus status = execHandle.waitForCompletion( Integer.MAX_VALUE); |
Retrieve MiningModel |
Retrieve Model |
NaivebayesModel model = ( NaiveBayesModel ) SupervisedModel.restore ( m_dmeConn, "NBDemo_Model" ); |
ClassificationModel model = ( ClassificationModel ) m_dmeConn.retrieveObject ( "nbModel", NamedObject.model ); |
Evaluate the Model |
Evaluate the Model |
//Compute accuracy & confusionmatrix LocationAccessData lad = new LocationAccessData ( "MINING_DATA_TEST_V", "DMUSER" ); //Schema Name PhysicalDataSpecification pds = new NonTransactionalDataSpecification( lad ); ClassificationTestTask testTask = new ClassificationTestTask ( pds,"NBDemo_Model", "NBDemo_TestResults" ); testTask.store( m_dmsConn, "NBDemoTestTask" ); testTask.execute( m_dmsConn ); MiningTaskStatus taskStatus = testTask.waitForCompletion( m_dmsConn ); ClassificationTestResult testResult = ClassificationTestResult.restore ( m_dmsConn, "NBDemo_TestResults" ); float accuracy = testResult.getAccuracy(); CategoryMatrix confusionMatrix = TestResult.getConfusionMatrix(); //Compute lift Category positiveCategory = new Category ( "Positive value", "1",DataType.intType ); MiningLiftTask liftTask = new MiningLiftTask ( pds, 10, //Number of quantiles to be used positiveCategory, /positive target value "NBDemo_Model", // model to be tested "NBDemo_LiftResults" //Lift results name ); liftTask.store( m_dmsConn, "NBDemoLiftTask" ); liftTask.execute( m_dmsConn ); MiningTaskStatus taskStatus = liftTask.waitForCompletion( m_dmsConn ); MiningLiftResult liftResult = MiningLiftResult.restore ( m_dmsConn,"NBDemo_LiftResults" ); |
//Compute accuracy, confusion matrix, lift & roc PhysicalDataSet testData = m_pdsFactory.create ( "MINING_DATA_TEST_V", false ); PhysicalAttribute pa = m_paFactory.create ( "cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); testData.addAttribute( pa ); m_dmeConn.saveObject ( "nbTestData", testData, true ); ClassificationTestTask testTask = m_testFactory.create ( "nbTestData", "nbModel", "nbTestMetrics" ); testTask.setNumberOfLiftQuantiles( 10 ); testTask.setPositiveTargetValue( new Integer(1) ); m_dmeConn.saveObject( "nbTestTask", testTask, true ); ExecutionHandle execHandle = m_dmeConn.execute("nbTestTask"); ExecutionStatus status = execHandle.waitForCompletion ( Integer.MAX_VALUE ); ClassificationTestMetrics testMetrics = ( ClassificationTestMetrics ) m_dmeConn.retrieveObject ( "nbTestMetrics", NamedObject.testMetrics ); Double accuracy = testMetrics.getAccuracy(); ConfusionMatrix confusionMatrix = testMetrics.getConfusionMatrix(); Lift lift = testMetrics.getLift(); ReceiverOperatingCharacterics roc = testMetrics.getROC(); |
Apply the Model |
Apply the Model |
LocationAccessData lad = new LocationAccessData ( "MINING_DATA_APPLY_V", "DMUSER"); PhysicalDataSpecification pds = new NonTransactionalDataSpecification( lad ); MiningApplyOutput mao = MiningApplyOutput.createDefault(); MiningAttribute srcAttribute = new MiningAttribute ( "CUST_ID", DataType.intType, AttributeType.notApplicable ); Attribute destAttribute = new Attribute ("CUST_ID", DataType.intType); ApplySourceAttributeItem m_srcAttrItem = new ApplySourceAttributeItem ( srcAttribute,destAttribute); mao.addItem(m_srcAttrItem); LocationAccessData outputTable = new LocationAccessData ( "NBDemo_Apply_Output", "DMUSER"); MiningApplyTask applyTask = new MiningApplyTask ( pds, //test data specification "NBDemo_Model", //Input model name mao, //MiningApplyOutput object outputTable, //Apply output table "NBDemo_ApplyResults" //Apply results ); applyTask.store( m_dmsConn, "NBDemoApplyTask" ); applyTask.execute( m_dmsConn ); MiningTaskStatus taskStatus = applyTask.waitForCompletion( m_dmsConn ); |
PhysicalDataSet applyData = m_pdsFactory.create ( "MINING_DATA_APPLY_V", false ); PhysicalAttribute pa = m_paFactory.create ( "cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); applyData.addAttribute( pa ); m_dmeConn.saveObject( "nbApplyData",applyData,true ); ClassificationApplySettings clasAS = m_applySettingsFactory.create(); m_dmeConn.saveObject( "nbApplySettings",clasAS,true ); DataSetApplyTask applyTask = m_dsApplyFactory.create ( "nbApplyData", "nbModel", "nbApplySettings", "nb_apply_output" ); m_dmeConn.saveObject ( "nbApplyTask", applyTask, true ); ExecutionHandle execHandle = m_dmeConn.execute( "nbApplyTask" ); ExecutionStatus status = execHandle.waitForCompletion( Integer.MAX_VALUE ); |