The query in Example 6-8 returns the attributes that are most important in predicting the age of customer 100010. The prediction functions apply a GLM regression model named GLMR_SH_Regr_sample
to the data selected from mining_data_apply_v
.
Example 6-8 Prediction Details for Regression
SELECT cust_id, PREDICTION(GLMR_SH_Regr_sample USING *) pr, PREDICTION_DETAILS(GLMR_SH_Regr_sample USING *) pd FROM mining_data_apply_v WHERE CUST_ID = 100010; CUST_ID PR PD ------- ----- ----------- 100010 25.45 <Details algorithm="Generalized Linear Model"> <Attribute name="FLAT_PANEL_MONITOR" actualValue="1" weight=".025" rank="1"/> <Attribute name="OCCUPATION" actualValue="Crafts" weight=".019" rank="2"/> <Attribute name="AFFINITY_CARD" actualValue="0" weight=".01" rank="3"/> <Attribute name="OS_DOC_SET_KANJI" actualValue="0" weight="0" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight="-.004" rank="5"/> </Details>
The query in Example 6-9 returns the customers who work in Tech Support and are likely to use an affinity card (with more than 85% probability). The prediction functions apply an SVM classification model named svmc_sh_clas_sample
. to the data selected from mining_data_apply_v
. The query includes the prediction details, which show that education is the most important predictor.
Example 6-9 Prediction Details for Classification
SELECT cust_id, PREDICTION_DETAILS(svmc_sh_clas_sample, 1 USING *) PD FROM mining_data_apply_v WHERE PREDICTION_PROBABILITY(svmc_sh_clas_sample, 1 USING *) > 0.85 AND occupation = 'TechSup' ORDER BY cust_id; CUST_ID PD ------- --------------------------------------------------------------------------------------- 100029 <Details algorithm="Support Vector Machines" class="1"> <Attribute name="EDUCATION" actualValue="Assoc-A" weight=".199" rank="1"/> <Attribute name="CUST_INCOME_LEVEL" actualValue="I: 170\,000 - 189\,999" weight=".044" rank="2"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".028" rank="3"/> <Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".024" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".022" rank="5"/> </Details> 100378 <Details algorithm="Support Vector Machines" class="1"> <Attribute name="EDUCATION" actualValue="Assoc-A" weight=".21" rank="1"/> <Attribute name="CUST_INCOME_LEVEL" actualValue="B: 30\,000 - 49\,999" weight=".047" rank="2"/> <Attribute name="FLAT_PANEL_MONITOR" actualValue="0" weight=".043" rank="3"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".03" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".023" rank="5"/> </Details> 100508 <Details algorithm="Support Vector Machines" class="1"> <Attribute name="EDUCATION" actualValue="Bach." weight=".19" rank="1"/> <Attribute name="CUST_INCOME_LEVEL" actualValue="L: 300\,000 and above" weight=".046" rank="2"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".031" rank="3"/> <Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".026" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".024" rank="5"/> </Details> 100980 <Details algorithm="Support Vector Machines" class="1"> <Attribute name="EDUCATION" actualValue="Assoc-A" weight=".19" rank="1"/> <Attribute name="FLAT_PANEL_MONITOR" actualValue="0" weight=".038" rank="2"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".026" rank="3"/> <Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".022" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".02" rank="5"/> </Details>
The query in Example 6-10 returns the two customers that differ the most from the rest of the customers. The prediction functions apply an anomaly detection model named SVMO_SH_Clas_sample
to the data selected from mining_data_apply_v
. Anomaly detection uses a one-class SVM classifier.
Example 6-10 Prediction Details for Anomaly Detection
SELECT cust_id, pd FROM (SELECT cust_id, PREDICTION_DETAILS(SVMO_SH_Clas_sample, 0 USING *) pd, RANK() OVER (ORDER BY prediction_probability( SVMO_SH_Clas_sample, 0 USING *) DESC, cust_id) rnk FROM mining_data_one_class_v) WHERE rnk <= 2 ORDER BY rnk; CUST_ID PD ---------- ----------------------------------------------------------------------------------- 102366 <Details algorithm="Support Vector Machines" class="0"> <Attribute name="COUNTRY_NAME" actualValue="United Kingdom" weight=".078" rank="1"/> <Attribute name="CUST_MARITAL_STATUS" actualValue="Divorc." weight=".027" rank="2"/> <Attribute name="CUST_GENDER" actualValue="F" weight=".01" rank="3"/> <Attribute name="HOUSEHOLD_SIZE" actualValue="9+" weight=".009" rank="4"/> <Attribute name="AGE" actualValue="28" weight=".006" rank="5"/> </Details> 101790 <Details algorithm="Support Vector Machines" class="0"> <Attribute name="COUNTRY_NAME" actualValue="Canada" weight=".068" rank="1"/> <Attribute name="HOUSEHOLD_SIZE" actualValue="4-5" weight=".018" rank="2"/> <Attribute name="EDUCATION" actualValue="7th-8th" weight=".015" rank="3"/> <Attribute name="CUST_GENDER" actualValue="F" weight=".013" rank="4"/> <Attribute name="AGE" actualValue="38" weight=".001" rank="5"/> </Details>