As shown in Table 7-1, you can identify columns of CHAR,
shorter VARCHAR2
(<=4000), BFILE
, and BLOB
as text attributes. If CHAR
and shorter VARCHAR2
columns are not explicitly identified as unstructured text, then CREATE_MODEL
processes them as categorical attributes. If BFILE
and BLOB
columns are not explicitly identified as unstructured text, then CREATE_MODEL
returns an error.
To identify a column as a text attribute, supply the keyword TEXT
in an attribute specification. The attribute specification is a field (attribute_spec
) in a transformation record (transform_rec
). Transformation records are components of transformation lists (xform_list
) that can be passed to CREATE_MODEL
.
Note:
An attribute specification can also include information that is not related to text. Instructions for constructing an attribute specification are in "Embedding Transformations in a Model" in Transforming the Data.
You can provide transformation instructions for any text attribute by qualifying the TEXT
keyword in the attribute specification with the subsettings described in Table 7-7.
Table 7-7 Attribute-Specific Text Transformation Instructions
Subsetting Name | Description | Example |
---|---|---|
|
Name of an Oracle Text policy object created with |
( |
|
The following values are supported:
|
|
|
Maximum number of features to use from the attribute. |
|
Note:
The TEXT
keyword is only required for CLOB
and longer VARCHAR2
(>4000) when you specify transformation instructions. The TEXT
keyword is always required for CHAR
, shorter VARCHAR2
, BFILE
, and BLOB
— whether or not you specify transformation instructions.
Tip:
You can view attribute specifications in the data dictionary view ALL_MINING_MODEL_ATTRIBUTES
, as shown in Oracle Database Reference.
When stems or themes are specified as the token type, the lexer preference for the text policy must support these types of tokens.
The following example adds themes and English stems to BASIC_LEXER
.
BEGIN CTX_DDL.CREATE_PREFERENCE('my_lexer', 'BASIC_LEXER'); CTX_DDL.SET_ATTRIBUTE('my_lexer', 'index_stems', 'ENGLISH'); CTX_DDL.SET_ATTRIBUTE('my_lexer', 'index_themes', 'YES'); END;
See Also:
DBMS_DATA_MINING.SET_TRANSFORM
in Oracle Database PL/SQL Packages and Types Reference
Example 7-1 A Sample Attribute Specification for Text
This expression specifies that text transformation for the attribute should use the text policy named my_policy
. The token type is THEME
, and the maximum number of features is 3000.
"TEXT(POLICY_NAME:my_policy)(TOKEN_TYPE:THEME)(MAX_FEATURES:3000)"