Data attributes are columns in the data set used to build, test, or score a model. Model attributes are the data representations used internally by the model.
Data attributes and model attributes can be the same. For example a column called SIZE
, with values S
, M
, and L
, might be an attribute used by an algorithm to build a model. Internally, the model attribute SIZE
would most likely be the same as the data attribute from which it was derived.
On the other hand, a nested column SALES_PROD
, containing the sales figures for a group of products, would not correspond to a model attribute. The data attribute would be SALES_PROD
, but each product with its corresponding sales figure (each row in the nested column) would be a model attribute.
Transformations also cause a discrepancy between data attributes and model attributes. For example, a transformation could apply a calculation to two data attributes and store the result in a new attribute. The new attribute would be a model attribute that has no corresponding data attribute. Other transformations such as binning, normalization, and outlier treatment, cause the model's representation of an attribute to be different from the data attribute in the case table.
See Also:
Transforming the Data for information about transformations