Providing a Masking Format to Define a Column

When you create a masking definition ("Masking with an Application Data Model and Workloads"), you will be either importing a format or selecting one from the available types in the Define Column Mask page. Format entry options are as follows:

  • Array List

    The data type of each value in the list must be compatible with that of the masked column. Uniqueness must be guaranteed if needed. For example, for a unique key column that already has 10 distinct values, the array list should also contain at least 10 distinct values.

  • Delete

    Deletes the specified rows as identified by the condition clauses. If a column includes a delete format for one of its conditions, a foreign key constraint or a dependent column cannot refer to the table.

  • Encrypt

    Encrypts column data by specifying a regular expression. The column values in all the rows must match the regular expression. This format can be used to mask data consistently across databases. That is, for a given value it always generates the same masked value.

    For example, the regular expression [(][1-9][0-9]{2}[)][_][0-9]{3}[-][0-9]{4} generates U.S. phone numbers such as (123) 456-7890.

    This format supports a subset of the regular expression language. It supports encrypting strings of fixed widths. However, it does not support * or + syntax of regular expressions.

    If a value does not match the format specified, the encrypted value may no longer produce one-to-one mappings. All non-confirming values are mapped to a single encrypted value, thereby producing a many-to-one mapping.

  • Fixed Number

    The type of column applicable to this entry is a NUMBER column or a STRING column. For example, if you mask a column that has a social security number, one of the entries can be Fixed Number 900. This format is combinable.

  • Fixed String

    The type of column applicable to this entry is a STRING column. For example, if you mask a column that has a License Plate Number, one of the entries can be Fixed String CA. This format is combinable.

  • Null Value

    Masks the column using a value of NULL. The column must be nullable.

  • Post-Processing Function

    This is a special function that you can apply to the mask value that the masking engine generates. This function takes the mask value as input and returns the actual mask value to be used for masking.

    The post-processing function is called after the mask value is generated. You can use it, for instance, to add commas or dollar signs to a value. For example, if a mask value is a number such as 12000, the post processing function can modify this to $12,000. Another use is for adding checksums or special encodings for the mask value that is produced.

    In the following statement:

    Function post_proc_udf_func (rowid varchar2, column_name varchar2, mask_value varchar2) returns varchar2;
    
    • rowid is the min (rowid) of the rows that contains the value mask_value 3rd argument.

    • column_name is the name of the column being masked.

    • mask_value is the value being masked.

  • Preserve Original Data

    Retains the original values for rows that match the specified condition clause. This is used in cases where some rows that match a condition do not need to be masked.

  • Random Dates

    The uniqueness of the Date column is not maintained after masking. This format is combinable.

  • Random Decimal Numbers

    If used as part of a mixed random string, these have limited usage for generating unique values. This masking format generates unique values within the specified range. For example, a starting value of 5.5 and ending value of 9.99 generates a decimal number ranging from 5.5 to 9.99, both inclusive. This masking format is combinable.

  • Random Digits

    This format generates unique values within the specified range. For example, for a random digit with a length of [5,5], an integer between [0, 99999] is randomly generated, left padded with '0's to satisfy the length and uniqueness requirement. This is a complementary type of random number, which will not be padded. When using random digits, the random digit pads to the appropriate length in a string. It does not pad when used for a number column. This format is combinable.

    Data masking ensures that the generated values are unique, but if you do not specify enough digits, you could run out of unique values in that range.

  • Random Numbers

    If used as part of a mixed random string, these have limited usage for generating unique values. This format generates unique values within the specified range. For example, a starting value of 100 and ending value of 200 generates an integer number ranging from 100 to 200, both inclusive. Note that Oracle Enterprise Manager release 10.2.0.4.0 does not support float numbers. This format is combinable.

  • Random Strings

    This format generates unique values within the specified range. For example, a starting length of 2 and ending length of 6 generates a random string of 2 - 6 characters in length. This format is combinable.

  • Regular Expression

    This format enables you to use regular expressions to search for sensitive data in LOBs (BLOB, CLOB, NCLOB) and replace the data with a fixed string, a fixed number, null, or SQL Expression. Use rules to search for multiple strings within a LOB.

    Examples:

    • Use the regular expression [0-9]{3}[.][0-9]{3}[.][0-9]{4} to match strings of the format nnn.nnn.nnnn and replace with a masked value, for example, ***.***.****

    • Use the regular expression <SALARY>[0-9]{2,6}</SALARY> to zero out salary information by replacing with <SALARY>0</SALARY>

    • Use the regular expression [A-Z]+@[A-Z]+\.[A-Z]{2,4} to mask e-mail addresses by replacing with john.doe@acme.com

    You can also use this format with data type VARCHAR2 to mask part of a string.

  • Shuffle

    This format randomly shuffles the original column data. It maintains data distribution except when a column is conditionally masked and its values are not unique.

    For more information, see "Using the Shuffle Format".

  • Substitute

    This format uses a hash-based substitution for the original value and always yields the same mask value for any given input value. Specify the substitution masking table and column. This format has the following properties:

    • The masked data is not reversible. That is, this format is not vulnerable to external security breaches because the original value is replaced, so it is not possible to retrieve the original value from the mask value.

    • Masking multiple times with a hash substitute across different databases yields the same mask value. This characteristic is valid across multiple databases or multiple runs assuming that the same substitution values are used in the two runs. That is, the actual rows and values in the substitution table do not change. For example, suppose the two values Joe and Tom were masked to Henry and Peter. When you repeat the same mask on another database using the same substitution table, if there were Bob and Tom, they might be replaced with Louise and Peter. Notice that even though the two runs have different data, Tom is always replaced with Peter.

    • This format does not generate uniqueness.

  • SQL Expression

    This masking format enables you to enter a SQL Expression for masking a column. Data masking uses this expression to generate masked values to replace the original values. You cannot combine a column using this masking format type with other masking format types, such as Random Numbers or Random Strings.

    The SQL Expression can consist of one or more values, operators, and SQL functions that evaluates to a value. It can also contain substitution columns (columns from the same table as the masked column). You should specify substitution columns within percent signs (%). Use SQL Expressions with dbms_lob and other user-defined functions for LOB (BLOB, CLOB, NCLOB) masking.

    Examples:

    • dbms_random.string('u', 8) || '@company.com'

      Generates random e-mail addresses.

    • %first_name% || '.' || %last_name% || '@company.com'

      Generates e-mail addresses using first_name and last_name column values. In this example, first_name and last_name are the substitution columns.

    • CLOB Masking

      dbms_lob.empty_clob()

      Empties the CLOB.

      custom_mask_clob(%CLOB_COL%)

      Applies the custom mask function to the clob column CLOB_COL.

    • Conditional Mask

      (case when %PARTY_TYPE%='PERSON' then %PERSON_FIRST_NAME%|| ' ' ||%PERSON_LAST_NAME% else (select dbms_random.string('U', 10) from dual) end)

      Columns within %% are present in the same table. The expression masks PERSON_FULL_NAME with the first and last name; otherwise, the mask is a random string.

    • Substitution Mask

      select MASK_ZIPCODE from data_mask.DATA_MASK_ADDR where ADDR_SEQ = ora_hash( %ZIPCODE% , 1000, 1234)

      Select 1000 rows in the substitution table data_mask.DATA_MASK_ADDR. Mask %ZIPCODE% with the MASK_ZIPCODE column in the substitution table. The row selected is dependent on ora_hash and is deterministic in this case. Selection is random if dbms_random procedures are used.

  • Substitute

    This format uses a hash-based substitution for the original value and always yields the same mask value for any given input value. Specify the substitution masking table and column. This format has the following properties:

    • The masked data is not reversible. That is, this format is not vulnerable to external security breaches, because the original value is replaced, so it is not possible to retrieve the original value from the mask value.

    • Masking multiple times with a hash substitute across different databases yields the same mask value. This characteristic is valid across multiple databases or multiple runs assuming that the same substitution values are used in the two runs. That is, the actual rows and values in the substitution table do not change. For example, suppose the two values Joe and Tom were masked to Henry and Peter. When you repeat the same mask on another database using the same substitution table, if there were Bob and Tom, they might be replaced with Louise and Peter. Notice that even though the two runs have different data, Tom is always replaced with Peter.

    • This format does not guarantee uniqueness.

  • Substring

    Substring is similar to the database substr function. The start position can be either a positive or a negative integer. For example, if the original string is abcd, a substring with a start position of 2 and length of 3 generates a masked string of bcd. A substring with start position of -2 and length of 3 generates a masked string of cd. This format is combinable.

  • Table Column

    A table column enables you to select values from the chosen column as the replacement value or part thereof. The data type and uniqueness must be compatible. Otherwise, a failure occurs when the job runs. This format is combinable.

  • Truncate

    Truncates all rows in a table. If one of the columns in a table is marked as truncated, the entire table is truncated, so no other mask formats can be specified for any of the other columns. If a table is being truncated, it cannot be referred to by a foreign key constraint or a dependent column.

  • User Defined Function

    The data type and uniqueness of the output values must be compatible with the original output values. Otherwise, a failure occurs when the job runs.

    In the following statement:

    Function udf_func (rowid varchar2, column_name varchar2, original_value varchar2) returns varchar2;
    
    • rowid is the min (rowid) of the rows that contain the value original_value 3rd argument.

    • column_name is the name of the column being masked.

    • original_value is the value being masked.