1 Introduction to Oracle Data Masking and Subsetting

The Data Masking and Subsetting features of the Enterprise Manager for Oracle Database Plug-in help you to securely manage test data.

When performing real-world testing, there is a risk of exposing sensitive data to non-production users in a test environment. The test data management features of Oracle Database helps to minimize this risk by enabling you to perform data masking and data subsetting on the test data.

When production data is copied into a testing environment, there is the risk of breaching sensitive information to non-production users, such as application developers or external consultants. In order to perform real-world testing, these non-production users need to access some of the original data, but not all the data, especially when the information is deemed confidential.

Oracle Database offers test data management features that help reduce this risk by enabling you to:

  • Store the list of applications, tables, and relationships between table columns using Application Data Modeling, and sensitive columns. For more information, see Application Data Modeling.

  • Replace sensitive data from your production system with fictitious data that can be used during testing using data masking. For more information, see Data Masking.
  • Replicate information that pertains only to a particular site using data subsetting. For more information, see Data Subsetting.

Note:

You must have the Oracle Data Masking and Subsetting Pack license to use these security features.

The Need to Mask and Subset Data

Enterprises share data from their production applications with other users for a variety of business needs.

  • Most organizations, if not all, copy production data into test and development environments to allow system administrators to test upgrades, patches, and fixes.

  • Businesses to stay competitive require new and improved functionality in existing production applications. As a result, application developers require an environment mimicking close to that of production, to build and test the new functionality ensuring that the existing functionality does not break.

  • Retail companies share customer point-of–sale data with market researchers to analyze customer buying patterns.

  • Pharmaceutical or health care organizations share patient data with medical researchers to assess the efficiency of clinical trials or medical treatments.

As a result, organizations copy tens of millions of sensitive customer and consumer data to non-production environments and very few companies do anything to protect this data, even while sharing with outsourcing and third-party agencies.

Numerous industry studies on data privacy have concluded that companies do not prevent sensitive data from reaching the hands of wrong-doers. Almost 1 out of 4 companies responded that live data had been lost or stolen, and 50% said that they had no way of knowing if the data in non-production environment had been compromised.

Protecting vital company information in non-production environment has become one of the foremost critical tasks over the recent years. With Oracle Data Masking and Subsetting, sensitive and valuable information can be replaced with realistic values. This allows production data to be safely used for development, testing, outsourcing with partners and off-shore partners or other non-production purposes.

Components

This section discusses the components of Oracle Data Masking and Subsetting:

Application Data Modeling

Application Data Modeling (ADM) simplifies the effort of sensitive data discovery through automated discovery procedures and sensitive column types from Oracle database tables. The automated discovery procedures not only discover columns holding sensitive information, but also identify the referential or parent-child relationships between these columns defined in the database.

A sensitive column type creates forensics of sensitive data elements such as national insurance numbers using combination of data patterns in column names, column data, and column comments. Automated discovery procedures leverage sensitive column types to sample data in the database table columns while scanning the database for sensitive information.

Application Data Modeling provides various out-of-the-box sensitive columns types such as credit card numbers, social security numbers, phone number, etc. Custom–sensitive column types can be easily created using regular expressions. With Oracle Data Masking Pack, and the Data Discovery and Modeling capability in Oracle Enterprise Manager, enterprises can define data pattern search criteria allowing database administrators to locate these sensitive elements. For example, data pattern such as 15 or 16 digits for credit card numbers or 9 digit formatted US social security numbers.

Oracle Data Masking and Subsetting also allows you to import and export Application Data Models.

For more information, see the Application Data Modeling chapter.

Data Masking Format Library

Oracle Data Masking and Subsetting provides security administrators the ability to maintain a centralized definition for common masking formats. This centralized definition ensures that database administrators apply the same masking rules across the entire enterprise regardless of the database and where the sensitive data resides. Thus, organizations and enterprises can ensure that the sensitive data is masked consistently and in compliance with the established standards for data sharing.

The data masking format library contains a centralized set of data masking format for commonly used sensitive data. For example, there are mask formats for phone numbers, social security numbers, credit card numbers, and fictitious first and last names. Security administrators can define these mask formats and maintain them in the format library to be used by the DBAs across the entire enterprise.

The Data Masking format library provides:

  • several built-in out-of-the-box masking formats and primitives that can be used by the Data Masking definitions

  • Built-in mask primitives that can be combined to create additional formats

  • User-defined SQL or PL/SQL functions for handling complex formats

The Data Masking Library also includes the following primitives:

  • Fixed Numbers

  • Fixed Strings

  • Random Strings

  • Random Digits

  • Random Numbers

  • Random Dates

Data Masking Transformations

Oracle Data Masking and Subsetting provides sophisticated masking transformations. If masking formats are considered as building blocks of a data masking definition, then masking transformations align these masking formats according to the varied business requirements. Oracle Data Masking and Subsetting provides the following data masking transformations:

  • Conditional Masking: provides the ability to arrange masking formats according to different conditions. For example, consider masking a column containing unique person identifiers. Person identifiers that belong to United States of America are masked using Social Security Number format and the person identifiers that belong to United Kingdom are masked using the National Insurance Number format.

  • Compound Masking (also known as grouping option): masks related columns as a group to ensure that the data being masked across the related columns retain the same relationship. For example, consider masking the address fields such as city, state, and postal codes. The values in these address fields will be consistent even after the masking operation is completed.

  • Deterministic/Consistent Masking: generates consistent outputs for a given input across various databases. This transformation is helpful to maintain data integrity across multiple applications and also to preserve system integrity in a single sign-on environment. For example, consider three applications: a human capital management application, a customer relationship management application, and a sales data warehouse. These applications might have a common field such as EMPLOYEE ID that must be masked consistently. The Substitute and Encrypt masking formats provide deterministic masking transformation.

  • Shuffle: allows the fields within a column to be shuffled in a random manner. This transformation is helpful in breaking the one-to-one mapping between sensitive data elements. For example, columns containing personal health records can be shuffled while masking health care information.

  • Key-based reversible masking (also known as Encrypt Format): encrypts and decrypts the original data using a secure key string, while preserving the format of the input data. This transformation uses the 3DES algorithm, and is helpful when businesses need to mask and send their data to a third-party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third–party, the original data can be recovered using the same key string that was used to encrypt the data.

  • Format Preserving Randomization (also known as Auto Mask Format): randomizes the data, preserving the input length, position of the characters and numbers, case of the character (upper or lower), and special characters in the input.

Data Subsetting Techniques

Oracle Data Masking and Subsetting simplifies subsetting through its easy-to-define goal-based and condition-based subsetting techniques.

  • Goal-based subsetting: Data is subsetted based on goals. A goal can be a relative table size. For example, extracting 1% subset of a table containing 1 million rows.

  • Condition-based subsetting: Data is subsetted based on conditions. The conditions are specified using “SQL where clause”. The “SQL where clause” supports bind variables.

    A condition can be based on time. For example, discarding all user records created prior to a particular year.

    A condition can be based on region. For example, extracting information pertaining to Asia Pacific for a new application development.

Application Templates

Oracle Data Masking Application Templates deliver pre-identified sensitive columns, their relationships, and industry-standard best practice masking techniques out-of-the box for packaged applications such as Oracle E-Business Suite and Oracle Fusion Applications. Use the Self Update feature to get the latest masking and subsetting templates available from Oracle.

Architecture

Oracle Data Masking and Subsetting architecture consists of a two-tier framework. The Cloud Control UI provides a graphical user interface for administrators to handle all management tasks. These client components can be installed locally or brought up with a Web browser. The Oracle Enterprise Manager framework is comprised of Oracle Management Server and a database repository.

The Management Server is the core of the Enterprise Manager framework. It provides administrative user accounts, processes management functions such as jobs and events, and manages the flow of information between the Cloud Control UI and the nodes with Enterprise Management Agents. The Oracle Enterprise Manager Agent communicates with the Oracle Management Server and performs tasks sent by Cloud Control UI and other client applications.

The Enterprise Management Agent is installed on each monitored host and is responsible for monitoring all of the targets running on those hosts, communicating that information to the Oracle Management Server, and managing and maintaining the hosts and its targets.

Figure 1-1 Architecture of the Oracle Data Masking and Subsetting

DMS_architecture.png diagram provides an overview of the architecture used in Oracle Data Masking and Subsetting.

Execution Methods

Oracle Data Masking and Subsetting provides two modes for masking and subsetting data:

  • In-Database

  • In-Export

In-Database

In-Database mode directly masks and subsets the data within a non-production database with minimal or no impact on production environments. Since the In-Database masking and subsetting mode permanently changes the data in a database, it is strongly recommended to use this mode for non-production environments such as staging, test, and development databases rather than using it on production databases.

In-Export

In-Export mode masks and subsets the data in near real-time while extracting the data from a database. The extracted masked and subsetted data is written to data pump export files that are further imported into test, development, or QA databases. In general, In-Export mode is used for production databases.

Heterogeneous

Oracle Data Masking and Subsetting also supports data masking of data from any non-Oracle database, such as IBM DB2, Microsoft SQL Server, and Sybase.

Oracle Data Masking and Subsetting uses the DB links and the gateways to read data from non-Oracle production databases, copy the data to an Oracle-based staging environment, mask in the Oracle staging environment, and then write to the non-Oracle test databases.

It uses the same masking techniques that are used in the Oracle databases for masking the data.

Note:

Oracle Data Masking and Subsetting does not support automatic data discovery for non-Oracle databases, and also does not perform referential integrity for non-Oracle databases.

Methodology

Oracle Data Masking and Subsetting uses the following methodology to secure non-production database and replaces sensitive data with fictitious, but yet relevant data that meets compliance requirements.

  • Creating an Application Data Model— Discover sensitive data and data relationships, and then create or assign an Application Data Model

  • Selecting Masking Formats and Criteria— Create data masking definition and masking format types and templates based on the sensitive data that is discovered.

  • Previewing and Validating — Secure sensitive data by previewing the masking algorithm results and the subset reduction results

  • Executing Masking Transformations— Execute In-Database or In-Export masking and subsetting transformations and validate the data that is masked.

The following figure describes the methodology used in Oracle Data Masking and Subsetting.

Figure 1-2 Methodology

The DMS_methodology.png explains the 4–step methodology that Oracle Data Masking and Subsetting uses.

Workflow

The following diagram explains the Oracle Data Masking and Subsetting workflow.

Figure 1-3 Oracle Data Masking and Subsetting Workflow

The DMS_workflow.png explains the Oracle Data Masking and Subsetting workflow in a flow chart format. The three important steps of the Data Masking and Subsetting workflow are a) creating an Application Data Model b) creating a data masking definition and c) creating a data subsetting definition.

The following steps describe the Oracle Data Masking and Subsetting Workflow:

  1. Create an Application Data Model — To begin using Oracle Data Masking and Subsetting, you must create an Application Data Model (ADM). ADMs capture application metadata, referential relationships, and discover sensitive data from the source database.

  2. Create a Data Masking Definition — After an ADM is created, the next step is to create a data masking definition. A masking definition includes information regarding the table columns and the masking format for each of these columns. The mask can be created by writing the masked data to the export file.

  3. Create a Data Subsetting Definition — Create a data subsetting definition to define the table rules and rule parameters. The subset can be created by writing the subset data to the export file.