Wednesday 6 February 2019

Regular Expression Based Redaction Policy

We can use the regular expression based redaction technique to mask a column data based on a pattern match when the function_type parameter is set to DBMS_REDACT.REGEXP. After setting the function_type parameter, we have to make use of the below parameters to build our regular expression based pattern matching.
 
·         REGEXP_PATTERN         parameter is used for defining the search pattern for matching the data.
 
·         REGEXP_REPLACE_STRING parameter is used for replacing the strings that are matched by the REGEXP_PATTERN parameter.
 
·         REGEXP_POSITION parameter specifies the starting position for the string search and replacement. This parameter accepts a positive integer indicating the column_name parameter’s character position. The default is 1 or RE_BEGINNING format, meaning that the search starts from the first character of the column data.
 
·         REGEXP_OCCURRENCE parameter defines at which occurrence the search and replace must occur. This parameter accepts all positive numbers to indicate the replace option.
 
è If we specify 0 or RE_ALL as its value, then all the occurrences of the match get replaced.
 
è If we specify 1 or RE_FIRST as its value, then the first occurrence of the match gets replaced.
 
è If we specify any positive integer, say n, then the nth occurrence of the match gets replaced.
 
·         REGEXP_MATCH_PARAMETER parameter allows us to change the default matching behavior. For e.g., to make the matching case insensitive, we must use the character i or the format RE_MATCH_CASE_INSENSITIVE.
 
We can use the below formats in place of the values for the REGEXP_PATTERN and REGEXP_REPLACE_STRING parameters in the DBMS_REDACT.ADD_POLICY procedure.
 
REGEXP_PATTERN & REGXP_REPLACE_STRING Default Formats
Format
Description
DBMS_REDACT.
RE_PATTERN_ANY_DIGIT
Searches for any digit. Below are the supported REGEXP_REPLACE_STRING default formats.
 
FORMAT
DESCRIPTION
DBMS_REDACT.
RE_REDACT_WITH_SINGLE_X
Replaces the data with a single X character for each actual data character.
DBMS_REDACT.
RE_REDACT_WITH_SINGLE_1
Replaces the data with a single 1 digit for each actual character.
DBMS_REDACT.
RE_PATTERN_CC_L6_T4
Searches for the digits in the credit card number having 6 leading and 4 trailing digits. Below are the supported REGEXP_REPLACE_STRING default formats.
 
FORMAT
DESCRIPTION
DBMS_REDACT.
RE_REDACT_CC_MIDDLE_DIGITS
Replaces the middle digits in a credit card number.
DBMS_REDACT.
RE_PATTERN_US_PHONE
Searches for any US telephone number format. Below are the supported REGEXP_REPLACE_STRING default formats.
 
FORMAT
DESCRIPTION
DBMS_REDACT.
RE_REDACT_PHONE_L7
Replaces the last 7 digits of a US telephone number.
DBMS_REDACT.
RE_PATTERN_EMAIL_ADDRESS
Searches for any email address. Below are the supported REGEXP_REPLACE_STRING default formats.
 
FORMAT
DESCRIPTION
DBMS_REDACT.
RE_REDACT_EMAIL_NAME
Replaces the email name.
DBMS_REDACT.
RE_REDACT_EMAIL_DOMAIN
Replaces the email domain name.
DBMS_REDACT.
RE_PATTERN_IP_ADDRESS
Searches for any IP address. Below are the supported REGEXP_REPLACE_STRING default formats.
 
FORMAT
DESCRIPTION
DBMS_REDACT.
RE_REDACT_IP_L3
Replaces the last 3 digits of the IP address.
 
In the below code listing, we have created a policy for redacting the email name from a list of email addresses using the default parameter values as shown below,
 
·         REGEXP_PATTERN parameter value as DBMS_REDACT.RE_PATTERN_EMAIL_ADDRESS.
 
·         REGEXP_REPLACE_STRING parameter value as DBMS_REDACT.RE_REDACT_EMAIL_NAME.
 
·         REGEXP_POSITION parameter value as DBMS_REDACT.RE_BEGINNING.
 
·         REGXP_OCCURRENCE parameter value as DBMS_REDACT.RE_FIRST.
 
·         REGEXP_MATCH_PARAMETER parameter value as DBMS_REDACT.RE_MATCH_CASE_INSENSITIVE.
 
BEGIN
 DBMS_REDACT.ADD_POLICY(
   object_schema          => 'C##',
   object_name            => 'CUSTOMER',
   column_name            => 'CUSTOMER_EMAIL',
   policy_name            => 'REDACT_CUSTOMER_EMAIL',
   function_type          => DBMS_REDACT.REGEXP,
   expression             => '1=1',
   regexp_pattern         => DBMS_REDACT.RE_PATTERN_EMAIL_ADDRESS,
   regexp_replace_string  => DBMS_REDACT.RE_REDACT_EMAIL_NAME,
   regexp_position        => DBMS_REDACT.RE_BEGINNING,
   regexp_occurrence      => DBMS_REDACT.RE_FIRST,
   regexp_match_parameter => DBMS_REDACT.RE_MATCH_CASE_INSENSITIVE);
END;
/
 
When we query the CUSTOMER table, we can see the name of the email address column being redacted to the default character x as shown below,
 
SELECT unredacted, customer_email FROM customer;
Script Result:
UNREDACTED
CUSTOMER_EMAIL
 
In the case of customized regular expression redaction policy, we can enter our own regular expression match pattern, replace string, position value, occurrence value, and the match parameter values instead of the default parameter formats. In our below example, the account number column of 11 characters each, for a set of customers has been redacted by replacing the first 7 characters with the user defined string x. Here, the parameter values used are,
 
·         REGEXP_PATTERN parameter value as '(\d\d\d\d)(\d\d\d)(\d\d\d\d)'.
 
·         REGEXP_REPLACE_STRING parameter value as 'XXXXXX\3'.
 
·         REGEXP_POSITION parameter value as 1.
 
·         REGEXP_OCCURRENCE parameter value as 0.
 
·         REGEXP_MATCH_PARAMETER parameter value as 'i'.
 
BEGIN
 DBMS_REDACT.ADD_POLICY(
   object_schema          => 'C##',
   object_name            => 'CUSTOMER',
   column_name            => 'CUST_ACC_NUM',
   policy_name            => 'REDACT_CUST_ACC_NUM',
   function_type          => DBMS_REDACT.REGEXP,
   expression             => '1=1',
   regexp_pattern         => '(\d\d\d\d)(\d\d\d)(\d\d\d\d)',
   regexp_replace_string  => 'XXXXXX\3',
   regexp_position        => 1,
   regexp_occurrence      => 0,
   regexp_match_parameter => 'i');
END;
/
 
When we query the account number field from the CUSTOMER table, we can see that the column has been redacted with 4 x characters in the place of first 7 characters.
 
SELECT unredacted, cust_acc_num FROM customer;
Script Result:
UNREDACTED
CUST_ACC_NUM
98237959239
XXXX9239
87395873094
XXXX3094
53069840568
XXXX0568
83859034854
XXXX4854

No comments:

Post a Comment