Database Anonymizer Tool

Data Anonymization is the process of protecting private or sensitive information, such as passwords, by deleting or encrypting personally identifiable information. As organizations store tend to store user information on local or cloud servers for various business requirements, data anonymization becomes a vital requirement to maintain data integrity, and to prevent security breaches.

The Database Anonymizer tool provides the functionality to anonymize the sensitive information by exporting data from the database, and allows you to configure which tables, columns, or values to exclude from the data. By default, all the Users and Passwords fields are excluded.

Note: This tool is mainly intended to hide passwords and dictionary values in the Digital.ai Deploy database. However, you can customize it based on your requirements.

Database Anonymizer Configuration File

The Database Anonymizer configuration file (central-config/xld-db-anonymize.yaml) tells you the data from the database you need to export. The configuration file contains three sections that define the rules for exporting.

1.Tables to not export: This section defines the tables that will not be exported. For example, USERS table can contain sensitive information. Therefore, this table is not exported by default.

deploy.db-anonymizer:
  tables-to-not-export:
    - XL_USERS
  tables-to-anonymize:
    - table: XLD_DICT_ENTRIES
      column: value
      value: placeholder
    - table: XLD_DICT_ENC_ENTRIES
      column: value
      value: enc-placeholder
    - table: XLD_DB_ARTIFACTS
      column: data
      value: file
  content-to-anonymize: []
  encrypted-fields-to-ignore:
    - password-regex: "\\{aes:v0\\}.*"
      table: XLD_CI_PROPERTIES
      column: string_value
      value: password
  1. Tables to anonymize: This section defines the content of the specific column within a specific table. The original content will be replaced with the content defined in the value field.
  tables-to-anonymize:
    - table: XLD_DICT_ENTRIES
      column: value
      value: placeholder
    - table: XLD_DICT_ENC_ENTRIES
      column: value
      value: enc-placeholder
    - table: XLD_DB_ARTIFACTS
      column: data
      value: file
  1. Content to anonymize: This section defines the column containing specific content of text that will be replaced with the updated value.
  content-to-anonymize: []
  encrypted-fields-to-ignore:
    - password-regex: "\\{aes:v0\\}.*"
      table: XLD_CI_PROPERTIES
      column: string_value
      value: password

Caution:

  • Anonymizing the content which is same as the dictionary title will change the key and the dictionary title.
  • Anonymizing the content which is same as the the dictionary type will corrupt the dictionary.

To anonymize the encrypted CI password with the local key store, edit the centralConfiguration/db-anonymizer.yaml file with the following configuration:

"encrypted-fields-to-ignore": [
    {
      "passwordRegex": "\\{aes:v0\\}.*",
      "table": "XLD_CI_PROPERTIES",
      "column": "string_value",
      "value": "password"
    }
  ]

Export Anonymizing Database

To export anonymized data, run the following command:

./bin/db-anonymizer.sh

When you run the command, the data is dumped in the server home directory with the file named xl-deploy-repository-dump.xml, and its corresponding validation file— xl-deploy-repository-dump.dtd.

Important: If you are using two databases (repository and reporting), run the -reports command to export the reporting database data file—xl-deploy-reporting-dump.xml.

Import Anonymizing Database

To import anonymized data, run the following command:

./bin/db-anonymizer.sh -import

Command-specific Flag Options

The following table describes the command-specific flag options when importing data:

Flag
Description
-import Imports data to empty database

Note: If the file is not specified, the system will try to import file named xl-deploy-repository-dump.xml from the server home directory. To import a specific file from different location, use -import -f <absolute-path-of-file>command. Ensure the xl-deploy-repository-dump.dtd file is available, along with the xl-deploy-repository-dump.xml in the absolute path.
-f Imports a specified data file
-refresh Refreshes data in the database

Note: Every record will be verified before inserting. Therefore the import time increases.
-batchSize Specifies the maximum number of commands in a batch

Note: Optimal batch size is different for each specific case and DBMS. However, the default value 100 provides good results in the most cases. If you want to disable batch processing, set the value to 0.
-reports Performs import on the reporting database