Downloading and anonymizing archived releases
With the Release Database Writer tool, you can make changes to your releases at a database level, anonymize your installation, or download archived releases as JSON files. This topic describes how to use the Database Writer to download and anonymize your archived releases.
Data hidden in the process
The Database Writer connects to the Release archiving database and downloads each release as a JSON file in a specified folder. The tool scans through the contents of the release and removes sensitive information such as user names. List of items the tool searches for:
- emails (john.smith@my-org.com)
- people names (John, Smith)
- user names (jsmith)
- organization names (my-org, MyOrg, My Org)
- location names (countries)
- telephone numbers (+1 …)
- mentions (@j-smith)
This information can be found using regular expressions or using dictionaries that are stored as text files inside the tool distribution folder. You can find and customize the dictionaries inside the database-writer-<version>.jar
archive under BOOT-INF/classes/dictionary/
.
Examples:
- emails: using regular expression
- people names: in
first_names.txt
andsurnames.txt
files - user names: loaded from the Release users table
- organization names or other additional names: are supplied as a parameter when running the tool
- location names: in the
geolocations.txt
file - telephone numbers: using regular expressions selected based on the locale specified when running the tool
- mentions: using regular expression
There is an additional file named ignore.txt
that contains a list of exclusions: strings that should not be replaced.
For example: the word release
is used throughout release JSONs, so if the tool replaces this string, the JSON might lose the meaning.
Installation
You can download the latest version of the database-writer
tool from the customer download area.
- Unpack
database-writer-*.zip
into a new directorydatabase-writer
. - If you are using the repository keystore, copy
XLR_HOME/conf/repository-keystore.jceks
todatabase-writer/repository-keystore.jceks
. - Create a new file
database-writer/app.properties
with the following content:
# Connection details for the reporting database of the Release
xlr.datasource.url=jdbc:mysql://localhost/xlarchive?useSSL=false
xlr.datasource.username=xlarchive
xlr.datasource.password=password
xlr.datasource.driver-class-name=com.mysql.jdbc.Driver
# If you have a repository keystore file with a password then you should specify the following two properties:
repository.keystore.location=./repository-keystore.jceks
repository.keystore.password=thepassword
- Copy the JDBC driver of the archive database to
database-writer/lib/
. - Optionally, you can create a file with extra names of organizations or people that you want to make sure are replaced. Place one string per line. You can use a multi-word string and it will replace it completely using case sensitive characters.
Example: a file ./names.txt
with the following content:
ACME
ACME Inc.
acme
-
To start the tool, execute:
./bin/database-writer --spring.config.location=app.properties
-
To download the anonymized archived releases, execute the following command:
read_all_archived --path ./archived_releases --anonymize --locale US --additional-replacements-path ./names.txt --anonymization-output-path ./replacements-made.txt --skip-existing
- The anonymization process can take several minutes to finish and requires CPU resources. After the process is finished, all the archived releases will be present as JSON files in the specified directory (
./archived_releases/**/*.json
in this example).
The file ./replacements-made.txt
will contain a mapping of the strings that were replaced by placeholders. This file is there for you verify the contents and must not be shared. For example:
EMAIL_4: jsmith@my-org.com
...
MENTION_9: @j-smith
...
NAME_7: John
...
ORG_1: My Org
...
USERNAME_5: jsmith
...
TELEPHONE_6: 2122620703
In releases where the specified user was mentioned, you will see placeholders such as _EMAIL_4_
or _NAME_7_
.
To start the tool without interactive shells:
-
Create a file with the command line, e.g.
./command.cli
:read_all_archived --path ./archived_releases --anonymize --locale US --additional-replacements-path ./names.txt --anonymization-output-path ./replacements-made.txt --skip-existing
-
Run the tool:
./bin/database-writer --spring.config.location=app.properties @command.cli
As a result of running the tool there will be archived releases in the specified folder. You can pack these and use them in a testing environment.