Risk Prediction module

The risk prediction module adds machine learning to XL Release to predict if a release or task will fail, or how much time it will take to execute it, before releases even start. This topic describes the architecture of the module and the installation instructions. For more information about using the risk prediction module, see Using the XL Release Risk Prediction module.

Overview

The risk prediction module uses the “DevOps Prediction Engine” service to train machine learning models and request predictions. Using the DevOps Prediction Engine service ensures that the XL Release server is not overloaded when training the models and the normal execution of releases is not affected.

This is a diagram of the interactions between XL Release and the DevOps Prediction Engine service:

Risk Prediction Module Architecture

The DevOps Prediction Engine service is shipped as a Docker image which runs Python processes inside. DevOps Prediction Engine exposes an HTTP API and uses the filesystem to store trained models as files. To train the models, DevOps Prediction Engine requires access to the archiving database which stores XL Release historical data.

In XL Release, the risk prediction module (shipped as xlr-risk-predictions-plugin) adds a new page in the user interface and connects to the DevOps Prediction Engine. The xlr-risk-predictions-plugin calls the DevOps Prediction Engine service with these objectives:

  1. To train the models. The module sends archiving database connection details to DevOps Prediction Engine (1.1 in the diagram above) which starts to scan releases (1.2) and run machine learning algorithms. DevOps Prediction Engine creates multiple models one after another. The models are serialized and saved on disk (1.3). This is a batch job which causes high network and CPU usage and it should not be executed often. The recommended frequency is to run it once per week. The execution duration of this batch job can be several minutes long.

  2. To request predictions for a new or updated release. When the risk of a release requires evaluation or when a user opens the new forecasting release page, the module sends the release content to the DevOps Prediction Engine service (2.1). The service loads the current models from the disk or from the cache and runs the release through them. The output conveys the probabilities of risk failure, task failures, durations, and other information. This data is sent back to xlr-risk-predictions-plugin in the response, which displays it in the user interface.

Requirements

Technical

  • XL Release version 7.5 or later.

  • XL Release must be configured with an external archiving database. The currently supported databases are PostgreSQL and MySQL. Other external databases can be supported per customer request. The default embedded Apache Derby database is not supported because it only allows connections from the XL Release JVM and the DevOps Prediction Engine service cannot scan historical releases (1.2).

  • The DevOps Prediction Engine Docker container should run on a separate VM from XL Release to avoid having the heavy CPU usage when training a model affecting the XL Release server.

  • The DevOps Prediction Engine Docker container should run with a mounted persistent disk. If a container is recreated, the trained models are not lost.

  • Ensure connectivity between XL Release, DevOps Prediction Engine, and the archiving database according to the diagram above.

  • The trained models require up to 1 Gb of disk space. The DevOps Prediction Engine service can use up to 8 Gb of memory and does not have strict requirements for CPU speed. Higher CPU clock speeds do improve the speed of training and predictions.

Functional

  • For the predictions to succeed, you must have archived releases. To check if there are archived release, go to the Releases Overview page and click Archived.
  • The more archived releases you have the better predictions you can get. You should have at least 100 releases for the module to have enough data.

Installation

The DevOps Prediction Engine service

  1. Run the Docker container with the DevOps Prediction Engine service. You can run the container using either the command line, Kubernetes, or any other preferred method. The image exposes port 4000 to serve HTTP requests and stores models internally under the path /app/output/pipelines. The log files are written internally to the path /app/output/log. Example: You can use the following command to start the service on port 4321, store models under /var/lib/devopsml/data and log files under /var/lib/devopsml/log:
docker run -p 4321:4000 -v /var/lib/devopsml/data:/app/output/pipelines -v /var/lib/devopsml/log:/app/output/log xebialabs/xl-devops-ml:latest
  1. Ensure that the logs of the container are recorded and are retrievable for a long period of time, such as 2 months. This will ensure that XebiaLabs Support can debug your models if you require.

  2. To test the service, execute the following HTTP request:

curl https://my-docker-host:4321

{"hostname":"my-docker-host","service":"xl-devops-ml","version":"0.0.6"}

The XL Release module

  1. Copy the latest xlr-risk-predictions-plugin JAR file from the XebiaLabs Distribution Site into the XL_RELEASE_SERVER/plugins/xlr-official directory.

  2. To configure the module, modify the XL_RELEASE_SERVER_HOME/conf/xl-release.conf file by adding an xl.plugin.riskPredictions section:

    xl {
    plugin {
    riskPredictions {
    
      # URL of the DevOps Prediction Engine web application, typically started as a Docker container
      xlDevopsMlUrl = "https://my-docker-host:4321/"
    
      # Period to rebuild all models.
      # This is a heavy operation, so don't make it too often.
      modelBuildPeriod = 14 days
      # Initial delay before rebuilding models after XL Release startup.
      modelBuildInitialDelay = 14 days
    
      # Connection details for the DevOps Prediction Engine app to be able to connect to reporting DB of XL Release.
      # By default the same connection details are used as by XL Release itself.
      db-connection {
        reporting {
          db-driver-classname = org.postgresql.Driver
          db-url = "jdbc:postgresql://my-db/xlarchive?ssl=false"
          db-username = "xlarchive"
          db-password = "some_password"
        }
      }
    
      timeouts {
        # Timeout when waiting for prediction responses from the DevOps Prediction Engine service
        # predictionsRequest = 5 minutes
        # Use a more aggressive timeout in the risk assessor, so that overall release
        # risk score calculation is not delayed
        # riskAssessorPredictionsRequest = 10 seconds
      }
    }
    }
    }

Configure the URL for the DevOps Prediction Engine service that you started above and the database connection details. The archiving database must be accessible from the DevOps Prediction Engine docker container with those details.

  1. Restart the XL Release server. Check the XL Release log files for the following lines:
INFO  c.x.x.p.p.s.ReleasePredictionService - Will start building prediction models every PT4H after PT4H
INFO  c.x.x.p.p.s.ReleasePredictionService - Connected to DevOpsML at https://my-docker-host:4321: version 0.0.6, hostname my-docker-host
  1. After the initial start, you must trigger the model training to enable the DevOps Prediction Engine service to make predictions. You will see errors in the DevOps Prediction Engine logs. Execute the following POST request with a user with admin permissions in XL Release:
curl --user admin --request POST 'https://xl-release-host/api/predictions/train'

This command starts training the models. The duration of this operation can be higher than ten minutes. Monitor the logs for the progress and result. If an issue occurs, check the DevOps Prediction Engine logs to find the cause. You can find the logs by executing the command docker logs <container ID> or by checking the log files from the Docker-mounted volume as described above.

  1. Open any template or release in the XL Release UI.
  2. Go to the predictions page to see the predictions.

For more information about using the risk prediction module, see Using the XL Release Risk Prediction module.

Operation

XL Release availability or performance does not depend on the DevOps Prediction Engine service. If the Docker image becomes unavailable:

  • The new risk assessments will be marked as “OK”, and in the log files there will be a warning printed.
  • When opening the Risk Forecast page you will see a message that the risk prediction are not available.

The model will be automatically retrained with the specified periodicity. Retraining is needed so that newly archived releases will be taken into account when doing predictions. We recommend to retrain it every week, and not more often than once a day. It is configured by two settings in the xl-release.conf file:

  • xl.plugin.riskPredictions.modelBuildPeriod: the periodicity of retraining the model; for example 14 days. The period starts when XL Release server has started, so if you have started it on Monday midday, then the model will start training in two weeks on Monday midday.
  • xl.plugin.riskPredictions.modelBuildInitialDelay: the period of time after the XL Release server starts when the model should be trained for the first time. If you set it for example to 1 minute then on every (re)start of XL Release it will almost immediately start training the models.

Security

The Docker service is available through a simple HTTP connection which is not encrypted and does not require authentication. If the DevOps Prediction Engine service is accessible from sources other than the XL Release server in your installation, ensure that you configure an HTTPS reverse proxy with authentication.

The pipeline file currently has the format *.pkl. This is a serialized Python class which contains the trained model. The information that you can get from that file includes:

  • IDs of releases and tasks from your XL Release archive
  • Numbers of tasks of different types in every release

Note: You cannot obtain the values of titles, descriptions, input properties and so on through the service, as they are all hashed during the training process. Passwords are not stored in the XL Release archive, so they cannot be passed or used in the DevOps Prediction Engine service.