High availability with master-worker setup
In XL Deploy, deployment tasks and control tasks are executed by the task execution engine. Based on your deployment task, the XL Deploy planner generates a deployment plan with steps that XL Deploy will carry out to deploy an application. You can manually modify the plan. When the plan is ready, XL Deploy generates a DeploymentTask that is sent to the task execution engine.
The preferred production setup is one where the task execution engine is separated from the core XL Deploy server(s) called the master instance(s). The tasks in XL Deploy are then executed by one or more processes called workers.
For high availability, you must configure multiple master instances. Although these master instances do not need to know about each other, workers must be connected to each of them, as described below. You must front the XL Deploy masters with a load balancer that supports sticky sessions based on session cookies, so that user traffic from the same user ends up on the same XL Deploy master.
For the export of reports and CI trees to work seamlessly across multiple masters, the
XL_DEPLOY_SERVER_HOME/export folder must be on a shared file system that each of the masters and workers has read-write access to. The user that the XL Deploy master or worker runs as must have permissions to create and destroy files and folders on that file system.
You can set up workers in various ways:
- In-process worker: The default out-of-the-box configuration contains a worker that is part of the master and runs in the same process. This is called an in-process worker and it is unique. Do not use this option in a high availability setup.
- Local workers: These workers run in separate processes but are located on the same machine and run in the same installation directory as the XL Deploy master.
- External workers: These workers also run in separate processes. They can either be located on different machines from the master, or on the same machine as the master, but in a different installation directory.
The master-worker setup enables you to scale XL Deploy by adding multiple workers connected to a master to execute more tasks from XL Deploy. The task is created on the master instance and sent to a worker to be executed. A worker can have multiple tasks assigned to the execution queue. Each task from the master can be assigned to only one worker.
When you install XL Deploy, the default configuration is to execute the tasks by an in process worker that runs in the same process as the master. This is not a production setup. You can change the default configuration and use multiple workers that run in different processes from the master. You cannot use the in-process worker and other workers in the same configuration.
When you install an XL Deploy instance, the default configuration is to execute the tasks on an internal worker that runs in the same process as the master. This setting is defined in the
xl.task.in-process-worker=true property in the
This configuration is not intended for production environments.
The worker is a task executor that is running in a different Java process from the XL Deploy master process. To start a worker, it must have an identical configuration with the master. Specifically, the folder structure, contents, and configuration files must be identical and it must use the same database. Workers will not be available for execution of new tasks from a master if their plugins or configuration settings differ. See also Changing configuration of masters and workers.
You can start a separate Java process for each worker from the same folder location as the master. These are called local workers.
This setup provides improved availability and a faster XL Deploy restart procedure for simple configuration changes such as a new plugin or configuration files updates. You can use the
startlocalworker script to quickly add new workers. Ensure that you allocate for each worker the same resources required for the master.
You must copy the installation directory of the master to a different location on the same machine for each worker and start a process from the new location.
This setup supports a faster restart procedure of XL Deploy for advanced configuration changes such as replacing or removing a plugin. Ensure that for each worker you allocate the same resources as required for the master.
You must copy the installation directory of the master to a location on the different machine for each worker and start a process from the new location. The master and the registered workers must all exist in the same subnetwork.
This setup supports high availability and better scalability for XL Deploy by using multiple machines to run workers connected to one or more masters.
If you are running XL Deploy with an in-process worker configuration, see Requirements for installing XL Deploy.
If you are running XL Deploy with multiple workers on the same machine, ensure that you allocate the same resources for each worker as for the master. A worker process has the same resource requirements as the master process.
If you are running XL Deploy in a configuration with multiple workers on different machines, the master and the workers must reside in the same sub-network.
To configure secure communication between master and workers, refer to Configure secure communication with workers and satellites.
The master-worker setup allows for high availability of XL Deploy. XL Deploy needs to be restarted for configuration, plugin, type system changes, or upgrades. With the master-worker setup, you can safely restart XL Deploy without waiting for all the running tasks to be completed. The tasks are executed by the workers and will continue to run on the existing configuration while the master is restarted.
After XL Deploy is restarted with the new configuration, you must manually synchronize the configuration for the existing workers. The master will not assign new tasks to workers that have the old configuration. You can either register new workers to the master, or free up workers before restarting the XL Deploy master.
Important: Upgrades to XL Deploy that include database changes or breaking type system changes still require a complete XL Deploy restart and cannot be performed while parts of the system are running.
When you perform a deployment, the deployment mapping and the planning are created on the master instance. The deployment execution task, which consists of executing the steps of the plan, is sent to a worker. Each task from the master is assigned to one worker. The XL Deploy master uses a round-robin task scheduling method to assign tasks to workers. The method consist of assigning tasks one by one, to each worker in the list.
Important: The master assigns tasks only if there are available workers registered with the master that have an identical configuration.
Task recovery occurs on the worker level. Each worker has a recovery space reserved on the disk where it writes the task information. When the system goes down, the tasks can be recovered and re-initiated from the worker recovery space on the disk.
These are the main characteristics that differentiate workers and XL Satellites:
|Network topology||Workers should reside on the same machine or in the same network as the master for stability, speed, and reduced latency.||Satellites can exist in a data center away from the master or workers.|
|Network stability||Workers and master must reside in the same stable network.||Satellites can be connected by unstable networks.|
|Task execution||One task is assigned to one worker.||Satellites execute a part of a task containing block of steps. One deployment task can involve multiple satellites. The host that you deploy to determines which satellites execute the block.|
|Functionality||Full functionality and identical to the master.||Limited functionality, does not require access to all resources. Satellites are CIs that exist in the XL Deploy repository and cannot execute all step types.|
Satellites can be used with all master-worker configurations.
For more information about XL Satellites, refer to Getting started with the satellite module.
Communication between the master process and the worker process when sending data is done through two channels.
- When you start a new worker, you must specify the DNS address of the master. The worker registers with the master.
- The master selects a worker and sends a task to execute it.
- The master sends instructions to the worker such as starting the task, pausing the task, etc.
All of the
command information is pulled from the master to the worker. These tasks are communicated through one command channel.
Resolution of the master hostname is by default done by DNS lookup, which is supposed to return an
SRV record listing all the masters, or an
A record if there is only a single one. The DNS server is polled every few seconds, and the worker will dynamically connect to new masters or disconnect from removed masters.
Note: Since resolution of the master hostname is done by DNS lookup,
/etc/hosts-based resolution (notably
localhost) will fail. Add a setting
akka.io.dns.resolver = inet-address to
xl-deploy.conf if you do want to rely on
The second channel of communication is in the other direction, from the worker to the master, through the REST API. This is used when a worker must perform an action that is only available on the master.
To set up the communication, you must specify the locations for these channels when you start a worker. For more information, see Add, start, and use workers
In the Explorer, workers are shown in the Monitoring section under Workers. The list contains the following information:
|ID||A unique (technical) identifier of the worker.|
|Name||The user assigned name of the worker (not unique).|
|Address||The (technical) unique address of the worker containing hostname and port.|
|State||The state that the worker is in.|
|# Deployment tasks||The number of deployment tasks assigned to the worker.|
|# Control tasks||The number of control tasks assigned to the worker.|
The state of a worker can be:
|Connected||The worker is connected to the master and can be used to execute tasks.|
|Incompatible||The worker is connected to the master, but is on a different configuration. The master will not send new tasks to this worker. This is a temporary state, while the system is being updated.|
|Draining||The worker is shutting down once all tasks are completed (cancelled or archived). The master will not send new tasks to this worker.|
|Disconnected||The worker is not connected to the master. This can occur due to network interruptions, a failure of the machine, or an issue in the worker. The master displays the number of deployment or control tasks that are running on it, but these tasks cannot be managed.|
When you are using XL Deploy in an active/hot-standby or active/active setup, the database is shared by all the XL Deploy master nodes. The workers you register with an XL Deploy master will also connect to the same database. The communication between the workers and the active XL Deploy master node is done through the load balancer.
The out-of-the-box evaluation setup is to have an in-process worker in each of the hot-standby master nodes. To disable the in-process worker and add other workers, set
xl.task.in-process-worker=false in the
XL_DEPLOY_SERVER_HOME/conf/xl-deploy.conf file for each master node.
To create a worker in a hot-standby scenario:
- For the
-apiflag, specify the REST endpoint of the load balancer
- Point one
-masterflag to each of the hot-standby master nodes (there can be multiple
To create a worker in an active/active scenario:
- For the
-apiflag, specify the REST endpoint of the load balancer
- Point one
-masterflag to the DNS XL Deploy service. The DNS service is required to return an
SRVrecord containing IP addresses of each of the available XL Deploy master instances, or an
Arecord if there is only a single one
The contents of the installation directories must be identical between all the master nodes and all the workers. Ensure all the
ext folders contain are synchronized in all locations. See also Changing configuration of masters and workers.
To ensure high availability, add workers across multiple locations in the same subnetwork. All the workers can communicate with any active master node.
If the communication between a master and a worker fails and the worker cannot be reached, the system will display the worker in a disconnected state, and its tasks in an Unknown state.
Important: The master and workers require access to the artifacts in XL Deploy. You must store the artifacts in an external system such as Nexus or Artifactory, in a central database, or using a file share for the storage location.