Set up an active-active cluster
This topic describes how to set up an active-active cluster for Deploy. Running Deploy in this mode enables you to have a Highly Available (HA) Deploy setup with improved scalability.
Running Deploy in an active-active cluster requires the following:
- Deploy must be installed according to the system requirements. For more information, see requirements for installing Deploy.
- A load balancer that receives HTTP(S) traffic and forwards that to the Deploy master nodes. For more information, see the HAProxy load balancer documentation.
- Two or more Deploy master nodes that are stateless and provide control over the workers and other functions (e.g. CI editing and reporting).
- Two or more Deploy worker nodes that contain and execute tasks and are configured to connect to all masters.
- A database server.
- A shared drive location to store exported CIs and reports.
The communication between the masters and the workers is done through a two-way peer-to-peer protocol using a single port for each master or worker node.
The majority of Deploy functions and configurations are identical for a cluster setup as for a single instance. The exception is that, in the cluster setup, the functions can operate on all masters and/or on all workers.
The Active-Active cluster provides you the high availability for Deploy by providing a fail over mechanism. If one node goes down the other node will be available to take over its tasks. This means that the UX will remain responsive all the time and scheduled deployments can be deployed by other nodes.
The Active-Active cluster provides you the mechanism to scale the number of deployments which can be done per hour or per day. By adding additional nodes it is possible to handle more deployments per period of time. This will make it possible to scale Deploy until the database or some other point becomes the bottleneck.
When planning an active-active cluster for Deploy, make sure you are aware of the following:
All masters and workers must have the same configuration, which consists of:
- The plugins
- The extensions folder (e.g. for scripts)
- The configuration files (some parts will be node specific)
- All masters and workers mast have access to the database.
- All masters and workers must have access to the artifacts.
- Communication between masters and workers requires a low latency, high bandwidth network.
- All masters and workers need access to all target hosts (and Deploy Satellites, if applicable).
- For the HTML5 UI to function correctly, all requests for a single user session must be handled by the same master.
- For exports of CIs and reports to work correctly across masters, the
export/folder should be a shared and read-write accessible volume for each master and worker.
Based on the planning phase considerations, these settings are strongly recommend:
- All masters and workers are part of the same network segment.
- The network segment for the masters and workers is properly secured.
- The hostnames and IP addresses for all masters and workers are stored and maintained in a DNS server that is accessible to all masters and workers.
- The load balancer is configured to be highly available and can properly reach the masters.
- The load balancer handles SSL and forwards unencrypted data.
- The load balancer is configured with session affinity (“Sticky sessions”).
- The database is configured for high availability and can be properly reached by masters and workers.
- Artifacts are stored in the database (or preferably in (an) external system(s)).
- When Deploy Satellite is used, all communication between masters, workers and satellites is secured using SSL Certificates.
The configuration of the load balancer, the network, and the database is not covered in this document.
When setting up a new system, the setup procedure should be executed on a single master node and the resulting configuration files shared with other nodes (masters and workers).
When upgrading, the upgrade procedure should be executed on all masters and workers.
In both cases, the configuration files to be shared between the masters and the workers include:
- The YAML files in the
centralConfigurationdirectory (see Central Configuration as a Standalone Service)
- The license (
- The repository keystore (
- The truststore (if applicable)
Each master defines its fully qualified host name in the configuration property
XL_DEPLOY_SERVER_HOME/centralConfiguration/deploy-server.yaml. The worker name is defined in
deploy.server.hostname property in the
xl-worker.conf file. Each master and worker should configure the correct
trust-store in the
xl.server.ssl section (if SSL is enabled) including certificates for Deploy Satellites (if applicable). Follow these instructions to set it up.
To start master and worker nodes:
- Masters can be started with the normal procedure, e.g. invoking
Workers can be started with the literal ’
worker’ as the first argument to
-apiflag pointing to the load balancer; and one or more
-masterflags, one for each fully qualified master name. E.g.:
bin/run.sh worker -api http://xld-loadbalancer.example.com -master xld1.example.com:8180 -master xld2.example.com:8180
Note: If no DNS server is used and the mapping is done using
/etc/hosts or a similar local mechanism, the configuration setting
xl.tasks.system.akka.io.dns.resolver must be set to
XL_DEPLOY_SERVER_HOME/centralConfiguration/deploy-server.yaml on all masters and hosts.
A running active-active cluster for Deploy can be scaled for better performance if properly configured.
When using SSL for communication between masters, workers and satellites, the certificates of new masters and workers must be trusted by the other nodes and satellites. In this case it is recommended to use a trusted root certificate to sign all certificates used by masters and workers and satellites. A (self-signed) root certificate can be added to the trust store.
Additional workers can be started and directed to an existing cluster of workers without additional configuration.
It is important to note that scheduled or on-going work (tasks) will not be re-balanced when adding workers. All workers are assigned tasks in a round-robin fashion when a task is created on one of the masters. Once a task is assigned to a worker, it cannot be moved to another worker.
To enable workers to find masters that are added while the workers are running, available workers should be registered in a DNS SRV record.
xld-masters IN SRV 1 0 0 xld-master-1 xld-masters IN SRV 1 0 0 xld-master-2 ...
The workers can now be started with a single
-master parameter that points to the SRV record:
The port number for a master can be configured in the DNS SRV record or in the parameter value:
xld-masters IN SRV 1 0 9001 xld-master-1
defines the port to be used for
xld-master-1 to be 9001. If the port in the DNS SRV record is
0, it is ignored.
A parameter value of
-master xld-masters:9002 means that all masters found in the DNS SRV record will use port 9002. The port number in the DNS SRV record has higher preference.