Cluster mode

This topic describes how to run XL Release as a cluster. Running XL Release in this mode enables you to have a Highly Available (HA) XL Release setup. This is the supported mode:

  • Active/active: Two or more nodes are running simultaneously to process all requests. A load balancer is needed to distribute requests.

Cluster configuration

Requirements

Using XL Release in cluster mode requires the following:

  • XL Release must be installed according to the system requirements. For more information, see requirements for installing XL Release.

  • The XL Release repository and archive must be stored in an external database, as described in configure the XL Release repository in a data database topic. Note: Cluster mode is not supported for the default configuration with an embedded database.

  • A load balancer. For more information, see the HAProxy load balancer documentation.

  • The time on both XL Release nodes must be synchronized through an NTP server.

  • The servers running XL Release must run on the same operating system.

  • XL Release servers and load balancers must be on the same network.

Important: All the XL Release cluster nodes must reside in the same network segment. This is required for the clustering protocol to function correctly. For optimal performance, it is also recommended that you put the database server in the same network segment to minimize network latency.

Setup procedure

The initial cluster setup is:

  • A load balancer
  • A database server
  • Two XL Release servers

To set up the cluster, perform the following configuration steps before starting XL Release.

Step 1 - Set up external databases

Follow the procedure described in Configure the XL Release SQL repository in a database.

Important: Both the xlrelease repository and the reporting archive must be configured in an external database.

Step 2 - Set up the cluster in the XL Release configuration

All active/active configuration settings are specified in the XL_RELEASE_SERVER_HOME/conf/xl-release.conf file, which uses the HOCON format.

  1. Enable clustering by setting xl.cluster.mode to full (active/active).
  2. Define ports for different types of incoming TCP connections in the xl.cluster.node section:
Parameter Description
xl.cluster.mode Possible values: default (single node, no cluster); full (active/active). Use this property to turn on the cluster by setting it to ‘full’.
xl.cluster.name A label to identify the cluster.
xl.cluster.node.id Unique ID that identifies this node in the cluster.
xl.cluster.node.hostname IP address or host name of the machine where the node is running. Note that a loopback address such as 127.0.0.1 or localhost should not be used.
xl.cluster.node.clusterPort Port used for cluster-wide communications; defaults to 5531.

Sample configuration

This is an example of the xl-release.conf configuration for an active/active setup:

    xl {
        cluster {
            mode = full
            name = "xlr-cluster"
            node {
                clusterPort = 5531
                hostname = "xlrelease-1.example.com"
                id = "xlrelease-1"
            }
        }
        database {
            ...
        }
    }

Step 3 - Set up the first node

  1. Open a command prompt and run the following server setup command: ./bin/run.sh -setup
  2. Follow the on-screen instructions.

Step 4 - Prepare another node in the cluster

  1. Zip the contents of the XL_RELEASE_SERVER_HOME/ folder from the first node.
  2. Copy the ZIP file to another node and unzip it.
  3. Edit the xl.cluster.node section of the XL_RELEASE_SERVER_HOME/conf/xl-release.conf file.
  4. Update the values for the specific node.

Note: You do not need to run the server setup command on each node.

Step 5 - Set up the load balancer

When running in cluster mode, you must configure a load balancer to route the requests to the available servers.

The load balancer checks the /ha/health endpoint with a HEAD or GET request to verify that the node is up. This endpoint will return:

  • A 200 OK HTTP status code if it is the currently active node

This is a sample haproxy.cfg configuration for HAProxy. Ensure that your configuration is hardened before using it in a production environment.

global
  log 127.0.0.1 local0
  log 127.0.0.1 local1 notice
  log-send-hostname
  maxconn 4096
  pidfile /var/run/haproxy.pid
  user haproxy
  group haproxy
  daemon
  stats socket /var/run/haproxy.stats level admin
  ssl-default-bind-options no-sslv3
  ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:AES256-GCM-SHA384:AES256-SHA256:AES256-SHA:DHE-DSS-AES128-SHA
defaults
  balance roundrobin
  log global
  mode http
  option redispatch
  option httplog
  option dontlognull
  option forwardfor
  timeout connect 5000
  timeout client 50000
  timeout server 50000
listen stats
  bind :1936
  mode http
  stats enable
  timeout connect 10s
  timeout client 1m
  timeout server 1m
  stats hide-version
  stats realm Haproxy\ Statistics
  stats uri /
  stats auth stats:stats
frontend default_port_80
  bind :80
  reqadd X-Forwarded-Proto:\ http
  maxconn 4096
  default_backend default_service
backend default_service
  cookie JSESSIONID prefix
  option httpchk head /ha/health HTTP/1.0
  server node_1 node_1:5516 cookie node_1 check inter 2000 rise 2 fall 3
  server node_2 node_2:5516 cookie node_2 check inter 2000 rise 2 fall 3

Limitation on HTTP session sharing and resiliency in cluster setups

XL Release does not share HTTP sessions among nodes. If the active XL Release node becomes unavailable:

  • All users will effectively be logged out and will lose any data that was not stored to the database.
  • Any script tasks that were running on the previously active node will have the failed status. When a new node becomes the active node, which happens automatically, you can restart failed tasks.

Note: Performing a TCP check or GET operation on / will indicate that a node is running.

Step 6 - Start the nodes

Beginning with the first node that you configured, start XL Release on each node. Ensure that each node is fully up and running before starting the next one.

Advanced configuration

Network split resolution

In the case of a network split, the XL Release cluster has a default strategy configured to avoid the creation of multiple independent cluster partitions from the original cluster. The default configured strategy is the MajorityLeaderAutoDowningProvider.

This auto-downing strategy shuts down every cluster partition which is in minority. For example: partition size < cluster size / 2).

When the cluster is split into two parts, partition size == cluster size / 2, the partition containing the oldest active cluster member will survive. If there are no partitions containing the sufficient number of members, the quorum cannot be achieved and the whole cluster will be shutdown. If this occurs, an external restart of the cluster is required.

An alternative strategy, available by default, is the OldestLeaderAutoDowningProvider. This strategy can be activated in the XL_RELEASE_SERVER_HOME/conf/xl-release.conf file by specifying:

xl {
    cluster {
        akka {
            cluster {
                downing-provider-class = "com.xebialabs.xlplatform.cluster.full.downing.OldestLeaderAutoDowningProvider"
            }
        }
    }
    ...
}

This strategy will keep the partition with the oldest active node alive. It is suitable for an XL Release cluster which needs to stay up as long as possible, without depending on the number of members in the partitions.