Troubleshooting Operator Based Installer
This section describes how to troubleshoot some of the common issues you may face when installing the Release application using Operator-based installer.
Restart the deployment using XL CLI.
xl apply -v -f digital-ai.yaml
digital-ai.yamlwrapper file that bundles all other files, such as infrastructure file, environment file, and so on.
The XL CLI script runs successfully, but only the Operator control manager pods are deployed on the Kubernetes cluster. No other pods are deployed.
Clear the Operator deployment as follows:
- Run the following command:
Kubectl get crd
- Delete the Operator corresponding to
kubectl delete crd digitalaireleases.xlr.digital.ai
- Go to
/digital-ai/kubernetes/templatepath in extracted ZIP file, and run the following command:
kubectl delete -f
- Restart the deployment using XL CLI.
Note: To troubleshoot the issue on Openshift AWS cluster, replace the
kubectl command with
After deleting the operator customer resource definition (CRD) and the operator, the redeployment process fails to create pods when you attempt to activate the deployment process by running the following command:
xl apply -v -f digital-ai.yaml
If you do not have a local Release instance, only then use the
kubectl delete -f command to remove the Release instance. If you have a local Release instance with deployment details, use the
make undeploy command to remove the Operator, and retry the deployment process.
The upgrade to Operator-based solution from the Helm Charts-based solution fails.
- Restore the database instance.
- Clean the deployments. For more information, see [Uninstall Release](/release/how-to/k8s-operator/uninstalling_release_operator.html).
- Update the `dairelease_cr.yaml` file to use the external database as follows:
- Update the external database credentials.
- Redeploy the Release instance.
The upgrade Operator-to-Operator solution, fails with following error:
“Fetching values from cluster… / Missing CRD and CR resources during Upgrade, Could not upgrade: exit status 1”
During the upgrade, the CRD and CR resources are backed up in
To troubleshoot the issue:
Restore CRD using following command:
kubectl apply -f dairelease_cr_original.yaml
- Restart the upgrade.
Note: Make sure that you don’t delete PVCs (answer to question
Should we preserve persisted volume claims? with
xl op --clean
Run the following cleanup script:
kubectl delete crd digitalaireleases.xlr.digital.ai kubectl delete role xlr-operator-leader-election-role kubectl delete clusterrole xlr-operator-manager-role kubectl delete clusterrole xlr-operator-metrics-reader kubectl delete clusterrole xlr-operator-proxy-role kubectl delete rolebinding xlr-operator-leader-election-rolebinding kubectl delete clusterrolebinding xlr-operator-manager-rolebinding kubectl delete clusterrolebinding xlr-operator-proxy-rolebinding kubectl delete service xlr-operator-controller-manager-metrics-service kubectl delete deployment xlr-operator-controller-manager
Keycloak pod is not starting on OpenShift cluster and you can see this error for keycloak StatefulSet:
Warning FailedCreate 2m11s (x3 over 2m11s) statefulset-controller create Pod dai-ocp-xlr-cn1502-k-0 in StatefulSet dai-ocp-xlr-cn1502-k failed error: pods "dai-ocp-xlr-cn1502-k-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount
You have to add security constraint context for your custom namespace:
oc adm policy add-scc-to-group anyuid system:serviceaccounts:<custom-namespace>
xlr-operator-controller-manager periodically restarts with OOM error. Error can be observed by checking describe of the pod in the
Edit the deployment for xlr-operator-controller-manager and increase the value for the memory limits: