Eleanor is a consultant at red hat, interested in containerized application development and user interface design.

10 Most Common Build & Deployment Errors in OpenShift

Whether you're looking for a quick fix for something or gearing up for future troubleshooting, all of these are pretty standard errors you'll run into as you're developing on OpenShift. Below are the 10 most common ones I've seen when working with developers who are getting started with the platform.

But first:

Where to Look for Error Info

Pod/Container Logs

If your build or deployment started and failed halfway through, this is the best place to start. You can see build logs by looking here: 

buildLogs.PNG
buildLogs2.PNG

You can see deployment logs by looking at the specific deployment and either looking at that deployment's logs or the pod's logs directly. Click on the "1 pod" section to find that deployment's pods, then click "Logs."

deployLogs.PNG
deployLogs2.PNG

Monitoring/Events

Most OpenShift objects include an "Events" tab so you can watch new events as they happen. You can also see all of the events happening in the project by clicking on "Monitoring" in the sidebar.

podEvents.PNG

Most of the time, errors will be visible in either of those locations.

10 Common Errors

1. Missing configmap/secret/volume in deployment config

This will appear as a "RunContainerError" when your pods are attempting to spin up. If the required ConfigMap/Secret is missing, or if the key you're looking for in a ConfigMap/Secret is missing, you'll see this error under "Events."

configmapError.PNG

2. health check using the wrong port

This one is a bit harder to find generally, but if your application looks like it has spun up fine with no errors and then appears as Failed with the pods constantly restarting, the liveness probe might be hitting the wrong port. Your readiness probe should also hit the correct port, but it won't restart the pod if it fails (the pod will just appear as "not ready").

port2.PNG

3. missing build secret for authenticating with source repo

If you're seeing a Fetch source failed error when you try to build, you might need to set up a build secret to authenticate with your Git repo. This will either be a username and password (new-basicauth secret) or an SSH key (new-sshauth secret) depending on the URL.

buildSecret.PNG

4. PROJECT QUOTA EXCEEDED

The system admins for your OCP cluster usually set project quotas to keep individual projects from taking up too many resources. If you've already reached your project quota, trying to deploy a new container will fail. You can decrease replicas for other containers, reduce the resource requests/limits for each service, or get the OCP admins to increase your project quota.

quota.PNG

5. rESOURCES OUTSIDE OF REQUEST/LIMIT RATIO

In addition to project quotas, sometimes OCP admins will add limits on what individual pods can request in terms of CPU and memory. Sometimes you'll be inside the limit range for the pod but you'll still get an error about your resource request/limit. This is because there can also be max/min ratios set on pod resources that require your request and limit values to be within a certain ratio.

resource1.PNG

6. build terminates with exit code 137

I really only see this on Maven builds, but it's not limited to that. This is an Out Of Memory (OOM) error while trying to build. Increase the memory request and limit on your BuildConfig and this should go away. 

mavenKill.PNG

7. image pull from external registry instead of internal

If you're seeing an error that looks something like this:

image.PNG

There are a couple reasons that this could be happening. In most cases, you probably aren't trying to get the "hello-world" image from registry.access.redhat.com but want to retrieve it from the internal OCP registry instead. If this is the case, you should take a look at your ImageChange Trigger in the DeploymentConfig and make sure that it is properly set up to update to the latest image when a new one is pushed to the internal registry.

To fix your current failed deployment, you'll need to get the full docker pull spec for the image and copy that into your DeploymentConfig. Go to the ImageStream for the image you want and click "Actions" and "Edit YAML":

imageStream.PNG

In the YAML, search for "dockerImageReference" and copy its value. It will look something like: 

Then paste this into your DeploymentConfig for that container:

dockerImage2.PNG

This will ensure that your DeploymentConfig is pulling the correct image from the local registry instead of going out to an external registry.

8. environment variables are "invalid"

If you try to use "oc apply" to update an environment variable from a name/value pair to a "valueFrom" retrieved from a ConfigMap or Secret, or vice versa, you'll get this error:

The DeploymentConfig "hello-world" is invalid: spec.template.spec.containers[0].env[0].valueFrom: Invalid value: "": may not be specified when `value` is not empty

There are a couple bug reports out for this error, but there isn't a fix as of this post. The easiest way to get rid of this error is to delete all of the environment variables for your DeploymentConfig and run the update again, or update them manually in the console rather than running an "oc apply."

9. Deployment config always appears as "canceled"

deployment.PNG

The deployment number causes this error. In the above example, we're currently running deployment #6, but somehow deployments #1 and #2 are more recent. The latest deployment/replication controller must always have the highest number. The reset to #1 generally happens if the DeploymentConfig is deleted and recreated with "oc delete"/"oc create" or "oc replace." If this happens, the quickest way to get the new deployments running again is to delete all of the previous replication controllers and re-deploy.

10. build succeeds but fails to push image

Private registries sometimes require image push or image pull secrets for security purposes. If a BuildConfig doesn't include this secret or includes the wrong one for a secured registry, you'll see this error:

nexus.PNG

You can fix this by editing your BuildConfig and choosing "Show advanced options" to choose the right image push/pull secrets.

pullsecret.PNG

Lightning talk at DevOps World | Jenkins World 2018

JMeter Reports Dashboard in OpenShift & Jenkins