Troubleshooting GitLab CI Runner Jobs Stuck in Pending: Registration Token & Credential Issues
Resolve GitLab CI/CD jobs stuck pending due to registration token errors, network connectivity, or configuration problems. A step-by-step guide for SysAdmins.
Introduction
As a Systems Administrator or DevOps engineer, you’ve likely encountered the frustrating scenario where your GitLab CI/CD jobs remain indefinitely in a “pending” state. This often indicates a communication breakdown between your GitLab instance and one or more configured GitLab CI runners. While several factors can cause this, a prevalent and often perplexing root cause revolves around registration token credentials, network connectivity, or misconfiguration of the runner itself. This guide provides a highly technical, step-by-step approach to diagnose and resolve such issues, ensuring your CI/CD pipelines run smoothly.
### Symptom & Error Signature
The primary symptom is that jobs within your GitLab project’s CI/CD pipelines (e.g., Jobs tab or pipeline view) will show a status of pending indefinitely, despite having runners available that should be picking up jobs.
When investigating the runner host, you might observe the following:
1. gitlab-runner verify Output:
Running a verification command on the runner host frequently reveals the core issue:
sudo gitlab-runner verify
Expected output for a problematic runner:
Verifying runner... is invalid runner=<runner_id>
FATAL: Runner not found or you don't have permission.
Or, if the runner isn’t even aware of its registration:
Verifying runner... is not alive runner=<runner_id>
2. GitLab Runner Service Logs:
Logs from the gitlab-runner service often provide more detailed insights. Check them using journalctl:
journalctl -u gitlab-runner.service -f
Typical error signatures in the logs might include:
May 15 10:30:45 ci-runner01 gitlab-runner[1234]: ERROR: Failed to get new build while checking runner: status 403 Forbidden runner=<runner_id>
May 15 10:30:45 ci-runner01 gitlab-runner[1234]: ERROR: Failed to get new build while checking runner: Get "<gitlab_url>/api/v4/jobs/request": x509: certificate signed by unknown authority
May 15 10:30:45 ci-runner01 gitlab-runner[1234]: ERROR: Failed to get new build while checking runner: Post "<gitlab_url>/api/v4/jobs/request": dial tcp <gitlab_ip>:443: connect: no route to host
May 15 10:30:45 ci-runner01 gitlab-runner[1234]: WARNING: Checking for jobs... failed runner=<runner_id> status=403 Forbidden
These errors indicate issues ranging from authentication problems (403 Forbidden), TLS/SSL certificate validation failures, or fundamental network connectivity blocks.
### Root Cause Analysis
The “jobs pending stuck” scenario, particularly when related to registration tokens, typically stems from one of several core issues:
- Invalid or Expired Registration Token: The most common culprit. The token used to register the runner might be incorrect, have expired (GitLab allows instance-level tokens to be rotated), or been generated for a different scope (e.g., trying to use a project token for an instance-level runner, or vice-versa, though typically
gitlab-runner registermakes this clear). If a runner is removed from the GitLab UI without being deregistered locally, its token becomes invalid. - Network Connectivity Issues: The runner host cannot reach the GitLab instance. This can be due to:
- Firewall Rules: Host-based firewalls (UFW,
firewalld), cloud security groups, or network firewalls blocking TCP port 443 (or 80 if HTTP). - DNS Resolution: Incorrect DNS configuration on the runner host preventing resolution of the GitLab instance URL.
- Proxy Server Configuration: If the runner needs to communicate through a proxy, it’s misconfigured or not set up.
- Incorrect GitLab URL: The
urlspecified in theconfig.tomlfile or during registration is wrong.
- Firewall Rules: Host-based firewalls (UFW,
- TLS/SSL Certificate Issues: The runner cannot validate the GitLab instance’s SSL certificate. This often happens with self-signed certificates or certificates issued by a private Certificate Authority (CA) that isn’t trusted by the runner’s host OS.
- GitLab Runner Configuration Mismatch: The
config.tomlfile located at/etc/gitlab-runner/config.toml(default location) contains incorrecturl,token, or other crucial settings for a specific runner entry. - GitLab Instance Permissions: The user or group that generated the registration token lacks sufficient permissions (e.g., Maintainer or Owner role) to register a runner within the specified project, group, or instance.
- Runner Deregistered from GitLab UI but Not Locally: If a runner is deleted from the GitLab web interface, its token becomes invalid. The runner on the host machine will continue attempting to connect with an invalid token.
- System Time Skew: A significant time difference between the runner host and the GitLab instance can lead to issues with SSL/TLS certificate validation and API communication.
### Step-by-Step Resolution
Follow these steps in order to methodically troubleshoot and resolve the pending job issue.
#### 1. Verify Network Connectivity to GitLab
Ensure your runner host can reach the GitLab instance.
# Test DNS resolution
ping gitlab.yourdomain.com
# Test HTTPS connectivity (replace with your GitLab URL)
curl -v https://gitlab.yourdomain.com/api/v4/version
Expected curl output: A 200 OK status code and JSON output detailing the GitLab version. If you see errors like Could not resolve host, Connection refused, No route to host, or SSL errors, investigate:
- DNS: Check
/etc/resolv.confand ensure correct nameservers. - Firewall:
- Local Host:
sudo ufw status(Ubuntu/Debian) orsudo firewall-cmd --state(CentOS/RHEL) to ensure port 443 outbound is allowed. - Network/Cloud: Check network ACLs, security groups (AWS, Azure, GCP), or corporate firewalls.
- Local Host:
- Proxy: If your network requires a proxy, ensure environment variables (
http_proxy,https_proxy,no_proxy) are set for thegitlab-runnerservice.
#### 2. Inspect GitLab Runner Service Logs
The journalctl utility is your primary tool for detailed runner logs.
sudo journalctl -u gitlab-runner.service -f --no-hostname
Look for specific error messages (e.g., 403 Forbidden, x509: certificate signed by unknown authority, dial tcp: connect: no route to host). These will guide your next steps.
[!TIP] Use
grepwithjournalctlfor specific keywords, e.g.,journalctl -u gitlab-runner.service | grep 'ERROR'
#### 3. Address TLS/SSL Certificate Issues (if applicable)
If logs show x509: certificate signed by unknown authority, the runner host doesn’t trust your GitLab instance’s SSL certificate.
For self-signed or private CA certificates:
- Obtain the certificate:
# Replace with your GitLab instance echo -n | openssl s_client -showcerts -connect gitlab.yourdomain.com:443 2>/dev/null | sed -n '/-----BEGIN CERTIFICATE-----/,/-----END CERTIFICATE-----/p' > /tmp/gitlab.crt - Copy the certificate to the trusted certificates directory:
sudo cp /tmp/gitlab.crt /usr/local/share/ca-certificates/gitlab.crt - Update the CA certificate store:
sudo update-ca-certificates - Restart the GitLab Runner service:
Verify logs again to confirm thesudo systemctl restart gitlab-runner.servicex509error is gone.
[!IMPORTANT] Ensure your GitLab instance’s full certificate chain is provided if it’s not directly signed by a public trusted CA. The
openssl s_clientcommand usually grabs the leaf certificate, so you might need to manually append intermediate CAs.
#### 4. Deregister and Re-register the GitLab Runner
This is often the most robust solution for registration token and credential-related issues, as it ensures a clean slate.
-
Stop the GitLab Runner service:
sudo systemctl stop gitlab-runner.service -
Identify the problematic runner ID: If you have multiple runners configured, check
/etc/gitlab-runner/config.toml. Each[[runners]]block corresponds to a runner. Note thetokenandurl. Alternatively, on the GitLab UI, navigate toAdmin Area > CI/CD > RunnersorProject > Settings > CI/CD > Runners. -
Deregister the runner from GitLab (UI): Go to your GitLab instance in the web browser.
- Instance-level runners:
Admin Area > CI/CD > Runners - Group-level runners:
Group > Settings > CI/CD > Runners - Project-level runners:
Project > Settings > CI/CD > RunnersFind the runner associated with your host (check the description or IP address) and click the “Remove” button. This invalidates its token.
- Instance-level runners:
-
Deregister the runner locally (optional but recommended): If the
gitlab-runner verifycommand was failing,gitlab-runner unregistermight not work. However, if it was somewhat operational or you just want to be thorough:# List registered runners sudo gitlab-runner list # Unregister a specific runner by its token (from config.toml) sudo gitlab-runner unregister --url <gitlab_url> --token <runner_token_from_config.toml> # Or by its name (if you named it during registration) sudo gitlab-runner unregister --name "my-awesome-runner"If
unregisterfails or you have multiple runners inconfig.tomland you’re unsure, you can manually edit/etc/gitlab-runner/config.tomland remove the[[runners]]block corresponding to the problematic runner. -
Generate a NEW Registration Token:
- Go to your GitLab instance in the web browser.
- Navigate to the correct scope for your runner (Instance, Group, or Project).
- Follow the on-screen instructions to get a new registration token. Ensure you select the correct tags and access permissions for the runner.
-
Register the GitLab Runner with the new token:
sudo gitlab-runner register \ --url "https://gitlab.yourdomain.com/" \ --token "YOUR_NEW_REGISTRATION_TOKEN" \ --executor "shell" \ --description "My CI Runner on ci-host-01" \ --tag-list "shell,linux,ubuntu" \ --run-untagged="true" \ --locked="false"[!NOTE] Adjust
--executor(e.g.,docker,kubernetes),--description,--tag-list,--run-untagged, and--lockedas per your requirements. For Docker executor, you’d add--docker-image "ubuntu:latest". -
Start the GitLab Runner service:
sudo systemctl start gitlab-runner.service -
Verify the runner’s status:
sudo gitlab-runner verify sudo journalctl -u gitlab-runner.service -fYou should now see
Verifying runner... is aliveandChecking for jobs... doneor similar success messages in the logs.
#### 5. Check GitLab Instance Permissions
Ensure the user account or group responsible for managing runners has at least the “Maintainer” role for project/group runners, or “Admin” privileges for instance runners. Incorrect permissions will prevent registration or job pickup, resulting in a 403 Forbidden error.
#### 6. Configure Proxy Settings (if required)
If your runner host is behind a corporate proxy, you must configure the runner to use it.
-
For the
gitlab-runnerservice: Edit thesystemdservice file to include proxy environment variables.sudo systemctl edit gitlab-runner.serviceAdd the following, adjusting proxy details:
[Service] Environment="http_proxy=http://proxy.yourcorp.com:8080" Environment="https_proxy=http://proxy.yourcorp.com:8080" Environment="no_proxy=localhost,127.0.0.1,.yourcorp.com,gitlab.yourdomain.com"Save and exit. Then reload systemd daemon and restart the runner:
sudo systemctl daemon-reload sudo systemctl restart gitlab-runner.service -
For Docker executor (if used): If your runner uses the Docker executor and Docker itself needs to pull images through a proxy, you might need to configure Docker’s daemon.json. Create or edit
/etc/docker/daemon.json:{ "proxies": { "http-proxy": "http://proxy.yourcorp.com:8080", "https-proxy": "http://proxy.yourcorp.com:8080", "no-proxy": "localhost,127.0.0.1,.yourcorp.com,gitlab.yourdomain.com" } }Then restart Docker:
sudo systemctl restart docker
#### 7. Synchronize System Time
A significant time skew can cause issues with SSL handshakes and token validation.
sudo timedatectl set-ntp true
sudo systemctl restart systemd-timesyncd # or ntp/chrony
Verify the time is correct: date
After performing these steps, your GitLab CI runner should successfully connect to your GitLab instance and pick up pending jobs. Always monitor journalctl -u gitlab-runner.service -f after making changes.
