Troubleshooting GitLab CI Runner Jobs Stuck in Pending: Registration Token & Credential Issues

Resolve GitLab CI/CD jobs stuck pending due to registration token errors, network connectivity, or configuration problems. A step-by-step guide for SysAdmins.


Introduction

As a Systems Administrator or DevOps engineer, you’ve likely encountered the frustrating scenario where your GitLab CI/CD jobs remain indefinitely in a “pending” state. This often indicates a communication breakdown between your GitLab instance and one or more configured GitLab CI runners. While several factors can cause this, a prevalent and often perplexing root cause revolves around registration token credentials, network connectivity, or misconfiguration of the runner itself. This guide provides a highly technical, step-by-step approach to diagnose and resolve such issues, ensuring your CI/CD pipelines run smoothly.

### Symptom & Error Signature

The primary symptom is that jobs within your GitLab project’s CI/CD pipelines (e.g., Jobs tab or pipeline view) will show a status of pending indefinitely, despite having runners available that should be picking up jobs.

When investigating the runner host, you might observe the following:

1. gitlab-runner verify Output:

Running a verification command on the runner host frequently reveals the core issue:

sudo gitlab-runner verify

Expected output for a problematic runner:

Verifying runner... is invalid                         runner=<runner_id>
FATAL: Runner not found or you don't have permission.

Or, if the runner isn’t even aware of its registration:

Verifying runner... is not alive                       runner=<runner_id>

2. GitLab Runner Service Logs:

Logs from the gitlab-runner service often provide more detailed insights. Check them using journalctl:

journalctl -u gitlab-runner.service -f

Typical error signatures in the logs might include:

May 15 10:30:45 ci-runner01 gitlab-runner[1234]: ERROR: Failed to get new build while checking runner: status 403 Forbidden  runner=<runner_id>
May 15 10:30:45 ci-runner01 gitlab-runner[1234]: ERROR: Failed to get new build while checking runner: Get "<gitlab_url>/api/v4/jobs/request": x509: certificate signed by unknown authority
May 15 10:30:45 ci-runner01 gitlab-runner[1234]: ERROR: Failed to get new build while checking runner: Post "<gitlab_url>/api/v4/jobs/request": dial tcp <gitlab_ip>:443: connect: no route to host
May 15 10:30:45 ci-runner01 gitlab-runner[1234]: WARNING: Checking for jobs... failed                runner=<runner_id> status=403 Forbidden

These errors indicate issues ranging from authentication problems (403 Forbidden), TLS/SSL certificate validation failures, or fundamental network connectivity blocks.

### Root Cause Analysis

The “jobs pending stuck” scenario, particularly when related to registration tokens, typically stems from one of several core issues:

  1. Invalid or Expired Registration Token: The most common culprit. The token used to register the runner might be incorrect, have expired (GitLab allows instance-level tokens to be rotated), or been generated for a different scope (e.g., trying to use a project token for an instance-level runner, or vice-versa, though typically gitlab-runner register makes this clear). If a runner is removed from the GitLab UI without being deregistered locally, its token becomes invalid.
  2. Network Connectivity Issues: The runner host cannot reach the GitLab instance. This can be due to:
    • Firewall Rules: Host-based firewalls (UFW, firewalld), cloud security groups, or network firewalls blocking TCP port 443 (or 80 if HTTP).
    • DNS Resolution: Incorrect DNS configuration on the runner host preventing resolution of the GitLab instance URL.
    • Proxy Server Configuration: If the runner needs to communicate through a proxy, it’s misconfigured or not set up.
    • Incorrect GitLab URL: The url specified in the config.toml file or during registration is wrong.
  3. TLS/SSL Certificate Issues: The runner cannot validate the GitLab instance’s SSL certificate. This often happens with self-signed certificates or certificates issued by a private Certificate Authority (CA) that isn’t trusted by the runner’s host OS.
  4. GitLab Runner Configuration Mismatch: The config.toml file located at /etc/gitlab-runner/config.toml (default location) contains incorrect url, token, or other crucial settings for a specific runner entry.
  5. GitLab Instance Permissions: The user or group that generated the registration token lacks sufficient permissions (e.g., Maintainer or Owner role) to register a runner within the specified project, group, or instance.
  6. Runner Deregistered from GitLab UI but Not Locally: If a runner is deleted from the GitLab web interface, its token becomes invalid. The runner on the host machine will continue attempting to connect with an invalid token.
  7. System Time Skew: A significant time difference between the runner host and the GitLab instance can lead to issues with SSL/TLS certificate validation and API communication.

### Step-by-Step Resolution

Follow these steps in order to methodically troubleshoot and resolve the pending job issue.

#### 1. Verify Network Connectivity to GitLab

Ensure your runner host can reach the GitLab instance.

# Test DNS resolution
ping gitlab.yourdomain.com

# Test HTTPS connectivity (replace with your GitLab URL)
curl -v https://gitlab.yourdomain.com/api/v4/version

Expected curl output: A 200 OK status code and JSON output detailing the GitLab version. If you see errors like Could not resolve host, Connection refused, No route to host, or SSL errors, investigate:

  • DNS: Check /etc/resolv.conf and ensure correct nameservers.
  • Firewall:
    • Local Host: sudo ufw status (Ubuntu/Debian) or sudo firewall-cmd --state (CentOS/RHEL) to ensure port 443 outbound is allowed.
    • Network/Cloud: Check network ACLs, security groups (AWS, Azure, GCP), or corporate firewalls.
  • Proxy: If your network requires a proxy, ensure environment variables (http_proxy, https_proxy, no_proxy) are set for the gitlab-runner service.

#### 2. Inspect GitLab Runner Service Logs

The journalctl utility is your primary tool for detailed runner logs.

sudo journalctl -u gitlab-runner.service -f --no-hostname

Look for specific error messages (e.g., 403 Forbidden, x509: certificate signed by unknown authority, dial tcp: connect: no route to host). These will guide your next steps.

[!TIP] Use grep with journalctl for specific keywords, e.g., journalctl -u gitlab-runner.service | grep 'ERROR'

#### 3. Address TLS/SSL Certificate Issues (if applicable)

If logs show x509: certificate signed by unknown authority, the runner host doesn’t trust your GitLab instance’s SSL certificate.

For self-signed or private CA certificates:

  1. Obtain the certificate:
    # Replace with your GitLab instance
    echo -n | openssl s_client -showcerts -connect gitlab.yourdomain.com:443 2>/dev/null | sed -n '/-----BEGIN CERTIFICATE-----/,/-----END CERTIFICATE-----/p' > /tmp/gitlab.crt
  2. Copy the certificate to the trusted certificates directory:
    sudo cp /tmp/gitlab.crt /usr/local/share/ca-certificates/gitlab.crt
  3. Update the CA certificate store:
    sudo update-ca-certificates
  4. Restart the GitLab Runner service:
    sudo systemctl restart gitlab-runner.service
    Verify logs again to confirm the x509 error is gone.

[!IMPORTANT] Ensure your GitLab instance’s full certificate chain is provided if it’s not directly signed by a public trusted CA. The openssl s_client command usually grabs the leaf certificate, so you might need to manually append intermediate CAs.

#### 4. Deregister and Re-register the GitLab Runner

This is often the most robust solution for registration token and credential-related issues, as it ensures a clean slate.

  1. Stop the GitLab Runner service:

    sudo systemctl stop gitlab-runner.service
  2. Identify the problematic runner ID: If you have multiple runners configured, check /etc/gitlab-runner/config.toml. Each [[runners]] block corresponds to a runner. Note the token and url. Alternatively, on the GitLab UI, navigate to Admin Area > CI/CD > Runners or Project > Settings > CI/CD > Runners.

  3. Deregister the runner from GitLab (UI): Go to your GitLab instance in the web browser.

    • Instance-level runners: Admin Area > CI/CD > Runners
    • Group-level runners: Group > Settings > CI/CD > Runners
    • Project-level runners: Project > Settings > CI/CD > Runners Find the runner associated with your host (check the description or IP address) and click the “Remove” button. This invalidates its token.
  4. Deregister the runner locally (optional but recommended): If the gitlab-runner verify command was failing, gitlab-runner unregister might not work. However, if it was somewhat operational or you just want to be thorough:

    # List registered runners
    sudo gitlab-runner list
    
    # Unregister a specific runner by its token (from config.toml)
    sudo gitlab-runner unregister --url <gitlab_url> --token <runner_token_from_config.toml>
    
    # Or by its name (if you named it during registration)
    sudo gitlab-runner unregister --name "my-awesome-runner"

    If unregister fails or you have multiple runners in config.toml and you’re unsure, you can manually edit /etc/gitlab-runner/config.toml and remove the [[runners]] block corresponding to the problematic runner.

  5. Generate a NEW Registration Token:

    • Go to your GitLab instance in the web browser.
    • Navigate to the correct scope for your runner (Instance, Group, or Project).
    • Follow the on-screen instructions to get a new registration token. Ensure you select the correct tags and access permissions for the runner.
  6. Register the GitLab Runner with the new token:

    sudo gitlab-runner register \
      --url "https://gitlab.yourdomain.com/" \
      --token "YOUR_NEW_REGISTRATION_TOKEN" \
      --executor "shell" \
      --description "My CI Runner on ci-host-01" \
      --tag-list "shell,linux,ubuntu" \
      --run-untagged="true" \
      --locked="false"

    [!NOTE] Adjust --executor (e.g., docker, kubernetes), --description, --tag-list, --run-untagged, and --locked as per your requirements. For Docker executor, you’d add --docker-image "ubuntu:latest".

  7. Start the GitLab Runner service:

    sudo systemctl start gitlab-runner.service
  8. Verify the runner’s status:

    sudo gitlab-runner verify
    sudo journalctl -u gitlab-runner.service -f

    You should now see Verifying runner... is alive and Checking for jobs... done or similar success messages in the logs.

#### 5. Check GitLab Instance Permissions

Ensure the user account or group responsible for managing runners has at least the “Maintainer” role for project/group runners, or “Admin” privileges for instance runners. Incorrect permissions will prevent registration or job pickup, resulting in a 403 Forbidden error.

#### 6. Configure Proxy Settings (if required)

If your runner host is behind a corporate proxy, you must configure the runner to use it.

  1. For the gitlab-runner service: Edit the systemd service file to include proxy environment variables.

    sudo systemctl edit gitlab-runner.service

    Add the following, adjusting proxy details:

    [Service]
    Environment="http_proxy=http://proxy.yourcorp.com:8080"
    Environment="https_proxy=http://proxy.yourcorp.com:8080"
    Environment="no_proxy=localhost,127.0.0.1,.yourcorp.com,gitlab.yourdomain.com"

    Save and exit. Then reload systemd daemon and restart the runner:

    sudo systemctl daemon-reload
    sudo systemctl restart gitlab-runner.service
  2. For Docker executor (if used): If your runner uses the Docker executor and Docker itself needs to pull images through a proxy, you might need to configure Docker’s daemon.json. Create or edit /etc/docker/daemon.json:

    {
      "proxies": {
        "http-proxy": "http://proxy.yourcorp.com:8080",
        "https-proxy": "http://proxy.yourcorp.com:8080",
        "no-proxy": "localhost,127.0.0.1,.yourcorp.com,gitlab.yourdomain.com"
      }
    }

    Then restart Docker:

    sudo systemctl restart docker

#### 7. Synchronize System Time

A significant time skew can cause issues with SSL handshakes and token validation.

sudo timedatectl set-ntp true
sudo systemctl restart systemd-timesyncd # or ntp/chrony

Verify the time is correct: date

After performing these steps, your GitLab CI runner should successfully connect to your GitLab instance and pick up pending jobs. Always monitor journalctl -u gitlab-runner.service -f after making changes.