Troubleshooting Docker Container Exited with Code 137: OOM Killed Error

Diagnose and resolve Docker containers exiting with code 137 due to Out-Of-Memory (OOM) killer terminations. Optimize resource allocation and prevent service downtime.

When a Docker container unexpectedly stops with an Exited (137) status, especially when combined with messages indicating an “OOM killed” event, it signifies a critical resource allocation failure. This guide delves into the technical intricacies of why this happens and provides a systematic, expert-level approach to diagnose, resolve, and prevent such occurrences in your Dockerized environments. Understanding and mitigating Out-Of-Memory (OOM) kills is crucial for maintaining stable and performant containerized applications.

Symptom & Error Signature

The primary symptom is an application or service running within a Docker container becoming unresponsive and then stopping abruptly. When inspecting the container’s status or logs, you will typically encounter the following signatures:

1. docker ps -a Output:

root@server:~# docker ps -a
CONTAINER ID   IMAGE                 COMMAND                  CREATED         STATUS                         PORTS     NAMES
a1b2c3d4e5f6   my-app:latest         "node server.js"         5 minutes ago   Exited (137) 3 minutes ago               my-app-container

The Exited (137) status is a strong indicator. Code 137 specifically means the container received a SIGKILL (signal 9), which is typically issued by the Linux kernel’s Out-Of-Memory (OOM) killer.

2. docker logs Output:

Reviewing the container’s logs might show an abrupt halt, or sometimes the last few entries before the termination, but rarely a graceful shutdown message related to memory:

root@server:~# docker logs a1b2c3d4e5f6
[INFO] Application started on port 3000
[INFO] Processing request /api/data
[WARN] High memory usage detected...
<--- The logs abruptly stop here without any further application output or error handling --->

3. Kernel dmesg Output:

The most definitive evidence of an OOM kill comes from the kernel’s message buffer. This is where the OOM killer announces its actions:

root@server:~# dmesg -T | grep -i 'oom\|killed process' | tail -n 5
[Fri May 17 08:30:15 2026] node invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
[Fri May 17 08:30:15 2026] CPU: 3 PID: 12345 Comm: node Not tainted 5.15.0-89-generic #99-Ubuntu
[Fri May 17 08:30:15 2026] Mem-Info:
[Fri May 17 08:30:16 2026] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_cgroup=docker/a1b2c3d4e5f6...,task_memcg=/docker/a1b2c3d4e5f6.../...,pgsk_memcg=/docker/a1b2c3d4e5f6.../...
[Fri May 17 08:30:16 2026] Killed process 12345 (node) total-vm:4194304kB, anon-rss:2097152kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:8192kB oom_score_adj:0

The oom-killer invocation and the line Killed process <PID> (<process_name>) confirm that the kernel terminated the process due to memory exhaustion. The mems_cgroup line is particularly useful as it often points directly to the Docker container’s cgroup.

Root Cause Analysis

The “Docker container exited with code 137 OOM killed” error is a direct consequence of the Linux kernel’s Out-Of-Memory (OOM) killer terminating a process (or an entire cgroup, like a Docker container) because the system or the cgroup has exhausted its available memory. Understanding the underlying mechanisms is key:

Insufficient Host System Memory: The most straightforward cause. The entire Docker host (VM or bare metal) simply doesn’t have enough physical RAM to accommodate all running containers and host processes. When total memory is scarce, the OOM killer steps in to reclaim memory by terminating the largest or highest-scoring process.
Insufficient Docker Container Memory Limits: Docker uses Linux control groups (cgroups) to enforce resource limits on containers. If a specific container is configured with a memory limit (e.g., --memory flag in docker run or memory in docker-compose.yml) that is lower than its actual memory requirements, the OOM killer will terminate that container once it hits its cgroup-defined memory ceiling. This can happen even if the host machine has plenty of free RAM. The kernel prioritizes killing processes within the cgroup that exceeded its limits.
Application Memory Leak or Inefficient Usage: The application running inside the container might have a memory leak, where it continuously consumes more and more RAM over time without releasing it. Alternatively, the application might be poorly optimized, using excessive memory for specific tasks, processing large datasets, or loading entire libraries into memory unnecessarily. A sudden spike in legitimate memory usage (e.g., handling a large number of concurrent requests, processing a massive file upload) can also trigger the OOM killer if the allocated memory limits are too tight.
Swap Space Depletion (or lack thereof): While not directly causing an OOM kill, insufficient or non-existent swap space can exacerbate memory pressure. When physical RAM is exhausted, the kernel typically moves less-used pages to swap. If swap is also full or unavailable, the OOM killer is invoked sooner and more aggressively.
Cgroup V1 vs V2 Behavior: Modern Linux kernels (especially with Ubuntu 22.04 and newer) often use cgroup v2. While the core concept of memory limits and OOM killing remains, the exact reporting and interaction might differ slightly from older v1 systems. Docker transparently handles this, but it’s a detail to be aware of for advanced debugging.

Step-by-Step Resolution

Resolving OOM kills requires a methodical approach, starting with identifying the immediate cause and then moving towards long-term optimizations.

1. Confirm the OOM Event and Identify the Culprit

First, re-verify that an OOM event indeed occurred and pinpoint which process was killed.

# Check the kernel message buffer for OOM events
dmesg -T | grep -i 'oom\|killed process' | less

Examine the output for lines like oom-kill:constraint=CONSTRAINT_MEMCG or Killed process <PID> (<process_name>). The mems_cgroup entry often directly names the Docker container’s cgroup, confirming it was the target. Note the total-vm and anon-rss values to understand how much memory the process was attempting to use.

2. Monitor Container and Host Memory Usage

Before making changes, gather data on current memory consumption.

a. Real-time Docker Container Stats:

Use docker stats to see live memory usage, limits, and percentage.

docker stats --no-stream my-app-container

CONTAINER ID   NAME               CPU %     MEM USAGE / LIMIT     MEM %     NET I/O     BLOCK I/O   PIDS
a1b2c3d4e5f6   my-app-container   0.50%     1.897GiB / 2GiB       94.85%    1.5MB / 0B  0B / 0B     12

This output immediately shows if the container is consistently hitting its defined limit.

b. Host System Memory:

Check the overall memory usage of the Docker host.

free -h

               total        used        free      shared  buff/cache   available
Mem:            7.8Gi       6.5Gi       235Mi       1.2Gi       1.1Gi       100Mi
Swap:           2.0Gi       1.5Gi       500Mi

If available memory is consistently low, the host itself is under memory pressure.

c. Advanced Monitoring (e.g., Prometheus/Grafana, cAdvisor):

For long-term trends and historical data, integrate monitoring tools like cAdvisor, Prometheus, and Grafana. These tools provide invaluable insights into memory usage patterns over time, helping you identify peak loads or slow memory leaks.

3. Analyze Application Memory Footprint (Inside the Container)

If the container runs long enough, you can inspect its internal memory usage.

# Execute a shell inside the running container (if it's not immediately OOM killed)
docker exec -it my-app-container /bin/bash

# Inside the container:
apt update && apt install -y procps # Install ps and free if not present
ps aux --sort -rss | head -n 5     # Top processes by Resident Set Size (RSS)
free -h                            # Container's view of memory (constrained by cgroups)

For language-specific applications:

Node.js: Use process.memoryUsage() or tools like heapdump.
Python: memory_profiler or objgraph.
Java: jstat, jmap (requires JDK inside container or attach from host if compatible).
PHP: memory_get_usage().

4. Increase Docker Container Memory Limits

This is often the quickest fix if the application legitimately needs more memory than allocated.

a. For docker run:

> [!IMPORTANT]
> Always specify `memory-swap` along with `memory` to avoid the container consuming all host swap space by default. If `memory-swap` is omitted, Docker sets it to `memory * 2`, meaning the container can use `memory` plus an equal amount of swap. For most applications, setting `memory-swap` equal to `memory` (i.e., `memory-swap=0` for no swap, or `memory-swap=1.5G` for a 1.5GB limit) is a safer practice to prevent excessive swap usage.

# Stop and remove the old container
docker stop my-app-container && docker rm my-app-container

# Run with increased memory (e.g., 2GB physical RAM, no swap)
docker run -d --name my-app-container --memory="2g" --memory-swap="2g" my-app:latest

b. For docker-compose.yml:

version: '3.8'
services:
  my-app:
    image: my-app:latest
    ports:
      - "80:3000"
    deploy:
      resources:
        limits:
          memory: 2G # Hard limit for the container
        reservations:
          memory: 1G # Guaranteed memory for the container
    restart: always

limits.memory: The maximum amount of memory the container can use. If it exceeds this, the OOM killer will step in.
reservations.memory: The amount of memory reserved for the container. This helps with scheduling and ensures the container gets at least this much.

5. Increase Host System Memory or Swap Space

If all containers combined are pressuring the host, you need more resources.

a. Upgrade Physical RAM: The most effective long-term solution. Increase the RAM of your VM or physical server.

b. Increase Swap Space (Use with Caution): While not a substitute for RAM, swap can prevent hard OOM kills by providing an overflow mechanism. However, relying heavily on swap will degrade performance significantly.

> [!WARNING]
> Excessive swap usage can severely impact system performance and disk I/O. This is a temporary measure or for systems with occasional, non-performance-critical memory spikes.

# Check current swap status
sudo swapon --show
sudo free -h

# Create a new swap file (e.g., 4GB)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Make swap persistent across reboots by adding to /etc/fstab
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

# Adjust swappiness (optional, lower value means kernel tries to keep more data in RAM)
# Current swappiness (default is 60 on Ubuntu)
cat /proc/sys/vm/swappiness
# Set swappiness to 10 (less aggressive swapping)
sudo sysctl vm.swappiness=10
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf

6. Optimize Application Memory Usage

This is a fundamental solution for long-term stability and efficiency.

a. Identify and Fix Memory Leaks:

Use language-specific profiling tools (as mentioned in Step 3) to identify objects that are not being garbage collected or released.
Review application code, especially in loops, data structures, and resource handling (file descriptors, database connections).

b. Efficient Data Handling:

Process large files or datasets in chunks instead of loading them entirely into memory.
Use streaming APIs where possible.
Optimize database queries to fetch only necessary data.

c. Configure Application-Specific Memory Settings:

Java: Adjust JVM heap size (-Xmx, -Xms) within the container.
PHP-FPM: Configure php_memory_limit in php.ini and worker memory limits in FPM pool configurations.
Node.js: Node.js has a default memory limit (often around 1.5-2GB for 64-bit systems) which can be increased using --max-old-space-size.

# Example for Node.js in Dockerfile or startup script
CMD ["node", "--max-old-space-size=3072", "server.js"] # Allows Node.js to use up to 3GB

7. Adjust OOM Score (Advanced, Use with Extreme Caution)

The OOM killer uses an oom_score to decide which process to kill. Lower scores mean lower likelihood of being killed. You can adjust this for containers.

> [!WARNING]
> Modifying `oom_score_adj` can have severe consequences. A low score might protect your application container, but it could lead the OOM killer to terminate critical system processes or other essential services, potentially destabilizing or crashing the entire host. Use only if you fully understand the implications and have exhausted other options.

# For docker run
docker run -d --name my-app-container --memory="2g" --memory-swap="2g" --oom-score-adj=500 my-app:latest

# For docker-compose.yml
version: '3.8'
services:
  my-app:
    image: my-app:latest
    oom_score_adj: 500 # Range from -1000 (least likely) to 1000 (most likely)

A value of 0 is default. Negative values make it less likely to be killed, positive values make it more likely. Setting it to -1000 essentially exempts it from OOM killing (unless it’s the only process left).

8. Review Docker Daemon Configuration

Ensure the Docker daemon itself isn’t operating under resource constraints, although this is less common for OOM kills within containers.

systemctl status docker

Check /etc/docker/daemon.json for any global default limits or experimental features that might impact memory management. For example, default-ulimits might affect file descriptor limits, indirectly impacting memory use.

By systematically applying these steps, you can effectively diagnose, resolve, and prevent Docker containers from exiting with the dreaded code 137 OOM killed error, ensuring the stability and reliability of your containerized services.