Resolving High CPU & Disk I/O: systemd-journald and Log Rotation Issues on Linux
Address systemd-journald consuming excessive CPU and disk I/O due to aggressive logging or misconfigured log rotation, improving system performance and stability.
When managing production Linux servers, a common performance bottleneck can arise from excessive logging, particularly when systemd-journald begins consuming an inordinate amount of CPU cycles or disk I/O. This situation often manifests as system slowdowns, high load averages, and unresponsive applications, directly impacting user experience and service availability. This guide provides a comprehensive, step-by-step approach to diagnose and resolve such issues.
Symptom & Error Signature
The primary symptom is a noticeable degradation in system performance, often accompanied by alerts from monitoring systems about high CPU utilization or disk I/O wait times. When inspecting the system, you’ll typically observe the systemd-journald process consuming significant resources.
Typical top/htop output showing high CPU:
Tasks: 201 total, 1 running, 200 sleeping, 0 stopped, 0 zombie
%Cpu(s): 15.3 us, 5.7 sy, 0.0 ni, 78.4 id, 0.6 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7969.5 total, 3489.6 free, 1234.4 used, 3245.5 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 6091.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
354 systemd-+ 20 0 1373516 116840 24760 S 100.0 1.4 13:45.12 systemd-journal
1123 root 20 0 268388 18376 13304 S 0.7 0.2 0:15.21 sshd
1456 www-data 20 0 468968 24508 9408 S 0.3 0.3 0:02.11 nginx
Typical iotop output showing high disk write activity:
Total DISK READ: 0.00 B/s | Total DISK WRITE: 203.49 M/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 203.49 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
354 be/4 systemd-+ 0.00 B/s 203.49 M/s 0.00 % 99.99 % systemd-journald
987 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u16:0-events]
1001 be/4 www-data 0.00 B/s 0.00 B/s 0.00 % 0.00 % nginx: worker process
You might also observe journalctl --disk-usage reporting an unexpectedly large journal size:
journalctl --disk-usage
Journals take up 3.7G on disk.
Root Cause Analysis
The root causes for systemd-journald consuming excessive resources are typically multifaceted, often stemming from a combination of aggressive logging, misconfiguration, or underlying system issues.
- Excessive Log Verbosity:
- Application Misconfiguration: Debug-level logging enabled in production for applications (e.g., Nginx, Apache, PHP-FPM, Docker containers, custom applications), resulting in millions of log entries per second.
- Rapid Error Loops: An application or service crashing and restarting repeatedly, flooding the journal with error messages and startup sequences.
- Security Events: Brute-force attacks or misconfigured security tools generating a high volume of authentication failures or firewall rejections.
- Misconfigured
systemd-journaldLimits: The default settings forjournaldmight not be aggressive enough for high-traffic servers or those with limited disk space, leading to unbounded growth of journal files or inefficient rotation. - Inefficient Log Rotation: While
systemd-journaldmanages its own log rotation, if other services are writing directly to/var/log(outside ofsystemd’s direct control) andlogrotateis misconfigured or failing, this can exacerbate disk I/O issues or disk full conditions, indirectly affectingjournald’s ability to prune. - Disk Full Conditions: If the
/var/logpartition (or root partition) is full,journaldmay struggle to write new entries or prune old ones efficiently, leading to resource contention. - Kernel/Systemd Bugs: While less common, specific versions of
systemdor the Linux kernel can occasionally exhibit performance regressions or bugs related tojournald’s I/O handling, especially under heavy load. - Underlying Disk I/O Performance: A slow or failing disk subsystem can make
journald’s legitimate logging activity appear disproportionately resource-intensive.
Step-by-Step Resolution
Addressing this issue requires a systematic approach, starting with identifying the source of logging and then configuring journald and other applications appropriately.
1. Identify the Source of Excessive Logging
The first step is to pinpoint which application or service is generating the log spam.
Monitor live log output:
journalctl -f
This command streams new log entries. Look for recurring patterns, specific service names, or IP addresses that appear frequently.
Analyze recent errors/warnings:
journalctl -p err -p warning -b
This shows error and warning messages from the current boot. High counts of these might indicate a looping failure.
Check disk usage by journal files:
sudo du -sh /var/log/journal/
This confirms the current size of the journal files.
Identify top journal contributors (advanced):
To see which units are logging the most, you can use a combination of journalctl and text processing. This is not directly available but can be approximated.
First, get a count of entries per unit for a recent period:
journalctl --since "1 hour ago" | grep _SYSTEMD_UNIT= | cut -d'=' -f2 | sort | uniq -c | sort -nr | head -n 10
This command helps you identify which systemd units (services) are generating the most log entries in the last hour.
2. Configure systemd-journald Log Retention Policies
The journald configuration controls how much disk space the journal uses and how long entries are kept.
Edit the journald configuration file:
sudo vim /etc/systemd/journald.conf
Uncomment and set appropriate values for the following parameters. Here are recommended settings for a typical production server:
[Journal]
# Ensure persistent storage. This is usually default, but confirm.
Storage=persistent
# Maximum size of all journal files on disk.
SystemMaxUse=1G
# Keep at least this much free disk space.
SystemKeepFree=15%
# Maximum individual journal file size.
SystemMaxFileSize=100M
# Max size of journal files in /run/log/journal (volatile, for boot).
RuntimeMaxUse=100M
# Retain journal entries for a maximum of 30 days.
MaxRetentionSec=30day
[!IMPORTANT] The
SystemMaxUseandSystemKeepFreedirectives are crucial.SystemMaxUsesets an absolute maximum for the total size of all journal files.SystemKeepFreeensures that a certain percentage of disk space remains free, taking precedence ifSystemMaxUsewould cause less free space than specified. AdjustSystemMaxUsebased on your disk capacity and logging needs (e.g., 1G-5G is common).
Apply changes by restarting systemd-journald:
sudo systemctl restart systemd-journald
Manually purge old journal entries (optional but recommended initially):
# Trim journal files to a maximum of 1GB
sudo journalctl --vacuum-size=1G
# Trim journal files older than 7 days
sudo journalctl --vacuum-time=7d
3. Adjust Application Logging Verbosity
Once you’ve identified the source, adjust the logging level for those applications.
For Nginx:
If Nginx access logs are flooding your system, consider disabling them for specific static assets or reducing their verbosity.
To disable access logs for specific locations (e.g., static files):
# /etc/nginx/nginx.conf or a site-specific conf file
server {
listen 80;
server_name example.com;
access_log /var/log/nginx/example.com_access.log; # Main access log
location ~* \.(jpg|jpeg|gif|png|ico|css|js)$ {
access_log off; # Disable access logs for static assets
expires 30d;
add_header Cache-Control "public";
root /var/www/example.com/html;
}
# ... other configurations
}
To reduce Nginx error log verbosity:
# /etc/nginx/nginx.conf
error_log /var/log/nginx/error.log warn; # Change 'info' or 'notice' to 'warn' or 'error'
After modifying Nginx configuration, test and reload:
sudo nginx -t
sudo systemctl reload nginx
For Docker Containers:
Docker containers can generate a huge amount of logs, often sent to journald via the default json-file driver or syslog driver.
Configure Docker daemon-wide logging limits:
Edit /etc/docker/daemon.json (create if it doesn’t exist):
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
This sets a default log limit of 10MB per file, keeping 3 files, for a total of 30MB per container.
[!IMPORTANT] After modifying
daemon.json, you must restart the Docker daemon:sudo systemctl restart dockerThis will affect new containers. For existing containers, you’ll need to restart them or stop/remove and recreate them for the new logging options to take effect.
Configure logging for specific containers:
For individual containers, you can override the daemon defaults:
docker run -d \
--log-opt max-size=5m \
--log-opt max-file=2 \
your-image-name
For Custom Applications:
Check the configuration of your custom applications (e.g., Python, Node.js, Java) and ensure they are not logging at DEBUG or INFO level in production unless absolutely necessary. Switch to WARN or ERROR levels.
4. Verify logrotate for Non-Journald Logs
While systemd-journald manages its own logs, many applications still write directly to files in /var/log (e.g., Nginx access/error logs if configured directly, database logs, older applications). Ensure logrotate is correctly configured and running for these.
Check logrotate configuration:
ls -l /etc/logrotate.d/
Review the configuration files for your services (e.g., nginx, mysql, apache2).
Example logrotate config for Nginx (/etc/logrotate.d/nginx):
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
prerotate
if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
run-parts /etc/logrotate.d/httpd-prerotate; \
fi \
endscript
postrotate
invoke-rc.d nginx rotate >/dev/null 2>&1
endscript
}
Ensure rotate and daily/weekly directives are set to appropriate values.
Manually run logrotate in debug mode to test:
sudo logrotate -d /etc/logrotate.conf
This command will simulate a rotation without making changes, showing you what would happen.
Force logrotate to run:
sudo logrotate -f /etc/logrotate.conf
This can be useful to immediately clean up old logs if disk space is critical.
[!WARNING] Forcing
logrotatecan sometimes cause issues if a service is actively writing to a log file and not properly signaled to reopen its log file after rotation. Always ensure thepostrotatescript correctly reloads/restarts the service.
5. Check Disk Space and I/O Performance
Ensure there are no underlying disk issues or full partitions preventing efficient log management.
Check disk space:
df -h
Look for any partitions at 90% or higher, especially / or /var.
Monitor disk I/O performance:
iostat -x 1 10
This command provides detailed I/O statistics every second for 10 iterations. Look at %util (percentage of time the device was busy) and await (average time for I/O requests) for your disk device (e.g., sda, nvme0n1). High values indicate I/O bottlenecks.
vmstat 1 10
Focus on the wa (wait for I/O) column under cpu. A persistently high wa percentage indicates the CPU is spending a lot of time waiting for disk operations to complete.
If these tools indicate a struggling disk, consider upgrading your storage, optimizing database I/O, or investigating hardware faults.
6. Upgrade systemd (If Applicable)
In rare cases, a specific version of systemd might have a bug contributing to the issue. If all other steps fail and you suspect a software bug, consider upgrading systemd to the latest stable version available for your distribution.
sudo apt update
sudo apt upgrade systemd
Always review release notes for systemd updates for any breaking changes or known issues before upgrading in a production environment.
By systematically applying these troubleshooting steps, you can identify the root cause of systemd-journald’s high resource consumption and implement lasting solutions to maintain your system’s performance and stability.
