Resolving 'Read-Only File System' Errors on Linux: A Comprehensive Guide to Filesystem Corruption Checks

Facing a read-only Linux filesystem? This expert guide details root causes like disk corruption and hardware failure, offering step-by-step resolution for common mount drive errors.

When a Linux filesystem unexpectedly switches to a read-only state, it’s a critical indicator of underlying issues that demand immediate attention. This condition prevents any new data from being written or existing files from being modified, effectively halting most applications and services running on the affected partition. Users typically encounter “Read-only file system” errors when attempting common operations like creating files, installing software, or even writing logs, leading to system instability and service outages. This guide, crafted by an experienced Systems Administrator, will walk you through the diagnostic and resolution steps to address read-only filesystem errors caused by drive corruption or other critical issues.

Symptom & Error Signature

The most prominent symptom is the inability to write to the affected filesystem. You’ll encounter specific error messages when performing write operations. System logs will often provide more granular details about the kernel’s decision to remount the filesystem as read-only.

Typical Error Messages:

touch: cannot touch 'newfile.txt': Read-only file system
mv: cannot move 'oldfile.txt' to 'new_location/oldfile.txt': Read-only file system
E: Could not open lock file /var/lib/dpkg/lock-frontend - open (30: Read-only file system)

Kernel Log (dmesg) Output:

[  123.456789] EXT4-fs (sda1): Remounting filesystem read-only
[  123.456790] EXT4-fs error (device sda1): ext4_find_entry: inode #123456: comm <process>: iget: bad entry in directory #987654: rec_len is too small for name_len - offset=0, inode=0, rec_len=0, name_len=0
[  124.567890] Buffer I/O error on device sda1, logical block 1234567
[  125.678901] XFS (sdb1): Metadata corruption detected at xfs_inode_alloc_inode+0x123/0x456 [xfs], IP 0x123456789abcdef0
[  125.678902] XFS (sdb1): Unmounting Filesystem

Root Cause Analysis

A Linux kernel will typically remount a filesystem as read-only as a protective measure when it detects inconsistencies or critical I/O errors, preventing further corruption. Understanding the underlying cause is crucial for effective resolution.

Filesystem Corruption:
- Unclean Shutdowns/Power Loss: Abrupt power outages or system crashes can leave filesystem metadata in an inconsistent state. When the system reboots, the kernel detects these inconsistencies (e.g., superblock errors, corrupted inodes, orphaned blocks) and remounts the filesystem as read-only.
- Kernel Panics/Software Bugs: While less common, a kernel bug or a faulty driver interacting with the storage subsystem can lead to data corruption that triggers the read-only state.
Hardware Failure:
- Failing Disk Drive (HDD/SSD): This is a very common cause. As a disk drive begins to fail, it may produce I/O errors (Input/Output errors) when the kernel attempts to read or write data. These errors signal unreadable sectors or other critical hardware issues. The kernel interprets repeated I/O errors as a sign of imminent disk failure and remounts the filesystem read-only to prevent data loss and allow for potential recovery.
- Faulty RAID Controller/HBA: In RAID setups, a failing controller or Host Bus Adapter can cause similar I/O errors across multiple disks, leading to widespread filesystem issues.
- Loose Cables/Connectivity Issues: Intermittent physical connectivity problems (e.g., loose SATA/SAS cables, faulty backplane) can also manifest as I/O errors, triggering the read-only state.
Misconfiguration (Less Common in this context):
- While possible to explicitly mount a filesystem as read-only via fstab or mount -o ro, this typically does not involve “corruption detected” messages. If your fstab has an ro option for the affected mount point and you didn’t intend it, that’s a configuration issue rather than a corruption one.

Step-by-Step Resolution

Addressing a read-only filesystem due to corruption requires careful diagnosis and often involves unmounting the filesystem to perform integrity checks. This typically means working outside the affected operating system or in a rescue environment.

1. Identify the Affected Filesystem and Device

First, determine which specific filesystem is read-only and its underlying device.

# Check currently mounted filesystems and their options
mount | grep -i ' ro,'

# Example output:
# /dev/sda1 on / type ext4 (ro,relatime,errors=remount-ro)
# /dev/sdb1 on /data type xfs (ro,relatime,attr2,inode64,noquota)

# Check recent kernel messages for clues
dmesg -T | grep -i 'read-only\|error\|corruption\|fail'

From the mount output, you’ll see the device (e.g., /dev/sda1, /dev/sdb1) and the mount point (/, /data). This is crucial for the next steps.

2. Check Disk Health with SMART

Before attempting any filesystem repairs, it’s vital to check the health of the physical disk. A failing disk can repeatedly corrupt the filesystem even after repairs.

# Install smartmontools if not already present (Ubuntu/Debian)
sudo apt update
sudo apt install smartmontools -y

# Check SMART data for the identified disk (e.g., /dev/sda)
# Replace /dev/sda with your actual disk device (e.g., /dev/sdb, /dev/nvme0n1)
sudo smartctl -a /dev/sda | less

Look for attributes like Reallocated_Sector_Ct, Current_Pending_Sector_Ct, Offline_Uncorrectable_Ct. Non-zero or increasing values in these attributes strongly indicate a failing drive.

[!WARNING] If smartctl reports critical errors, such as a high number of reallocated or pending sectors, data backup is paramount. The disk is likely failing, and any further operations risk complete data loss. Consider replacing the drive immediately after backing up data.

3. Boot into a Rescue Environment or Single-User Mode

You cannot run filesystem checks (like fsck or xfs_repair) on a mounted filesystem, especially the root filesystem (/). You must unmount it first.

For Cloud Instances (e.g., AWS EC2, DigitalOcean, Azure): Most cloud providers offer a “Rescue Mode” or “Recovery Console” option. You’ll typically stop your instance, boot it into a special rescue OS, and then manually mount your original root volume to a temporary mount point to perform repairs. Consult your cloud provider’s documentation.

For Physical Servers/VMs:

Live CD/USB: Boot your server from a Linux Live CD/USB (e.g., Ubuntu Live Server, SystemRescueCD).
Single-User Mode (for root filesystem): If the root filesystem is read-only, you might be able to boot into single-user mode (runlevel 1) by appending init=/bin/bash or single to the kernel boot parameters in GRUB. This gives you a minimal shell where the root filesystem might be mounted read-write or can be remounted.
```
# Try to remount root read-write in single-user mode if it's currently read-only
mount -o remount,rw /
```
If successful, you can then proceed, but for critical repairs, a dedicated rescue environment is generally safer.

4. Unmount the Affected Filesystem

Once in a rescue environment (where your original disk partitions are not actively mounted by the rescue OS’s root), identify your partitions and unmount the problematic one.

# List block devices and their filesystems
sudo lsblk -f

# Example output:
# NAME   FSTYPE LABEL UUID                                 MOUNTPOINTS
# sda
# ├─sda1 ext4   ROOT  xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /mnt/data
# └─sda2 swap   SWAP  yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy [SWAP]
#
# (In rescue mode, your original root might be mounted under /mnt/data or similar)

# Unmount the affected partition (e.g., /dev/sda1)
# If it's currently mounted, use its mount point (e.g., /mnt/data)
sudo umount /dev/sda1
# OR
sudo umount /mnt/data

[!IMPORTANT] If umount fails with “target is busy”, it means processes are still using the filesystem. You might need to identify and kill those processes, or more reliably, ensure you are in a pure rescue environment where the partition is not used by the rescue OS itself. sudo lsof /path/to/mountpoint can help identify processes.

5. Run Filesystem Checks (`fsck` or `xfs_repair`)

Now, run the appropriate filesystem check tool based on your filesystem type.

A. For ext2/ext3/ext4 Filesystems:

# Check the filesystem type first if unsure
sudo blkid /dev/sda1
# Example: /dev/sda1: UUID="xxxx" TYPE="ext4"

# Run fsck on the device (e.g., /dev/sda1)
# -y: automatically answer yes to prompts (use with caution, can lose data)
# -f: force checking even if the filesystem appears clean
sudo fsck -y -f /dev/sda1

fsck will attempt to fix any detected inconsistencies. It will report its actions. If it finds many errors or fails repeatedly, it indicates severe corruption or a failing drive.

[!WARNING] fsck can sometimes lead to data loss if it encounters irrecoverable corruption and attempts to repair by discarding corrupted fragments. Always have backups before running fsck on critical data.

B. For XFS Filesystems:

# Check the filesystem type
sudo blkid /dev/sdb1
# Example: /dev/sdb1: UUID="xxxx" TYPE="xfs"

# Run xfs_repair on the device (e.g., /dev/sdb1)
# -L: force log zeroing (use only if repair fails without it, can lose recent data)
sudo xfs_repair /dev/sdb1

xfs_repair is designed for XFS. It first performs checks, and if it finds issues, it will attempt to repair them. If it fails, the -L option (zeroing the log) can sometimes fix issues, but it will discard any uncommitted transactions, potentially leading to loss of very recent data.

6. Re-mount and Verify

After the filesystem check completes, attempt to re-mount the partition and verify write access.

# Re-mount the partition
# If it's your root filesystem from rescue mode, you might want to reboot instead.
# For other partitions:
sudo mount /dev/sda1 /mnt/data

# Test write access
sudo touch /mnt/data/test_file.txt
sudo echo "This is a test." | sudo tee /mnt/data/test_content.txt
sudo rm /mnt/data/test_file.txt /mnt/data/test_content.txt

# Check dmesg for any new errors after re-mounting and testing
dmesg -T | tail -n 20

If you can successfully create and delete files, the repair was likely successful. If the filesystem immediately remounts read-only again, or if you still encounter errors, the corruption is severe, or the underlying hardware is failing.

7. Update `/etc/fstab` (If Necessary)

If the read-only state was not due to corruption but rather an accidental ro option in /etc/fstab, or if you replaced a disk and need to update the UUID:

# In rescue mode, mount your root partition (e.g., /dev/sda1) to a temporary location
sudo mount /dev/sda1 /mnt

# Edit the fstab file using a text editor
sudo nano /mnt/etc/fstab

Ensure the correct defaults or rw option is present, and errors=remount-ro is a good default behavior to prevent further data corruption in case of future issues. Verify UUIDs are correct using sudo blkid.

# Example /etc/fstab entry for a root filesystem
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx / ext4 defaults,errors=remount-ro 0 1

[!IMPORTANT] Be extremely careful when editing /etc/fstab. Incorrect entries can prevent your system from booting. Always make a backup before editing: sudo cp /mnt/etc/fstab /mnt/etc/fstab.bak.

8. Consider Hardware Replacement and Data Migration

If smartctl indicates a failing drive, or if fsck/xfs_repair repeatedly finds errors that return after reboots, the underlying disk is likely failing.

Backup All Data: This is your highest priority.
Replace the Drive: Physically replace the faulty drive.
Data Migration: Restore your backup to the new drive or use tools like dd or rsync to clone/migrate data from the old (if still accessible) to the new drive.
Reinstall OS: In severe cases, a fresh OS install on the new drive might be the most reliable path, followed by restoring application data.

By following these steps methodically, you can diagnose and resolve most Linux read-only filesystem errors, restoring system stability and data integrity.