Salvaging a broken hard drive
2025-01-14
I recently had the misfortune of a hard drive randomly disappearing from the device list. See below for dmesg
output, containing lots of I/O errors.
The following worked for me to resolve the issue.
First, find the host the disk is connected to: readlink /sys/block/sdX
(look for hostX
)
To remove the offending disk: echo 1 | sudo tee /sys/block/sdX/device/delete
To reconnect: echo "- - -" | sudo tee /sys/class/scsi_host/hostX/scan
To start a SMART offline check that will hopefully resolve the issue: sudo smartctl -t offline /dev/sdX
Inspect progress of running check: sudo smartctl -a /dev/sdX | grep Current_Pending_Sector
dmesg
output:
[ 846.095273] sd 0:0:0:0: [sdd] Attached SCSI disk
[ 903.578050] sdd:
[ 903.601468] sdd:
[ 903.695939] sdd: sdd1
[ 903.992813] sdd: sdd1
[ 919.723783] ata1: log page 10h reported inactive tag 26
[ 919.723789] ata1.00: exception Emask 0x1 SAct 0x407fffff SErr 0x0 action 0x0
[ 919.723792] ata1.00: irq_stat 0x40000008
[ 919.723794] ata1.00: failed command: WRITE FPDMA QUEUED
[ 919.723795] ata1.00: cmd 61/08:00:00:00:68/00:00:16:00:00/40 tag 0 ncq dma 4096 out
res 41/10:00:00:00:00/00:00:00:00:00/00 Emask 0x81 (invalid argument)
[ 919.723800] ata1.00: status: { DRDY ERR }
[ 919.723801] ata1.00: error: { IDNF }
[ 919.723803] ata1.00: failed command: WRITE FPDMA QUEUED
[ 919.723804] ata1.00: cmd 61/08:08:00:00:a8/00:00:16:00:00/40 tag 1 ncq dma 4096 out
res 41/10:00:03:00:00/00:00:00:00:00/00 Emask 0x81 (invalid argument)
[ 919.723807] ata1.00: status: { DRDY ERR }
[ 919.723808] ata1.00: error: { IDNF }
[ 919.724392] ata1.00: failed to read native max address (err_mask=0x40)
[ 919.724395] ata1.00: HPA support seems broken, skipping HPA handling
[ 919.724396] ata1.00: revalidation failed (errno=-5)
[ 919.724399] ata1: hard resetting link
[ 925.074509] ata1: link is slow to respond, please be patient (ready=0)
[ 929.757529] ata1: found unknown device (class 0)
[ 929.912524] ata1: softreset failed (device not ready)
[ 929.912530] ata1: hard resetting link
[ 935.268530] ata1: link is slow to respond, please be patient (ready=0)
[ 939.960555] ata1: found unknown device (class 0)
[ 940.120562] ata1: softreset failed (device not ready)
[ 940.120568] ata1: hard resetting link
[ 945.476584] ata1: link is slow to respond, please be patient (ready=0)
[ 950.476609] ata1: found unknown device (class 0)
[ 955.637622] ata1: link is slow to respond, please be patient (ready=0)
[ 975.125702] ata1: softreset failed (device not ready)
[ 975.125709] ata1: limiting SATA link speed to 3.0 Gbps
[ 975.125713] ata1: hard resetting link
[ 980.169732] ata1: found unknown device (class 0)
[ 980.328732] ata1: softreset failed (device not ready)
[ 980.328741] ata1: reset failed, giving up
[ 980.328744] ata1.00: disable device
[ 980.328807] ata1: EH complete
[ 980.328827] scsi_io_completion_action: 3 callbacks suppressed
[ 980.328840] sd 0:0:0:0: [sdd] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=67s
[ 980.328845] sd 0:0:0:0: [sdd] tag#23 CDB: Write(16) 8a 00 00 00 00 00 16 68 00 00 00 00 00 08 00 00
[ 980.328847] blk_print_req_error: 3 callbacks suppressed
[ 980.328848] I/O error, dev sdd, sector 375914496 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 2
[ 980.328853] Buffer I/O error on dev sdd1, logical block 38797312, lost async page write
[ 980.328866] sd 0:0:0:0: [sdd] tag#31 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=67s
[ 980.328869] sd 0:0:0:0: [sdd] tag#31 CDB: Write(16) 8a 00 00 00 00 00 16 a8 00 00 00 00 00 08 00 00
[ 980.328870] I/O error, dev sdd, sector 380108800 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 2
[ 980.328873] Buffer I/O error on dev sdd1, logical block 39321600, lost async page write
[ 980.328878] sd 0:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=67s
[ 980.328880] sd 0:0:0:0: [sdd] tag#0 CDB: Write(16) 8a 00 00 00 00 00 16 e8 00 00 00 00 00 08 00 00