Saturday, May 12, 2018
Had another instance of I/O error on the Terramaster. This looks a bit like this one: bz1315013
Just before it went bezerk, I had run a zpool statusand gotten:
pool: data state: ONLINEstatus: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: none requested config:
NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 2 sdd ONLINE 1 0 1 sde ONLINE 0 0 1 sdf ONLINE 0 0 1errors: No known data errors
When things went south, instead, I got:
pool: data state: ONLINE status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://zfsonlinux.org/msg/ZFS-8000-HC scan: none requested config: NAME STATE READ WRITE CKSUM data ONLINE 12 32 0 raidz1-0 ONLINE 32 65 0 sdb ONLINE 10 49 0 sdc ONLINE 39 45 2 sdd ONLINE 13 31 1 sde ONLINE 36 15 1 sdf ONLINE 32 35 1errors: List of errors unavailable: pool I/O is currently suspended
errors: 30 data errors, use '-v' for a list
Problem is that since after shutting down the device and restarting it, the serial devices change, it's a bit hard to reconnect the zpool.
Replaced the names of the disks using trick found here:
zpool export data zpool import -d /dev/disk/by-id -aN
Now zpool status shows:
pool: data state: ONLINE scan: resilvered 92K in 0h0m with 0 errors on Sat May 12 16:21:21 2018 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 usb-ST3000DM_001-1CH166_201609040218-0:0 ONLINE 0 0 0 usb-WDC_WD30_EFRX-68EUZN0_201609040218-0:1 ONLINE 0 0 0 usb-ST3000DM_001-1CH166_201609040218-0:2 ONLINE 0 0 0 usb-WDC_WD30_EFRX-68EUZN0_201609040218-0:3 ONLINE 0 0 0 usb-WDC_WD30_EFRX-68EUZN0_201609040218-0:4 ONLINE 0 0 0errors: No known data errors
Hopefully, this means my system will now survive when switching the array off/on because of the USB reset...
Turns out that /dev/sdd is not the third disk in the array (I expected that disks were numbered in array disk order). It seems to be the disk #4 that was blinking red on regular occasions. It's one of the two usb-ST3000DM_001-1CH166_201609040218-0 in the system.