They joy of replacing a local disk with SVM
We still have some old Sun Fire V245 server here and I had the pleasure to replace a failed disk today.
The server is still running Solaris 10 and using SVM to mirror the root disks.
What's the failed disk in question?
# iostat -En c0t0d0 Soft Errors: 3039 Hard Errors: 75 Transport Errors: 14 Vendor: FUJITSU Product: MAY2073RCSUN72G Revision: 0501 Serial No: xxxxxxxxxx Size: 73.41GB <73407865856 bytes> Media Error: 64 Device Not Ready: 0 No Device: 11 Recoverable: 3039 Illegal Request: 2 Predictive Failure Analysis: 12
Nasty, let's configure it out.
# disk=c0t0d0 # metastat -p > /etc/lvm/md.tab # grep $disk /etc/lvm/md.tab d21 1 1 c0t0d0s1 d11 1 1 c0t0d0s0 # metadetach d10 d11 d10: submirror d11 is detached # metadetach d20 d21 metadetach: solaris: d20: attempt an operation on a submirror that has erred components
Ooops, I hope the force is still with me...
# metastat d20 d20: Mirror Submirror 0: d21 State: Needs maintenance Submirror 1: d22 State: Okay ... # metadetach -f d20 d21 d20: submirror d21 is detached
I had such anger, but it's all good now. Let's continue.
# metaclear d11 d11: Concat/Stripe is cleared # metaclear d21 d11: Concat/Stripe is cleared ... # metadb | grep $disk a m p luo 16 8192 /dev/dsk/c0t0d0s7 a p luo 8208 8192 /dev/dsk/c0t0d0s7 a p luo 16400 8192 /dev/dsk/c0t0d0s7 # metadb -d ${disk}s7 # cfgadm -al | grep $disk c0::dsk/c0t0d0 disk connected configured unknown # cfgadm -c unconfigure c0::dsk/c0t0d0
Now we can physically replace the failed disk.
# tail -f /var/adm/messages ... May 23 12:30:20 solaris genunix: [ID 408114 kern.info] /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0) offline May 23 12:30:27 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1 (mpt0): May 23 12:30:27 solaris mpt_handle_event_sync : SAS target 0 added. May 23 12:30:27 solaris scsi: [ID 583861 kern.info] sd0 at mpt0: unit-address 0,0: target 0 lun 0 May 23 12:30:27 solaris genunix: [ID 936769 kern.info] sd0 is /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 May 23 12:30:28 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0): May 23 12:30:28 solaris Corrupt label - label checksum failed May 23 12:30:28 solaris genunix: [ID 408114 kern.info] /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0) online
Alright, let's configure the new disk in.
# cfgadm -c configure c0::dsk/c0t0d0 # format $disk c0t0d0: configured with capacity of 68.35GB selecting c0t0d0 [disk formatted] ... format> label Ready to label disk, continue? y format> quit
Time to rebuild our SVM mirror.
# metastat d20 ... Device Relocation Information: Device Reloc Device ID c0t1d0 Yes id1,sd@n500000e016af27c0 # sourcedisk=c0t1d0 # prtvtoc /dev/rdsk/${sourcedisk}s2 | fmthard -s - /dev/rdsk/${disk}s2 # metadb -a -c3 ${disk}s7 # metainit d11 # metainit d21 # metattach d10 d11 d10: submirror d11 is attached # metattach d20 d21 d20: submirror d21 is attached # metadevadm -u $disk Updating Solaris Volume Manager device relocation information for c0t0d0 Old device reloc information: id1,sd@n500000e0135e2dd0 New device reloc information: id1,sd@n500000e0135e2dd0 # installboot /usr/platform/$(uname -i)/lib/fs/ufs/bootblk /dev/rdsk/${disk}s0 # metastat -c d20 m 4.0GB d22 d21 (resync-1%) d22 s 4.0GB c0t1d0s1 d21 s 4.0GB c0t0d0s1
Once the resync is done we have our OS disks mirrored again.