They joy of replacing a local disk with SVM
We still have some old Sun Fire V245 server here and I had the pleasure to replace a failed disk today.
The server is still running Solaris 10 and using SVM to mirror the root disks.
What's the failed disk in question?
# iostat -En c0t0d0 Soft Errors: 3039 Hard Errors: 75 Transport Errors: 14 Vendor: FUJITSU Product: MAY2073RCSUN72G Revision: 0501 Serial No: xxxxxxxxxx Size: 73.41GB <73407865856 bytes> Media Error: 64 Device Not Ready: 0 No Device: 11 Recoverable: 3039 Illegal Request: 2 Predictive Failure Analysis: 12
Nasty, let's configure it out.
# disk=c0t0d0 # metastat -p > /etc/lvm/md.tab # grep $disk /etc/lvm/md.tab d21 1 1 c0t0d0s1 d11 1 1 c0t0d0s0 # metadetach d10 d11 d10: submirror d11 is detached # metadetach d20 d21 metadetach: solaris: d20: attempt an operation on a submirror that has erred components
Ooops, I hope the force is still with me...
# metastat d20
d20: Mirror
Submirror 0: d21
State: Needs maintenance
Submirror 1: d22
State: Okay
...
# metadetach -f d20 d21
d20: submirror d21 is detached
I had such anger, but it's all good now. Let's continue.
# metaclear d11
d11: Concat/Stripe is cleared
# metaclear d21
d11: Concat/Stripe is cleared
...
# metadb | grep $disk
a m p luo 16 8192 /dev/dsk/c0t0d0s7
a p luo 8208 8192 /dev/dsk/c0t0d0s7
a p luo 16400 8192 /dev/dsk/c0t0d0s7
# metadb -d ${disk}s7
# cfgadm -al | grep $disk
c0::dsk/c0t0d0 disk connected configured unknown
# cfgadm -c unconfigure c0::dsk/c0t0d0
Now we can physically replace the failed disk.
# tail -f /var/adm/messages ... May 23 12:30:20 solaris genunix: [ID 408114 kern.info] /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0) offline May 23 12:30:27 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1 (mpt0): May 23 12:30:27 solaris mpt_handle_event_sync : SAS target 0 added. May 23 12:30:27 solaris scsi: [ID 583861 kern.info] sd0 at mpt0: unit-address 0,0: target 0 lun 0 May 23 12:30:27 solaris genunix: [ID 936769 kern.info] sd0 is /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 May 23 12:30:28 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0): May 23 12:30:28 solaris Corrupt label - label checksum failed May 23 12:30:28 solaris genunix: [ID 408114 kern.info] /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0) online
Alright, let's configure the new disk in.
# cfgadm -c configure c0::dsk/c0t0d0 # format $disk c0t0d0: configured with capacity of 68.35GB selecting c0t0d0 [disk formatted] ... format> label Ready to label disk, continue? y format> quit
Time to rebuild our SVM mirror.
# metastat d20
...
Device Relocation Information:
Device Reloc Device ID
c0t1d0 Yes id1,sd@n500000e016af27c0
# sourcedisk=c0t1d0
# prtvtoc /dev/rdsk/${sourcedisk}s2 | fmthard -s - /dev/rdsk/${disk}s2
# metadb -a -c3 ${disk}s7
# metainit d11
# metainit d21
# metattach d10 d11
d10: submirror d11 is attached
# metattach d20 d21
d20: submirror d21 is attached
# metadevadm -u $disk
Updating Solaris Volume Manager device relocation information for c0t0d0
Old device reloc information:
id1,sd@n500000e0135e2dd0
New device reloc information:
id1,sd@n500000e0135e2dd0
# installboot /usr/platform/$(uname -i)/lib/fs/ufs/bootblk /dev/rdsk/${disk}s0
# metastat -c
d20 m 4.0GB d22 d21 (resync-1%)
d22 s 4.0GB c0t1d0s1
d21 s 4.0GB c0t0d0s1
Once the resync is done we have our OS disks mirrored again.