Tuesday, May 23, 2017

They joy of replacing a local disk with SVM

They joy of replacing a local disk with SVM

We still have some old Sun Fire V245 server here and I had the pleasure to replace a failed disk today.

The server is still running Solaris 10 and using SVM to mirror the root disks.

What's the failed disk in question?

# iostat -En
c0t0d0           Soft Errors: 3039 Hard Errors: 75 Transport Errors: 14
Vendor: FUJITSU  Product: MAY2073RCSUN72G  Revision: 0501 Serial No: xxxxxxxxxx
Size: 73.41GB <73407865856 bytes>
Media Error: 64 Device Not Ready: 0 No Device: 11 Recoverable: 3039
Illegal Request: 2 Predictive Failure Analysis: 12

Nasty, let's configure it out.

# disk=c0t0d0
# metastat -p > /etc/lvm/md.tab
# grep $disk /etc/lvm/md.tab
d21 1 1 c0t0d0s1
d11 1 1 c0t0d0s0
# metadetach d10 d11
d10: submirror d11 is detached
# metadetach d20 d21
metadetach: solaris: d20: attempt an operation on a submirror that has erred components

Ooops, I hope the force is still with me...

# metastat d20
d20: Mirror
    Submirror 0: d21
      State: Needs maintenance
    Submirror 1: d22
      State: Okay
...
# metadetach -f d20 d21
d20: submirror d21 is detached

I had such anger, but it's all good now. Let's continue.

# metaclear d11
d11: Concat/Stripe is cleared
# metaclear d21
d11: Concat/Stripe is cleared
...
# metadb | grep $disk
     a m  p  luo        16              8192            /dev/dsk/c0t0d0s7
     a    p  luo        8208            8192            /dev/dsk/c0t0d0s7
     a    p  luo        16400           8192            /dev/dsk/c0t0d0s7
# metadb -d ${disk}s7
# cfgadm -al | grep $disk
c0::dsk/c0t0d0                 disk         connected    configured   unknown
# cfgadm -c unconfigure c0::dsk/c0t0d0

Now we can physically replace the failed disk.

# tail -f /var/adm/messages
...
May 23 12:30:20 solaris genunix: [ID 408114 kern.info] /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0) offline
May 23 12:30:27 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1 (mpt0):
May 23 12:30:27 solaris    mpt_handle_event_sync : SAS target 0 added.
May 23 12:30:27 solaris scsi: [ID 583861 kern.info] sd0 at mpt0: unit-address 0,0: target 0 lun 0
May 23 12:30:27 solaris genunix: [ID 936769 kern.info] sd0 is /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0
May 23 12:30:28 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0):
May 23 12:30:28 solaris    Corrupt label - label checksum failed
May 23 12:30:28 solaris genunix: [ID 408114 kern.info] /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0 (sd0) online

Alright, let's configure the new disk in.

# cfgadm -c configure c0::dsk/c0t0d0
# format $disk

c0t0d0: configured with capacity of 68.35GB
selecting c0t0d0
[disk formatted]
...
format> label
Ready to label disk, continue? y

format> quit

Time to rebuild our SVM mirror.

# metastat d20
...
Device Relocation Information:
Device   Reloc  Device ID
c0t1d0   Yes    id1,sd@n500000e016af27c0

# sourcedisk=c0t1d0
# prtvtoc /dev/rdsk/${sourcedisk}s2 | fmthard -s - /dev/rdsk/${disk}s2
# metadb -a -c3 ${disk}s7
# metainit d11
# metainit d21
# metattach d10 d11
d10: submirror d11 is attached
# metattach d20 d21
d20: submirror d21 is attached
# metadevadm -u $disk
Updating Solaris Volume Manager device relocation information for c0t0d0
Old device reloc information:
        id1,sd@n500000e0135e2dd0
New device reloc information:
        id1,sd@n500000e0135e2dd0

# installboot /usr/platform/$(uname -i)/lib/fs/ufs/bootblk /dev/rdsk/${disk}s0

# metastat -c
d20              m   4.0GB d22 d21 (resync-1%)
    d22          s   4.0GB c0t1d0s1
    d21          s   4.0GB c0t0d0s1

Once the resync is done we have our OS disks mirrored again.

Links

No comments:

Post a Comment

389 Directory Server 1.3.x LDAP client authentication

389 Directory Server 1.3.x LDAP client authentication Last time we did a multi-master replication setup, see 389 Directory Server 1.3.x Repl...