Thursday, September 14, 2017

When fstrim trims all our SAN paths on SLES12

When fstrim trims all our SAN paths on SLES12

Let's venture into Linux land today. We had several unplanned outages the last weeks and it all boiled down to... fstrim(8)

Sep 11 00:04:40 sles12db multipathd[2086]: sdai: mark as failed
Sep 11 00:04:40 sles12db multipathd[2086]: mpath-lvm--002: remaining active paths: 3
Sep 11 00:04:40 sles12db kernel: [...] sd 0:0:3:0: [sdai] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 11 00:04:40 sles12db kernel: [...] sd 0:0:3:0: [sdai] tag#0 Sense Key : Illegal Request [current]
Sep 11 00:04:40 sles12db kernel: [...] sd 0:0:3:0: [sdai] tag#0 ASC=0x27 <<vendor>>ASCQ=0xb0
Sep 11 00:04:40 sles12db kernel: [...] sd 0:0:3:0: [sdai] tag#0 CDB: Write same(16) 93 08 00 00 00 00 05 04 08 00 00 00 10 a8 00 00
Sep 11 00:04:40 sles12db kernel: [...] blk_update_request: 38 callbacks suppressed
Sep 11 00:04:40 sles12db kernel: [...] blk_update_request: I/O error, dev sdai, sector 84150272
Sep 11 00:04:40 sles12db kernel: [...] EXT4-fs (dm-17): discard request in group:321 block:0 count:533 failed with -5
...
Sep 11 00:04:41 sles12db multipathd[2086]: sdaf: mark as failed
Sep 11 00:04:41 sles12db multipathd[2086]: mpath-lvm--002: remaining active paths: 2
Sep 11 00:04:41 sles12db multipathd[2086]: sdz: mark as failed
Sep 11 00:04:41 sles12db multipathd[2086]: mpath-lvm--002: remaining active paths: 1
Sep 11 00:04:41 sles12db multipathd[2086]: sdac: mark as failed
Sep 11 00:04:41 sles12db multipathd[2086]: mpath-lvm--002: remaining active paths: 0

Ooops. There goes our Oracle database filesystem...

We came to the conclusion that HBA, cables, Fibre Channel switch and storage system are fine because it only affected 3 LUNs out of 7 and everything was back to normal after 4 seconds. The following message was suspicious though.

Sep 11 00:04:40 sles12db fstrim[138568]: fstrim: /oracle/SID/oraarch: FITRIM ioctl failed: Input/output error

And guess what...

# systemctl list-timers
NEXT LEFT LAST PASSED UNIT
...
Mon 2017-09-18 00:00:00 CEST 5 days left Mon 2017-09-11 00:00:00 CEST 1 day 8h ago fstrim.timer

Luckily we were able to reproduce this on a test server using filebench rather quickly.

#  grep -Hv "zz" /sys/block/sdai/queue/discard_max_bytes
/sys/block/sdai/queue/discard_max_bytes:8183808

# mount -v
...
/dev/mapper/testvg-oraarch on /mnt/oraarch type ext4 (rw,relatime,discard,nobarrier,data=ordered)

# cat fileserver.f
set $dir=/mnt/oraarch
set $nfiles=10000
set $meandirwidth=40
set $filesize=cvar(type=cvar-gamma,parameters=mean:16m;gamma:1.5)
set $nthreads=50
set $iosize=1m
set $meanappendsize=16k
set $runtime=600

define fileset name=bigfileset,path=$dir,size=$filesize,entries=$nfiles,dirwidth=$meandirwidth,prealloc=80

define process name=filereader,instances=1
{
  thread name=filereaderthread,memsize=10m,instances=$nthreads
  {
    flowop createfile name=createfile1,filesetname=bigfileset,fd=1
    flowop writewholefile name=wrtfile1,srcfd=1,fd=1,iosize=$iosize
    flowop closefile name=closefile1,fd=1
    flowop openfile name=openfile1,filesetname=bigfileset,fd=1
    flowop appendfilerand name=appendfilerand1,iosize=$meanappendsize,fd=1
    flowop closefile name=closefile2,fd=1
    flowop openfile name=openfile2,filesetname=bigfileset,fd=1
    flowop readwholefile name=readfile1,fd=1,iosize=$iosize
    flowop closefile name=closefile3,fd=1
    flowop deletefile name=deletefile1,filesetname=bigfileset
    flowop statfile name=statfile1,filesetname=bigfileset
  }
}

echo  "File-server Version 3.0 personality successfully loaded"

run $runtime
# ./bin/filebench -f fileserver.f
...
160.033: Failed to open file 9103, /mnt/oraarch/bigfileset/00000001/00000004/00000002/00000001/00000065/00000056/00000023/00000018, with status 10: Read-only file system
160.033: filereaderthread-39: flowop createfile1-1 failed
160.033: failed to create file createfile1
160.034: filereaderthread-1: flowop createfile1-1 failed
160.034: failed to create file createfile1
160.034: filereaderthread-35: flowop createfile1-1 failed
160.033: failed to create file createfile1
160.034: filereaderthread-44: flowop createfile1-1 failed
160.193: Run took 159 seconds...

# mount -v
...
/dev/mapper/testvg-oraarch on /mnt/oraarch type ext4 (ro,relatime,discard,nobarrier,data=ordered)

Software affected so far: SuSE SLES 12 SP2, kernel-default-4.4.59-92.24.2.x86_64, Fujitsu Eternus DX8700 S3 with thin provisioned storage.

Update 2017-11-06 -- And here is the official SuSE KB document Read-only or corrupted filesystem after fstrim operation on Eternus DXM provided storage LUN.

Links

No comments:

Post a Comment

389 Directory Server 1.3.x LDAP client authentication

389 Directory Server 1.3.x LDAP client authentication Last time we did a multi-master replication setup, see 389 Directory Server 1.3.x Repl...