When fstrim trims all our SAN paths on SLES12
Let's venture into Linux land today. We had several unplanned outages the last weeks and it all boiled down to... fstrim(8)
Sep 11 00:04:40 sles12db multipathd[2086]: sdai: mark as failed Sep 11 00:04:40 sles12db multipathd[2086]: mpath-lvm--002: remaining active paths: 3 Sep 11 00:04:40 sles12db kernel: [...] sd 0:0:3:0: [sdai] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Sep 11 00:04:40 sles12db kernel: [...] sd 0:0:3:0: [sdai] tag#0 Sense Key : Illegal Request [current] Sep 11 00:04:40 sles12db kernel: [...] sd 0:0:3:0: [sdai] tag#0 ASC=0x27 <<vendor>>ASCQ=0xb0 Sep 11 00:04:40 sles12db kernel: [...] sd 0:0:3:0: [sdai] tag#0 CDB: Write same(16) 93 08 00 00 00 00 05 04 08 00 00 00 10 a8 00 00 Sep 11 00:04:40 sles12db kernel: [...] blk_update_request: 38 callbacks suppressed Sep 11 00:04:40 sles12db kernel: [...] blk_update_request: I/O error, dev sdai, sector 84150272 Sep 11 00:04:40 sles12db kernel: [...] EXT4-fs (dm-17): discard request in group:321 block:0 count:533 failed with -5 ... Sep 11 00:04:41 sles12db multipathd[2086]: sdaf: mark as failed Sep 11 00:04:41 sles12db multipathd[2086]: mpath-lvm--002: remaining active paths: 2 Sep 11 00:04:41 sles12db multipathd[2086]: sdz: mark as failed Sep 11 00:04:41 sles12db multipathd[2086]: mpath-lvm--002: remaining active paths: 1 Sep 11 00:04:41 sles12db multipathd[2086]: sdac: mark as failed Sep 11 00:04:41 sles12db multipathd[2086]: mpath-lvm--002: remaining active paths: 0
Ooops. There goes our Oracle database filesystem...
We came to the conclusion that HBA, cables, Fibre Channel switch and storage system are fine because it only affected 3 LUNs out of 7 and everything was back to normal after 4 seconds. The following message was suspicious though.
Sep 11 00:04:40 sles12db fstrim[138568]: fstrim: /oracle/SID/oraarch: FITRIM ioctl failed: Input/output error
And guess what...
# systemctl list-timers NEXT LEFT LAST PASSED UNIT ... Mon 2017-09-18 00:00:00 CEST 5 days left Mon 2017-09-11 00:00:00 CEST 1 day 8h ago fstrim.timer
Luckily we were able to reproduce this on a test server using filebench rather quickly.
# grep -Hv "zz" /sys/block/sdai/queue/discard_max_bytes /sys/block/sdai/queue/discard_max_bytes:8183808 # mount -v ... /dev/mapper/testvg-oraarch on /mnt/oraarch type ext4 (rw,relatime,discard,nobarrier,data=ordered) # cat fileserver.f set $dir=/mnt/oraarch set $nfiles=10000 set $meandirwidth=40 set $filesize=cvar(type=cvar-gamma,parameters=mean:16m;gamma:1.5) set $nthreads=50 set $iosize=1m set $meanappendsize=16k set $runtime=600 define fileset name=bigfileset,path=$dir,size=$filesize,entries=$nfiles,dirwidth=$meandirwidth,prealloc=80 define process name=filereader,instances=1 { thread name=filereaderthread,memsize=10m,instances=$nthreads { flowop createfile name=createfile1,filesetname=bigfileset,fd=1 flowop writewholefile name=wrtfile1,srcfd=1,fd=1,iosize=$iosize flowop closefile name=closefile1,fd=1 flowop openfile name=openfile1,filesetname=bigfileset,fd=1 flowop appendfilerand name=appendfilerand1,iosize=$meanappendsize,fd=1 flowop closefile name=closefile2,fd=1 flowop openfile name=openfile2,filesetname=bigfileset,fd=1 flowop readwholefile name=readfile1,fd=1,iosize=$iosize flowop closefile name=closefile3,fd=1 flowop deletefile name=deletefile1,filesetname=bigfileset flowop statfile name=statfile1,filesetname=bigfileset } } echo "File-server Version 3.0 personality successfully loaded" run $runtime # ./bin/filebench -f fileserver.f ... 160.033: Failed to open file 9103, /mnt/oraarch/bigfileset/00000001/00000004/00000002/00000001/00000065/00000056/00000023/00000018, with status 10: Read-only file system 160.033: filereaderthread-39: flowop createfile1-1 failed 160.033: failed to create file createfile1 160.034: filereaderthread-1: flowop createfile1-1 failed 160.034: failed to create file createfile1 160.034: filereaderthread-35: flowop createfile1-1 failed 160.033: failed to create file createfile1 160.034: filereaderthread-44: flowop createfile1-1 failed 160.193: Run took 159 seconds... # mount -v ... /dev/mapper/testvg-oraarch on /mnt/oraarch type ext4 (ro,relatime,discard,nobarrier,data=ordered)
Software affected so far: SuSE SLES 12 SP2, kernel-default-4.4.59-92.24.2.x86_64, Fujitsu Eternus DX8700 S3 with thin provisioned storage.
Update 2017-11-06 -- And here is the official SuSE KB document Read-only or corrupted filesystem after fstrim operation on Eternus DXM provided storage LUN.
No comments:
Post a Comment