Wednesday, April 19, 2017

LDOMs with SR-IOV NET and SR-IOV FC

LDOMs with SR-IOV NET and SR-IOV FC

So we got this shiny new S7-2 (running Solaris 11.3 SRU 18 and firmware 9.7.5.b) for testing. It has two 16Gb QLogic Fibre Channel adapters and a rather weird onboard network/SAS controller bus assignment:

# ldm ls-io -l
...
/SYS/MB/NET0                               PCIE   pci_0    primary   OCC
[pci@300/pci@1/pci@0/pci@1]
    network@0
    network@0,1
    network@0,2
...
/SYS/MB/RISER3/PCIE4                       PCIE   pci_2    primary   OCC
[pci@302/pci@2/pci@0/pci@14]
    LSI,sas@0/iport@4
    LSI,sas@0/iport@8

Looks like a primary/secondary setup is off the table. So let's create some fully SR-IOV'd LDOMs instead.

First make sure IOV is on.

# ldm ls-io | grep BUS
NAME                                       TYPE   BUS      DOMAIN    STATUS
pci_0                                      BUS    pci_0    primary   IOV
pci_2                                      BUS    pci_2    primary   IOV

and that we have Physical Function devices already:

# ldm ls-io | grep PF
/SYS/MB/NET0/IOVNET.PF1                    PF     pci_0    primary
/SYS/MB/NET0/IOVNET.PF2                    PF     pci_0    primary
/SYS/MB/NET0/IOVNET.PF0                    PF     pci_0    primary
/SYS/MB/NET0/IOVNET.PF3                    PF     pci_0    primary

Hm? That's strange, no IOVFC devices?! A quick MOS search leads to "Unable To Create SR-IOV Virtual Instance For Qlogic 16 Gb Fibre Channel PCIe Universal Host Bus Adapter (Doc ID 1950167.1)".

Looks like we need a newer firmware! What's my QLogic Fibre Channel adapter part number again? Let's check via ILOM:

-> show /System/PCI_Devices/Add-on/Device_3

 /System/PCI_Devices/Add-on/Device_3
    Targets:

    Properties:
        part_number = 7101674

Time to download "QLE8362 (7101674) SRIOV Flash Kit for Solaris FC Adapters".

# unzip Oracle_QLE8362_SRIOV_Flash_Kit_07.zip
...
# cd XXX 
# ./update_sol.sh .
Flashing Board Config Data...
Installation directory: /usr/lib/ssm/fwupdate/qlogic
Working dir: /var/tmp/sriov
Updating Board Config parameters of HBA instance 0 - QLE8362...
Success
... many many lines ...
Flash update complete. Changes have been saved to all ports of this HBA.
You must reboot in order for the changes to become effective.

Reboot time... Sooo? Do we have our IOVFC devices now?

# ldm ls-io | grep IOVFC
/SYS/MB/RISER3/PCIE3/IOVFC.PF0             PF     pci_0    primary
/SYS/MB/RISER3/PCIE3/IOVFC.PF1             PF     pci_0    primary
/SYS/MB/RISER1/PCIE1/IOVFC.PF0             PF     pci_2    primary
/SYS/MB/RISER1/PCIE1/IOVFC.PF1             PF     pci_2    primary

Yay! Let's start (you can skip the set-io iov=on if you alread have them, if not that's how you enable IOV).

# ldm add-vds primary-vds0 primary
# ldm add-vcc port-range=5000-5100 primary-vcc0 primary
# ldm set-core 4 primary
# svcadm enable vntsd
# ldm start-reconf primary
# ldm set-mem 16g primary
# ldm set-io iov=on pci_0
# ldm set-io iov=on pci_2
# shutdown -y -g0 -i6

# ldm set-domain failure-policy=reset primary
# ldm add-spconfig initial

Network cables on net1/net2 are for LDOMs (IPMP config). Yes I know, we should add another NIC to the last free PCIe port and use that for the second port... Anyway, let's match net1/2 with IOVNET.PF? just to be sure:

# dladm show-phys | egrep 'net(1|2)'
net1              Ethernet             up         1000   full      i40e1
net2              Ethernet             up         1000   full      i40e2

# ldm ls-io | grep IOVNET.PF
/SYS/MB/NET0/IOVNET.PF1                    PF     pci_0    primary
/SYS/MB/NET0/IOVNET.PF2                    PF     pci_0    primary
...
# ldm ls-io -l /SYS/MB/NET0/IOVNET.PF1
NAME                                       TYPE   BUS      DOMAIN    STATUS
----                                       ----   ---      ------    ------
/SYS/MB/NET0/IOVNET.PF1                    PF     pci_0    primary
[pci@300/pci@1/pci@0/pci@1/network@0,1]
    maxvfs = 31

# grep pci@300/pci@1/pci@0/pci@1/network@0,1 /etc/path_to_inst
"/pci@300/pci@1/pci@0/pci@1/network@0,1" 1 "i40e"

So net1 is i40e1 and i40e1 is "/pci@300/pci@1/pci@0/pci@1/network@0,1", which is IOVNET.PF1. Same with net2:

# ldm ls-io -l /SYS/MB/NET0/IOVNET.PF2
NAME                                       TYPE   BUS      DOMAIN    STATUS
----                                       ----   ---      ------    ------
/SYS/MB/NET0/IOVNET.PF2                    PF     pci_0    primary
[pci@300/pci@1/pci@0/pci@1/network@0,2]
    maxvfs = 31

# grep pci@300/pci@1/pci@0/pci@1/network@0,2 /etc/path_to_inst
"/pci@300/pci@1/pci@0/pci@1/network@0,2" 2 "i40e"

# ldm create-vf -n 4 /SYS/MB/NET0/IOVNET.PF1
# ldm create-vf -n 4 /SYS/MB/NET0/IOVNET.PF2
# ldm ls-io | grep VF
...
/SYS/MB/NET0/IOVNET.PF1.VF0                VF     pci_0
/SYS/MB/NET0/IOVNET.PF2.VF0                VF     pci_0
...

Create PFs for the QLogic Fibre Channel SAN cards as well (connected to Fabric via first port on each HBA):

# ldm create-vf -n 4 /SYS/MB/RISER3/PCIE3/IOVFC.PF0
# ldm create-vf -n 4 /SYS/MB/RISER1/PCIE1/IOVFC.PF0

Make sure NPIV support is enabled on the Brocade switch port:

switch:admin> portcfgshow x/xx | grep NPIV
NPIV capability           ON
...

Good, primary is configured, VFs are there, two LUNs created (one for the OS, one for aux data). Time to create the first LDOM guest (we're using port VLAN IDs for all LDOMs/zones, that's why we have to set the pvid).

# ldm add-dom ldg1
# ldm set-core 4 ldg1
# ldm set-mem 32g ldg1
# ldm set-domain master=primary ldg1

# ldm add-io /SYS/MB/NET0/IOVNET.PF1.VF0 ldg1
# ldm add-io /SYS/MB/NET0/IOVNET.PF2.VF0 ldg1
# ldm set-io pvid=1234 /SYS/MB/NET0/IOVNET.PF1.VF0
# ldm set-io pvid=1234 /SYS/MB/NET0/IOVNET.PF2.VF0

# ldm add-io /SYS/MB/RISER3/PCIE3/IOVFC.PF0.VF0 ldg1
# ldm add-io /SYS/MB/RISER1/PCIE1/IOVFC.PF0.VF0 ldg1

# ldm set-var auto-boot\?=false ldg1

# ldm add-vdsdev options=ro /net/installsrv.mycompany.com/export/.../sol-11_3_18_6_0-text-sparc.iso solaris@primary-vds0
# ldm add-vdisk solaris solaris@primary-vds0 ldg1

# ldm bind-domain ldg1
# ldm start-domain ldg1
# telnet localhost 5000
{0} ok devalias
solaris                  /virtual-devices@100/channel-devices@200/disk@0
...
{0} ok boot solaris

Installation complete? Then we're almost done...

# ldm set-var auto-boot\?=true ldg1
# ldm set-var multipath-boot\?=true ldg1
# ldm set-domain boot-policy=enforce ldg1

The boot-device eeprom variable is interesting (I added the second path):

# ldm list-var boot-device ldg1
boot-device=/pci@300/pci@2/pci@0/pci@13/SUNW,qlc@0,2/fp@0,0/disk@w5006016xxxxxxxd6,0:a \
/pci@302/pci@1/pci@0/pci@11/SUNW,qlc@0,2/fp@0,0/disk@w5006016xxxxxxxd6,0:a

We're booting from LUN 0 and with multipath-boot set to yes it'll try both paths (on each HBA). Neat.

We could remove the solaris vdisk and we should definitely run ldm add-spconfig to save the current LDOM config. Enabling MPxIO and IPMP in ldg1 is on the TODO, too...

Links

4 comments:

  1. how you implement IPMP on ldg1 ?

    ReplyDelete
    Replies
    1. I usually do it like this while logged in via telnet on the vconsole:

      # ipadm delete-ip net0

      # ipadm create-ip net0
      # ipadm create-ip net1
      # ipadm create-ipmp -i net0 -i net1 ipmp0

      # ipadm create-addr -T static -a 10.x.x.x/25 ipmp0/v4
      # route -p add default 10.x.x.x

      (make sure you do "ipadm create-ipmp -i net1 -i net0 ipmp0" in ldg2 to get some inbound load balancing, see ipmpstat(1M))

      Delete
  2. is SRIOV is reliable? We are planning to have 9 LDOM with solaris 10, No cluster, one primary domain. Is there any limitation for SRIOV?

    ReplyDelete

389 Directory Server 1.3.x LDAP client authentication

389 Directory Server 1.3.x LDAP client authentication Last time we did a multi-master replication setup, see 389 Directory Server 1.3.x Repl...