Monday, May 8, 2017

FreeBSD aesni(4) and openssl

FreeBSD aesni(4) and openssl

I have to admit, I also like FreeBSD. There was a question if you need to kldload aesni to speed up openssl (or any application that's using libcrypto) and the short answer is no. The long answer...

What is AES-NI again?

The new AES-NI instruction set is comprised of six new instructions that perform several compute intensive parts of the AES algorithm. These instructions can execute using significantly less clock cycles than a software solution.

So AES-NI is basically just another mnemonic like ADD, SUB, XOR, MOV, AND, etc. You can just call them from user space. And that's what OpenSSL/LibreSSL does, see aesni-x86_64.S.

.globl aesni_cbc_encrypt
...
aesni_cbc_encrypt:
...
.byte 102,15,56,220,209
...

And there's our AESENC opcode. It's in .bytes notation because FreeBSD is still using a GPLv2 binutils that's too old for AES-NI mnemonics.

In Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 we see that the AESENC instruction has the following opcode.

66 0F 38 DC /r
AESENC xmm1, xmm2/m128

So ".byte 102,15,56,220,209" translates with some decimal to hexadecimal conversion to "0x66, 0x0F, 0x38, 0xDC". Which is just the opcode for AESENC above.

So there is no kernel stuff involved. What is aesni(4) used for then?

The aesni driver registers itself to accelerate AES operations for crypto(4).

In sys/crypto/aesni/aesni.c we see that the AES-NI kernel module registers itself as being hardware capable and for the following ciphers (see sys/crypto/aesni/aesni.c and crypto(9)).

static int
aesni_attach(device_t dev)
{
 struct aesni_softc *sc;
...
 sc->cid = crypto_get_driverid(dev, CRYPTOCAP_F_HARDWARE |
     CRYPTOCAP_F_SYNC);
...
 crypto_register(sc->cid, CRYPTO_AES_CBC, 0, 0);
 crypto_register(sc->cid, CRYPTO_AES_ICM, 0, 0);
 crypto_register(sc->cid, CRYPTO_AES_NIST_GCM_16, 0, 0);
 crypto_register(sc->cid, CRYPTO_AES_128_NIST_GMAC, 0, 0);
 crypto_register(sc->cid, CRYPTO_AES_192_NIST_GMAC, 0, 0);
 crypto_register(sc->cid, CRYPTO_AES_256_NIST_GMAC, 0, 0);
 crypto_register(sc->cid, CRYPTO_AES_XTS, 0, 0);
...

So the aesni kernel module provides crypto services with the OpenCrypto framework for both user space (that's the /dev/crypto device) via crypto(4) and kernel space (think of GELI and IPSec) via crypto(9).

Still not convinced that openssl is not using aesni(4)? Let's fire up DTrace just to be sure...

For the first test, no aesni(4) kernel module is loaded. We use openssl with -evp to make sure we're actually using hardware crypto (and you'll see it's calling libcrypto's aesni_cbc_encrypt function we talked about earlier). Notice there is almost no user/kernel space boundary crossing.

# kldstat 
Id Refs Address            Size     Name
 1    6 0xffffffff80200000 1fa7c38  kernel
 2    1 0xffffffff82219000 249d     ulpt.ko
 3    1 0xffffffff8221c000 adec     tmpfs.ko

# kldload dtraceall

# dtrace -n 'pid$target:libcrypto.so.8:*aesni*:entry { @[probefunc] = count(); }' \
  -c "openssl speed -elapsed -evp aes-128-cbc"
...
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc       5973.14k    23210.84k    84090.94k   238951.75k   543361.71k
...
  aesni_cbc_encrypt                                           4105425

# dtrace -n 'fbt:kernel:copy*:entry /pid == $target/ { @bytes[probefunc] = quantize(arg2); }' \
  -c "openssl speed -elapsed -evp aes-128-cbc"
...
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     602153.10k   640760.18k   643890.94k   660515.73k   666079.99k
...
  copyin                                            
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@                                        2        
              16 |@@@@@@@@@@@@                             16       
              32 |@@@@@@@@@@@@@@@@                         22       
              64 |@@@                                      4        
             128 |                                         0        
             256 |@@@@                                     5        
             512 |@@@@                                     5        
            1024 |                                         0        

  copyinstr                                         
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6        
            2048 |                                         0        

  copyout                                           
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@                                        1        
              16 |@@@@@@@@@                                15       
              32 |@@@@@@@@                                 14       
              64 |@                                        2        
             128 |@@@@@@@@@@@@                             21       
             256 |@@@                                      5        
             512 |@@@                                      6        
            1024 |                                         0        
            2048 |@@@                                      5        
            4096 |                                         0      

Let's load the aesni(4) kernel module now and check if any kernel aesni probes fire.

# kldload aesni
aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard

# dtrace -n 'fbt:aesni::entry /pid == $target/ { @[probefunc] = count(); }' \
  -c "openssl speed -elapsed -evp aes-128-cbc"
...
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     616989.41k   668602.69k   670494.73k   673865.32k   679105.88k
...

No probes fired. That means openssl is not using the aesni kernel functions at all.

BUT, we can make openssl use aesni(4) with -engine cryptodev. Let's check if any kernel aesni and user/kernel space boundary crossing probes fire now.

# kldload cryptodev
# openssl engine -c -tt
(cryptodev) BSD cryptodev engine
 [RSA, DSA, DH, AES-128-CBC, AES-192-CBC, AES-256-CBC]
     [ available ]
...

# dtrace -n 'fbt:aesni::entry /pid == $target/ { @[probefunc] = count(); }' \
  -c "openssl speed -elapsed -evp aes-128-cbc"
...
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc       4056.66k    15608.91k    57565.04k   183731.20k   492147.10k
...
  aesni_cipher_setup_common                                         8
  aesni_freesession                                                 8
  aesni_newsession                                                  8
  aesni_cipher_alloc                                          3036256
  aesni_encrypt_cbc                                           3036256
  aesni_process                                               3036256

# dtrace -n 'fbt:kernel:copy*:entry /pid == $target/ { @bytes[probefunc] = quantize(arg2); }' \
  -c "openssl speed -elapsed -evp aes-128-cbc"
...
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc       3602.78k    13827.67k    51851.57k   168562.00k   472572.78k
...
  copyinstr                                         
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6        
            2048 |                                         0        

  copyout                                           
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |                                         8        
               8 |                                         1        
              16 |@@@@@                                    675536   
              32 |@@@@@@@@@@@@@@@@@@@@                     2603253  
              64 |@@@@@                                    649862   
             128 |                                         21       
             256 |@@@@@                                    609223   
             512 |                                         6        
            1024 |@@@@                                     495120   
            2048 |                                         5        
            4096 |                                         0        
            8192 |@                                        173512   
           16384 |                                         0        

  copyin                                            
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |                                         15       
               8 |                                         3        
              16 |@@@@@@@@@@@@@@@@@                        3278781  
              32 |@@@@@@@@@@@@@                            2603265  
              64 |@@@                                      649864   
             128 |                                         0        
             256 |@@@                                      609223   
             512 |                                         5        
            1024 |@@@                                      495120   
            2048 |                                         0        
            4096 |                                         0        
            8192 |@                                        173512   
           16384 |                                         0  

That's weird, -engine cryptodev seems to be on by default as soon as we loaded the cryptodev kernel module. Also note the huge amount of user/kernel space boundary crossing. That's a lot of data to be copied from user space to kernel space and back again for nothing. Mental note here, don't load the cryptodev kernel module ever unless using a hifn(4), safe(4) or ubsec(4) crypto accelerator.

Links

No comments:

Post a Comment

389 Directory Server 1.3.x LDAP client authentication

389 Directory Server 1.3.x LDAP client authentication Last time we did a multi-master replication setup, see 389 Directory Server 1.3.x Repl...