FreeBSD aesni(4) and openssl
I have to admit, I also like FreeBSD. There was a question if you need to kldload aesni
to speed up openssl (or any application that's using libcrypto) and the short answer is no. The long answer...
What is AES-NI again?
The new AES-NI instruction set is comprised of six new instructions that perform several compute intensive parts of the AES algorithm. These instructions can execute using significantly less clock cycles than a software solution.
So AES-NI is basically just another mnemonic like ADD
, SUB
, XOR
, MOV
, AND
, etc. You can just call them from user space. And that's what OpenSSL/LibreSSL does, see aesni-x86_64.S.
.globl aesni_cbc_encrypt ... aesni_cbc_encrypt: ... .byte 102,15,56,220,209 ...
And there's our AESENC
opcode. It's in .bytes notation because FreeBSD is still using a GPLv2 binutils that's too old for AES-NI mnemonics.
In Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 we see that the AESENC
instruction has the following opcode.
66 0F 38 DC /r
AESENC xmm1, xmm2/m128
So ".byte 102,15,56,220,209" translates with some decimal to hexadecimal conversion to "0x66, 0x0F, 0x38, 0xDC". Which is just the opcode for AESENC
above.
So there is no kernel stuff involved. What is aesni(4) used for then?
The aesni driver registers itself to accelerate AES operations for crypto(4).
In sys/crypto/aesni/aesni.c
we see that the AES-NI kernel module registers itself as being hardware capable and for the following ciphers (see sys/crypto/aesni/aesni.c and crypto(9)).
static int aesni_attach(device_t dev) { struct aesni_softc *sc; ... sc->cid = crypto_get_driverid(dev, CRYPTOCAP_F_HARDWARE | CRYPTOCAP_F_SYNC); ... crypto_register(sc->cid, CRYPTO_AES_CBC, 0, 0); crypto_register(sc->cid, CRYPTO_AES_ICM, 0, 0); crypto_register(sc->cid, CRYPTO_AES_NIST_GCM_16, 0, 0); crypto_register(sc->cid, CRYPTO_AES_128_NIST_GMAC, 0, 0); crypto_register(sc->cid, CRYPTO_AES_192_NIST_GMAC, 0, 0); crypto_register(sc->cid, CRYPTO_AES_256_NIST_GMAC, 0, 0); crypto_register(sc->cid, CRYPTO_AES_XTS, 0, 0); ...
So the aesni kernel module provides crypto services with the OpenCrypto framework for both user space (that's the /dev/crypto
device) via crypto(4) and kernel space (think of GELI and IPSec) via crypto(9).
Still not convinced that openssl is not using aesni(4)? Let's fire up DTrace just to be sure...
For the first test, no aesni(4) kernel module is loaded. We use openssl with -evp
to make sure we're actually using hardware crypto (and you'll see it's calling libcrypto's aesni_cbc_encrypt
function we talked about earlier). Notice there is almost no user/kernel space boundary crossing.
# kldstat Id Refs Address Size Name 1 6 0xffffffff80200000 1fa7c38 kernel 2 1 0xffffffff82219000 249d ulpt.ko 3 1 0xffffffff8221c000 adec tmpfs.ko # kldload dtraceall # dtrace -n 'pid$target:libcrypto.so.8:*aesni*:entry { @[probefunc] = count(); }' \ -c "openssl speed -elapsed -evp aes-128-cbc" ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 5973.14k 23210.84k 84090.94k 238951.75k 543361.71k ... aesni_cbc_encrypt 4105425 # dtrace -n 'fbt:kernel:copy*:entry /pid == $target/ { @bytes[probefunc] = quantize(arg2); }' \ -c "openssl speed -elapsed -evp aes-128-cbc" ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 602153.10k 640760.18k 643890.94k 660515.73k 666079.99k ... copyin value ------------- Distribution ------------- count 4 | 0 8 |@ 2 16 |@@@@@@@@@@@@ 16 32 |@@@@@@@@@@@@@@@@ 22 64 |@@@ 4 128 | 0 256 |@@@@ 5 512 |@@@@ 5 1024 | 0 copyinstr value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6 2048 | 0 copyout value ------------- Distribution ------------- count 4 | 0 8 |@ 1 16 |@@@@@@@@@ 15 32 |@@@@@@@@ 14 64 |@ 2 128 |@@@@@@@@@@@@ 21 256 |@@@ 5 512 |@@@ 6 1024 | 0 2048 |@@@ 5 4096 | 0
Let's load the aesni(4) kernel module now and check if any kernel aesni probes fire.
# kldload aesni aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard # dtrace -n 'fbt:aesni::entry /pid == $target/ { @[probefunc] = count(); }' \ -c "openssl speed -elapsed -evp aes-128-cbc" ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 616989.41k 668602.69k 670494.73k 673865.32k 679105.88k ...
No probes fired. That means openssl
is not using the aesni kernel functions at all.
BUT, we can make openssl use aesni(4) with -engine cryptodev
. Let's check if any kernel aesni and user/kernel space boundary crossing probes fire now.
# kldload cryptodev # openssl engine -c -tt (cryptodev) BSD cryptodev engine [RSA, DSA, DH, AES-128-CBC, AES-192-CBC, AES-256-CBC] [ available ] ... # dtrace -n 'fbt:aesni::entry /pid == $target/ { @[probefunc] = count(); }' \ -c "openssl speed -elapsed -evp aes-128-cbc" ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 4056.66k 15608.91k 57565.04k 183731.20k 492147.10k ... aesni_cipher_setup_common 8 aesni_freesession 8 aesni_newsession 8 aesni_cipher_alloc 3036256 aesni_encrypt_cbc 3036256 aesni_process 3036256 # dtrace -n 'fbt:kernel:copy*:entry /pid == $target/ { @bytes[probefunc] = quantize(arg2); }' \ -c "openssl speed -elapsed -evp aes-128-cbc" ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 3602.78k 13827.67k 51851.57k 168562.00k 472572.78k ... copyinstr value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6 2048 | 0 copyout value ------------- Distribution ------------- count 2 | 0 4 | 8 8 | 1 16 |@@@@@ 675536 32 |@@@@@@@@@@@@@@@@@@@@ 2603253 64 |@@@@@ 649862 128 | 21 256 |@@@@@ 609223 512 | 6 1024 |@@@@ 495120 2048 | 5 4096 | 0 8192 |@ 173512 16384 | 0 copyin value ------------- Distribution ------------- count 2 | 0 4 | 15 8 | 3 16 |@@@@@@@@@@@@@@@@@ 3278781 32 |@@@@@@@@@@@@@ 2603265 64 |@@@ 649864 128 | 0 256 |@@@ 609223 512 | 5 1024 |@@@ 495120 2048 | 0 4096 | 0 8192 |@ 173512 16384 | 0
That's weird, -engine cryptodev
seems to be on by default as soon as we loaded the cryptodev kernel module. Also note the huge amount of user/kernel space boundary crossing. That's a lot of data to be copied from user space to kernel space and back again for nothing. Mental note here, don't load the cryptodev
kernel module ever unless using a hifn(4), safe(4) or ubsec(4) crypto accelerator.
No comments:
Post a Comment