
On Wed, Aug 14, 2024 at 09:42:08AM -0400, Raymond Mao wrote:
Hi Tom,
On Fri, 2 Aug 2024 at 11:34, Raymond Mao raymond.mao@linaro.org wrote:
Hi Tom,
On Thu, 1 Aug 2024 at 16:46, Tom Rini trini@konsulko.com wrote:
On Wed, Jul 31, 2024 at 10:25:10AM -0700, Raymond Mao wrote:
Integrate MbedTLS v3.6 LTS (currently v3.6.0) with U-Boot.
Motivations:
- MbedTLS is well maintained with LTS versions.
- LWIP is integrated with MbedTLS and easily to enable HTTPS.
- MbedTLS recently switched license back to GPLv2.
Prerequisite:
This patch series requires mbedtls git repo to be added as a subtree to the main U-Boot repo via: $ git subtree add --prefix lib/mbedtls/external/mbedtls \ https://github.com/Mbed-TLS/mbedtls.git \ v3.6.0 --squash Moreover, due to the Windows-style files from mbedtls git repo, we need to convert the CRLF endings to LF and do a commit manually: $ git add --renormalize . $ git commit
New Kconfig options:
`MBEDTLS_LIB` is for MbedTLS general switch. `MBEDTLS_LIB_CRYPTO` is for replacing original digest and crypto libs
with
MbedTLS. `MBEDTLS_LIB_X509` is for replacing original X509, PKCS7, MSCode, ASN1, and Pubkey parser with MbedTLS. `LEGACY_CRYPTO` is introduced as a main switch for legacy crypto
library.
`LEGACY_CRYPTO_BASIC` is for the basic crypto functionalities and `LEGACY_CRYPTO_CERT` is for the certificate related functionalities. For each of the algorithm, a pair of `<alg>_LEGACY` and `<alg>_MBEDTLS` Kconfig options are introduced. Meanwhile, `SPL_` Kconfig options are introduced.
In this patch set, MBEDTLS_LIB, MBEDTLS_LIB_CRYPTO and MBEDTLS_LIB_X509 are by default enabled in qemu_arm64_defconfig and sandbox_defconfig for testing purpose.
Patches for external MbedTLS project:
Since U-Boot uses Microsoft Authentication Code to verify PE/COFFs executables which is not supported by MbedTLS at the moment, addtional patches for MbedTLS are created to adapt with the EFI loader:
- Decoding of Microsoft Authentication Code.
- Decoding of PKCS#9 Authenticate Attributes.
- Extending MbedTLS PKCS#7 lib to support multiple signer's
certificates.
- MbedTLS native test suites for PKCS#7 signer's info.
All above 4 patches (tagged with `mbedtls/external`) are submitted to MbedTLS project and being reviewed, eventually they should be part of MbedTLS LTS release. But before that, please merge them into U-Boot, otherwise the building will be broken when MBEDTLS_LIB_X509 is enabled.
See below PR link for the reference: https://github.com/Mbed-TLS/mbedtls/pull/9001
Miscellaneous:
Optimized MbedTLS library size by tailoring the config file and disabling all unnecessary features for EFI loader. From v2, original libs (rsa, asn1_decoder, rsa_helper, md5, sha1,
sha256,
sha512) are completely replaced when MbedTLS is enabled. From v3, the size-growth is slightly reduced by refactoring Hash
functions.
Target(QEMU arm64) size-growth when enabling MbedTLS: v1: 6.03% v2: 4.66% From v3: 4.55%
Please see the latest output from buildman for size-growth on QEMU
arm64,
Sandbox and Nanopi A64. [1]
Let us inline the growth on qemu_arm64 for a moment: aarch64: (for 1/1 boards) all +6916.0 bss -32.0 data -64.0 rodata +200.0 text +6812.0 qemu_arm64 : all +6916 bss -32 data -64 rodata +200 text +6812 u-boot: add: 28/-17, grow: 12/-16 bytes: 15492/-8304 (7188) function old new delta mbedtls_internal_sha1_process - 4540 +4540 mbedtls_internal_md5_process - 2928 +2928 mbedtls_internal_sha256_process - 2052 +2052 mbedtls_internal_sha512_process - 1056 +1056 K - 896 +896 mbedtls_sha512_finish - 556 +556 mbedtls_sha256_finish - 484 +484 mbedtls_sha1_finish - 420 +420 mbedtls_sha512_starts - 340 +340 mbedtls_md5_finish - 336 +336 mbedtls_sha512_update - 264 +264 mbedtls_sha256_update - 252 +252 mbedtls_sha1_update - 236 +236 mbedtls_md5_update - 236 +236 mbedtls_sha512 - 148 +148 mbedtls_sha256_starts - 124 +124 hash_init_sha512 52 128 +76 hash_init_sha256 52 128 +76 mbedtls_sha1_starts - 72 +72 mbedtls_md5_starts - 60 +60 hash_init_sha1 52 112 +60 mbedtls_platform_zeroize - 56 +56 mbedtls_sha512_free - 16 +16 mbedtls_sha256_free - 16 +16 mbedtls_sha1_free - 16 +16 mbedtls_md5_free - 16 +16 hash_finish_sha512 72 88 +16 hash_finish_sha256 72 88 +16 hash_finish_sha1 72 88 +16 sha512_csum_wd 68 80 +12 sha256_csum_wd 68 80 +12 sha1_csum_wd 68 80 +12 md5_wd 68 80 +12 mbedtls_sha512_init - 12 +12 mbedtls_sha256_init - 12 +12 mbedtls_sha1_init - 12 +12 mbedtls_md5_init - 12 +12 memset_func - 8 +8 sha512_update 4 8 +4 sha384_update 4 8 +4 sha256_update 12 8 -4 sha1_update 12 8 -4 sha256_process 16 - -16 sha1_process 16 - -16 hash_update_sha512 36 16 -20 hash_update_sha256 36 16 -20 hash_update_sha1 36 16 -20 MD5Init 56 36 -20 sha1_starts 60 36 -24 hash_update_sha384 36 - -36 hash_init_sha384 52 - -52 sha384_csum_wd 68 12 -56 sha256_starts 104 40 -64 sha256_padding 64 - -64 sha1_padding 64 - -64 hash_finish_sha384 72 - -72 sha512_finish 152 36 -116 sha512_starts 168 40 -128 sha384_starts 168 40 -128 sha384_finish 152 4 -148 MD5Final 196 44 -152 sha512_base_do_finalize 160 - -160 static.sha256_update 228 - -228 static.sha1_update 240 - -240 sha512_base_do_update 244 - -244 MD5Update 260 - -260 sha1_finish 300 36 -264 sha256_finish 404 36 -368 sha256_armv8_ce_process 428 - -428 sha1_armv8_ce_process 484 - -484 sha512_K 640 - -640 sha512_block_fn 1212 - -1212 MD5Transform 2552 - -2552
And to start with, that's not bad. In fact, tossing LTO in before mbedTLS only changes the top-line a little: aarch64: (for 1/1 boards) all +5120.0 bss -16.0 data -64.0 rodata +200.0 text +5000.0 qemu_arm64 : all +5120 bss -16 data -64 rodata +200 text +5000 u-boot: add: 19/-18, grow: 11/-7 bytes: 14696/-7884 (6812)
But, is there something we can do still? mbedTLS is a more robust solution and I'm accepting there will be growth. But still the process/start/finish is much larger. Is there something configurable there?
I have investigated all those MbedTLS native functions with big-size
(_process/_update/_finish). For MD5 and SHA1, we don't have turnable configs. For SHA256 and SHA512, there are a few configs:
- Performance configs only for Armv8/a64. I didn't turn that on, which might affect the target size as well.
- Smaller implementation with lower size (only for non-Armv8/a64) at the
expense of losing performance. I didn't enable both, as #1 is more for performance and might potentially increase target size; #2 compromises the performance and only for non-Armv8/a64. Looks like that both don't help in reducing the size of qemu_arm64. But I will try #1 on qemu_arm64 and #2 on sandbox and let you know the size impact soon.
The smaller footprint implementation for SHA256/512 can reduce the target
size significantly on those "<hash>_process()" functions. Please see below output from buildman:
aarch64: (for 2/2 boards) all -1468.0 bss +16.0 data -64.0 rodata +200.0 text -1620.0 qemu_arm64 : all +4608 bss +80 data -64 rodata +200 text +4392 u-boot: add: 29/-17, grow: 12/-16 bytes: 13072/-8304 (4768) nanopi_a64 : all -7544 bss -48 data -64 rodata +200 text -7632 u-boot: add: 21/-8, grow: 4/-8 bytes: 10692/-4364 (6328) sandbox: (for 1/1 boards) all +19312.0 data +1440.0 rodata -4128.0 text +22000.0 sandbox : all +19312 data +1440 rodata -4128 text +22000 u-boot: add: 258/-206, grow: 122/-59 bytes: 90286/-76286 (14000)
Since this is a trade-off between size and performance, I will add one more kconfig to allow the user to turn it on/off. What are your thoughts?
On the other hand, the "Armv8/a64 only" options depend on NEON instructions, so I will keep them off.
Yes, we should have it as a Kconfig option. Can you perhaps do a real-world performance test (so some not-qemu platform) where you sha256 a big hunk of memory? That might help inform the defaults. Also, sorry, I don't understand your comment about NEON instructions. Is it the kind of thing where there's too much core variability in terms of who has what? If so, it too should be an option, but not enabled by default.