[PATCH v2 0/5] Add ARMv8 CE sha1/sha256 support

This series adds support for the SHA-1 and SHA-256 Secure Hash Algorithm for CPUs that have support of the ARM v8 Crypto Extensions. It Improves speed of integrity & signature checking procedures.
V2: - Add cover letter & sha256 support. - Kconfig default 'y' only if SHA1 and SHA256 selected
Loic Poulain (5): lib: sha1: Add support for hardware specific sha1_process sha1: Fix digest state size/type armv8 SHA-1 using ARMv8 Crypto Extensions: lib: sha256: Add support for hardware specific sha256_process armv8 SHA-256 using ARMv8 Crypto Extensions
arch/arm/cpu/armv8/Kconfig | 15 ++++ arch/arm/cpu/armv8/Makefile | 2 + arch/arm/cpu/armv8/sha1_ce_core.S | 132 +++++++++++++++++++++++++++++++++++ arch/arm/cpu/armv8/sha1_ce_glue.c | 21 ++++++ arch/arm/cpu/armv8/sha256_ce_core.S | 134 ++++++++++++++++++++++++++++++++++++ arch/arm/cpu/armv8/sha256_ce_glue.c | 21 ++++++ include/u-boot/sha1.h | 2 +- lib/sha1.c | 26 +++++-- lib/sha256.c | 26 +++++-- 9 files changed, 364 insertions(+), 15 deletions(-) create mode 100644 arch/arm/cpu/armv8/sha1_ce_core.S create mode 100644 arch/arm/cpu/armv8/sha1_ce_glue.c create mode 100644 arch/arm/cpu/armv8/sha256_ce_core.S create mode 100644 arch/arm/cpu/armv8/sha256_ce_glue.c

Mark sha1_process as weak to allow hardware specific implementation. Add parameter to support for multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org --- lib/sha1.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/lib/sha1.c b/lib/sha1.c index 8154e1e..e5e42bc 100644 --- a/lib/sha1.c +++ b/lib/sha1.c @@ -25,6 +25,8 @@ #include <watchdog.h> #include <u-boot/sha1.h>
+#include <linux/compiler_attributes.h> + const uint8_t sha1_der_prefix[SHA1_DER_LEN] = { 0x30, 0x21, 0x30, 0x09, 0x06, 0x05, 0x2b, 0x0e, 0x03, 0x02, 0x1a, 0x05, 0x00, 0x04, 0x14 @@ -65,7 +67,7 @@ void sha1_starts (sha1_context * ctx) ctx->state[4] = 0xC3D2E1F0; }
-static void sha1_process(sha1_context *ctx, const unsigned char data[64]) +static void __maybe_unused sha1_process_one(sha1_context *ctx, const unsigned char data[64]) { unsigned long temp, W[16], A, B, C, D, E;
@@ -219,6 +221,18 @@ static void sha1_process(sha1_context *ctx, const unsigned char data[64]) ctx->state[4] += E; }
+__weak void sha1_process(sha1_context *ctx, const unsigned char *data, + unsigned int blocks) +{ + if (!blocks) + return; + + while (blocks--) { + sha1_process_one(ctx, data); + data += 64; + } +} + /* * SHA-1 process buffer */ @@ -242,17 +256,15 @@ void sha1_update(sha1_context *ctx, const unsigned char *input,
if (left && ilen >= fill) { memcpy ((void *) (ctx->buffer + left), (void *) input, fill); - sha1_process (ctx, ctx->buffer); + sha1_process(ctx, ctx->buffer, 1); input += fill; ilen -= fill; left = 0; }
- while (ilen >= 64) { - sha1_process (ctx, input); - input += 64; - ilen -= 64; - } + sha1_process(ctx, input, ilen / 64); + input += ilen / 64 * 64; + ilen = ilen % 64;
if (ilen > 0) { memcpy ((void *) (ctx->buffer + left), (void *) input, ilen);

On Wed, Jun 01, 2022 at 08:26:27PM +0200, Loic Poulain wrote:
Mark sha1_process as weak to allow hardware specific implementation. Add parameter to support for multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
Applied to u-boot/next, thanks!

sha1 digest size is 5*32-bit => 160-bit. Using 64-bit unsigned long does not cause issue with the current sha1 implementation, but could be problematic for vectorized access.
Signed-off-by: Loic Poulain loic.poulain@linaro.org --- include/u-boot/sha1.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/u-boot/sha1.h b/include/u-boot/sha1.h index 283f103..09fee59 100644 --- a/include/u-boot/sha1.h +++ b/include/u-boot/sha1.h @@ -30,7 +30,7 @@ extern const uint8_t sha1_der_prefix[]; typedef struct { unsigned long total[2]; /*!< number of bytes processed */ - unsigned long state[5]; /*!< intermediate digest state */ + uint32_t state[5]; /*!< intermediate digest state */ unsigned char buffer[64]; /*!< data block being processed */ } sha1_context;

On Wed, Jun 01, 2022 at 08:26:28PM +0200, Loic Poulain wrote:
sha1 digest size is 5*32-bit => 160-bit. Using 64-bit unsigned long does not cause issue with the current sha1 implementation, but could be problematic for vectorized access.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
Applied to u-boot/next, thanks!

This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that have support for the SHA-1 part of the ARM v8 Crypto Extensions.
It greatly improves sha-1 based operations, about 10x faster on iMX8M evk board. ~12ms vs ~165ms for a 20MiB kernel sha-1 verification.
asm implementation is a simplified version of the Linux version (from Ard Biesheuvel).
Signed-off-by: Loic Poulain loic.poulain@linaro.org --- arch/arm/cpu/armv8/Kconfig | 11 ++++ arch/arm/cpu/armv8/Makefile | 1 + arch/arm/cpu/armv8/sha1_ce_core.S | 132 ++++++++++++++++++++++++++++++++++++++ arch/arm/cpu/armv8/sha1_ce_glue.c | 21 ++++++ 4 files changed, 165 insertions(+) create mode 100644 arch/arm/cpu/armv8/sha1_ce_core.S create mode 100644 arch/arm/cpu/armv8/sha1_ce_glue.c
diff --git a/arch/arm/cpu/armv8/Kconfig b/arch/arm/cpu/armv8/Kconfig index 9967376..0b11ca8 100644 --- a/arch/arm/cpu/armv8/Kconfig +++ b/arch/arm/cpu/armv8/Kconfig @@ -171,4 +171,15 @@ config ARMV8_SECURE_BASE
endif
+menuconfig ARMV8_CRYPTO + bool "ARM64 Accelerated Cryptographic Algorithms" + +if ARMV8_CRYPTO + +config ARMV8_CE_SHA1 + bool "SHA-1 digest algorithm (ARMv8 Crypto Extensions)" + default y if SHA1 + +endif + endif diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile index 85fe047..ff2495c 100644 --- a/arch/arm/cpu/armv8/Makefile +++ b/arch/arm/cpu/armv8/Makefile @@ -44,3 +44,4 @@ obj-$(CONFIG_TARGET_HIKEY) += hisilicon/ obj-$(CONFIG_ARMV8_PSCI) += psci.o obj-$(CONFIG_TARGET_BCMNS3) += bcmns3/ obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_ARMV8_CE_SHA1) += sha1_ce_glue.o sha1_ce_core.o diff --git a/arch/arm/cpu/armv8/sha1_ce_core.S b/arch/arm/cpu/armv8/sha1_ce_core.S new file mode 100644 index 0000000..fbf2714 --- /dev/null +++ b/arch/arm/cpu/armv8/sha1_ce_core.S @@ -0,0 +1,132 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * sha1_ce_core.S - SHA-1 secure hash using ARMv8 Crypto Extensions + * + * Copyright (C) 2014 Linaro Ltd ard.biesheuvel@linaro.org + * Copyright (C) 2022 Linaro Ltd loic.poulain@linaro.org + */ + +#include <config.h> +#include <linux/linkage.h> +#include <asm/system.h> +#include <asm/macro.h> + + .text + .arch armv8-a+crypto + + k0 .req v0 + k1 .req v1 + k2 .req v2 + k3 .req v3 + + t0 .req v4 + t1 .req v5 + + dga .req q6 + dgav .req v6 + dgb .req s7 + dgbv .req v7 + + dg0q .req q12 + dg0s .req s12 + dg0v .req v12 + dg1s .req s13 + dg1v .req v13 + dg2s .req s14 + + .macro add_only, op, ev, rc, s0, dg1 + .ifc \ev, ev + add t1.4s, v\s0().4s, \rc().4s + sha1h dg2s, dg0s + .ifnb \dg1 + sha1\op dg0q, \dg1, t0.4s + .else + sha1\op dg0q, dg1s, t0.4s + .endif + .else + .ifnb \s0 + add t0.4s, v\s0().4s, \rc().4s + .endif + sha1h dg1s, dg0s + sha1\op dg0q, dg2s, t1.4s + .endif + .endm + + .macro add_update, op, ev, rc, s0, s1, s2, s3, dg1 + sha1su0 v\s0().4s, v\s1().4s, v\s2().4s + add_only \op, \ev, \rc, \s1, \dg1 + sha1su1 v\s0().4s, v\s3().4s + .endm + + .macro loadrc, k, val, tmp + movz \tmp, :abs_g0_nc:\val + movk \tmp, :abs_g1:\val + dup \k, \tmp + .endm + + /* + * void sha1_armv8_ce_process(uint32_t state[5], uint8_t const *src, + * uint32_t blocks) + */ +ENTRY(sha1_armv8_ce_process) + /* load round constants */ + loadrc k0.4s, 0x5a827999, w6 + loadrc k1.4s, 0x6ed9eba1, w6 + loadrc k2.4s, 0x8f1bbcdc, w6 + loadrc k3.4s, 0xca62c1d6, w6 + + /* load state (4+1 digest states) */ + ld1 {dgav.4s}, [x0] + ldr dgb, [x0, #16] + + /* load input (64 bytes into v8->v11 16B vectors) */ +0: ld1 {v8.4s-v11.4s}, [x1], #64 + sub w2, w2, #1 +#if __BYTE_ORDER == __LITTLE_ENDIAN + rev32 v8.16b, v8.16b + rev32 v9.16b, v9.16b + rev32 v10.16b, v10.16b + rev32 v11.16b, v11.16b +#endif + +1: add t0.4s, v8.4s, k0.4s + mov dg0v.16b, dgav.16b + + add_update c, ev, k0, 8, 9, 10, 11, dgb + add_update c, od, k0, 9, 10, 11, 8 + add_update c, ev, k0, 10, 11, 8, 9 + add_update c, od, k0, 11, 8, 9, 10 + add_update c, ev, k1, 8, 9, 10, 11 + + add_update p, od, k1, 9, 10, 11, 8 + add_update p, ev, k1, 10, 11, 8, 9 + add_update p, od, k1, 11, 8, 9, 10 + add_update p, ev, k1, 8, 9, 10, 11 + add_update p, od, k2, 9, 10, 11, 8 + + add_update m, ev, k2, 10, 11, 8, 9 + add_update m, od, k2, 11, 8, 9, 10 + add_update m, ev, k2, 8, 9, 10, 11 + add_update m, od, k2, 9, 10, 11, 8 + add_update m, ev, k3, 10, 11, 8, 9 + + add_update p, od, k3, 11, 8, 9, 10 + add_only p, ev, k3, 9 + add_only p, od, k3, 10 + add_only p, ev, k3, 11 + add_only p, od + + /* update state */ + add dgbv.2s, dgbv.2s, dg1v.2s + add dgav.4s, dgav.4s, dg0v.4s + + /* loop on next block? */ + cbz w2, 2f + b 0b + + /* store new state */ +2: st1 {dgav.4s}, [x0] + str dgb, [x0, #16] + mov w0, w2 + ret +ENDPROC(sha1_armv8_ce_process) diff --git a/arch/arm/cpu/armv8/sha1_ce_glue.c b/arch/arm/cpu/armv8/sha1_ce_glue.c new file mode 100644 index 0000000..780b119 --- /dev/null +++ b/arch/arm/cpu/armv8/sha1_ce_glue.c @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * sha1_ce_glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions + * + * Copyright (C) 2022 Linaro Ltd loic.poulain@linaro.org + */ + +#include <common.h> +#include <u-boot/sha1.h> + +extern void sha1_armv8_ce_process(uint32_t state[5], uint8_t const *src, + uint32_t blocks); + +void sha1_process(sha1_context *ctx, const unsigned char *data, + unsigned int blocks) +{ + if (!blocks) + return; + + sha1_armv8_ce_process(ctx->state, data, blocks); +}

On Wed, Jun 01, 2022 at 08:26:29PM +0200, Loic Poulain wrote:
This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that have support for the SHA-1 part of the ARM v8 Crypto Extensions.
It greatly improves sha-1 based operations, about 10x faster on iMX8M evk board. ~12ms vs ~165ms for a 20MiB kernel sha-1 verification.
asm implementation is a simplified version of the Linux version (from Ard Biesheuvel).
Signed-off-by: Loic Poulain loic.poulain@linaro.org
Applied to u-boot/next, thanks!

Mark sha256_process as weak to allow hardware specific implementation. Add parameter for supporting multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org --- lib/sha256.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/lib/sha256.c b/lib/sha256.c index c1fe93d..50b0b51 100644 --- a/lib/sha256.c +++ b/lib/sha256.c @@ -14,6 +14,8 @@ #include <watchdog.h> #include <u-boot/sha256.h>
+#include <linux/compiler_attributes.h> + const uint8_t sha256_der_prefix[SHA256_DER_LEN] = { 0x30, 0x31, 0x30, 0x0d, 0x06, 0x09, 0x60, 0x86, 0x48, 0x01, 0x65, 0x03, 0x04, 0x02, 0x01, 0x05, @@ -55,7 +57,7 @@ void sha256_starts(sha256_context * ctx) ctx->state[7] = 0x5BE0CD19; }
-static void sha256_process(sha256_context *ctx, const uint8_t data[64]) +static void sha256_process_one(sha256_context *ctx, const uint8_t data[64]) { uint32_t temp1, temp2; uint32_t W[64]; @@ -186,6 +188,18 @@ static void sha256_process(sha256_context *ctx, const uint8_t data[64]) ctx->state[7] += H; }
+__weak void sha256_process(sha256_context *ctx, const unsigned char *data, + unsigned int blocks) +{ + if (!blocks) + return; + + while (blocks--) { + sha256_process_one(ctx, data); + data += 64; + } +} + void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) { uint32_t left, fill; @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length)
if (left && length >= fill) { memcpy((void *) (ctx->buffer + left), (void *) input, fill); - sha256_process(ctx, ctx->buffer); + sha256_process(ctx, ctx->buffer, 1); length -= fill; input += fill; left = 0; }
- while (length >= 64) { - sha256_process(ctx, input); - length -= 64; - input += 64; - } + sha256_process(ctx, input, length / 64); + input += length / 64 * 64; + length = length % 64;
if (length) memcpy((void *) (ctx->buffer + left), (void *) input, length);

On Wed, Jun 01, 2022 at 08:26:30PM +0200, Loic Poulain wrote:
Mark sha256_process as weak to allow hardware specific implementation. Add parameter for supporting multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
Applied to u-boot/next, thanks!

Hi Loic,
On Wed, 1 Jun 2022 at 12:27, Loic Poulain loic.poulain@linaro.org wrote:
Mark sha256_process as weak to allow hardware specific implementation. Add parameter for supporting multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
lib/sha256.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/lib/sha256.c b/lib/sha256.c index c1fe93d..50b0b51 100644 --- a/lib/sha256.c +++ b/lib/sha256.c @@ -14,6 +14,8 @@ #include <watchdog.h> #include <u-boot/sha256.h>
+#include <linux/compiler_attributes.h>
const uint8_t sha256_der_prefix[SHA256_DER_LEN] = { 0x30, 0x31, 0x30, 0x0d, 0x06, 0x09, 0x60, 0x86, 0x48, 0x01, 0x65, 0x03, 0x04, 0x02, 0x01, 0x05, @@ -55,7 +57,7 @@ void sha256_starts(sha256_context * ctx) ctx->state[7] = 0x5BE0CD19; }
-static void sha256_process(sha256_context *ctx, const uint8_t data[64]) +static void sha256_process_one(sha256_context *ctx, const uint8_t data[64]) { uint32_t temp1, temp2; uint32_t W[64]; @@ -186,6 +188,18 @@ static void sha256_process(sha256_context *ctx, const uint8_t data[64]) ctx->state[7] += H; }
+__weak void sha256_process(sha256_context *ctx, const unsigned char *data,
unsigned int blocks)
+{
if (!blocks)
return;
while (blocks--) {
sha256_process_one(ctx, data);
data += 64;
}
+}
void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) { uint32_t left, fill; @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length)
if (left && length >= fill) { memcpy((void *) (ctx->buffer + left), (void *) input, fill);
sha256_process(ctx, ctx->buffer);
sha256_process(ctx, ctx->buffer, 1); length -= fill; input += fill; left = 0; }
while (length >= 64) {
sha256_process(ctx, input);
length -= 64;
input += 64;
}
sha256_process(ctx, input, length / 64);
input += length / 64 * 64;
length = length % 64; if (length) memcpy((void *) (ctx->buffer + left), (void *) input, length);
-- 2.7.4
I just came across this patch as it broke minnowmax.
This should be using driver model, not weak functions. Please can you take a look?
Regards, Simon

Hi Simon,
Le lun. 6 févr. 2023 à 18:12, Simon Glass sjg@chromium.org a écrit :
Hi Loic,
On Wed, 1 Jun 2022 at 12:27, Loic Poulain loic.poulain@linaro.org wrote:
Mark sha256_process as weak to allow hardware specific implementation. Add parameter for supporting multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
lib/sha256.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/lib/sha256.c b/lib/sha256.c index c1fe93d..50b0b51 100644 --- a/lib/sha256.c +++ b/lib/sha256.c @@ -14,6 +14,8 @@ #include <watchdog.h> #include <u-boot/sha256.h>
+#include <linux/compiler_attributes.h>
const uint8_t sha256_der_prefix[SHA256_DER_LEN] = { 0x30, 0x31, 0x30, 0x0d, 0x06, 0x09, 0x60, 0x86, 0x48, 0x01, 0x65, 0x03, 0x04, 0x02, 0x01, 0x05, @@ -55,7 +57,7 @@ void sha256_starts(sha256_context * ctx) ctx->state[7] = 0x5BE0CD19; }
-static void sha256_process(sha256_context *ctx, const uint8_t data[64]) +static void sha256_process_one(sha256_context *ctx, const uint8_t
data[64])
{ uint32_t temp1, temp2; uint32_t W[64]; @@ -186,6 +188,18 @@ static void sha256_process(sha256_context *ctx,
const uint8_t data[64])
ctx->state[7] += H;
}
+__weak void sha256_process(sha256_context *ctx, const unsigned char
*data,
unsigned int blocks)
+{
if (!blocks)
return;
while (blocks--) {
sha256_process_one(ctx, data);
data += 64;
}
+}
void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t
length)
{ uint32_t left, fill; @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const
uint8_t *input, uint32_t length)
if (left && length >= fill) { memcpy((void *) (ctx->buffer + left), (void *) input,
fill);
sha256_process(ctx, ctx->buffer);
sha256_process(ctx, ctx->buffer, 1); length -= fill; input += fill; left = 0; }
while (length >= 64) {
sha256_process(ctx, input);
length -= 64;
input += 64;
}
sha256_process(ctx, input, length / 64);
input += length / 64 * 64;
length = length % 64; if (length) memcpy((void *) (ctx->buffer + left), (void *) input,
length);
-- 2.7.4
I just came across this patch as it broke minnowmax.
Ok, is it a build time or runtime break?
This should be using driver model, not weak functions. Please can you take a look?
Yes I can look at it in the next few days. I have used weak function because it’s an architecture feature offered by armv8 instructions, It’s not strictly speaking an internal device/IP.
Regards, Loic

Hi Loic,
On Mon, 6 Feb 2023 at 15:12, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
Le lun. 6 févr. 2023 à 18:12, Simon Glass sjg@chromium.org a écrit :
Hi Loic,
On Wed, 1 Jun 2022 at 12:27, Loic Poulain loic.poulain@linaro.org wrote:
Mark sha256_process as weak to allow hardware specific implementation. Add parameter for supporting multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
lib/sha256.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/lib/sha256.c b/lib/sha256.c index c1fe93d..50b0b51 100644 --- a/lib/sha256.c +++ b/lib/sha256.c @@ -14,6 +14,8 @@ #include <watchdog.h> #include <u-boot/sha256.h>
+#include <linux/compiler_attributes.h>
const uint8_t sha256_der_prefix[SHA256_DER_LEN] = { 0x30, 0x31, 0x30, 0x0d, 0x06, 0x09, 0x60, 0x86, 0x48, 0x01, 0x65, 0x03, 0x04, 0x02, 0x01, 0x05, @@ -55,7 +57,7 @@ void sha256_starts(sha256_context * ctx) ctx->state[7] = 0x5BE0CD19; }
-static void sha256_process(sha256_context *ctx, const uint8_t data[64]) +static void sha256_process_one(sha256_context *ctx, const uint8_t data[64]) { uint32_t temp1, temp2; uint32_t W[64]; @@ -186,6 +188,18 @@ static void sha256_process(sha256_context *ctx, const uint8_t data[64]) ctx->state[7] += H; }
+__weak void sha256_process(sha256_context *ctx, const unsigned char *data,
unsigned int blocks)
+{
if (!blocks)
return;
while (blocks--) {
sha256_process_one(ctx, data);
data += 64;
}
+}
void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) { uint32_t left, fill; @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length)
if (left && length >= fill) { memcpy((void *) (ctx->buffer + left), (void *) input, fill);
sha256_process(ctx, ctx->buffer);
sha256_process(ctx, ctx->buffer, 1); length -= fill; input += fill; left = 0; }
while (length >= 64) {
sha256_process(ctx, input);
length -= 64;
input += 64;
}
sha256_process(ctx, input, length / 64);
input += length / 64 * 64;
length = length % 64; if (length) memcpy((void *) (ctx->buffer + left), (void *) input, length);
-- 2.7.4
I just came across this patch as it broke minnowmax.
Ok, is it a build time or runtime break?
Build, but you need the binary blobs to see it :-(
This should be using driver model, not weak functions. Please can you take a look?
Yes I can look at it in the next few days. I have used weak function because it’s an architecture feature offered by armv8 instructions, It’s not strictly speaking an internal device/IP.
Thanks.
Right, same as hardware-accelerated hashing hardware in my book.
See hash.c which has become a mess. We have been trying to make do with a list of algos, but given all the updates I think needs a new UCLASS_HASH with the same operations as in hash.h
Regards, Simon

Hi Simon,
On Tue, 7 Feb 2023 at 05:05, Simon Glass sjg@chromium.org wrote:
Hi Loic,
On Mon, 6 Feb 2023 at 15:12, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
Le lun. 6 févr. 2023 à 18:12, Simon Glass sjg@chromium.org a écrit :
Hi Loic,
On Wed, 1 Jun 2022 at 12:27, Loic Poulain loic.poulain@linaro.org wrote:
Mark sha256_process as weak to allow hardware specific implementation. Add parameter for supporting multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
lib/sha256.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
[...]
+__weak void sha256_process(sha256_context *ctx, const unsigned char *data,
unsigned int blocks)
+{
if (!blocks)
return;
while (blocks--) {
sha256_process_one(ctx, data);
data += 64;
}
+}
void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) { uint32_t left, fill; @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length)
if (left && length >= fill) { memcpy((void *) (ctx->buffer + left), (void *) input, fill);
sha256_process(ctx, ctx->buffer);
sha256_process(ctx, ctx->buffer, 1); length -= fill; input += fill; left = 0; }
while (length >= 64) {
sha256_process(ctx, input);
length -= 64;
input += 64;
}
sha256_process(ctx, input, length / 64);
input += length / 64 * 64;
length = length % 64; if (length) memcpy((void *) (ctx->buffer + left), (void *) input, length);
-- 2.7.4
I just came across this patch as it broke minnowmax.
Ok, is it a build time or runtime break?
Build, but you need the binary blobs to see it :-(
This should be using driver model, not weak functions. Please can you take a look?
Just tested the minnowmax build (b69026c91f2e; minnowmax_defconfig; gcc-11.3.0), and I've not observed any issue (but I had to fake some of the binary blobs...). Could you share the build problem/error you encountered? As you mentioned it, Is the error specifically related to _weak function linking? Would like to have a simple and quick fix before trying to move on to a more proper DM based solution.
Thanks, Loic
Yes I can look at it in the next few days. I have used weak function because it’s an architecture feature offered by armv8 instructions, It’s not strictly speaking an internal device/IP.
Thanks.
Right, same as hardware-accelerated hashing hardware in my book.
See hash.c which has become a mess. We have been trying to make do with a list of algos, but given all the updates I think needs a new UCLASS_HASH with the same operations as in hash.h
Regards, Simon

Hi Loic,
On Tue, 7 Feb 2023 at 14:47, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
On Tue, 7 Feb 2023 at 05:05, Simon Glass sjg@chromium.org wrote:
Hi Loic,
On Mon, 6 Feb 2023 at 15:12, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
Le lun. 6 févr. 2023 à 18:12, Simon Glass sjg@chromium.org a écrit :
Hi Loic,
On Wed, 1 Jun 2022 at 12:27, Loic Poulain loic.poulain@linaro.org wrote:
Mark sha256_process as weak to allow hardware specific implementation. Add parameter for supporting multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
lib/sha256.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
[...]
+__weak void sha256_process(sha256_context *ctx, const unsigned char *data,
unsigned int blocks)
+{
if (!blocks)
return;
while (blocks--) {
sha256_process_one(ctx, data);
data += 64;
}
+}
void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) { uint32_t left, fill; @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length)
if (left && length >= fill) { memcpy((void *) (ctx->buffer + left), (void *) input, fill);
sha256_process(ctx, ctx->buffer);
sha256_process(ctx, ctx->buffer, 1); length -= fill; input += fill; left = 0; }
while (length >= 64) {
sha256_process(ctx, input);
length -= 64;
input += 64;
}
sha256_process(ctx, input, length / 64);
input += length / 64 * 64;
length = length % 64; if (length) memcpy((void *) (ctx->buffer + left), (void *) input, length);
-- 2.7.4
I just came across this patch as it broke minnowmax.
Ok, is it a build time or runtime break?
Build, but you need the binary blobs to see it :-(
This should be using driver model, not weak functions. Please can you take a look?
Just tested the minnowmax build (b69026c91f2e; minnowmax_defconfig; gcc-11.3.0), and I've not observed any issue (but I had to fake some of the binary blobs...). Could you share the build problem/error you
Unfortunately you need the blobs!
encountered? As you mentioned it, Is the error specifically related to _weak function linking? Would like to have a simple and quick fix before trying to move on to a more proper DM based solution.
It is just because of the code size increase, I believe. I am planning to dig into it a bit as Bin Meng asked for more info as to why I sent a revert for his patch moving U-Boot.
Regards, Simon
Thanks, Loic
Yes I can look at it in the next few days. I have used weak function because it’s an architecture feature offered by armv8 instructions, It’s not strictly speaking an internal device/IP.
Thanks.
Right, same as hardware-accelerated hashing hardware in my book.
See hash.c which has become a mess. We have been trying to make do with a list of algos, but given all the updates I think needs a new UCLASS_HASH with the same operations as in hash.h
Regards, Simon

On Tue, Feb 07, 2023 at 03:25:16PM -0700, Simon Glass wrote:
Hi Loic,
On Tue, 7 Feb 2023 at 14:47, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
On Tue, 7 Feb 2023 at 05:05, Simon Glass sjg@chromium.org wrote:
Hi Loic,
On Mon, 6 Feb 2023 at 15:12, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
Le lun. 6 févr. 2023 à 18:12, Simon Glass sjg@chromium.org a écrit :
Hi Loic,
On Wed, 1 Jun 2022 at 12:27, Loic Poulain loic.poulain@linaro.org wrote:
Mark sha256_process as weak to allow hardware specific implementation. Add parameter for supporting multiple blocks processing.
Signed-off-by: Loic Poulain loic.poulain@linaro.org
lib/sha256.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
[...]
+__weak void sha256_process(sha256_context *ctx, const unsigned char *data,
unsigned int blocks)
+{
if (!blocks)
return;
while (blocks--) {
sha256_process_one(ctx, data);
data += 64;
}
+}
void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) { uint32_t left, fill; @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length)
if (left && length >= fill) { memcpy((void *) (ctx->buffer + left), (void *) input, fill);
sha256_process(ctx, ctx->buffer);
sha256_process(ctx, ctx->buffer, 1); length -= fill; input += fill; left = 0; }
while (length >= 64) {
sha256_process(ctx, input);
length -= 64;
input += 64;
}
sha256_process(ctx, input, length / 64);
input += length / 64 * 64;
length = length % 64; if (length) memcpy((void *) (ctx->buffer + left), (void *) input, length);
-- 2.7.4
I just came across this patch as it broke minnowmax.
Ok, is it a build time or runtime break?
Build, but you need the binary blobs to see it :-(
This should be using driver model, not weak functions. Please can you take a look?
Just tested the minnowmax build (b69026c91f2e; minnowmax_defconfig; gcc-11.3.0), and I've not observed any issue (but I had to fake some of the binary blobs...). Could you share the build problem/error you
Unfortunately you need the blobs!
encountered? As you mentioned it, Is the error specifically related to _weak function linking? Would like to have a simple and quick fix before trying to move on to a more proper DM based solution.
It is just because of the code size increase, I believe. I am planning to dig into it a bit as Bin Meng asked for more info as to why I sent a revert for his patch moving U-Boot.
That honestly makes more sense, having stared at the commit in question. Perhaps Minnow needs LTO enabled.

Hi Tom,
On Tue, 7 Feb 2023 at 17:10, Tom Rini trini@konsulko.com wrote:
On Tue, Feb 07, 2023 at 03:25:16PM -0700, Simon Glass wrote:
Hi Loic,
On Tue, 7 Feb 2023 at 14:47, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
On Tue, 7 Feb 2023 at 05:05, Simon Glass sjg@chromium.org wrote:
Hi Loic,
On Mon, 6 Feb 2023 at 15:12, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
Le lun. 6 févr. 2023 à 18:12, Simon Glass sjg@chromium.org a écrit :
Hi Loic,
On Wed, 1 Jun 2022 at 12:27, Loic Poulain loic.poulain@linaro.org wrote: > > Mark sha256_process as weak to allow hardware specific implementation. > Add parameter for supporting multiple blocks processing. > > Signed-off-by: Loic Poulain loic.poulain@linaro.org > --- > lib/sha256.c | 26 +++++++++++++++++++------- > 1 file changed, 19 insertions(+), 7 deletions(-) >
[...]
> +__weak void sha256_process(sha256_context *ctx, const unsigned char *data, > + unsigned int blocks) > +{ > + if (!blocks) > + return; > + > + while (blocks--) { > + sha256_process_one(ctx, data); > + data += 64; > + } > +} > + > void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) > { > uint32_t left, fill; > @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) > > if (left && length >= fill) { > memcpy((void *) (ctx->buffer + left), (void *) input, fill); > - sha256_process(ctx, ctx->buffer); > + sha256_process(ctx, ctx->buffer, 1); > length -= fill; > input += fill; > left = 0; > } > > - while (length >= 64) { > - sha256_process(ctx, input); > - length -= 64; > - input += 64; > - } > + sha256_process(ctx, input, length / 64); > + input += length / 64 * 64; > + length = length % 64; > > if (length) > memcpy((void *) (ctx->buffer + left), (void *) input, length); > -- > 2.7.4 >
I just came across this patch as it broke minnowmax.
Ok, is it a build time or runtime break?
Build, but you need the binary blobs to see it :-(
This should be using driver model, not weak functions. Please can you take a look?
Just tested the minnowmax build (b69026c91f2e; minnowmax_defconfig; gcc-11.3.0), and I've not observed any issue (but I had to fake some of the binary blobs...). Could you share the build problem/error you
Unfortunately you need the blobs!
encountered? As you mentioned it, Is the error specifically related to _weak function linking? Would like to have a simple and quick fix before trying to move on to a more proper DM based solution.
It is just because of the code size increase, I believe. I am planning to dig into it a bit as Bin Meng asked for more info as to why I sent a revert for his patch moving U-Boot.
That honestly makes more sense, having stared at the commit in question. Perhaps Minnow needs LTO enabled.
Yes, that would likely help. But Bin's point is that it should be possible to move the text base.
Anyway, the point of this thread is that this needs to be done as drivers rather than weak functions. Unfortunately hash.c has grown a few appendages...this is yet another.
Regards, Simon

On Wed, Feb 08, 2023 at 11:28:21AM -0700, Simon Glass wrote:
Hi Tom,
On Tue, 7 Feb 2023 at 17:10, Tom Rini trini@konsulko.com wrote:
On Tue, Feb 07, 2023 at 03:25:16PM -0700, Simon Glass wrote:
Hi Loic,
On Tue, 7 Feb 2023 at 14:47, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
On Tue, 7 Feb 2023 at 05:05, Simon Glass sjg@chromium.org wrote:
Hi Loic,
On Mon, 6 Feb 2023 at 15:12, Loic Poulain loic.poulain@linaro.org wrote:
Hi Simon,
Le lun. 6 févr. 2023 à 18:12, Simon Glass sjg@chromium.org a écrit : > > Hi Loic, > > On Wed, 1 Jun 2022 at 12:27, Loic Poulain loic.poulain@linaro.org wrote: > > > > Mark sha256_process as weak to allow hardware specific implementation. > > Add parameter for supporting multiple blocks processing. > > > > Signed-off-by: Loic Poulain loic.poulain@linaro.org > > --- > > lib/sha256.c | 26 +++++++++++++++++++------- > > 1 file changed, 19 insertions(+), 7 deletions(-) > >
[...]
> > +__weak void sha256_process(sha256_context *ctx, const unsigned char *data, > > + unsigned int blocks) > > +{ > > + if (!blocks) > > + return; > > + > > + while (blocks--) { > > + sha256_process_one(ctx, data); > > + data += 64; > > + } > > +} > > + > > void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) > > { > > uint32_t left, fill; > > @@ -204,17 +218,15 @@ void sha256_update(sha256_context *ctx, const uint8_t *input, uint32_t length) > > > > if (left && length >= fill) { > > memcpy((void *) (ctx->buffer + left), (void *) input, fill); > > - sha256_process(ctx, ctx->buffer); > > + sha256_process(ctx, ctx->buffer, 1); > > length -= fill; > > input += fill; > > left = 0; > > } > > > > - while (length >= 64) { > > - sha256_process(ctx, input); > > - length -= 64; > > - input += 64; > > - } > > + sha256_process(ctx, input, length / 64); > > + input += length / 64 * 64; > > + length = length % 64; > > > > if (length) > > memcpy((void *) (ctx->buffer + left), (void *) input, length); > > -- > > 2.7.4 > > > > I just came across this patch as it broke minnowmax.
Ok, is it a build time or runtime break?
Build, but you need the binary blobs to see it :-(
> > This should be using driver model, not weak functions. Please can you > take a look?
Just tested the minnowmax build (b69026c91f2e; minnowmax_defconfig; gcc-11.3.0), and I've not observed any issue (but I had to fake some of the binary blobs...). Could you share the build problem/error you
Unfortunately you need the blobs!
encountered? As you mentioned it, Is the error specifically related to _weak function linking? Would like to have a simple and quick fix before trying to move on to a more proper DM based solution.
It is just because of the code size increase, I believe. I am planning to dig into it a bit as Bin Meng asked for more info as to why I sent a revert for his patch moving U-Boot.
That honestly makes more sense, having stared at the commit in question. Perhaps Minnow needs LTO enabled.
Yes, that would likely help. But Bin's point is that it should be possible to move the text base.
I suspect that might just be more fall-out of the size problem and possibly some hard coded assumptions in the blobs?
Anyway, the point of this thread is that this needs to be done as drivers rather than weak functions. Unfortunately hash.c has grown a few appendages...this is yet another.
I don't know, it seems fine, especially since we aren't really talking about a "hash driver" but rather introducing some performance sensitive code.

This patch adds support for the SHA-256 Secure Hash Algorithm for CPUs that have support for the SHA-256 part of the ARM v8 Crypto Extensions.
It greatly improves sha-256 based operations, about 17x faster on iMX8M evk board. ~12ms vs ~208ms for a 20MiB kernel sha-256 verification.
asm implementation is a simplified version of the Linux version (from Ard Biesheuvel).
Signed-off-by: Loic Poulain loic.poulain@linaro.org --- arch/arm/cpu/armv8/Kconfig | 4 ++ arch/arm/cpu/armv8/Makefile | 1 + arch/arm/cpu/armv8/sha256_ce_core.S | 134 ++++++++++++++++++++++++++++++++++++ arch/arm/cpu/armv8/sha256_ce_glue.c | 21 ++++++ 4 files changed, 160 insertions(+) create mode 100644 arch/arm/cpu/armv8/sha256_ce_core.S create mode 100644 arch/arm/cpu/armv8/sha256_ce_glue.c
diff --git a/arch/arm/cpu/armv8/Kconfig b/arch/arm/cpu/armv8/Kconfig index 0b11ca8..0494a08 100644 --- a/arch/arm/cpu/armv8/Kconfig +++ b/arch/arm/cpu/armv8/Kconfig @@ -180,6 +180,10 @@ config ARMV8_CE_SHA1 bool "SHA-1 digest algorithm (ARMv8 Crypto Extensions)" default y if SHA1
+config ARMV8_CE_SHA256 + bool "SHA-256 digest algorithm (ARMv8 Crypto Extensions)" + default y if SHA256 + endif
endif diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile index ff2495c..2e4bf9e 100644 --- a/arch/arm/cpu/armv8/Makefile +++ b/arch/arm/cpu/armv8/Makefile @@ -45,3 +45,4 @@ obj-$(CONFIG_ARMV8_PSCI) += psci.o obj-$(CONFIG_TARGET_BCMNS3) += bcmns3/ obj-$(CONFIG_XEN) += xen/ obj-$(CONFIG_ARMV8_CE_SHA1) += sha1_ce_glue.o sha1_ce_core.o +obj-$(CONFIG_ARMV8_CE_SHA256) += sha256_ce_glue.o sha256_ce_core.o diff --git a/arch/arm/cpu/armv8/sha256_ce_core.S b/arch/arm/cpu/armv8/sha256_ce_core.S new file mode 100644 index 0000000..fbae3ca --- /dev/null +++ b/arch/arm/cpu/armv8/sha256_ce_core.S @@ -0,0 +1,134 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * sha256-ce-core.S - core SHA-256 transform using v8 Crypto Extensions + * + * Copyright (C) 2014 Linaro Ltd ard.biesheuvel@linaro.org + * Copyright (C) 2022 Linaro Ltd loic.poulain@linaro.org + */ + + #include <config.h> + #include <linux/linkage.h> + #include <asm/system.h> + #include <asm/macro.h> + + .text + .arch armv8-a+crypto + + dga .req q20 + dgav .req v20 + dgb .req q21 + dgbv .req v21 + + t0 .req v22 + t1 .req v23 + + dg0q .req q24 + dg0v .req v24 + dg1q .req q25 + dg1v .req v25 + dg2q .req q26 + dg2v .req v26 + + .macro add_only, ev, rc, s0 + mov dg2v.16b, dg0v.16b + .ifeq \ev + add t1.4s, v\s0().4s, \rc().4s + sha256h dg0q, dg1q, t0.4s + sha256h2 dg1q, dg2q, t0.4s + .else + .ifnb \s0 + add t0.4s, v\s0().4s, \rc().4s + .endif + sha256h dg0q, dg1q, t1.4s + sha256h2 dg1q, dg2q, t1.4s + .endif + .endm + + .macro add_update, ev, rc, s0, s1, s2, s3 + sha256su0 v\s0().4s, v\s1().4s + add_only \ev, \rc, \s1 + sha256su1 v\s0().4s, v\s2().4s, v\s3().4s + .endm + + /* + * The SHA-256 round constants + */ + .align 4 +.Lsha2_rcon: + .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5 + .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5 + .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3 + .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174 + .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc + .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da + .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7 + .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967 + .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13 + .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85 + .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3 + .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070 + .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5 + .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3 + .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208 + .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 + + /* + * void sha256_armv8_ce_process(struct sha256_ce_state *sst, + * uint8_t const *src, uint32_t blocks) + */ +ENTRY(sha256_armv8_ce_process) + /* load round constants */ + adr x8, .Lsha2_rcon + ld1 { v0.4s- v3.4s}, [x8], #64 + ld1 { v4.4s- v7.4s}, [x8], #64 + ld1 { v8.4s-v11.4s}, [x8], #64 + ld1 {v12.4s-v15.4s}, [x8] + + /* load state */ + ldp dga, dgb, [x0] + + /* load input */ +0: ld1 {v16.4s-v19.4s}, [x1], #64 + sub w2, w2, #1 +#if __BYTE_ORDER == __LITTLE_ENDIAN + rev32 v16.16b, v16.16b + rev32 v17.16b, v17.16b + rev32 v18.16b, v18.16b + rev32 v19.16b, v19.16b +#endif + +1: add t0.4s, v16.4s, v0.4s + mov dg0v.16b, dgav.16b + mov dg1v.16b, dgbv.16b + + add_update 0, v1, 16, 17, 18, 19 + add_update 1, v2, 17, 18, 19, 16 + add_update 0, v3, 18, 19, 16, 17 + add_update 1, v4, 19, 16, 17, 18 + + add_update 0, v5, 16, 17, 18, 19 + add_update 1, v6, 17, 18, 19, 16 + add_update 0, v7, 18, 19, 16, 17 + add_update 1, v8, 19, 16, 17, 18 + + add_update 0, v9, 16, 17, 18, 19 + add_update 1, v10, 17, 18, 19, 16 + add_update 0, v11, 18, 19, 16, 17 + add_update 1, v12, 19, 16, 17, 18 + + add_only 0, v13, 17 + add_only 1, v14, 18 + add_only 0, v15, 19 + add_only 1 + + /* update state */ + add dgav.4s, dgav.4s, dg0v.4s + add dgbv.4s, dgbv.4s, dg1v.4s + + /* handled all input blocks? */ + cbnz w2, 0b + + /* store new state */ +3: stp dga, dgb, [x0] + ret +ENDPROC(sha256_armv8_ce_process) diff --git a/arch/arm/cpu/armv8/sha256_ce_glue.c b/arch/arm/cpu/armv8/sha256_ce_glue.c new file mode 100644 index 0000000..67dd796 --- /dev/null +++ b/arch/arm/cpu/armv8/sha256_ce_glue.c @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * sha256_ce_glue.c - SHA-256 secure hash using ARMv8 Crypto Extensions + * + * Copyright (C) 2022 Linaro Ltd loic.poulain@linaro.org + */ + +#include <common.h> +#include <u-boot/sha256.h> + +extern void sha256_armv8_ce_process(uint32_t state[8], uint8_t const *src, + uint32_t blocks); + +void sha256_process(sha256_context *ctx, const unsigned char *data, + unsigned int blocks) +{ + if (!blocks) + return; + + sha256_armv8_ce_process(ctx->state, data, blocks); +}

Now that we can make use of CPU features for sha1/sha256, enable in QEMU so that we get some test coverage.
Cc: Loic Poulain loic.poulain@linaro.org Cc: Tuomas Tynkkynen tuomas.tynkkynen@iki.fi Signed-off-by: Tom Rini trini@konsulko.com --- configs/qemu_arm64_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/configs/qemu_arm64_defconfig b/configs/qemu_arm64_defconfig index 87acf00f30e3..f7c93ba2af54 100644 --- a/configs/qemu_arm64_defconfig +++ b/configs/qemu_arm64_defconfig @@ -8,6 +8,7 @@ CONFIG_ENV_SECT_SIZE=0x40000 CONFIG_DEFAULT_DEVICE_TREE="qemu-arm64" CONFIG_DEBUG_UART_BASE=0x9000000 CONFIG_DEBUG_UART_CLOCK=0 +CONFIG_ARMV8_CRYPTO=y CONFIG_SYS_LOAD_ADDR=0x40200000 CONFIG_ENV_ADDR=0x4000000 CONFIG_DEBUG_UART=y

On Thu, Jun 23, 2022 at 03:51:31PM -0400, Tom Rini wrote:
Now that we can make use of CPU features for sha1/sha256, enable in QEMU so that we get some test coverage.
Cc: Loic Poulain loic.poulain@linaro.org Cc: Tuomas Tynkkynen tuomas.tynkkynen@iki.fi Signed-off-by: Tom Rini trini@konsulko.com
Applied to u-boot/next, thanks!

On Wed, Jun 01, 2022 at 08:26:31PM +0200, Loic Poulain wrote:
This patch adds support for the SHA-256 Secure Hash Algorithm for CPUs that have support for the SHA-256 part of the ARM v8 Crypto Extensions.
It greatly improves sha-256 based operations, about 17x faster on iMX8M evk board. ~12ms vs ~208ms for a 20MiB kernel sha-256 verification.
asm implementation is a simplified version of the Linux version (from Ard Biesheuvel).
Signed-off-by: Loic Poulain loic.poulain@linaro.org
Applied to u-boot/next, thanks!

Hi Folks,
Any comments on this series? Anyone else to CC?
Thanks, Loic
On Wed, 1 Jun 2022 at 20:26, Loic Poulain loic.poulain@linaro.org wrote:
This series adds support for the SHA-1 and SHA-256 Secure Hash Algorithm for CPUs that have support of the ARM v8 Crypto Extensions. It Improves speed of integrity & signature checking procedures.
V2:
- Add cover letter & sha256 support.
- Kconfig default 'y' only if SHA1 and SHA256 selected
Loic Poulain (5): lib: sha1: Add support for hardware specific sha1_process sha1: Fix digest state size/type armv8 SHA-1 using ARMv8 Crypto Extensions: lib: sha256: Add support for hardware specific sha256_process armv8 SHA-256 using ARMv8 Crypto Extensions
arch/arm/cpu/armv8/Kconfig | 15 ++++ arch/arm/cpu/armv8/Makefile | 2 + arch/arm/cpu/armv8/sha1_ce_core.S | 132 +++++++++++++++++++++++++++++++++++ arch/arm/cpu/armv8/sha1_ce_glue.c | 21 ++++++ arch/arm/cpu/armv8/sha256_ce_core.S | 134 ++++++++++++++++++++++++++++++++++++ arch/arm/cpu/armv8/sha256_ce_glue.c | 21 ++++++ include/u-boot/sha1.h | 2 +- lib/sha1.c | 26 +++++-- lib/sha256.c | 26 +++++-- 9 files changed, 364 insertions(+), 15 deletions(-) create mode 100644 arch/arm/cpu/armv8/sha1_ce_core.S create mode 100644 arch/arm/cpu/armv8/sha1_ce_glue.c create mode 100644 arch/arm/cpu/armv8/sha256_ce_core.S create mode 100644 arch/arm/cpu/armv8/sha256_ce_glue.c
-- 2.7.4
participants (3)
-
Loic Poulain
-
Simon Glass
-
Tom Rini