[PATCH] powerpc: reduce number of WATCHDOG_RESET calls from flush_cache

Calling WATCHDOG_RESET for each and every cache line is overkill.
In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.
I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).
64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk --- arch/powerpc/lib/cache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c index 528361e972..df2310f4e2 100644 --- a/arch/powerpc/lib/cache.c +++ b/arch/powerpc/lib/cache.c @@ -8,6 +8,7 @@ #include <cpu_func.h> #include <asm/cache.h> #include <watchdog.h> +#include <linux/sizes.h>
void flush_cache(ulong start_addr, ulong size) { @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } /* wait for all dcbst to complete on bus */ asm volatile("sync" : : : "memory"); + WATCHDOG_RESET();
for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } asm volatile("sync" : : : "memory"); /* flush prefetch queue */

On 04.06.20 11:30, Rasmus Villemoes wrote:
Calling WATCHDOG_RESET for each and every cache line is overkill.
In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.
I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).
64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
Reviewed-by: Stefan Roese sr@denx.de
Thanks, Stefan
arch/powerpc/lib/cache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c index 528361e972..df2310f4e2 100644 --- a/arch/powerpc/lib/cache.c +++ b/arch/powerpc/lib/cache.c @@ -8,6 +8,7 @@ #include <cpu_func.h> #include <asm/cache.h> #include <watchdog.h> +#include <linux/sizes.h>
void flush_cache(ulong start_addr, ulong size) { @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("dcbst 0,%0" : : "r" (addr) : "memory");
WATCHDOG_RESET();
if ((addr & (SZ_64K - 1)) == 0)
WATCHDOG_RESET();
} /* wait for all dcbst to complete on bus */ asm volatile("sync" : : : "memory");
WATCHDOG_RESET();
for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("icbi 0,%0" : : "r" (addr) : "memory");
WATCHDOG_RESET();
if ((addr & (SZ_64K - 1)) == 0)
} asm volatile("sync" : : : "memory"); /* flush prefetch queue */WATCHDOG_RESET();

On 04/06/2020 12.31, Stefan Roese wrote:
On 04.06.20 11:30, Rasmus Villemoes wrote:
Calling WATCHDOG_RESET for each and every cache line is overkill.
In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.
I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).
64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
Reviewed-by: Stefan Roese sr@denx.de
Ping.
Thanks, Stefan
arch/powerpc/lib/cache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c index 528361e972..df2310f4e2 100644 --- a/arch/powerpc/lib/cache.c +++ b/arch/powerpc/lib/cache.c @@ -8,6 +8,7 @@ #include <cpu_func.h> #include <asm/cache.h> #include <watchdog.h> +#include <linux/sizes.h> void flush_cache(ulong start_addr, ulong size) { @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } /* wait for all dcbst to complete on bus */ asm volatile("sync" : : : "memory"); + WATCHDOG_RESET(); for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } asm volatile("sync" : : : "memory"); /* flush prefetch queue */

On Fri, Sep 18, 2020 at 10:20:46AM +0200, Rasmus Villemoes wrote:
On 04/06/2020 12.31, Stefan Roese wrote:
On 04.06.20 11:30, Rasmus Villemoes wrote:
Calling WATCHDOG_RESET for each and every cache line is overkill.
In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.
I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).
64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
Reviewed-by: Stefan Roese sr@denx.de
Ping.
This too should end up in -next "soon". Thanks.

Dear Rasmus,
In message afea5640-9d95-bf92-316a-0d674a5129ce@prevas.dk you wrote:
On 04/06/2020 12.31, Stefan Roese wrote:
On 04.06.20 11:30, Rasmus Villemoes wrote:
Calling WATCHDOG_RESET for each and every cache line is overkill.
In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.
I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of
Maybe it is, maybe it is not. I remember boards where the watchdog trigger had to happen in pretty tight intervals, something like min 20, max 60 milliseconds.
+ if ((addr & (SZ_64K - 1)) == 0)
In any case, please write readable code. The use of SZ_ here is not a good idea. Also, you might want to make the actual parameter configurable, so boards with specific timing requirements can adjust this as needed.
Best regards,
Wolfgang Denk

On Wed, Sep 23, 2020 at 02:48:18PM +0200, Wolfgang Denk wrote:
Dear Rasmus,
In message afea5640-9d95-bf92-316a-0d674a5129ce@prevas.dk you wrote:
On 04/06/2020 12.31, Stefan Roese wrote:
On 04.06.20 11:30, Rasmus Villemoes wrote:
Calling WATCHDOG_RESET for each and every cache line is overkill.
In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.
I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of
Maybe it is, maybe it is not. I remember boards where the watchdog trigger had to happen in pretty tight intervals, something like min 20, max 60 milliseconds.
+ if ((addr & (SZ_64K - 1)) == 0)
In any case, please write readable code. The use of SZ_ here is not a good idea. Also, you might want to make the actual parameter configurable, so boards with specific timing requirements can adjust this as needed.
We should add a comment somewhere here to match the commit message, but since we're flushing on 64K chunks, SZ_64K makes sense. If we make it build-time configurable then it's not a concern. I assume we don't want to make this run time configurable.

-----Original Message----- From: U-Boot u-boot-bounces@lists.denx.de On Behalf Of Rasmus Villemoes Sent: Friday, September 18, 2020 1:51 PM To: Stefan Roese sr@denx.de; u-boot@lists.denx.de Cc: Christophe Leroy christophe.leroy@c-s.fr; Wolfgang Denk wd@denx.de; Simon Glass sjg@chromium.org; Tom Rini trini@konsulko.com Subject: Re: [PATCH] powerpc: reduce number of WATCHDOG_RESET calls from flush_cache
On 04/06/2020 12.31, Stefan Roese wrote:
On 04.06.20 11:30, Rasmus Villemoes wrote:
Calling WATCHDOG_RESET for each and every cache line is overkill.
In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.
I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).
64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being
done.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
Reviewed-by: Stefan Roese sr@denx.de
<snip> Applied to u-boot-mpc85xx. Awaiting upstream
Thanks Priyanka

Dear Priyanka,
In message VE1PR04MB649416C255B3379B43E7F7FDE6380@VE1PR04MB6494.eurprd04.prod.outlook.com you wrote:
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
Reviewed-by: Stefan Roese sr@denx.de
<snip> Applied to u-boot-mpc85xx. Awaiting upstream
I object. This must at least be configurable, so that boards with tight timing requirements can still work.
Best regards,
Wolfgang Denk

On Thu, Sep 24, 2020 at 12:57:36PM +0200, Wolfgang Denk wrote:
Dear Priyanka,
In message VE1PR04MB649416C255B3379B43E7F7FDE6380@VE1PR04MB6494.eurprd04.prod.outlook.com you wrote:
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk
Reviewed-by: Stefan Roese sr@denx.de
<snip> Applied to u-boot-mpc85xx. Awaiting upstream
I object. This must at least be configurable, so that boards with tight timing requirements can still work.
I'm removing just this patch from the -next pull request.
participants (5)
-
Priyanka Jain (OSS)
-
Rasmus Villemoes
-
Stefan Roese
-
Tom Rini
-
Wolfgang Denk