[PATCH] powerpc: reduce number of WATCHDOG_RESET calls from flush_cache

newer
pull request of u-boot-mpc85xx for...

Rasmus Villemoes

4 Jun 2020 4 Jun '20

11:30 a.m.

Calling WATCHDOG_RESET for each and every cache line is overkill.

In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.

I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).

64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done.

Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk --- arch/powerpc/lib/cache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c index 528361e972..df2310f4e2 100644 --- a/arch/powerpc/lib/cache.c +++ b/arch/powerpc/lib/cache.c @@ -8,6 +8,7 @@ #include <cpu_func.h> #include <asm/cache.h> #include <watchdog.h> +#include <linux/sizes.h>

void flush_cache(ulong start_addr, ulong size) { @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } /* wait for all dcbst to complete on bus */ asm volatile("sync" : : : "memory"); + WATCHDOG_RESET();

for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } asm volatile("sync" : : : "memory"); /* flush prefetch queue */

-- 2.23.0

Show replies by date

Stefan Roese

4 Jun 4 Jun

12:31 p.m.

On 04.06.20 11:30, Rasmus Villemoes wrote:

...

Calling WATCHDOG_RESET for each and every cache line is overkill.

In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.

I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).

64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done.

Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk

Reviewed-by: Stefan Roese sr@denx.de

Thanks, Stefan

...

arch/powerpc/lib/cache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c index 528361e972..df2310f4e2 100644 --- a/arch/powerpc/lib/cache.c +++ b/arch/powerpc/lib/cache.c @@ -8,6 +8,7 @@ #include <cpu_func.h> #include <asm/cache.h> #include <watchdog.h> +#include <linux/sizes.h>

void flush_cache(ulong start_addr, ulong size) { @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("dcbst 0,%0" : : "r" (addr) : "memory");
WATCHDOG_RESET();
if ((addr & (SZ_64K - 1)) == 0)
	WATCHDOG_RESET();
} /* wait for all dcbst to complete on bus */ asm volatile("sync" : : : "memory");
WATCHDOG_RESET();

for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("icbi 0,%0" : : "r" (addr) : "memory");
WATCHDOG_RESET();
if ((addr & (SZ_64K - 1)) == 0)
	WATCHDOG_RESET();
} asm volatile("sync" : : : "memory"); /* flush prefetch queue */

Rasmus Villemoes

18 Sep 18 Sep

10:20 a.m.

On 04/06/2020 12.31, Stefan Roese wrote:

...

On 04.06.20 11:30, Rasmus Villemoes wrote:

...
Calling WATCHDOG_RESET for each and every cache line is overkill.

In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.

I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).

64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done.

Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk

Reviewed-by: Stefan Roese sr@denx.de

Ping.

...

Thanks, Stefan

...

arch/powerpc/lib/cache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c index 528361e972..df2310f4e2 100644 --- a/arch/powerpc/lib/cache.c +++ b/arch/powerpc/lib/cache.c @@ -8,6 +8,7 @@ #include <cpu_func.h> #include <asm/cache.h> #include <watchdog.h> +#include <linux/sizes.h> void flush_cache(ulong start_addr, ulong size) { @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } /* wait for all dcbst to complete on bus */ asm volatile("sync" : : : "memory"); + WATCHDOG_RESET(); for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } asm volatile("sync" : : : "memory"); /* flush prefetch queue */

-- Rasmus Villemoes Software Developer Prevas A/S Hedeager 3 DK-8200 Aarhus N +45 51210274 rasmus.villemoes@prevas.dk www.prevas.dk

Tom Rini

4:50 p.m.

On Fri, Sep 18, 2020 at 10:20:46AM +0200, Rasmus Villemoes wrote:

...

On 04/06/2020 12.31, Stefan Roese wrote:

...
On 04.06.20 11:30, Rasmus Villemoes wrote:

...
Calling WATCHDOG_RESET for each and every cache line is overkill.

In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.

I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).

64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done.

Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk

Reviewed-by: Stefan Roese sr@denx.de

Ping.

This too should end up in -next "soon". Thanks.

-- Tom

Wolfgang Denk

23 Sep 23 Sep

2:48 p.m.

Dear Rasmus,

In message afea5640-9d95-bf92-316a-0d674a5129ce@prevas.dk you wrote:

...

On 04/06/2020 12.31, Stefan Roese wrote:

...
On 04.06.20 11:30, Rasmus Villemoes wrote:

...
Calling WATCHDOG_RESET for each and every cache line is overkill.

In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.

I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of

Maybe it is, maybe it is not. I remember boards where the watchdog trigger had to happen in pretty tight intervals, something like min 20, max 60 milliseconds.

...

...
...
+ if ((addr & (SZ_64K - 1)) == 0)

In any case, please write readable code. The use of SZ_ here is not a good idea. Also, you might want to make the actual parameter configurable, so boards with specific timing requirements can adjust this as needed.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away. - Antoine de Saint-Exupery

Tom Rini

24 Sep 24 Sep

2:32 p.m.

On Wed, Sep 23, 2020 at 02:48:18PM +0200, Wolfgang Denk wrote:

...

Dear Rasmus,

In message afea5640-9d95-bf92-316a-0d674a5129ce@prevas.dk you wrote:

...
On 04/06/2020 12.31, Stefan Roese wrote:

...
On 04.06.20 11:30, Rasmus Villemoes wrote:

...
Calling WATCHDOG_RESET for each and every cache line is overkill.

In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.

I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of

Maybe it is, maybe it is not. I remember boards where the watchdog trigger had to happen in pretty tight intervals, something like min 20, max 60 milliseconds.

...
...
...
+ if ((addr & (SZ_64K - 1)) == 0)

In any case, please write readable code. The use of SZ_ here is not a good idea. Also, you might want to make the actual parameter configurable, so boards with specific timing requirements can adjust this as needed.

We should add a comment somewhere here to match the commit message, but since we're flushing on 64K chunks, SZ_64K makes sense. If we make it build-time configurable then it's not a concern. I assume we don't want to make this run time configurable.

-- Tom

Priyanka Jain (OSS)

23 Sep 23 Sep

3:28 p.m.

...

-----Original Message----- From: U-Boot u-boot-bounces@lists.denx.de On Behalf Of Rasmus Villemoes Sent: Friday, September 18, 2020 1:51 PM To: Stefan Roese sr@denx.de; u-boot@lists.denx.de Cc: Christophe Leroy christophe.leroy@c-s.fr; Wolfgang Denk wd@denx.de; Simon Glass sjg@chromium.org; Tom Rini trini@konsulko.com Subject: Re: [PATCH] powerpc: reduce number of WATCHDOG_RESET calls from flush_cache

On 04/06/2020 12.31, Stefan Roese wrote:

...
On 04.06.20 11:30, Rasmus Villemoes wrote:

...
Calling WATCHDOG_RESET for each and every cache line is overkill.

In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime.

I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops).

64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being

done.

...
...
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk

Reviewed-by: Stefan Roese sr@denx.de

<snip> Applied to u-boot-mpc85xx. Awaiting upstream

Thanks Priyanka

Wolfgang Denk

24 Sep 24 Sep

12:57 p.m.

Dear Priyanka,

In message VE1PR04MB649416C255B3379B43E7F7FDE6380@VE1PR04MB6494.eurprd04.prod.outlook.com you wrote:

...

...
...
...
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk

Reviewed-by: Stefan Roese sr@denx.de

<snip> Applied to u-boot-mpc85xx. Awaiting upstream

I object. This must at least be configurable, so that boards with tight timing requirements can still work.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The human mind ordinarily operates at only ten percent of its capacity. The rest is overhead for the operating system.

Tom Rini

2:32 p.m.

On Thu, Sep 24, 2020 at 12:57:36PM +0200, Wolfgang Denk wrote:

...

Dear Priyanka,

In message VE1PR04MB649416C255B3379B43E7F7FDE6380@VE1PR04MB6494.eurprd04.prod.outlook.com you wrote:

...
...
...
...
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk

Reviewed-by: Stefan Roese sr@denx.de

<snip> Applied to u-boot-mpc85xx. Awaiting upstream

I object. This must at least be configurable, so that boards with tight timing requirements can still work.

I'm removing just this patch from the -next pull request.

-- Tom

1685

Age (days ago)

1797

Last active (days ago)

List overview

Download

8 comments

5 participants

tags (0)

participants (5)

Priyanka Jain (OSS)
Rasmus Villemoes
Stefan Roese
Tom Rini
Wolfgang Denk