[U-Boot] [RFC PATCH v2] ARM: Avoid compiler optimization for usages of readb, writeb and friends.

gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
Reading C99 6.7.3 8 and the comment 114) there, I think it is a bug of that gcc version to ignore the volatile type qualifier used e.g. in __arch_getl(). Anyway, using a definition as in the kernel headers avoids such optimizations when gcc 4.5.1 is used.
Maybe the headers as used in the current linux-kernel should be used, but to avoid large changes, I've just added a small change to the current headers.
Signed-off-by: Alexander Holler holler@ahsoftware.de --- arch/arm/include/asm/io.h | 20 ++++++++++++++------ 1 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index ff1518e..5364b78 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -125,13 +125,21 @@ extern inline void __raw_readsl(unsigned int addr, void *data, int longlen) #define __raw_readw(a) __arch_getw(a) #define __raw_readl(a) __arch_getl(a)
-#define writeb(v,a) __arch_putb(v,a) -#define writew(v,a) __arch_putw(v,a) -#define writel(v,a) __arch_putl(v,a) +/* + * TODO: The kernel offers some more advanced versions of barriers, it might + * have some advantages to use them instead of the simple one here. + */ +#define dmb() __asm__ __volatile__ ("" : : : "memory") +#define __iormb() dmb() +#define __iowmb() dmb() + +#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); }) +#define writew(v,c) ({ __iowmb(); __arch_putw(v,c); }) +#define writel(v,c) ({ __iowmb(); __arch_putl(v,c); })
-#define readb(a) __arch_getb(a) -#define readw(a) __arch_getw(a) -#define readl(a) __arch_getl(a) +#define readb(c) ({ u8 __v = __arch_getb(c); __iormb(); __v; }) +#define readw(c) ({ u16 __v = __arch_getw(c); __iormb(); __v; }) +#define readl(c) ({ u32 __v = __arch_getl(c); __iormb(); __v; })
/* * The compiler seems to be incapable of optimising constants

On 18.12.2010 23:27, Alexander Holler wrote:
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
Reading C99 6.7.3 8 and the comment 114) there, I think it is a bug of that gcc version to ignore the volatile type qualifier used e.g. in __arch_getl(). Anyway, using a definition as in the kernel headers avoids such optimizations when gcc 4.5.1 is used.
Maybe the headers as used in the current linux-kernel should be used, but to avoid large changes, I've just added a small change to the current headers.
Signed-off-by: Alexander Hollerholler@ahsoftware.de
arch/arm/include/asm/io.h | 20 ++++++++++++++------ 1 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index ff1518e..5364b78 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -125,13 +125,21 @@ extern inline void __raw_readsl(unsigned int addr, void *data, int longlen) #define __raw_readw(a) __arch_getw(a) #define __raw_readl(a) __arch_getl(a)
-#define writeb(v,a) __arch_putb(v,a) -#define writew(v,a) __arch_putw(v,a) -#define writel(v,a) __arch_putl(v,a) +/*
- TODO: The kernel offers some more advanced versions of barriers, it might
- have some advantages to use them instead of the simple one here.
- */
+#define dmb() __asm__ __volatile__ ("" : : : "memory") +#define __iormb() dmb() +#define __iowmb() dmb()
+#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); }) +#define writew(v,c) ({ __iowmb(); __arch_putw(v,c); }) +#define writel(v,c) ({ __iowmb(); __arch_putl(v,c); })
-#define readb(a) __arch_getb(a) -#define readw(a) __arch_getw(a) -#define readl(a) __arch_getl(a) +#define readb(c) ({ u8 __v = __arch_getb(c); __iormb(); __v; }) +#define readw(c) ({ u16 __v = __arch_getw(c); __iormb(); __v; }) +#define readl(c) ({ u32 __v = __arch_getl(c); __iormb(); __v; })
Using the test code below [1] and then looking at the disassembly from the two tool chains gcc version 4.3.3 (Sourcery G++ Lite 2009q1-203) versus gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50): Yes, without the additional dmb() the gcc 4.5.1 just creates
00000000 <main>: 0: e3a00000 mov r0, #0 4: e12fff1e bx lr
while with the additional dmb() it creates
00000000 <main>: 0: e59f300c ldr r3, [pc, #12] ; 14 <main+0x14> 4: e5932028 ldr r2, [r3, #40] ; 0x28 8: e5930028 ldr r0, [r3, #40] ; 0x28 c: e0620000 rsb r0, r2, r0 10: e12fff1e bx lr 14: 48318000
what looks correct. And 4.3.3 does the same code for both readl() versions. So:
Acked-by: Dirk Behme dirk.behme@googlemail.com
Thanks
Dirk
[1]
arm-none-linux-gnueabi-gcc -Wall -O2 -c foo.c -o foo.o arm-none-linux-gnueabi-objdump -D foo.o > foo.dis
-- foo.c -- struct gptimer { unsigned int tidr; /* 0x00 r */ unsigned char res[0xc]; unsigned int tiocp_cfg; /* 0x10 rw */ unsigned int tistat; /* 0x14 r */ unsigned int tisr; /* 0x18 rw */ unsigned int tier; /* 0x1c rw */ unsigned int twer; /* 0x20 rw */ unsigned int tclr; /* 0x24 rw */ unsigned int tcrr; /* 0x28 rw */ unsigned int tldr; /* 0x2c rw */ unsigned int ttgr; /* 0x30 rw */ unsigned int twpc; /* 0x34 r*/ unsigned int tmar; /* 0x38 rw*/ unsigned int tcar1; /* 0x3c r */ unsigned int tcicr; /* 0x40 rw */ unsigned int tcar2; /* 0x44 r */ };
#define dmb() __asm__ __volatile__ ("" : : : "memory") #define __iormb() dmb()
#define __arch_getl(a) (*(volatile unsigned int *)(a)) #define readl(a) __arch_getl(a) //#define readl(c) ({ unsigned int __v = __arch_getl(c); __iormb(); __v; })
int main(void) {
struct gptimer *gpt1_base = (struct gptimer *)0x48318000; unsigned int cdiff, cstart, cend;
cstart = readl(&gpt1_base->tcrr);
cend = readl(&gpt1_base->tcrr);
cdiff = cend - cstart;
return cdiff;
} -- foo.c --

Hello,
On 19.12.2010 08:51, Dirk Behme wrote:
On 18.12.2010 23:27, Alexander Holler wrote:
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
Acked-by: Dirk Behme dirk.behme@googlemail.com
Thanks for the ack, but I have to say, that those barriers are having side effects here. Reading NAND now fails on my BeagleBoard. Regardless if I use gcc 4.3.5 or gcc 4.5.1, after applying that patch the wrong ID of the NAND is read. In nand_get_flash_type() (drivers/mtd/nand/nand_base.c) without that patch I will get the following:
*maf_id: 44, dev_id: 186
with the patch the following is read:
*maf_id: 128, dev_id: 85
Which just is wrong.
I haven't looked further up to now, maybe thats just a side effect of some wrong clock settings because of different timings through those barrieres or whatever.
Regards,
Alexander

Le 19/12/2010 11:22, Alexander Holler a écrit :
Hello,
On 19.12.2010 08:51, Dirk Behme wrote:
On 18.12.2010 23:27, Alexander Holler wrote:
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
Acked-by: Dirk Behmedirk.behme@googlemail.com
Thanks for the ack, but I have to say, that those barriers are having side effects here. Reading NAND now fails on my BeagleBoard. Regardless if I use gcc 4.3.5 or gcc 4.5.1, after applying that patch the wrong ID of the NAND is read. In nand_get_flash_type() (drivers/mtd/nand/nand_base.c) without that patch I will get the following:
*maf_id: 44, dev_id: 186
with the patch the following is read:
*maf_id: 128, dev_id: 85
Which just is wrong.
I haven't looked further up to now, maybe thats just a side effect of some wrong clock settings because of different timings through those barrieres or whatever.
IIUC, the patch is adding barriers to every HW register write and read, which makes the compiler stop reordering these, and keep them ordered as in the source code. Are you sure that the order as laid out in the source is the right one? Maybe you were just luck so far that the reordering masked a bug.
Amicalement,

Hello,
Am 19.12.2010 12:28, schrieb Albert ARIBAUD:
Le 19/12/2010 11:22, Alexander Holler a écrit :
Hello,
On 19.12.2010 08:51, Dirk Behme wrote:
On 18.12.2010 23:27, Alexander Holler wrote:
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
Acked-by: Dirk Behmedirk.behme@googlemail.com
Thanks for the ack, but I have to say, that those barriers are having side effects here. Reading NAND now fails on my BeagleBoard. Regardless if I use gcc 4.3.5 or gcc 4.5.1, after applying that patch the wrong ID of the NAND is read. In nand_get_flash_type() (drivers/mtd/nand/nand_base.c) without that patch I will get the following:
*maf_id: 44, dev_id: 186
with the patch the following is read:
*maf_id: 128, dev_id: 85
Which just is wrong.
I haven't looked further up to now, maybe thats just a side effect of some wrong clock settings because of different timings through those barrieres or whatever.
IIUC, the patch is adding barriers to every HW register write and read, which makes the compiler stop reordering these, and keep them ordered as in the source code. Are you sure that the order as laid out in the source is the right one? Maybe you were just luck so far that the reordering masked a bug.
I don't know much about all the stuff the omap3-drivers are doing. E.g. I've already wondered why it's necessary to measure a clock frequency in the code which got (wrongly) optimized without that patch. I would think such isn't necessary because I would assume the drivers are actually there to set the clock settings, and not to measure them. ;)
Regards,
Alexander

On Sun, Dec 19, 2010 at 3:22 AM, Alexander Holler holler@ahsoftware.de wrote:
side effects here. Reading NAND now fails on my BeagleBoard. Regardless if I use gcc 4.3.5 or gcc 4.5.1, after applying that patch the wrong ID of the NAND is read. In nand_get_flash_type() (drivers/mtd/nand/nand_base.c) without that patch I will get the following:
*maf_id: 44, dev_id: 186
with the patch the following is read:
*maf_id: 128, dev_id: 85
The nand_get_flash_type routine reads these id's twice and compares them. Do your see an error message like this?
second ID read did not match
Which just is wrong.
I haven't looked further up to now, maybe thats just a side effect of some wrong clock settings because of different timings through those barrieres or whatever.
Regards,
Alexander _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Am 19.12.2010 19:45, schrieb John Rigby:
On Sun, Dec 19, 2010 at 3:22 AM, Alexander Hollerholler@ahsoftware.de wrote:
side effects here. Reading NAND now fails on my BeagleBoard. Regardless if I use gcc 4.3.5 or gcc 4.5.1, after applying that patch the wrong ID of the NAND is read. In nand_get_flash_type() (drivers/mtd/nand/nand_base.c) without that patch I will get the following:
*maf_id: 44, dev_id: 186
with the patch the following is read:
*maf_id: 128, dev_id: 85
The nand_get_flash_type routine reads these id's twice and compares them. Do your see an error message like this?
second ID read did not match
No, the output is
without memory barrier in read*/write*: -------------------------------- U-Boot 2010.12-rc3-00013-g71aab09 (Dec 19 2010 - 01:19:38)
OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz OMAP3 Beagle board + LPDDR/NAND I2C: ready DRAM: 256 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 In: serial Out: serial Err: serial Beagle Rev C4 timed out in wait_for_pin: I2C_STAT=0 No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 OMAP3 beagleboard.org # nand info
Device 0: nand0, sector size 128 KiB --------------------------------
with memory barrier in read*/write*: -------------------------------- U-Boot 2010.12-rc3-00014-gde0126f (Dec 19 2010 - 01:25:11)
OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz OMAP3 Beagle board + LPDDR/NAND I2C: ready DRAM: 256 MiB NAND: 32 MiB MMC: OMAP SD/MMC: 0 NAND read from offset 260000 failed -74 *** Warning - readenv() failed, using default environment
In: serial Out: serial Err: serial Beagle Rev C4 timed out in wait_for_pin: I2C_STAT=0 No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 OMAP3 beagleboard.org # nand info
Device 0: nand0, sector size 16 KiB --------------------------------
And with the memory barrier the kernel gets started but hangs afterwards (at least it looks so). I still haven't searched further and can only offer speculations like clocks are screwed up, memory setup is broken or such. Finding the reason and not just the impact will need some more time on my side.
Regards,
Alexander

On Sun, Dec 19, 2010 at 12:59 PM, Alexander Holler holler@ahsoftware.de wrote: ...
No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 OMAP3 beagleboard.org # nand info
Device 0: nand0, sector size 16 KiB
I get the same output without your change. My gcc is linaro 4.4.5. I'll do some bisecting and try to find out what is going on.

Am 20.12.2010 01:39, schrieb John Rigby:
On Sun, Dec 19, 2010 at 12:59 PM, Alexander Hollerholler@ahsoftware.de wrote: ...
No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 OMAP3 beagleboard.org # nand info
Device 0: nand0, sector size 16 KiB
I get the same output without your change. My gcc is linaro 4.4.5. I'll do some bisecting and try to find out what is going on.
Bisecting won't help you here. Not if the problem was always there (which is what I assume).
Regards,
Alexander

On Sun, Dec 19, 2010 at 5:56 PM, Alexander Holler holler@ahsoftware.de wrote:
Am 20.12.2010 01:39, schrieb John Rigby:
On Sun, Dec 19, 2010 at 12:59 PM, Alexander Hollerholler@ahsoftware.de wrote: ...
No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 OMAP3 beagleboard.org # nand info
Device 0: nand0, sector size 16 KiB
I get the same output without your change. My gcc is linaro 4.4.5. I'll do some bisecting and try to find out what is going on.
Bisecting won't help you here. Not if the problem was always there (which is what I assume
Sorry, I was confused about my results.
If I replace include <asm/io.h> in drivers/mtd/nand/omap_gpmc.c with a copy of the original called orig_io.h: #include "orig_io.h"
Nand starts working again. So the problem seems to be isolated to this file.
Regards,
Alexander

On Sun, Dec 19, 2010 at 9:18 PM, John Rigby john.rigby@linaro.org wrote:
On Sun, Dec 19, 2010 at 5:56 PM, Alexander Holler holler@ahsoftware.de wrote:
Am 20.12.2010 01:39, schrieb John Rigby:
On Sun, Dec 19, 2010 at 12:59 PM, Alexander Hollerholler@ahsoftware.de wrote: ...
No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 OMAP3 beagleboard.org # nand info
Device 0: nand0, sector size 16 KiB
I get the same output without your change. My gcc is linaro 4.4.5. I'll do some bisecting and try to find out what is going on.
Bisecting won't help you here. Not if the problem was always there (which is what I assume
Sorry, I was confused about my results.
If I replace include <asm/io.h> in drivers/mtd/nand/omap_gpmc.c with a copy of the original called orig_io.h: #include "orig_io.h"
Nand starts working again. So the problem seems to be isolated to this file.
Regards,
Alexander
With your patch and the following hack nand works:
diff --git a/drivers/mtd/nand/omap_gpmc.c b/drivers/mtd/nand/omap_gpmc.c index 99b9cef..5e94155 100644 --- a/drivers/mtd/nand/omap_gpmc.c +++ b/drivers/mtd/nand/omap_gpmc.c @@ -29,6 +29,8 @@ #include <linux/mtd/nand_ecc.h> #include <nand.h>
+#define origwriteb(v,a) __arch_putb(v,a) + static uint8_t cs; static struct nand_ecclayout hw_nand_oob = GPMC_NAND_HW_ECC_LAYOUT;
@@ -58,7 +60,7 @@ static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, }
if (cmd != NAND_CMD_NONE) - writeb(cmd, this->IO_ADDR_W); + origwriteb(cmd, this->IO_ADDR_W); }
/*
The working assembly looks like this:
if (cmd != NAND_CMD_NONE) 80024d28: e3710001 cmn r1, #1 origwriteb(cmd, this->IO_ADDR_W); 80024d2c: 15933004 ldrne r3, [r3, #4] 80024d30: 120110ff andne r1, r1, #255 ; 0xff 80024d34: 15c31000 strbne r1, [r3] 80024d38: e8bd8010 pop {r4, pc}
The broken assembly looks like this:
if (cmd != NAND_CMD_NONE) 80024d28: e3710001 cmn r1, #1 80024d2c: 08bd8010 popeq {r4, pc} writeb(cmd, this->IO_ADDR_W); 80024d30: e5933004 ldr r3, [r3, #4] 80024d34: e20110ff and r1, r1, #255 ; 0xff 80024d38: e5c31000 strb r1, [r3] 80024d3c: e5d33000 ldrb r3, [r3] 80024d40: e8bd8010 pop {r4, pc}
This is with gcc version "(Ubuntu/Linaro 4.4.4-14ubuntu4) 4.4.5". I'll try a 4.5.2 version next and see what happens.
br,
John

On 20.12.2010 07:07, John Rigby wrote:
On Sun, Dec 19, 2010 at 9:18 PM, John Rigbyjohn.rigby@linaro.org wrote:
On Sun, Dec 19, 2010 at 5:56 PM, Alexander Hollerholler@ahsoftware.de wrote:
Am 20.12.2010 01:39, schrieb John Rigby:
On Sun, Dec 19, 2010 at 12:59 PM, Alexander Hollerholler@ahsoftware.de wrote: ...
No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 OMAP3 beagleboard.org # nand info
Device 0: nand0, sector size 16 KiB
I get the same output without your change. My gcc is linaro 4.4.5. I'll do some bisecting and try to find out what is going on.
Bisecting won't help you here. Not if the problem was always there (which is what I assume
Sorry, I was confused about my results.
If I replace include<asm/io.h> in drivers/mtd/nand/omap_gpmc.c with a copy of the original called orig_io.h: #include "orig_io.h"
Nand starts working again. So the problem seems to be isolated to this file.
Regards,
Alexander
With your patch and the following hack nand works:
diff --git a/drivers/mtd/nand/omap_gpmc.c b/drivers/mtd/nand/omap_gpmc.c index 99b9cef..5e94155 100644 --- a/drivers/mtd/nand/omap_gpmc.c +++ b/drivers/mtd/nand/omap_gpmc.c @@ -29,6 +29,8 @@ #include<linux/mtd/nand_ecc.h> #include<nand.h>
+#define origwriteb(v,a) __arch_putb(v,a)
- static uint8_t cs; static struct nand_ecclayout hw_nand_oob = GPMC_NAND_HW_ECC_LAYOUT;
@@ -58,7 +60,7 @@ static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, }
if (cmd != NAND_CMD_NONE)
writeb(cmd, this->IO_ADDR_W);
origwriteb(cmd, this->IO_ADDR_W);
}
/*
The working assembly looks like this:
if (cmd != NAND_CMD_NONE) 80024d28: e3710001 cmn r1, #1 origwriteb(cmd, this->IO_ADDR_W); 80024d2c: 15933004 ldrne r3, [r3, #4] 80024d30: 120110ff andne r1, r1, #255 ; 0xff 80024d34: 15c31000 strbne r1, [r3] 80024d38: e8bd8010 pop {r4, pc}
The broken assembly looks like this:
if (cmd != NAND_CMD_NONE) 80024d28: e3710001 cmn r1, #1 80024d2c: 08bd8010 popeq {r4, pc} writeb(cmd, this->IO_ADDR_W); 80024d30: e5933004 ldr r3, [r3, #4] 80024d34: e20110ff and r1, r1, #255 ; 0xff 80024d38: e5c31000 strb r1, [r3] 80024d3c: e5d33000 ldrb r3, [r3] 80024d40: e8bd8010 pop {r4, pc}
Hmm. From functionality point of view, the 'broken' assembly below should to the same as the working assembly, above. The main difference is the 'popeq {r4, pc}' and the additional 'ldrb r3, [r3]'. The write to the HW 'strb r1, [r3]' is there, so it should work. Is this understanding correct?
If it's correct, the question is, what breaks the below assembly? The popeq or the additional ldrb? The popeq looks correct, but why is the additional ldrb there?
For some further debugging, I had two ideas: Modifying the resulting binary with a hex editor and replacing the ldrb with a nop (if I remember correctly the hex code for a nop is ffffffff?). Or modifying the the C code and adding a barrier behind the writeb(). E.g.
if (cmd != NAND_CMD_NONE); writeb(cmd, this->IO_ADDR_W); __asm__ __volatile__ ("dsb" : : : "memory"); }
Best regards
Dirk

On Sun, Dec 19, 2010 at 11:49 PM, Dirk Behme dirk.behme@googlemail.com wrote:
On 20.12.2010 07:07, John Rigby wrote:
The working assembly looks like this:
if (cmd != NAND_CMD_NONE) 80024d28: e3710001 cmn r1, #1 origwriteb(cmd, this->IO_ADDR_W); 80024d2c: 15933004 ldrne r3, [r3, #4] 80024d30: 120110ff andne r1, r1, #255 ; 0xff 80024d34: 15c31000 strbne r1, [r3] 80024d38: e8bd8010 pop {r4, pc}
The broken assembly looks like this:
if (cmd != NAND_CMD_NONE) 80024d28: e3710001 cmn r1, #1 80024d2c: 08bd8010 popeq {r4, pc} writeb(cmd, this->IO_ADDR_W); 80024d30: e5933004 ldr r3, [r3, #4] 80024d34: e20110ff and r1, r1, #255 ; 0xff 80024d38: e5c31000 strb r1, [r3] 80024d3c: e5d33000 ldrb r3, [r3] 80024d40: e8bd8010 pop {r4, pc}
Hmm. From functionality point of view, the 'broken' assembly below should to the same as the working assembly, above. The main difference is the 'popeq {r4, pc}' and the additional 'ldrb r3, [r3]'. The write to the HW 'strb r1, [r3]' is there, so it should work. Is this understanding correct?
If it's correct, the question is, what breaks the below assembly? The popeq or the additional ldrb? The popeq looks correct, but why is the additional ldrb there?
I can't answer why the ldrb is there but I'm pretty sure it is the problem. From the TRM http://focus.ti.com/lit/ug/spruf98m/spruf98m.pdf:
Only write accesses must be issued to these locations, but the GPMC does not discard any read access. Accessing a NAND device with nOE and CLE or ALE asserted (read access) can produce undefined results.
br, John

Earlier in this thread Alexander said:
I haven't add the definitions which are using a memory barrier because I haven't found a place in the kernel where they were actually enabled (CONFIG_ARM_DMA_MEM_BUFFERABLE).
I think this is the problem because it is indeed defined for all v6 and v7 arm platforms. Here is the config snippet from arch/arm/mm/Kconfig:
config ARM_DMA_MEM_BUFFERABLE bool "Use non-cacheable memory for DMA" if CPU_V6 && !CPU_V7 depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP || \ MACH_REALVIEW_PB11MP) default y if CPU_V6 || CPU_V7 help Historically, the kernel has used strongly ordered mappings to provide DMA coherent memory. With the advent of ARMv7, mapping memory with differing types results in unpredictable behaviour, so on these CPUs, this option is forced on.
Multiple mappings with differing attributes is also unpredictable on ARMv6 CPUs, but since they do not have aggressive speculative prefetch, no harm appears to occur.
However, drivers may be missing the necessary barriers for ARMv6, and therefore turning this on may result in unpredictable driver behaviour. Therefore, we offer this as an option.
You are recommended say 'Y' here and debug any affected drivers.

On Mon, Dec 20, 2010 at 9:08 AM, John Rigby john.rigby@linaro.org wrote:
Earlier in this thread Alexander said:
I haven't add the definitions which are using a memory barrier because I haven't found a place in the kernel where they were actually enabled (CONFIG_ARM_DMA_MEM_BUFFERABLE).
I think this is the problem because it is indeed defined for all v6 and v7 arm platforms. Here is the config snippet from arch/arm/mm/Kconfig:
config ARM_DMA_MEM_BUFFERABLE bool "Use non-cacheable memory for DMA" if CPU_V6 && !CPU_V7 depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP || \ MACH_REALVIEW_PB11MP) default y if CPU_V6 || CPU_V7 help Historically, the kernel has used strongly ordered mappings to provide DMA coherent memory. With the advent of ARMv7, mapping memory with differing types results in unpredictable behaviour, so on these CPUs, this option is forced on.
On second thought maybe this is noise for us in u-boot without cacheable mappings? Sorry for the noise.
br,
John

Hello,
Am 20.12.2010 17:08, schrieb John Rigby:
Earlier in this thread Alexander said:
I haven't add the definitions which are using a memory barrier because I haven't found a place in the kernel where they were actually enabled (CONFIG_ARM_DMA_MEM_BUFFERABLE).
Because I've just run again into such a "search problem":
Don't use "git grep" on a kernel configured for x86, when you are searching an option for another architecture. ;)
Regards,
Alexander

Hello,
Am 20.12.2010 07:07, schrieb John Rigby:
With your patch and the following hack nand works:
diff --git a/drivers/mtd/nand/omap_gpmc.c b/drivers/mtd/nand/omap_gpmc.c index 99b9cef..5e94155 100644 --- a/drivers/mtd/nand/omap_gpmc.c +++ b/drivers/mtd/nand/omap_gpmc.c @@ -29,6 +29,8 @@ #include<linux/mtd/nand_ecc.h> #include<nand.h>
+#define origwriteb(v,a) __arch_putb(v,a)
- static uint8_t cs; static struct nand_ecclayout hw_nand_oob = GPMC_NAND_HW_ECC_LAYOUT;
@@ -58,7 +60,7 @@ static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, }
if (cmd != NAND_CMD_NONE)
writeb(cmd, this->IO_ADDR_W);
origwriteb(cmd, this->IO_ADDR_W);
}
/*
The working assembly looks like this:
if (cmd != NAND_CMD_NONE) 80024d28: e3710001 cmn r1, #1 origwriteb(cmd, this->IO_ADDR_W); 80024d2c: 15933004 ldrne r3, [r3, #4] 80024d30: 120110ff andne r1, r1, #255 ; 0xff 80024d34: 15c31000 strbne r1, [r3] 80024d38: e8bd8010 pop {r4, pc}
The broken assembly looks like this:
if (cmd != NAND_CMD_NONE) 80024d28: e3710001 cmn r1, #1 80024d2c: 08bd8010 popeq {r4, pc} writeb(cmd, this->IO_ADDR_W); 80024d30: e5933004 ldr r3, [r3, #4] 80024d34: e20110ff and r1, r1, #255 ; 0xff 80024d38: e5c31000 strb r1, [r3] 80024d3c: e5d33000 ldrb r3, [r3] 80024d40: e8bd8010 pop {r4, pc}
This is with gcc version "(Ubuntu/Linaro 4.4.4-14ubuntu4) 4.4.5". I'll try a 4.5.2 version next and see what happens.
There must be more problems. Using gcc 4.5.1, the read*/write*-patch and your hack, my kernel doesn't boot. Using gcc 4.3.5 and the same source to compile u-boot the kernel comes up. Here is the output for the non-working u-boot:
---------------- U-Boot 2010.12-rc3-00015-g3ae9687-dirty (Dec 20 2010 - 18:01:41, gcc 4.5.1)
OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz OMAP3 Beagle board + LPDDR/NAND I2C: ready DRAM: 256 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 In: serial Out: serial Err: serial Beagle Rev C4 timed out in wait_for_pin: I2C_STAT=0 No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 reading boot.scr
422 bytes read Running bootscript from mmc ... ## Executing script at 82000000 reading uImage
2419940 bytes read Booting from mmc ... ## Booting kernel from Legacy Image at 82000000 ... Image Name: Linux-2.6.37-rc5-beagleboard-000 Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 2419876 Bytes = 2.3 MiB Load Address: 80008000 Entry Point: 80008000 Verifying Checksum ... OK Loading Kernel Image ... OK OK ----------------
Nothing else.
Regards,
Alexander

On Mon, Dec 20, 2010 at 10:12 AM, Alexander Holler holler@ahsoftware.de wrote:
There must be more problems. Using gcc 4.5.1, the read*/write*-patch and your hack, my kernel doesn't boot. Using gcc 4.3.5 and the same source to compile u-boot the kernel comes up. Here is the output for the non-working u-boot:
U-Boot 2010.12-rc3-00015-g3ae9687-dirty (Dec 20 2010 - 18:01:41, gcc 4.5.1)
OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz OMAP3 Beagle board + LPDDR/NAND I2C: ready DRAM: 256 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 In: serial Out: serial Err: serial Beagle Rev C4 timed out in wait_for_pin: I2C_STAT=0 No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 reading boot.scr
422 bytes read Running bootscript from mmc ... ## Executing script at 82000000 reading uImage
2419940 bytes read Booting from mmc ... ## Booting kernel from Legacy Image at 82000000 ... Image Name: Linux-2.6.37-rc5-beagleboard-000 Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 2419876 Bytes = 2.3 MiB Load Address: 80008000 Entry Point: 80008000 Verifying Checksum ... OK Loading Kernel Image ... OK OK
Nothing else.
Regards,
Alexander
Yes, you are correct, I see the same here with 4.5.2. I noticed that bd did not have correct values of machine type and boot params:
bd address = 0x8FF24FE0 arch_number = 0xFF0084FF boot_params = 0xBB2000FE DRAM bank = 0x00000000 -> start = 0x80000000 -> size = 0x08000000 DRAM bank = 0x00000001 -> start = 0x88000000 -> size = 0x08000000 baudrate = 115200 bps TLB addr = 0x8FFF0000 relocaddr = 0x8FF85000 reloc off = 0x0FF7D000 irq_sp = 0x8FF24F68 sp start = 0x8FF24F60 FB base = 0x00000000
If we then look at board_init in beagle.c the problem is obvious:
800331ac <board_init>: 800331ac: e92d4008 push {r3, lr} 800331b0: ebff5a74 bl 80009b88 <gpmc_init> 800331b4: e3a00000 mov r0, #0 800331b8: e5983000 ldr r3, [r8] 800331bc: e5983000 ldr r3, [r8] 800331c0: e8bd8008 pop {r3, pc}
Here is with source mingled in: 800331ac <board_init>: /* * Routine: board_init * Description: Early hardware init. */ int board_init(void) { 800331ac: e92d4008 push {r3, lr} DECLARE_GLOBAL_DATA_PTR;
gpmc_init(); /* in SRAM or SDRAM, finish GPMC */ 800331b0: ebff5a74 bl 80009b88 <gpmc_init> gd->bd->bi_arch_number = MACH_TYPE_OMAP3_BEAGLE; /* boot param addr */ gd->bd->bi_boot_params = (OMAP34XX_SDRC_CS0 + 0x100);
return 0; } 800331b4: e3a00000 mov r0, #0 { DECLARE_GLOBAL_DATA_PTR;
gpmc_init(); /* in SRAM or SDRAM, finish GPMC */ /* board id for Linux */ gd->bd->bi_arch_number = MACH_TYPE_OMAP3_BEAGLE; 800331b8: e5983000 ldr r3, [r8] /* boot param addr */ gd->bd->bi_boot_params = (OMAP34XX_SDRC_CS0 + 0x100); 800331bc: e5983000 ldr r3, [r8]
return 0; } 800331c0: e8bd8008 pop {r3, pc}
br,
John

On Mon, Dec 20, 2010 at 5:25 PM, John Rigby john.rigby@linaro.org wrote:
On Mon, Dec 20, 2010 at 10:12 AM, Alexander Holler holler@ahsoftware.de wrote:
There must be more problems. Using gcc 4.5.1, the read*/write*-patch and your hack, my kernel doesn't boot. Using gcc 4.3.5 and the same source to compile u-boot the kernel comes up. Here is the output for the non-working u-boot:
U-Boot 2010.12-rc3-00015-g3ae9687-dirty (Dec 20 2010 - 18:01:41, gcc 4.5.1)
OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz OMAP3 Beagle board + LPDDR/NAND I2C: ready DRAM: 256 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 In: serial Out: serial Err: serial Beagle Rev C4 timed out in wait_for_pin: I2C_STAT=0 No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 reading boot.scr
422 bytes read Running bootscript from mmc ... ## Executing script at 82000000 reading uImage
2419940 bytes read Booting from mmc ... ## Booting kernel from Legacy Image at 82000000 ... Image Name: Linux-2.6.37-rc5-beagleboard-000 Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 2419876 Bytes = 2.3 MiB Load Address: 80008000 Entry Point: 80008000 Verifying Checksum ... OK Loading Kernel Image ... OK OK
Nothing else.
Regards,
Alexander
Yes, you are correct, I see the same here with 4.5.2. I noticed that bd did not have correct values of machine type and boot params:
bd address = 0x8FF24FE0 arch_number = 0xFF0084FF boot_params = 0xBB2000FE DRAM bank = 0x00000000 -> start = 0x80000000 -> size = 0x08000000 DRAM bank = 0x00000001 -> start = 0x88000000 -> size = 0x08000000 baudrate = 115200 bps TLB addr = 0x8FFF0000 relocaddr = 0x8FF85000 reloc off = 0x0FF7D000 irq_sp = 0x8FF24F68 sp start = 0x8FF24F60 FB base = 0x00000000
If we then look at board_init in beagle.c the problem is obvious:
800331ac <board_init>: 800331ac: e92d4008 push {r3, lr} 800331b0: ebff5a74 bl 80009b88 <gpmc_init> 800331b4: e3a00000 mov r0, #0 800331b8: e5983000 ldr r3, [r8] 800331bc: e5983000 ldr r3, [r8] 800331c0: e8bd8008 pop {r3, pc}
Apparently this is a known issue mentioned in README:
NOTE: DECLARE_GLOBAL_DATA_PTR must be used with file-global scope, or current versions of GCC may "optimize" the code too much.
With this fix I can boot again:
diff --git a/board/ti/beagle/beagle.c b/board/ti/beagle/beagle.c index d9b6f01..c066d6e 100644 --- a/board/ti/beagle/beagle.c +++ b/board/ti/beagle/beagle.c @@ -51,6 +51,8 @@
#define BEAGLE_NO_EEPROM 0xffffffff
+DECLARE_GLOBAL_DATA_PTR; + static struct { unsigned int device_vendor; unsigned char revision; @@ -66,8 +68,6 @@ static struct { */ int board_init(void) { - DECLARE_GLOBAL_DATA_PTR; - gpmc_init(); /* in SRAM or SDRAM, finish GPMC */ /* board id for Linux */ gd->bd->bi_arch_number = MACH_TYPE_OMAP3_BEAGLE;
Please let me know if you find any other problems.
br,
John

On 21.12.2010 01:46, John Rigby wrote:
On Mon, Dec 20, 2010 at 5:25 PM, John Rigbyjohn.rigby@linaro.org wrote:
On Mon, Dec 20, 2010 at 10:12 AM, Alexander Hollerholler@ahsoftware.de wrote:
There must be more problems. Using gcc 4.5.1, the read*/write*-patch and your hack, my kernel doesn't boot. Using gcc 4.3.5 and the same source to compile u-boot the kernel comes up. Here is the output for the non-working u-boot:
U-Boot 2010.12-rc3-00015-g3ae9687-dirty (Dec 20 2010 - 18:01:41, gcc 4.5.1)
OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz OMAP3 Beagle board + LPDDR/NAND I2C: ready DRAM: 256 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 In: serial Out: serial Err: serial Beagle Rev C4 timed out in wait_for_pin: I2C_STAT=0 No EEPROM on expansion board Die ID #062a000400000000040365fa16019019 Hit any key to stop autoboot: 0 reading boot.scr
422 bytes read Running bootscript from mmc ... ## Executing script at 82000000 reading uImage
2419940 bytes read Booting from mmc ... ## Booting kernel from Legacy Image at 82000000 ... Image Name: Linux-2.6.37-rc5-beagleboard-000 Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 2419876 Bytes = 2.3 MiB Load Address: 80008000 Entry Point: 80008000 Verifying Checksum ... OK Loading Kernel Image ... OK OK
Nothing else.
Regards,
Alexander
Yes, you are correct, I see the same here with 4.5.2. I noticed that bd did not have correct values of machine type and boot params:
bd address = 0x8FF24FE0 arch_number = 0xFF0084FF boot_params = 0xBB2000FE DRAM bank = 0x00000000 -> start = 0x80000000 -> size = 0x08000000 DRAM bank = 0x00000001 -> start = 0x88000000 -> size = 0x08000000 baudrate = 115200 bps TLB addr = 0x8FFF0000 relocaddr = 0x8FF85000 reloc off = 0x0FF7D000 irq_sp = 0x8FF24F68 sp start = 0x8FF24F60 FB base = 0x00000000
If we then look at board_init in beagle.c the problem is obvious:
800331ac<board_init>: 800331ac: e92d4008 push {r3, lr} 800331b0: ebff5a74 bl 80009b88<gpmc_init> 800331b4: e3a00000 mov r0, #0 800331b8: e5983000 ldr r3, [r8] 800331bc: e5983000 ldr r3, [r8] 800331c0: e8bd8008 pop {r3, pc}
Apparently this is a known issue mentioned in README:
NOTE: DECLARE_GLOBAL_DATA_PTR must be used with file-global scope, or current versions of GCC may "optimize" the code too much.
With this fix I can boot again:
diff --git a/board/ti/beagle/beagle.c b/board/ti/beagle/beagle.c index d9b6f01..c066d6e 100644 --- a/board/ti/beagle/beagle.c +++ b/board/ti/beagle/beagle.c @@ -51,6 +51,8 @@
#define BEAGLE_NO_EEPROM 0xffffffff
+DECLARE_GLOBAL_DATA_PTR;
- static struct { unsigned int device_vendor; unsigned char revision;
@@ -66,8 +68,6 @@ static struct { */ int board_init(void) {
DECLARE_GLOBAL_DATA_PTR;
gpmc_init(); /* in SRAM or SDRAM, finish GPMC */ /* board id for Linux */ gd->bd->bi_arch_number = MACH_TYPE_OMAP3_BEAGLE;
Please let me know if you find any other problems.
Just to not loose the overview:
This is fixed by your patch
http://patchwork.ozlabs.org/patch/76250/
?
But the issue with drivers/mtd/nand/omap_gpmc.c (i.e. the additional ldrb r3, [r3]) is still open? Has anybody tried to replace it with a nop in the binary to be sure this is the root cause?
Thanks
Dirk

Hi Dirk,
Le 21/12/2010 08:11, Dirk Behme a écrit :
But the issue with drivers/mtd/nand/omap_gpmc.c (i.e. the additional ldrb r3, [r3]) is still open? Has anybody tried to replace it with a nop in the binary to be sure this is the root cause?
Can you try and preprocess the C file for both the broken and working cases, then post the preprocessed C extract? Differences at the C level may help understanding differences at the asm level.
Thanks
Dirk
Amicalement,

On 21.12.2010 08:21, Albert ARIBAUD wrote:
Hi Dirk,
Le 21/12/2010 08:11, Dirk Behme a écrit :
But the issue with drivers/mtd/nand/omap_gpmc.c (i.e. the additional ldrb r3, [r3]) is still open? Has anybody tried to replace it with a nop in the binary to be sure this is the root cause?
Can you try and preprocess the C file for both the broken and working cases, then post the preprocessed C extract? Differences at the C level may help understanding differences at the asm level.
gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50)
Work: ====
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv; ... if (cmd != -1)
(*(volatile unsigned char *)(this->IO_ADDR_W) = (cmd)); }
if (cmd != NAND_CMD_NONE) 84: e3710001 cmn r1, #1 origwriteb(cmd, this->IO_ADDR_W); 88: 15933004 ldrne r3, [r3, #4] 8c: 120110ff andne r1, r1, #255 ; 0xff 90: 15c31000 strbne r1, [r3] 94: e12fff1e bx lr ...
Broken: ======
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv; ... if (cmd != -1) ({ do { } while (0); (*(volatile unsigned char *)(this->IO_ADDR_W) = (cmd)); }); }
if (cmd != NAND_CMD_NONE) 84: e3710001 cmn r1, #1 writeb(cmd, this->IO_ADDR_W); 88: 15933004 ldrne r3, [r3, #4] 8c: 120110ff andne r1, r1, #255 ; 0xff 90: 15c31000 strbne r1, [r3] 94: 15d33000 ldrbne r3, [r3] 98: e12fff1e bx lr ...
The issue seems to be the additional 'ldrbne r3, [r3]' added by the compiler in the broken version.
Best regards
Dirk

Dear Dirk Behme,
In message 4D105FB3.3090801@googlemail.com you wrote:
Broken:
...
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv; ... if (cmd != -1) ({ do { } while (0); (*(volatile unsigned char *)(this->IO_ADDR_W) > = (cmd)); });
But this is the old, discarded version of the patch.
I though we had agreed that we have to use the
__asm__ __volatile__ ("" : : : "memory")
version instead?
Best regards,
Wolfgang Denk

On 21.12.2010 09:17, Wolfgang Denk wrote:
Dear Dirk Behme,
In message4D105FB3.3090801@googlemail.com you wrote:
Broken:
...
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv; ... if (cmd != -1) ({ do { } while (0); (*(volatile unsigned char *)(this->IO_ADDR_W)> = (cmd)); });
But this is the old, discarded version of the patch.
I though we had agreed that we have to use the
__asm__ __volatile__ ("" : : : "memory")
version instead?
Yes.
I mixed the patches :(
Sorry for the noise. I just sent the hopefully correct version some minutes ago.
Thanks
Dirk

(Resend with corrected broken example)
On 21.12.2010 08:21, Albert ARIBAUD wrote:
Hi Dirk,
Le 21/12/2010 08:11, Dirk Behme a écrit :
But the issue with drivers/mtd/nand/omap_gpmc.c (i.e. the additional ldrb r3, [r3]) is still open? Has anybody tried to replace it with a nop in the binary to be sure this is the root cause?
Can you try and preprocess the C file for both the broken and working cases, then post the preprocessed C extract? Differences at the C level may help understanding differences at the asm level.
gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50)
Work: ====
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv; ... if (cmd != -1)
(*(volatile unsigned char *)(this->IO_ADDR_W) = (cmd)); }
if (cmd != NAND_CMD_NONE) 84: e3710001 cmn r1, #1 origwriteb(cmd, this->IO_ADDR_W); 88: 15933004 ldrne r3, [r3, #4] 8c: 120110ff andne r1, r1, #255 ; 0xff 90: 15c31000 strbne r1, [r3] 94: e12fff1e bx lr ...
Broken: ======
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv;
...
if (cmd != -1) ({ __asm__ __volatile__ ("" : : : "memory"); (*(volatile unsigned char *)(this->IO_ADDR_W) = (cmd)); }); }
if (cmd != NAND_CMD_NONE) 84: e3710001 cmn r1, #1 88: 012fff1e bxeq lr writeb(cmd, this->IO_ADDR_W); 8c: e5933004 ldr r3, [r3, #4] 90: e20110ff and r1, r1, #255 ; 0xff 94: e5c31000 strb r1, [r3] 98: e5d33000 ldrb r3, [r3] 9c: e12fff1e bx lr
The issue seems to be the additional 'ldrb r3, [r3]' added by the compiler in the broken version.
Best regards
Dirk

On Tue, Dec 21, 2010 at 1:35 AM, Dirk Behme dirk.behme@googlemail.com wrote:
(Resend with corrected broken example)
On 21.12.2010 08:21, Albert ARIBAUD wrote:
Hi Dirk,
Le 21/12/2010 08:11, Dirk Behme a écrit :
But the issue with drivers/mtd/nand/omap_gpmc.c (i.e. the additional ldrb r3, [r3]) is still open? Has anybody tried to replace it with a nop in the binary to be sure this is the root cause?
Can you try and preprocess the C file for both the broken and working cases, then post the preprocessed C extract? Differences at the C level may help understanding differences at the asm level.
gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50)
Work:
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv; ... if (cmd != -1)
(*(volatile unsigned char *)(this->IO_ADDR_W) = (cmd)); }
if (cmd != NAND_CMD_NONE) 84: e3710001 cmn r1, #1 origwriteb(cmd, this->IO_ADDR_W); 88: 15933004 ldrne r3, [r3, #4] 8c: 120110ff andne r1, r1, #255 ; 0xff 90: 15c31000 strbne r1, [r3] 94: e12fff1e bx lr ...
Broken:
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv;
...
if (cmd != -1) ({ __asm__ __volatile__ ("" : : : "memory"); (*(volatile unsigned char *)(this->IO_ADDR_W) = (cmd)); }); }
if (cmd != NAND_CMD_NONE) 84: e3710001 cmn r1, #1 88: 012fff1e bxeq lr writeb(cmd, this->IO_ADDR_W); 8c: e5933004 ldr r3, [r3, #4] 90: e20110ff and r1, r1, #255 ; 0xff 94: e5c31000 strb r1, [r3] 98: e5d33000 ldrb r3, [r3] 9c: e12fff1e bx lr
The issue seems to be the additional 'ldrb r3, [r3]' added by the compiler in the broken version.
And I at your suggestion tried modifying the binary changing the extra ldrb to a nop and it works.

Le 21/12/2010 09:46, John Rigby a écrit :
On Tue, Dec 21, 2010 at 1:35 AM, Dirk Behmedirk.behme@googlemail.com wrote:
(Resend with corrected broken example)
On 21.12.2010 08:21, Albert ARIBAUD wrote:
Hi Dirk,
Le 21/12/2010 08:11, Dirk Behme a écrit :
But the issue with drivers/mtd/nand/omap_gpmc.c (i.e. the additional ldrb r3, [r3]) is still open? Has anybody tried to replace it with a nop in the binary to be sure this is the root cause?
Can you try and preprocess the C file for both the broken and working cases, then post the preprocessed C extract? Differences at the C level may help understanding differences at the asm level.
gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50)
Work:
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv; ... if (cmd != -1)
(*(volatile unsigned char *)(this->IO_ADDR_W) = (cmd)); }
if (cmd != NAND_CMD_NONE)
84: e3710001 cmn r1, #1 origwriteb(cmd, this->IO_ADDR_W); 88: 15933004 ldrne r3, [r3, #4] 8c: 120110ff andne r1, r1, #255 ; 0xff 90: 15c31000 strbne r1, [r3] 94: e12fff1e bx lr ...
Broken:
static void omap_nand_hwcontrol(struct mtd_info *mtd, int32_t cmd, uint32_t ctrl) { register struct nand_chip *this = mtd->priv;
...
if (cmd != -1) ({ __asm__ __volatile__ ("" : : : "memory"); (*(volatile unsigned char *)(this->IO_ADDR_W) = (cmd)); }); }
if (cmd != NAND_CMD_NONE)
84: e3710001 cmn r1, #1 88: 012fff1e bxeq lr writeb(cmd, this->IO_ADDR_W); 8c: e5933004 ldr r3, [r3, #4] 90: e20110ff and r1, r1, #255 ; 0xff 94: e5c31000 strb r1, [r3] 98: e5d33000 ldrb r3, [r3] 9c: e12fff1e bx lr
The issue seems to be the additional 'ldrb r3, [r3]' added by the compiler in the broken version.
And I at your suggestion tried modifying the binary changing the extra ldrb to a nop and it works.
Seems like a compiler issue to me, as the preprocessed C source is the same for the register access and does not call for a re-read (that is what I wanted to see in the preprocessed version), yet the ASM sequence does the re-read.
Amicalement,

Dear Albert ARIBAUD,
In message 4D1083B4.2060704@free.fr you wrote:
And I at your suggestion tried modifying the binary changing the extra ldrb to a nop and it works.
Seems like a compiler issue to me, as the preprocessed C source is the same for the register access and does not call for a re-read (that is what I wanted to see in the preprocessed version), yet the ASM sequence does the re-read.
I also tend to think this is a compiler problem. Searching the gcc bugzilla entries for "ldrb" turns up quite a number of hits. I'm not sure which of these we are running into here, but there are enough of them so you can chose freely :-(
Best regards,
Wolfgang Denk

Am 21.12.2010 11:53, schrieb Wolfgang Denk:
Dear Albert ARIBAUD,
In message4D1083B4.2060704@free.fr you wrote:
And I at your suggestion tried modifying the binary changing the extra ldrb to a nop and it works.
Seems like a compiler issue to me, as the preprocessed C source is the same for the register access and does not call for a re-read (that is what I wanted to see in the preprocessed version), yet the ASM sequence does the re-read.
I also tend to think this is a compiler problem. Searching the gcc bugzilla entries for "ldrb" turns up quite a number of hits. I'm not sure which of these we are running into here, but there are enough of them so you can chose freely :-(
Hmm, is there actual somethinbg which should forbid the compiler to generate such code which rereads something? It might not be nice, but I don't think that it is forbidden for a compiler to do so. So the proper way to handle such, might be to use asm to avoid that the compiler touches that register.
Regards,
Alexander

Le 21/12/2010 13:35, Alexander Holler a écrit :
Hmm, is there actual somethinbg which should forbid the compiler to generate such code which rereads something? It might not be nice, but I don't think that it is forbidden for a compiler to do so. So the proper way to handle such, might be to use asm to avoid that the compiler touches that register.
Yes there is something that should prevent a compiler from inserting reads: these accesses are to hardware, not memory, and may cause side effects even on read (these could be acknowledges, for instance; I've seen instances of that myself on some HW).
Another way to look at it is that the semantics of " *ptr = value " is a pure write and should not result in a write-then-read.
Regards,
Alexander
Amicalement,

Am 21.12.2010 13:51, schrieb Albert ARIBAUD:
Le 21/12/2010 13:35, Alexander Holler a écrit :
Hmm, is there actual somethinbg which should forbid the compiler to generate such code which rereads something? It might not be nice, but I don't think that it is forbidden for a compiler to do so. So the proper way to handle such, might be to use asm to avoid that the compiler touches that register.
Yes there is something that should prevent a compiler from inserting reads: these accesses are to hardware, not memory, and may cause side effects even on read (these could be acknowledges, for instance; I've seen instances of that myself on some HW).
Another way to look at it is that the semantics of " *ptr = value " is a pure write and should not result in a write-then-read.
I think it's something like atomic_read. E.g. when reading an 32bit int (uint32_t i = *bla;), nothing forbids that the compiler generates code which reads those 4 bytes byte by byte (and so becoming a non-atomic operation). It's unusual to do so on 32bit architectures but valid.
Regards,
Alexander

Le 21/12/2010 14:30, Alexander Holler a écrit :
Am 21.12.2010 13:51, schrieb Albert ARIBAUD:
Le 21/12/2010 13:35, Alexander Holler a écrit :
Hmm, is there actual somethinbg which should forbid the compiler to generate such code which rereads something? It might not be nice, but I don't think that it is forbidden for a compiler to do so. So the proper way to handle such, might be to use asm to avoid that the compiler touches that register.
Yes there is something that should prevent a compiler from inserting reads: these accesses are to hardware, not memory, and may cause side effects even on read (these could be acknowledges, for instance; I've seen instances of that myself on some HW).
Another way to look at it is that the semantics of " *ptr = value " is a pure write and should not result in a write-then-read.
I think it's something like atomic_read. E.g. when reading an 32bit int (uint32_t i = *bla;), nothing forbids that the compiler generates code which reads those 4 bytes byte by byte (and so becoming a non-atomic operation). It's unusual to do so on 32bit architectures but valid.
OTOH, this still respects the semantics (the compiler is allowed to do a non-atomic read) while the bug we're seeing does not repect the semantics (the compiler is not asked to do any read but does one).
Regards,
Alexander
Amicalement,

Dear Albert & friends,
what is your opinion? Should we include the memory barrier patch into the upcoming release (and eventually delay it for further testing), or release as is and solve this issue in the next release?
I tend to leave it as is, as I expect that most people will disappear in the next few days for holidays, so no much testing will be done anyway, and we then can solve this with less pressure in the next release - but I'm not really sure if this is a good idea?
Best regards,
Wolfgang Denk

On 21.12.2010 20:52, Wolfgang Denk wrote:
Dear Albert& friends,
what is your opinion? Should we include the memory barrier patch into the upcoming release (and eventually delay it for further testing), or release as is and solve this issue in the next release?
I tend to leave it as is, as I expect that most people will disappear in the next few days for holidays, so no much testing will be done anyway, and we then can solve this with less pressure in the next release - but I'm not really sure if this is a good idea?
I somehow tend to leave it as is, too.
We have issues with some recent compilers. For these we found a fix using the io.h somehow the same way the Linux kernel does. But this introduces new issues for us, we haven't found a proper fix yet (except changing the code to the 'old' io.h style). But we don't know where we might have this issue additionally, yet.
I would like to talk with some tool chain guys about this, too.
Could we add a readme or a note somewhere about the issues with the recent tool chains in this release?
Best regards
Dirk

Am 21.12.2010 21:04, schrieb Dirk Behme:
On 21.12.2010 20:52, Wolfgang Denk wrote:
Dear Albert& friends,
what is your opinion? Should we include the memory barrier patch into the upcoming release (and eventually delay it for further testing), or release as is and solve this issue in the next release?
I tend to leave it as is, as I expect that most people will disappear in the next few days for holidays, so no much testing will be done anyway, and we then can solve this with less pressure in the next release - but I'm not really sure if this is a good idea?
I somehow tend to leave it as is, too.
We have issues with some recent compilers. For these we found a fix using the io.h somehow the same way the Linux kernel does. But this introduces new issues for us, we haven't found a proper fix yet (except changing the code to the 'old' io.h style). But we don't know where we might have this issue additionally, yet.
The only real problem found with that patch was one with a register which doesn't like an (unmotivated) read after write.
On the other side, without that patch, using gcc >= 4.5.x (at least on arm) proved to fail. In contrast to that problem of gcc 4.5.x ignoring that volatile, 4.5.x still fixes many bugs for arm and gcc >= 4.5.x is necessary for hardfloat. So it's likely that more people will start using 4.5.x (4.5.2 was just released).
Regards,
Alexander

Le 22/12/2010 01:11, Alexander Holler a écrit :
Am 21.12.2010 21:04, schrieb Dirk Behme:
On 21.12.2010 20:52, Wolfgang Denk wrote:
Dear Albert& friends,
what is your opinion? Should we include the memory barrier patch into the upcoming release (and eventually delay it for further testing), or release as is and solve this issue in the next release?
I tend to leave it as is, as I expect that most people will disappear in the next few days for holidays, so no much testing will be done anyway, and we then can solve this with less pressure in the next release - but I'm not really sure if this is a good idea?
I somehow tend to leave it as is, too.
We have issues with some recent compilers. For these we found a fix using the io.h somehow the same way the Linux kernel does. But this introduces new issues for us, we haven't found a proper fix yet (except changing the code to the 'old' io.h style). But we don't know where we might have this issue additionally, yet.
The only real problem found with that patch was one with a register which doesn't like an (unmotivated) read after write.
Yes, and this is enough for me to not want it right away: we caught this one, but how many others, so far unseen, will creep up?
On the other side, without that patch, using gcc>= 4.5.x (at least on arm) proved to fail. In contrast to that problem of gcc 4.5.x ignoring that volatile, 4.5.x still fixes many bugs for arm and gcc>= 4.5.x is necessary for hardfloat. So it's likely that more people will start using 4.5.x (4.5.2 was just released).
Do we need hard floating point in u-boot? IIRC, and unless this changed, the kernel does not want floating point, so I wonder why u-boot would.
As for getting 4.5 to work, for the next cycle people can still use pre 4.5 gccs / toolchains, so this is important but not urgent to the point of rushing decisions.
Regards,
Alexander
Amicalement,

On 22.12.2010 08:02, Albert ARIBAUD wrote:
Le 22/12/2010 01:11, Alexander Holler a écrit :
Am 21.12.2010 21:04, schrieb Dirk Behme:
On 21.12.2010 20:52, Wolfgang Denk wrote:
Dear Albert& friends,
what is your opinion? Should we include the memory barrier patch into the upcoming release (and eventually delay it for further testing), or release as is and solve this issue in the next release?
I tend to leave it as is, as I expect that most people will disappear in the next few days for holidays, so no much testing will be done anyway, and we then can solve this with less pressure in the next release - but I'm not really sure if this is a good idea?
I somehow tend to leave it as is, too.
We have issues with some recent compilers. For these we found a fix using the io.h somehow the same way the Linux kernel does. But this introduces new issues for us, we haven't found a proper fix yet (except changing the code to the 'old' io.h style). But we don't know where we might have this issue additionally, yet.
The only real problem found with that patch was one with a register which doesn't like an (unmotivated) read after write.
Yes, and this is enough for me to not want it right away: we caught this one, but how many others, so far unseen, will creep up?
On the other side, without that patch, using gcc>= 4.5.x (at least on arm) proved to fail. In contrast to that problem of gcc 4.5.x ignoring that volatile, 4.5.x still fixes many bugs for arm and gcc>= 4.5.x is necessary for hardfloat. So it's likely that more people will start using 4.5.x (4.5.2 was just released).
Do we need hard floating point in u-boot? IIRC, and unless this changed, the kernel does not want floating point, so I wonder why u-boot would.
As for getting 4.5 to work, for the next cycle people can still use pre 4.5 gccs / toolchains, so this is important but not urgent to the point of rushing decisions.
Agree.
Btw, I tried to send a summary of our issues to the Codesourcery mailing list:
http://www.codesourcery.com/archives/arm-gnu/msg03989.html
Let's see if we get an answer.
Best regards
Dirk

Dear Dirk Behme,
In message 4D11A65E.8040200@googlemail.com you wrote:
Btw, I tried to send a summary of our issues to the Codesourcery =
mailing list:
http://www.codesourcery.com/archives/arm-gnu/msg03989.html
Let's see if we get an answer.
You got one: http://www.codesourcery.com/archives/arm-gnu/msg03990.html
And it sounds like a reasonable explanation to me.
Best regards,
Wolfgang Denk

On 22.12.2010 08:52, Wolfgang Denk wrote:
Dear Dirk Behme,
In message4D11A65E.8040200@googlemail.com you wrote:
Btw, I tried to send a summary of our issues to the Codesourcery =
mailing list:
http://www.codesourcery.com/archives/arm-gnu/msg03989.html
Let's see if we get an answer.
You got one: http://www.codesourcery.com/archives/arm-gnu/msg03990.html
And it sounds like a reasonable explanation to me.
An additional answer, mainly covering the question why the volatile is optimized away:
http://www.codesourcery.com/archives/arm-gnu/msg03999.html
Best regards
Dirk

Am 22.12.2010 08:02, schrieb Albert ARIBAUD:
Le 22/12/2010 01:11, Alexander Holler a écrit :
Am 21.12.2010 21:04, schrieb Dirk Behme:
On 21.12.2010 20:52, Wolfgang Denk wrote:
Dear Albert& friends,
what is your opinion? Should we include the memory barrier patch into the upcoming release (and eventually delay it for further testing), or release as is and solve this issue in the next release?
I tend to leave it as is, as I expect that most people will disappear in the next few days for holidays, so no much testing will be done anyway, and we then can solve this with less pressure in the next release - but I'm not really sure if this is a good idea?
I somehow tend to leave it as is, too.
We have issues with some recent compilers. For these we found a fix using the io.h somehow the same way the Linux kernel does. But this introduces new issues for us, we haven't found a proper fix yet (except changing the code to the 'old' io.h style). But we don't know where we might have this issue additionally, yet.
The only real problem found with that patch was one with a register which doesn't like an (unmotivated) read after write.
Yes, and this is enough for me to not want it right away: we caught this one, but how many others, so far unseen, will creep up?
On the other side, without that patch, using gcc>= 4.5.x (at least on arm) proved to fail. In contrast to that problem of gcc 4.5.x ignoring that volatile, 4.5.x still fixes many bugs for arm and gcc>= 4.5.x is necessary for hardfloat. So it's likely that more people will start using 4.5.x (4.5.2 was just released).
Do we need hard floating point in u-boot? IIRC, and unless this changed, the kernel does not want floating point, so I wonder why u-boot would.
Most people won't use U-Boot as base for the decision which compiler version to use.
As for getting 4.5 to work, for the next cycle people can still use pre 4.5 gccs / toolchains, so this is important but not urgent to the point of rushing decisions.
I've just written down my opinion.
Regards,
Alexander

Hello,
Am 21.12.2010 01:25, schrieb John Rigby:
On Mon, Dec 20, 2010 at 10:12 AM, Alexander Hollerholler@ahsoftware.de wrote:
There must be more problems. Using gcc 4.5.1, the read*/write*-patch and
...
Yes, you are correct, I see the same here with 4.5.2. I noticed that bd did not have correct values of machine type and boot params:
bd address = 0x8FF24FE0
...
Great!
Thanks a lot for searching and finding that.
I've just tested an u-boot for the BeagleBoard with your "Move DECLARE..."-patch compiled with gcc 4.5.1 and it now boots. ;)
I remember having seen those two DECLARE_GLOBAL_DATA_PTR in beagle.c too, but I was to lazy at that time to check if that has (bad) consequences. ;)
Regards,
Alexander

Dear Alexander Holler,
In message 1292711230-3234-1-git-send-email-holler@ahsoftware.de you wrote:
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
...
+#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); }) +#define writew(v,c) ({ __iowmb(); __arch_putw(v,c); }) +#define writel(v,c) ({ __iowmb(); __arch_putl(v,c); })
http://www.codesourcery.com/archives/arm-gnu/msg03990.html explains why this construct is causing errors in cases where an additional read from the address is unsupported.
Can you please try the following patch instead?
-------------------------------------------------------------------------
From 4672bbddaf8ce7e17a99ba737782cc527d46e5eb Mon Sep 17 00:00:00 2001
From: Alexander Holler holler@ahsoftware.de Date: Sat, 18 Dec 2010 23:27:10 +0100 Subject: [PATCH] ARM: Avoid compiler optimization for readb, writeb and friends.
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
Reading C99 6.7.3 8 and the comment 114) there, I think it is a bug of that gcc version to ignore the volatile type qualifier used e.g. in __arch_getl(). Anyway, using a definition as in the kernel headers avoids such optimizations when gcc 4.5.1 is used.
Maybe the headers as used in the current linux-kernel should be used, but to avoid large changes, I've just added a small change to the current headers.
Signed-off-by: Alexander Holler holler@ahsoftware.de Signed-off-by: Wolfgang Denk wd@denx.de --- arch/arm/include/asm/io.h | 32 ++++++++++++++++++++------------ 1 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index ff1518e..647503a 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -117,21 +117,29 @@ extern inline void __raw_readsl(unsigned int addr, void *data, int longlen) *buf++ = __arch_getl(addr); }
-#define __raw_writeb(v,a) __arch_putb(v,a) -#define __raw_writew(v,a) __arch_putw(v,a) -#define __raw_writel(v,a) __arch_putl(v,a) +#define __raw_writeb(v,a) __arch_putb(v,a) +#define __raw_writew(v,a) __arch_putw(v,a) +#define __raw_writel(v,a) __arch_putl(v,a)
-#define __raw_readb(a) __arch_getb(a) -#define __raw_readw(a) __arch_getw(a) -#define __raw_readl(a) __arch_getl(a) +#define __raw_readb(a) __arch_getb(a) +#define __raw_readw(a) __arch_getw(a) +#define __raw_readl(a) __arch_getl(a)
-#define writeb(v,a) __arch_putb(v,a) -#define writew(v,a) __arch_putw(v,a) -#define writel(v,a) __arch_putl(v,a) +/* + * TODO: The kernel offers some more advanced versions of barriers, it might + * have some advantages to use them instead of the simple one here. + */ +#define dmb() __asm__ __volatile__ ("" : : : "memory") +#define __iormb() dmb() +#define __iowmb() dmb() + +#define writeb(v,c) do { __iowmb(); __arch_putb(v,c); } while (0) +#define writew(v,c) do { __iowmb(); __arch_putw(v,c); } while (0) +#define writel(v,c) do { __iowmb(); __arch_putl(v,c); } while (0)
-#define readb(a) __arch_getb(a) -#define readw(a) __arch_getw(a) -#define readl(a) __arch_getl(a) +#define readb(c) ({ u8 __v = __arch_getb(c); __iormb(); __v; }) +#define readw(c) ({ u16 __v = __arch_getw(c); __iormb(); __v; }) +#define readl(c) ({ u32 __v = __arch_getl(c); __iormb(); __v; })
/* * The compiler seems to be incapable of optimising constants

Hello,
Am 22.12.2010 09:02, schrieb Wolfgang Denk:
In message1292711230-3234-1-git-send-email-holler@ahsoftware.de you wrote:
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
...
+#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); }) +#define writew(v,c) ({ __iowmb(); __arch_putw(v,c); }) +#define writel(v,c) ({ __iowmb(); __arch_putl(v,c); })
http://www.codesourcery.com/archives/arm-gnu/msg03990.html explains why this construct is causing errors in cases where an additional read from the address is unsupported.
Can you please try the following patch instead?
Indeed. Using do {} while (0) instead of that "GCC statement-expression" fixes the problem with the read after write.
I didn't know about that "GCC statement-expression" and my usage of ({...}) was based on the writeb in the kernel-headers. But they (seem to) expand to something returning (void) which might avoid the problem.
Good that I've added the sentence that using the kernel-headers instead of that patch might be a better solution. But this might bring in much more changes, including real memory barriers. ;)
Anyway, now the master (including the GLOBAL...-patch) + patch v3 for read*/write* is good here. Just tested with both gcc 4.3.5 and gcc 4.5.1 using binutils 2.20.1.
Regards,
Alexander

On 22.12.2010 09:02, Wolfgang Denk wrote:
Dear Alexander Holler,
In message1292711230-3234-1-git-send-email-holler@ahsoftware.de you wrote:
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
...
+#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); }) +#define writew(v,c) ({ __iowmb(); __arch_putw(v,c); }) +#define writel(v,c) ({ __iowmb(); __arch_putl(v,c); })
http://www.codesourcery.com/archives/arm-gnu/msg03990.html explains why this construct is causing errors in cases where an additional read from the address is unsupported.
Just for the record:
The trick is to ensure that the __arch_putx() containing the volatile is not the last statement in the GCC statement-expression. So, using something like
#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); v;})
(note the additional 'v;') should result in correct code, too.
The patches sent by Wolfgang and Alexander using
#define writeb(v,c) do { __iowmb(); __arch_putb(v,c); } while (0)
do the same with a slightly different syntax, so these patches are fine, too.
Thanks
Dirk
Can you please try the following patch instead?
From 4672bbddaf8ce7e17a99ba737782cc527d46e5eb Mon Sep 17 00:00:00 2001
From: Alexander Hollerholler@ahsoftware.de Date: Sat, 18 Dec 2010 23:27:10 +0100 Subject: [PATCH] ARM: Avoid compiler optimization for readb, writeb and friends.
gcc 4.5.1 seems to ignore (at least some) volatile definitions, avoid that as done in the kernel.
Reading C99 6.7.3 8 and the comment 114) there, I think it is a bug of that gcc version to ignore the volatile type qualifier used e.g. in __arch_getl(). Anyway, using a definition as in the kernel headers avoids such optimizations when gcc 4.5.1 is used.
Maybe the headers as used in the current linux-kernel should be used, but to avoid large changes, I've just added a small change to the current headers.
Signed-off-by: Alexander Hollerholler@ahsoftware.de Signed-off-by: Wolfgang Denkwd@denx.de
arch/arm/include/asm/io.h | 32 ++++++++++++++++++++------------ 1 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index ff1518e..647503a 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -117,21 +117,29 @@ extern inline void __raw_readsl(unsigned int addr, void *data, int longlen) *buf++ = __arch_getl(addr); }
-#define __raw_writeb(v,a) __arch_putb(v,a) -#define __raw_writew(v,a) __arch_putw(v,a) -#define __raw_writel(v,a) __arch_putl(v,a) +#define __raw_writeb(v,a) __arch_putb(v,a) +#define __raw_writew(v,a) __arch_putw(v,a) +#define __raw_writel(v,a) __arch_putl(v,a)
-#define __raw_readb(a) __arch_getb(a) -#define __raw_readw(a) __arch_getw(a) -#define __raw_readl(a) __arch_getl(a) +#define __raw_readb(a) __arch_getb(a) +#define __raw_readw(a) __arch_getw(a) +#define __raw_readl(a) __arch_getl(a)
-#define writeb(v,a) __arch_putb(v,a) -#define writew(v,a) __arch_putw(v,a) -#define writel(v,a) __arch_putl(v,a) +/*
- TODO: The kernel offers some more advanced versions of barriers, it might
- have some advantages to use them instead of the simple one here.
- */
+#define dmb() __asm__ __volatile__ ("" : : : "memory") +#define __iormb() dmb() +#define __iowmb() dmb()
+#define writeb(v,c) do { __iowmb(); __arch_putb(v,c); } while (0) +#define writew(v,c) do { __iowmb(); __arch_putw(v,c); } while (0) +#define writel(v,c) do { __iowmb(); __arch_putl(v,c); } while (0)
-#define readb(a) __arch_getb(a) -#define readw(a) __arch_getw(a) -#define readl(a) __arch_getl(a) +#define readb(c) ({ u8 __v = __arch_getb(c); __iormb(); __v; }) +#define readw(c) ({ u16 __v = __arch_getw(c); __iormb(); __v; }) +#define readl(c) ({ u32 __v = __arch_getl(c); __iormb(); __v; })
/*
- The compiler seems to be incapable of optimising constants

Dirk Behme:
Just for the record:
The trick is to ensure that the __arch_putx() containing the volatile is not the last statement in the GCC statement-expression. So, using something like
#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); v;})
(note the additional 'v;') should result in correct code, too.
Yes, that's good. Also "0" may work, and may be more readable, (or not, according to who reads it).
The patches sent by Wolfgang and Alexander using
#define writeb(v,c) do { __iowmb(); __arch_putb(v,c); } while (0)
do the same with a slightly different syntax, so these patches are fine, too.
It's not just different syntax, it's different semantics.
The ({...}) trick turns statements into expressions, while the "do {...} while(0)" does not. I'd better not forbid statements like
while (reg = readb(addr), reg != VALUE) { .... }
or
if (readb(addr) == VALUE) { ... }
or swtich (readb(addr)) { ... }
While I agree they may in general not be clean, I can forsee meaningful uses. Moreover, I'd better allow expression-looking macros to really behave like expressions -- otherwise error messages are quite hard to understand for the unaquainted.
IMHO, the only reason to use "do {} while(0)" over statemente expressions is being portable but in u-boot we are gcc-specific anyways.
/alessandro

On 30.12.2010 00:10, Alessandro Rubini wrote:
Dirk Behme:
Just for the record:
The trick is to ensure that the __arch_putx() containing the volatile is not the last statement in the GCC statement-expression. So, using something like
#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); v;})
(note the additional 'v;') should result in correct code, too.
Yes, that's good. Also "0" may work, and may be more readable, (or not, according to who reads it).
Yes, indeed,
#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); 0;})
seems to work, too :)
Thanks
Dirk
The patches sent by Wolfgang and Alexander using
#define writeb(v,c) do { __iowmb(); __arch_putb(v,c); } while (0)
do the same with a slightly different syntax, so these patches are fine, too.
It's not just different syntax, it's different semantics.
The ({...}) trick turns statements into expressions, while the "do {...} while(0)" does not. I'd better not forbid statements like
while (reg = readb(addr), reg != VALUE) { .... }
or
if (readb(addr) == VALUE) { ... }
or swtich (readb(addr)) { ... }
While I agree they may in general not be clean, I can forsee meaningful uses. Moreover, I'd better allow expression-looking macros to really behave like expressions -- otherwise error messages are quite hard to understand for the unaquainted.
IMHO, the only reason to use "do {} while(0)" over statemente expressions is being portable but in u-boot we are gcc-specific anyways.
/alessandro

Dear Alessandro Rubini,
In message 20101229231004.GA17999@mail.gnudd.com you wrote:
#define writeb(v,c) ({ __iowmb(); __arch_putb(v,c); v;})
(note the additional 'v;') should result in correct code, too.
Yes, that's good. Also "0" may work, and may be more readable, (or not, according to who reads it).
'0' looks wrong to me, as it changes behaviour. Keep in mind that the previous implemention usually looks like this:
#define writeb(val,addr) (*(volatile unsigned char *)(addr)) = (val)
The value of this should be "val", not 0.
#define writeb(v,c) do { __iowmb(); __arch_putb(v,c); } while (0)
do the same with a slightly different syntax, so these patches are fine, too.
It's not just different syntax, it's different semantics.
This is true. Thanks for pointing out.
The ({...}) trick turns statements into expressions, while the "do {...} while(0)" does not. I'd better not forbid statements like
while (reg = readb(addr), reg != VALUE) { .... }
or
if (readb(addr) == VALUE) { ... }
or swtich (readb(addr)) { ... }
Here you are mixing things up. There is no issue with the read*() code, only the write*() code is affected.
And while no person ion a sane mind of state should use write*() in such a context, I agree that for the sake of backward compatibility we should change the code. Patch follows.
While I agree they may in general not be clean, I can forsee meaningful uses. Moreover, I'd better allow expression-looking macros to really behave like expressions -- otherwise error messages are quite hard to understand for the unaquainted.
Agreed. But the write*() "functions" are considered to return void, so there should not be any such issues.
IMHO, the only reason to use "do {} while(0)" over statemente expressions is being portable but in u-boot we are gcc-specific anyways.
This is wrong interpretation, too; nothing in this construct is GCC specific.
Best regards,
Wolfgang Denk
participants (6)
-
Albert ARIBAUD
-
Alessandro Rubini
-
Alexander Holler
-
Dirk Behme
-
John Rigby
-
Wolfgang Denk