[U-Boot-Users] v2p4 porting troubles

Gidday there,
We have a custom made board containing not much more than a Virtex-II Pro (v2p4) and 16 Megs of Mobile SDRAM. We are trying to port Linux, and in preparing we've successfully built kernels and code for a development board from Amirix.
But...
We have had a series of problems with getting u-boot to work. Firstly, this is a small Virtex-II, so we are unable to put everything into BRAM. So we've made a stripped down version of Amirix's ppcboot-lite, that is supposed to load u-boot, which then will load Linux.
Ppcboot-lite runs fine out of BRAM. However we've had several problems with executing from SDRAM. With various versions of the IPIF interface we have gone through the various Machine Check Exceptions problems with jumps to 0x700 and 0x200.
Now, it seems that we have found a working configuration (ipif v1.0.d) that lets us run code compiled by EDK (on WinXP under VMWare), but there is an odd problem with the same code compiled by our Linux cross-compiler.
We have some LEDs connected to memory mapped I/O, and our test code writes to it. The EDK compiled binary makes the following assembly for the output part.
stw r0,0(r9)
This works fine, and the LEDs blink all pretty and flashy like.
The Linux tools compile...
stw r9,0(r0)
However, the LEDs respond with blankity nothingness. Using XMD, we swap around these general purpose (according to how we read the docs) registers (r8,r10, r1 etc.) , and sometimes it works and sometimes it doesn't.
In the mailing-list archive, I've seen mention of a common SDRAM problem with u-boot, but it wasn't clear if this was just for Virtex-IIs or what.
Has anyone seen a problem like this? Is it our toolchain? Could it be signal integrity on our board? (we have slowed things on the bus down to 16MHz, just in case)
We are using crosstools-0.26, with gcc-3.3-20040112 and glibc-2.3.2.
Questions for clarification are welcome. Pointers, smack-up-side-the-heads, hints, cajoles, rants are also welcome.
Joshua Lamorie Department of Random Walks Xiphos Technologies Inc.

Gidday again,
In further investigating this problem we have seen a pattern. The failure constantly occurs when the instruction is compiled as...
stw rX,0(r0)
where X is 1-31.
We see that instead of writing the data to the address pointed to by r0, it writes the data to address 0x0 of physical memory (we have virtual memory disabled). When I look at the PowerPC Processor Reference Guide (pp. 80-82), it mentions that for the addressing modes Register-Indirect with Immediate Index, Register-Indirect with Index, and Register Indirect, if rA is set to 0 (refering also to r0) then an index of 0 is used instead of the value of r0.
So if this is the problem, why is our compiler making code that does this? I'm new to PPC assembler, so perhaps I'm interpreting this incorrectly.
Can you recommend a toolchain that would be better than the one we're currently using? (listed below)
Thanks in advance.
Joshua Lamorie Director of Brownian Motion Control Xiphos Technologies Inc.
----- Original Message ----- [snip]
The Linux tools compile...
stw r9,0(r0)
However, the LEDs respond with blankity nothingness. Using XMD, we swap around these general purpose (according to how we read the docs) registers (r8,r10, r1 etc.) , and sometimes it works and sometimes it doesn't.
[snip]
We are using crosstools-0.26, with gcc-3.3-20040112 and glibc-2.3.2

In message 004101c40b6d$69274aa0$cb01a8c0@xiphos.ca you wrote:
So if this is the problem, why is our compiler making code that does
this? I'm new to PPC assembler, so perhaps I'm interpreting this
Maybe your compiler / assembler is brogen?
Can you recommend a toolchain that would be better than the one we're
currently using? (listed below)
ELDK 3.0?
Best regards,
Wolfgang Denk

(i'm working with the original poster, Joshua Lamorie)
On Tue, 16 Mar 2004, Wolfgang Denk wrote:
In message 004101c40b6d$69274aa0$cb01a8c0@xiphos.ca you wrote:
So if this is the problem, why is our compiler making code that does
this? I'm new to PPC assembler, so perhaps I'm interpreting this
Maybe your compiler / assembler is brogen?
Can you recommend a toolchain that would be better than the one we're
currently using? (listed below)
ELDK 3.0?
thank you for the suggestion!
i had missed the ELDK when i looked for a toolchain originally. I downloaded the latest version, installed from binary:
ELDK version 3.0 ppc_4xx: Build 2004-02-16
and compiled my little test program, which basically just loops and outputs values to a memory-mapped register that is plugged into LEDs.
to my amazement, the ELDK compiler made the same "mistake" (in quotes because it's entirely possible that it's my limited ppc405 knowledge that leads me to thinks it's a mistake):
00018304 <XIo_Out32>: 18304: 94 21 ff e0 stwu r1,-32(r1) 18308: 93 e1 00 1c stw r31,28(r1) 1830c: 7c 3f 0b 78 mr r31,r1 18310: 90 7f 00 08 stw r3,8(r31) 18314: 90 9f 00 0c stw r4,12(r31) 18318: 3d 20 00 03 lis r9,3 1831c: 80 1f 00 0c lwz r0,12(r31) 18320: 90 09 84 b0 stw r0,-31568(r9) 18324: 81 3f 00 0c lwz r9,12(r31) 18328: 80 1f 00 08 lwz r0,8(r31) 1832c: 91 20 00 00 stw r9,0(r0) 18330: 7c 00 06 ac eieio 18334: 81 61 00 00 lwz r11,0(r1) 18338: 83 eb ff fc lwz r31,-4(r11) 1833c: 7d 61 5b 78 mr r1,r11 18340: 4e 80 00 20 blr
(notice that this is the XIo_Out32() function generated by Xilinx XPS)
the problem is at 1832c. it correctly puts the absolute address in r0, and the data to output in r9. it then tries to store r9's contents at [r0+0].
Looking at the PP405's UM, i see this in the stw instruction page:
===================== An effective address (EA) is calculated by adding a displacement to a base address, which are formed as follows: * The displacement is formed by sign-extending the 16-bit d instruction field to 32 bits. * If the rA field is 0, the base address is 0. * If the rA field is not 0, the contents of register rA are used as the base address.
The contents of register rS are stored into the word referenced by EA. =====================
so using r0 as a pointer won't work: the cpu won't use the contents of r0, but will use 0. And we verified that: the value indeed gets output to 0 in memory.
Both ELDK's compiler and my other toolchain, a crosstool 0.26-generated toolchain (powerpc-405-linux-gnu-gcc version 3.3.3 20040112 (prerelease)) generate this "flawed" code.
Which would explain why our u-boot port isn't working, as presumably, "mistaken" code is generated elsewhere as well.
This looks like a gcc problem, but perhaps someone on this list could offer some guidance: - do all powerpc's, not just the 405, have this behavior (of using 0 instead of r0 when the rA field is 0)? - if so, why would these compilers generate code that seems flawed? - if not, are we supposed to invoke the compiler in a particular way?
The compiler that is part of Xilinx EDK, powerpc-eabi-gcc gcc version 2.95.3-4 Xilinx EDK 6.2 Build EDK_Gm.1, does not produce this flaw (it uses r9 as the address, r0 as the data), which correctly outputs to the correct address.
i realize that at this level, it's not a u-boot problem and that this mailing list is probably not the best avenue to ask for advice, but seeing as some people here have ppc / xilinx experience, perhaps someone would care to make a comment.
thank you!

In message Pine.LNX.4.58.0403161421230.24165@strange.wwd.ca you wrote:
to my amazement, the ELDK compiler made the same "mistake" (in quotes because it's entirely possible that it's my limited ppc405 knowledge that leads me to thinks it's a mistake):
...
(notice that this is the XIo_Out32() function generated by Xilinx XPS)
Which sort of code does this tool generate? C code? Then it would be indeed a compiler probile, and I definitely would like to see this C code.
Looking at the PP405's UM, i see this in the stw instruction page:
This behaviour is standard and well-known on PPC.
This looks like a gcc problem, but perhaps someone on this list could
I don;t thinkthat GCC has such a serious problem.
offer some guidance:
- do all powerpc's, not just the 405, have this behavior (of using 0
instead of r0 when the rA field is 0)?
Yes, this is standard on all PPC.
- if so, why would these compilers generate code that seems flawed?
I don't think that a compiler would generate such code.
- if not, are we supposed to invoke the compiler in a particular way?
Please show us the source code that produces such output; without all discussion is in vain.
Best regards,
Wolfgang Denk

We're doing quite the tag team here.
(notice that this is the XIo_Out32() function generated by Xilinx XPS)
Which sort of code does this tool generate? C code? Then it would be indeed a compiler probile, and I definitely would like to see this C code.
The code from Xilinx is as follows...
void XIo_Out32(XIo_Address OutAddress, Xuint32 Value) { __asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "r" (OutAddress)); }
... which is called from a simple program (found at the end of this email).
When this is compiled with the Xilinx toolchain (gcc 2.95 I think), it generates the 'real' assembler as follows.
stw r0,0(r9)
However, when we put this into a linux toolchain (crosstools 0.26, or ELDK 3.0) it generates it with r0 and r9 in the opposite spots.
stw r9,0(r0)
We can change the C code to
__asm__ volatile ("stw %1,0(%0); eieio" : : "r" (OutAddress), "r" (Value));
Which seems to generate it correctly with the linux tools.
We have added this same function inside u-boot to toggle some LEDs as we are debugging, but without changing anything it generates assembler that uses r4 and r3 and not r0 and r9, so there is no chance of a problem.
But why? What is it about this C code which causes this sort of thing? What constraints should be used? Where is decent, unambiguous documentation of gcc inline assembler (instead of poorly explained examples)?
Thank you Wolfgang for your help so far. Your comments have been very useful.
Joshua
The original test code is as follows.
#define XPAR_IOBRIDGEV2P_PLB_0_BASEADDR 0x40000000 typedef unsigned long Xuint32; /**< unsigned 32-bit */ typedef Xuint32 XIo_Address;
void XIo_Out32(XIo_Address OutAddress, Xuint32 Value); int simpletest();
int _start() { simpletest(0,0); }
int simpletest() { int i,j,k;
XIo_Out32(XPAR_IOBRIDGEV2P_PLB_0_BASEADDR, 0x55); while (1) { for (i=1; i<30000; i++) { for (j=i; j<5000; j*=2) { for (k=0; k<1000; k+=5) { } } XIo_Out32(XPAR_IOBRIDGEV2P_PLB_0_BASEADDR, i&0xFF); } } return(0);
}
void XIo_Out32(XIo_Address OutAddress, Xuint32 Value) {
//__asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "r" (OutAddress)); __asm__ volatile ("stw %1,0(%0); eieio" : : "r" (OutAddress), "r" (Value)); }

In message 009b01c40baa$a324f2b0$cb01a8c0@xiphos.ca you wrote:
We're doing quite the tag team here.
;-)
The code from Xilinx is as follows...
void XIo_Out32(XIo_Address OutAddress, Xuint32 Value) { __asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "r" (OutAddress)); }
This code is broken. The "r" operand specification sais "a register operand is allowed provided that it is in a general register." Of course R0 is a GPR as well. I think you need to use "m" (any kind of address that the machine supports in general) here:
__asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "m" (OutAddress));
But why? What is it about this C code which causes this sort of thing?
Actually this is no C code, but inline assembler. And yes, it is the code, which uses an inappropriate operand specification.
What constraints should be used? Where is decent, unambiguous documentation
Use "m".
of gcc inline assembler (instead of poorly explained examples)?
The GCC info pages are pretty useful. See File: gcc.info, Node: Simple Constraints,
Thank you Wolfgang for your help so far. Your comments have been very useful.
I'm glad if I could help.
Best regards,
Wolfgang Denk

In message 009b01c40baa$a324f2b0$cb01a8c0@xiphos.ca you wrote:
We're doing quite the tag team here.
;-)
The code from Xilinx is as follows...
void XIo_Out32(XIo_Address OutAddress, Xuint32 Value) { __asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "r" (OutAddress)); }
This code is broken. The "r" operand specification sais "a register operand is allowed provided that it is in a general register." Of course R0 is a GPR as well. I think you need to use "m" (any kind of address that the machine supports in general) here:
__asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "m" (OutAddress));
If you want to avoid using R0, you should be able to use the "clobber" list. like this: __asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "r" (OutAddress) : "r0" );
Jocke

The code from Xilinx is as follows...
void XIo_Out32(XIo_Address OutAddress, Xuint32 Value) { __asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "r" (OutAddress)); }
This code is broken. The "r" operand specification sais "a register operand is allowed provided that it is in a general register." Of course R0 is a GPR as well. I think you need to use "m" (any kind of address that the machine supports in general) here:
__asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "m" (OutAddress));
If you want to avoid using R0, you should be able to use the "clobber" list. like this: __asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "r" (OutAddress) : "r0" );
Jocke
It appears that PPC has yet another register operand, "b". "b" appears to mean all "r" registers but r0. The asm would then become: __asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "b" (OutAddress));
Anyone that knows more about the "b" operand?

Joshua,
The code from Xilinx is as follows...
void XIo_Out32(XIo_Address OutAddress, Xuint32 Value) { __asm__ volatile ("stw %0,0(%1); eieio" : : "r" (Value), "r" (OutAddress)); }
Yes, this code is broken. You can use the work-around as pointed out by Wolfgang or you can use the xio.h that is part of u-boot and the Linux kernel. In u-boot you can find it at board/xilinx/common/xio.h In Linux you can find it at arch/ppc/platforms/xilinx_ocp/xio.h
I will make sure this piece of code gets fixed in EDK.
- Peter
participants (5)
-
Eric St-Jean
-
Joakim Tjernlund
-
Joshua Lamorie
-
Peter Ryser
-
Wolfgang Denk