
Hi Andreas,
On Mon, 13 May 2013 10:35:12 +0200, "Andreas Bießmann" andreas.devel@googlemail.com wrote:
Hi Albert,
On 05/10/2013 05:09 PM, Albert ARIBAUD wrote:
Hi Andreas,
On Fri, 10 May 2013 12:57:21 +0200, "Andreas Bießmann" andreas.devel@googlemail.com wrote:
Hi Albert,
On 05/10/2013 11:24 AM, Albert ARIBAUD wrote:
Hi Andreas,
On Wed, 8 May 2013 11:25:17 +0200, Andreas Bießmann andreas.devel@googlemail.com wrote:
Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link section.h symbol files) changed the __bss_end symbol type from char[] to ulong. This led to wrong relocation parameters which ended up in a not working u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we may get a 'half-working' u-boot then.
Fix this by dereferencing the __bss_end symbol where needed.
(cc:ing Simon and Tom)
The dereferencing is correct, so this patch seems good per se (it could actually have applied when __bss_end was still a char[]).
well, as I understood this the __bss_end being a char[] did implicitly take the address when accessing __bss_end (as we do when we have a definition of char foo[2] and we take just 'foo'). But you say here we should reference the address of __bss_end while it was still of type char[]. Sorry, I do not understand that, can you please clarify?
There are several concepts here, some pertaining to the compiler, some to the linker.
From the linker viewpoint, a symbol is *always* and *only* an address, the first address of the object corresponding to the symbol, and an object is just some area in the addressable space.
From the compiler viewpoint, an object has a C type, possibly with an initial value, and a name, which is the symbol. The compiler considers the name/symbol to be value, not the address of the corresponding object... at least most of the time: as you indicate, when the symbol denotes a C array, then the C compiler understand the symbol as the address of the array.
The __bss_end symbol does not actually correspond to an object in the usual sense, since the BSS contains all sorts of data: any C global, either uninitialized or initialized with zeroes, whatever its type, ends up in BSS. The most general way one can assign a type to BSS itself is by considering it as a shapeless array of bytes -- hence the char[] definition.
Thus, the C compiler considered the name __bss_end to denote the address of the BSS "object", and the C code for AVR32 was correct as it was actually referring to the BSS "object"'s address.
When the __bss_end symbol's C type was changed to 'ulong', this changed the way the compiler understood the symbol: it now thinks __bss_end is the BSS' "value", which has no true sense, and certainly does not mean 'the first 4 bytes of BSS considered as a 32-bit value'.
To compensate this, the AVR32 code has to add an & to find the address of __bss_end, but the original error is to have changed the "type" of the BSS.
IOW, we should *always* take the address of __bss_end, since this is the only thing it was defined for. We should never give it a chance to even *have* a value at the C level, because we don't want to read, or worse, write, the BSS itself; we only want to access C globals in the BSS.
thank you for your detailed explanation. So now its clear why referring the address of an object of type char[] will also work. Another question, wouldn't it make sense to declare these C globals as const then?
Indeed, const may help prevent these symbols from being accidentally written into, at least in the most expected cases such as passing &__bss_end to a function expecting a non-const char*.
There is, however, a much better way of preventing this and more: just give these symbols a C type of 'struct {}' (empty struct).
Since this type has absolutely no field which could be written into or read from, it is completely protected from direct write but also from direct read; and a pointer to it has type "struct {} *" which is incompatible with any other pointer, so any inconsiderate use of it is detected at compile time.
I had thought of the 'struct {}' method for linker lists refactoring, when I needed a zero-size type; I finally turned to char[0] instead (and comment on this at line 150 of include/linker_lists.h) because the struct method would cause gcc 4.4 and earlier, such as eldk42, to throw diagnostics like "warning: dereferencing type-punned pointer will break strict-aliasing rules" -- that is the incompatibility I am talking about.
Note that the diagnostics did not stem from the empty struct variable declarations as such, but type-casting the address of an empty struct into a pointer to a known, non-empty, struct; I just checked now, and doing an intermediate cast to char* or void* prevents the warnings. Why I failed to find this when I was refactoring linker lists, I'll never know.
Of course, there is always a possibility that the area around BSS end be accidentally read or written into if &__bss_end is force-cast into, say, a char*, but that's true whatever type you give __bss_end. Aside from this force-casting, the empty struct approach is IMO the most protective.
Thus if one wants to protect symbols in sections.h, I suggest giving them a type of 'struct {}', then building U-Boot with e.g. ELDK42, and fixing all type-punning diagnostics.
(and I will 'fix' include/linker_lists.h myself to use empty structs)
Best regards
Andreas Bießmann
Amicalement,