[U-Boot] [RFC] [PATCH] rewrite doc/README.arm-unaligned-accesses

Signed-off-by: Albert ARIBAUD albert.u.boot@aribaud.net --- There has been a few back-and-forths (and sideways too) about how unaligned accesses are considered in ARM U-Boot. This post is to (try and) get us together in one place, get things straight about what is currently done and why as far as alignment is concerned, and to get doc/README.arm-unaligned-accesses is clear and consistent with it.
Please, in discussing this topic, always elaborate on your opinions on whatever you are commenting. Avoid saying "this is wrong" or "you should do XYZ instead" without also telling what exactly is wrong and why, or what you compare XYZ to and what are the pros and cons of each; your reasoning may seem obvious to you, but it probably is not to the person whose post you are commenting, and won't be to the readers of the resulting doc/README-arm-unaligned-accesses file if it is to include your contribution.
I also suggest giving concrete examples, code excerpts, possibly even generated assembly language if required to make a point; just make sure that any generated code is actually generated (as opposed to manually constructed) and how it was produced (including compiler options), so that people can reproduce it.
And yes, this goes for me as well :) so don't hesitate to point out parts of this text which require improvement... and why they do. :)
doc/README.arm-unaligned-accesses | 379 +++++++++++++++++++++++++++++--------- 1 file changed, 289 insertions(+), 90 deletions(-)
diff --git a/doc/README.arm-unaligned-accesses b/doc/README.arm-unaligned-accesses index c37d135..5873689 100644 --- a/doc/README.arm-unaligned-accesses +++ b/doc/README.arm-unaligned-accesses @@ -1,55 +1,229 @@ -If you are reading this because of a data abort: the following MIGHT -be relevant to your abort, if it was caused by an alignment violation. -In order to determine this, use the PC from the abort dump along with -an objdump -s -S of the u-boot ELF binary to locate the function where -the abort happened; then compare this function with the examples below. -If they match, then you've been hit with a compiler generated unaligned -access, and you should rewrite your code or add -mno-unaligned-access -to the command line of the offending file. +Notes:
-Note that the PC shown in the abort message is relocated. In order to -be able to match it to an address in the ELF binary dump, you will need -to know the relocation offset. If your target defines CONFIG_CMD_BDI -and if you can get to the prompt and enter commands before the abort -happens, then command "bdinfo" will give you the offset. Otherwise you -will need to try a build with DEBUG set, which will display the offset, -or use a debugger and set a breakpoint at relocate_code() to see the -offset (passed as an argument). - -* - -Since U-Boot runs on a variety of hardware, some only able to perform -unaligned accesses with a strong penalty, some unable to perform them -at all, the policy regarding unaligned accesses is to not perform any, -unless absolutely necessary because of hardware or standards. - -Also, on hardware which permits it, the core is configured to throw -data abort exceptions on unaligned accesses in order to catch these -unallowed accesses as early as possible. - -Until version 4.7, the gcc default for performing unaligned accesses -(-mno-unaligned-access) is to emulate unaligned accesses using aligned -loads and stores plus shifts and masks. Emulated unaligned accesses -will not be caught by hardware. These accesses may be costly and may -be actually unnecessary. In order to catch these accesses and remove -or optimize them, option -munaligned-access is explicitly set for all -versions of gcc which support it. - -From gcc 4.7 onward starting at armv7 architectures, the default for -performing unaligned accesses is to use unaligned native loads and -stores (-munaligned-access), because the cost of unaligned accesses -has dropped on armv7 and beyond. This should not affect U-Boot's -policy of controlling unaligned accesses, however the compiler may -generate uncontrolled unaligned accesses on its own in at least one -known case: when declaring a local initialized char array, e.g. +i) If you are reading this because of a data abort: section 10 is what +you're looking for, but please read the whole document.
-function foo() -{ - char buffer[] = "initial value"; -/* or */ - char buffer[] = { 'i', 'n', 'i', 't', 0 }; - ... -} +ii) In order to make sure the following is self-sufficient, it goes +through the basics of alignment and assumes only good, not expert, +knowledge of the C language. + +1. C99 alignment requirements + +The C99 standard [1] (henceforth: 'C99') defines alignment requirements +as a "requirement that objects of a particular type be located on +storage boundaries with addresses that are particular multiples of a +byte address". + +In C99, unaligned accesses (those which which do not respect alignment +requirements of the object type being accessed) are deemed 'undefined'. +This means programs which contain unaligned accesses might build and +execute as if there were no alignment constraints, or build and execute +but not as expected, or build and crash at execution, or not build +at all--or even crash the compiler. + +2. Implementation alignment requirements + +While C99 does define alignment requirements, it does not lay out any +actual alignment requirements, because these depend greatly on the +hardware involved; they are thus defined by C99 implementations. + +For ARM, the C alignment requirements are laid out in the ARM EABI [2]. +For instance, is is the ARM EABI which sets the alignment constraint of +type 'long' to four-byte boundaries. + +Alignment requirements for a given architecture may differ from +hardware capabilities, i.e. they might be stricter than what the +hardware can actually do. One example is (32-bit) x86, which can do +unaligned accesses at the hardware level at some performance cost, but +has stricter requirement analogous to the ARM EABI ones. ARM is a mixed +bag: some older ARM hardware cannot perform unaligned access at all; +some can but at a cost; some can at virtually no cost. + +Further, even when a given architecture is capable of emitting such +unaligned accesses at the core or CPU level, at a higher (system) level +they might be forbidden because the target address falls within a +region in which only aligned accesses are possible. + +3. Native vs emulated unaligned accesses. + +There is a different between the alignment of accesses in a C program +source code and the alignment of accesses at the core or CPU level. In +the following, translated accesses (core or CPU accesses) will be +qualified as /native/ accesses to distinguish them from (untranslated) C +program accesses. + +A C99 implementation might translate an unaligned access into a native +unaligned access, or it might emulate it by a combination of native +aligned accesses. + +4. Alignment requirements in U-Boot + +As U-Boot runs on a variety of hardware using a variety of C99 +implementations, alignment requirements for the U-Boot code are the +strictest possible so that all implementations building U-Boot will have +their requirements satisfied. For instance, aligning longs to four-byte +boundaries is compatible with all implementations and is thus the +requirement for U-Boot. + +[U-Boot alignment requirements are actually stricter than just C99: in +U-Boot, at least some accesses of a given width must not be broken into +smaller accesses or lumped into a wider access, because they are done +to or from a device for which each physical access may have side +effects. C99 does not explicitly state such a constraint except in +general terms, that a C99 implementation .] + +The strict alignment requirements imply that all type declarations in +U-Boot should respect alignment requirements; especially, that no +structure should contain unaligned members. + +5. Purposeful unaligned accesses in U-Boot + +However, there might be cases where some data representation may need +to include objects which are not aligned to their requirements. This +happens in structs representing protocols such as IP or USB, for +instance, or de facto standards such as FAT. Taking FAT as an example, +here is the start of the FAT header, expressed as a succession of C +types from position 0 in the first disk sector: + + 0 3 jump instruction + 3 8 Name of the tool which formatted the FAT + 11 2 Number of bytes per sector + +Since the third field is a short, as per ARM EABI it should be +even-aligned. However, it is not: its offset is 11, an odd value. +Therefore, a struct representing the FAT header (examples below are +adapted from actual code in fs/fdos/dos.h) would either have to split +the field in two smaller bytes... + + typedef struct bootsector + { + unsigned char jump [3]; /* 0 Jump to boot code */ + char banner[BANNER_LG]; /* 3 OEM name & version */ + unsigned char secsiz_lo; /* 11 Bytes per sector LO */ + unsigned char secsiz_hi; /* 12 Bytes per sector HI */ + [...] + } Directory_t; + +... but this is awkward and makes accesses to the field more +complicated to express and read. The alternative is to use a short +field, made unaligned by way of a 'packed' attribute on the struct: + + typedef struct bootsector + { + unsigned char jump [3]; /* 0 Jump to boot code */ + char banner[BANNER_LG]; /* 3 OEM name & version */ + unsigned short secsiz; /* 11 Bytes per sector */ + [...] + } __attribute__ ((packed)) Directory_t; + +The latter is what is done in the U-Boot code, and the access is then +explicitly turned from one unaligned 16-bit to two aligned 8-bit ones +by using the macro __le16_to_cpu (which is slightly below U-Boot +requirements, as we should be using put_unaligned_le16 instead). + +6. Unintended unaligned accesses in U-Boot + +As stated in section 4, U-Boot code should be conforming enough that it +would not perform accesses which are unaligned as per ARM EABI, and +access to intentionally unaligned C objects in U-Boot should be done by +emulating through get_unaligned/put_unaligned macros. + +However, mistakes do happen, and therefore a patch might slip past +review, or a developer might write new code before submitting it, which +could cause some unintended unaligned access. + +Mostly, the cause of unintended unaligned accesses fall into two +categories: needlessly packed structs and pointer arithmetic mistakes, +the latter including compile-time mistakes and run-time mistakes. + +Such unaligned accesses might only actually happen on some architecture +different from the one the developer is working on, which will delay +discovery of the issue and complicate its analysis and resolution. +As we cannot test for all targets at each release, it might even happen +that some release of U-Boot out there would fail to work for a given +target because an unintended unaligned access was not detected early. + +Early detection of unaligned accesses is therefore important, even on +architecture which could handle such unaligned accesses natively or +by emulation, for the benefit of architectures which could not. + +7. ARM: detecting unintentional unaligned accesses by hardware + +[This section assumes that unaligned accesses are translated into native +unaligned accesses rather than emulated. For a discussion of this +assumption, see next section.] + +In order to catch unintended unaligned accesses on ARM platforms, we can +use the A bit in SRC: setting it will cause the core to throw a Data +Abort when a native unaligned access occurs. + +This will catch pointer arithmetic errors when they result in badly +aligned pointers, as well as accesses to unaligned fields in packed +structures of -munaligned-access is in effect (see next section). + +8. ARM: detecting unintentional unaligned accesses by software + +In the previous section, we have chosen to configure the ARM hardware +to catch native unaligned accesses. However, whether unaligned accesses +are translated into native or emulated accesses depends, on ARM, on a +compiler option: -m[no-]unaligned-access. + +With -munaligned-access, the compiler is allowed to translate unaligned +accesses from a C program into native unaligned accesses at the core +level. + +With -mno-unaligned-access, the compiler is *not* allowed to translate +unaligned accesses from a C program into native unaligned accesses at +the core level, and must therefore emulate them with a combination of +smaller aligned native accesses. + +At first sight, -mno-unaligne-access seems to solve our problem as it +prevents the core from emitting native unaligned accesses; and +-munaligned-access appears counter-intuitive as it allows the compiler +to emit native unaligned accesses, which we precisely want to avoid. + +However, if -mno-unaligned-access prevents native unaligned accesses on +ARM, it has no effect on other architectures for which it does not even +always have an equivalent; thus, a C program which builds and runs +without error with -mno-unaligned-access on ARM platforms might fail to +build, or build but fail to run properly, on other platforms. + +On the other hand, -munaligned-access only causes natie unaligned +accesses if the C program has unaligned accesses to begin with; for C +programs in which unaligned accesses are all intentional, the compiler +will not generate native unaligned accesses (except in very specific +situations which can be controlled, see next section). + +We can see that while -mno-unaligned-accesses prevents the ill effects +of unintended unaligned accesses by turning them into smaller native +aligned ones, this translation also prevents detecting unaligned +accesses early, whereas -munaligned-access does. + +Again, building ARM U-Boot with -munaligned-access might seem +contradictory with setting the SRC A bit to catch native unaligned +accesses; but this is because what we want is not to get rid of native +unaligned accesses /per se/; what we want is to find and fix unintended +unaligned accesses in the source code, and we use -munaligned-access +for this because it makes a better job of translating unintended +unaligned accesses into native unaligned accesses, and thus at getting +the SCR.A bit setting to detect them, than -mno-unaligned-access does. + +Again: if we had wanted to just not emit native unaligned accesses, then +globally turning -mno-unaligned-access on would have been our option of +choice; but this is not what we want. + +9. Corner cases: array initializations + +There is actually one single corner case where gcc might generate +native unaligned accesses from a C program which does not at first +sight appear to contain unaligned accesses at all. Consider the +following: + + function foo() + { + char buffer[] = "initial value"; + ... + }
Under -munaligned-accesses with optimizations on, this declaration causes the compiler to generate native loads from the literal string @@ -57,8 +231,12 @@ and native stores to the buffer, and the literal string alignment cannot be controlled. If it is misaligned, then the core will throw a data abort exception.
-Quite probably the same might happen for 16-bit array initializations -where the constant is aligned on a boundary which is a multiple of 2 +[under -munaligned-accesses *without* optimizations, there is no issue +as the compiler emits the initialization as a memcpy() which does not +cause any native unaligned access] + +The same might happen for 16-bit array initializations where the +literal constant array is aligned on a boundary which is a multiple of 2 but not of 4:
function foo() @@ -67,12 +245,17 @@ function foo() ... }
-The long term solution to this issue is to add an option to gcc to -allow controlling the general alignment of data, including constant -initialization values. +Under -munaligned-accesses with optimizations on, the compiler will +use unaligned long native accesses to copy the literal into the +variable. + +The long term solution to this issue is to modify the compiler in +order to get finer control on this particular initialization either by +controlling literal alignment, or by controlling use of native +unaligned accesses in this case.
-However this will only apply to the version of gcc which will have such -an option. For other versions, there are four workarounds: +However this will at best only apply to a future version of gcc. For +existing versions, there are four workarounds:
a) Enforce as a rule that array initializations as described above are forbidden. This is generally not acceptable as they are valid, @@ -80,43 +263,59 @@ a) Enforce as a rule that array initializations as described above is when they actually equate to a const char* declaration, i.e. the array is initialized and never modified in the function's scope.
-b) Drop the requirement on unaligned accesses at least for ARMv7, - i.e. do not throw a data abort exception upon unaligned accesses. - But that will allow adding badly aligned code to U-Boot, only for - it to fail when re-used with a stricter target, possibly once the - bad code is already in mainline. +b) Drop the hardware requirement on unaligned accesses at least for + ARMv7, i.e. do not throw a data abort exception upon unaligned + accesses. But that will allow adding badly aligned code to U-Boot, + only for it to fail when re-used with a stricter target, possibly + once the bad code is already in mainline.
c) Relax the -munaligned-access rule globally. This will prevent native - unaligned accesses of course, but that will also hide any bug caused - by a bad unaligned access, making it much harder to diagnose it. It - is actually what already happens when building ARM targets with a - pre-4.7 gcc, and it may actually already hide some bugs yet unseen - until the target gets compiled with -munaligned-access. - -d) Relax the -munaligned-access rule only for for files susceptible to - the local initialized array issue and for armv7 architectures and + unaligned accesses from being generated of course, but that will also + hide any bug caused by an unwanted unaligned access, making it much + harder to diagnose it. It is actually what already happens + when building ARM targets with a pre-4.7 gcc, and it may actually + already hide some bugs yet unseen until the target gets compiled + with -munaligned-access. + +d) Relax the -munaligned-access rule only for files susceptible to the + local initialized array issue and for armv7 architectures and beyond. This minimizes the quantity of code which can hide unwanted misaligned accesses.
The option retained is d).
-Considering that actual occurrences of the issue are rare (as of this -writing, 5 files out of 7840 in U-Boot, or .3%, contain an initialized -local char array which cannot actually be replaced with a const char*), -contributors should not be required to systematically try and detect -the issue in their patches. - -Detecting files susceptible to the issue can be automated through a -filter installed as a hook in .git which recognizes local char array -initializations. Automation should err on the false positive side, for -instance flagging non-local arrays as if they were local if they cannot -be told apart. - -In any case, detection shall not prevent committing the patch, but -shall pre-populate the commit message with a note to the effect that -this patch contains an initialized local char or 16-bit array and thus -should be protected from the gcc 4.7 issue. - -Upon a positive detection, either $(PLATFORM_NO_UNALIGNED) should be -added to CFLAGS for the affected file(s), or if the array is a pseudo -const char*, it should be replaced by an actual one. +10. Alignment issues and Data Aborts + +If you are reading this file because of the message displayed during a +data abort: the following MIGHT be relevant to your abort, if it was +caused by an alignment requirement violation. This may have been caused +by bad pointer arithmetic, or improper access to unaligned fields in a +packed struct, or a corner case as described in the previous section, +or an unlikely new corner case. + +In order to determine which is which, you must first determine whether +the data abort is an alignment abort. Refer to the ARM architecture +reference manual to determine the exact cause of the abort. + +If the abort is indeed an alignment one, then use the PC from the abort +dump along with an objdump -d -S of the u-boot ELF binary to locate the +function where the abort happened; then compare this function with the +examples of this file. If they match, then you've been hit with a +compiler-generated unaligned access, and you should rewrite your code +or add -mno-unaligned-access to the command line of the offending file. +Otherwise, you're probably having a bad pointer arithmetic case. + +Note that the PC shown in the abort message is relocated. In order to +be able to match it to an address in the ELF binary dump, you will need +to know the relocation offset. If your target defines CONFIG_CMD_BDI +and if you can get to the prompt and enter commands before the abort +happens, then command "bdinfo" will give you the offset. Otherwise you +will need to try a build with DEBUG set, which will display the offset +(but you'll have to witness the abort again on the DEBUG build, as the +abort PC will have changed), or use a debugger and set a breakpoint at +relocate_code() to see the offset (passed as an argument). + +11. References + +[1] ISO/IEC 9899:1999 +[2] ARM IHI 0042E (AAPCS)

On Tue, Feb 18, 2014 at 01:52:42PM +0100, Albert ARIBAUD wrote:
There has been a few back-and-forths (and sideways too) about how unaligned accesses are considered in ARM U-Boot. This post is to (try and) get us together in one place, get things straight about what is currently done and why as far as alignment is concerned, and to get doc/README.arm-unaligned-accesses is clear and consistent with it.
I agree that we need to get on the same page and understand what is and isn't (and is and cannot be done for us) going on, so we can get back to bigger problems that need solving.
I wonder if we shouldn't largely crib from https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt and modify the examples to be relevant to our sources, and perhaps drop the networking section (as that doesn't directly apply).
And as an aside, building with -Wcast-align shows some places we _may_ need to investigate.
[snip]
+ii) In order to make sure the following is self-sufficient, it goes +through the basics of alignment and assumes only good, not expert, +knowledge of the C language.
+1. C99 alignment requirements
+The C99 standard [1] (henceforth: 'C99') defines alignment requirements +as a "requirement that objects of a particular type be located on +storage boundaries with addresses that are particular multiples of a +byte address".
+In C99, unaligned accesses (those which which do not respect alignment +requirements of the object type being accessed) are deemed 'undefined'. +This means programs which contain unaligned accesses might build and +execute as if there were no alignment constraints, or build and execute +but not as expected, or build and crash at execution, or not build +at all--or even crash the compiler.
OK.
+2. Implementation alignment requirements
+While C99 does define alignment requirements, it does not lay out any +actual alignment requirements, because these depend greatly on the +hardware involved; they are thus defined by C99 implementations.
+For ARM, the C alignment requirements are laid out in the ARM EABI [2]. +For instance, is is the ARM EABI which sets the alignment constraint of +type 'long' to four-byte boundaries.
+Alignment requirements for a given architecture may differ from +hardware capabilities, i.e. they might be stricter than what the +hardware can actually do. One example is (32-bit) x86, which can do +unaligned accesses at the hardware level at some performance cost, but +has stricter requirement analogous to the ARM EABI ones. ARM is a mixed +bag: some older ARM hardware cannot perform unaligned access at all; +some can but at a cost; some can at virtually no cost.
+Further, even when a given architecture is capable of emitting such +unaligned accesses at the core or CPU level, at a higher (system) level +they might be forbidden because the target address falls within a +region in which only aligned accesses are possible.
The compiler must make reasonable decisions based on what it knows about a particular architecture and ABI based on what it can know. The developer must make correct decisions based on things the compiler cannot know such as memory region constraints.
+3. Native vs emulated unaligned accesses.
+There is a different between the alignment of accesses in a C program +source code and the alignment of accesses at the core or CPU level. In +the following, translated accesses (core or CPU accesses) will be +qualified as /native/ accesses to distinguish them from (untranslated) C +program accesses.
+A C99 implementation might translate an unaligned access into a native +unaligned access, or it might emulate it by a combination of native +aligned accesses.
This isn't relevant, given the contraint on the compiler to do sane things for a given architecture.
+4. Alignment requirements in U-Boot
+As U-Boot runs on a variety of hardware using a variety of C99 +implementations, alignment requirements for the U-Boot code are the +strictest possible so that all implementations building U-Boot will have +their requirements satisfied. For instance, aligning longs to four-byte +boundaries is compatible with all implementations and is thus the +requirement for U-Boot.
+[U-Boot alignment requirements are actually stricter than just C99: in +U-Boot, at least some accesses of a given width must not be broken into +smaller accesses or lumped into a wider access, because they are done +to or from a device for which each physical access may have side +effects. C99 does not explicitly state such a constraint except in +general terms, that a C99 implementation .]
+The strict alignment requirements imply that all type declarations in +U-Boot should respect alignment requirements; especially, that no +structure should contain unaligned members.
This is where I moderately disagree. I agree we don't want to make things unaligned for no reason in data structures.
+5. Purposeful unaligned accesses in U-Boot
+However, there might be cases where some data representation may need +to include objects which are not aligned to their requirements. This +happens in structs representing protocols such as IP or USB, for +instance, or de facto standards such as FAT. Taking FAT as an example,
Right. We should combine 4 and 5 to an extent. We don't normally want, but have cases we end up with it.
+here is the start of the FAT header, expressed as a succession of C +types from position 0 in the first disk sector:
0 3 jump instruction
3 8 Name of the tool which formatted the FAT
11 2 Number of bytes per sector
+Since the third field is a short, as per ARM EABI it should be +even-aligned. However, it is not: its offset is 11, an odd value. +Therefore, a struct representing the FAT header (examples below are +adapted from actual code in fs/fdos/dos.h) would either have to split +the field in two smaller bytes...
- typedef struct bootsector
- {
unsigned char jump [3]; /* 0 Jump to boot code */
char banner[BANNER_LG]; /* 3 OEM name & version */
unsigned char secsiz_lo; /* 11 Bytes per sector LO */
unsigned char secsiz_hi; /* 12 Bytes per sector HI */
[...]
- } Directory_t;
+... but this is awkward and makes accesses to the field more +complicated to express and read. The alternative is to use a short +field, made unaligned by way of a 'packed' attribute on the struct:
- typedef struct bootsector
- {
unsigned char jump [3]; /* 0 Jump to boot code */
char banner[BANNER_LG]; /* 3 OEM name & version */
unsigned short secsiz; /* 11 Bytes per sector */
[...]
- } __attribute__ ((packed)) Directory_t;
+The latter is what is done in the U-Boot code, and the access is then +explicitly turned from one unaligned 16-bit to two aligned 8-bit ones +by using the macro __le16_to_cpu (which is slightly below U-Boot +requirements, as we should be using put_unaligned_le16 instead).
FWIW, this doesn't match the U-Boot sources where we just let the compiler do padding and now that I type that I wonder if that's not part of the blame for some of our fatwrite problems?
But here's the non-problem, demonstrated with the 'gpt write' code on an armv5te target. Having described a struct to the compiler, and given the compiler knows architecture aligment requirements it will either go: (a) The struct is not packed, invisble padding time (b) The struct is packed, this is unaligned, do it correctly.
+6. Unintended unaligned accesses in U-Boot
We indeed don't want to allow actually bad things to be written and work. On processors such as ARM where we may, or may not, do a silent fixup, we choose not to do a silent fixup and instead fail. For example, the following blows right up as we want it to today, and continues to blow up with the patch I sent where we send -mno-unaligned-access. It _only_ works if we cleared the SCTRL.A bit which _we_do_not_want_.
diff --git a/common/cmd_misc.c b/common/cmd_misc.c index 93f9eab..fa4a146 100644 --- a/common/cmd_misc.c +++ b/common/cmd_misc.c @@ -38,6 +38,18 @@ U_BOOT_CMD( " - delay execution for N seconds (N is _decimal_ !!!)" );
+static int do_unaligned(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) +{ + unsigned char buf[8] = { 0 }; + return *(int*)((unsigned long)buf | 1); +} + +U_BOOT_CMD( + unaligned, 1, 1, do_unaligned, + "Make some unaligned accesses happen", + "Make some unaligned accesses happen" +); + #ifdef CONFIG_CMD_TIMER static int do_timer(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) {
And that's where we need to stop, more or less. The above is the type of unaligned access problem we care about because it fails, everywhere _unless_ we fix it up, and we don't want to. All of the problems we're encountering now are either: - Self inflicted (due to the mismatch in what the compiler requires and what we've not done) - Potentially abusing the packed attribute.
In the case of the GPT code, I'm still not convinced there's a problem that isn't due to how we're building code that validly describes the standard and how we tell the compiler to build it. It should be __packed to match the on-disk format. Being __packed means the compiler does the right thing about access or it's a compiler bug.

Hi Tom,
On Thu, 20 Feb 2014 16:34:44 -0500, Tom Rini trini@ti.com wrote:
On Tue, Feb 18, 2014 at 01:52:42PM +0100, Albert ARIBAUD wrote:
There has been a few back-and-forths (and sideways too) about how unaligned accesses are considered in ARM U-Boot. This post is to (try and) get us together in one place, get things straight about what is currently done and why as far as alignment is concerned, and to get doc/README.arm-unaligned-accesses is clear and consistent with it.
I agree that we need to get on the same page and understand what is and isn't (and is and cannot be done for us) going on, so we can get back to bigger problems that need solving.
I wonder if we shouldn't largely crib from https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt and modify the examples to be relevant to our sources, and perhaps drop the networking section (as that doesn't directly apply).
And as an aside, building with -Wcast-align shows some places we _may_ need to investigate.
Thanks for the suggestion. If investigation shows it to be a good sign of some alignment issue, then maybe we could add it systematically if it does not adversely affect build or run performance.
[snip]
+ii) In order to make sure the following is self-sufficient, it goes +through the basics of alignment and assumes only good, not expert, +knowledge of the C language.
+1. C99 alignment requirements
+The C99 standard [1] (henceforth: 'C99') defines alignment requirements +as a "requirement that objects of a particular type be located on +storage boundaries with addresses that are particular multiples of a +byte address".
+In C99, unaligned accesses (those which which do not respect alignment +requirements of the object type being accessed) are deemed 'undefined'. +This means programs which contain unaligned accesses might build and +execute as if there were no alignment constraints, or build and execute +but not as expected, or build and crash at execution, or not build +at all--or even crash the compiler.
OK.
+2. Implementation alignment requirements
+While C99 does define alignment requirements, it does not lay out any +actual alignment requirements, because these depend greatly on the +hardware involved; they are thus defined by C99 implementations.
+For ARM, the C alignment requirements are laid out in the ARM EABI [2]. +For instance, is is the ARM EABI which sets the alignment constraint of +type 'long' to four-byte boundaries.
+Alignment requirements for a given architecture may differ from +hardware capabilities, i.e. they might be stricter than what the +hardware can actually do. One example is (32-bit) x86, which can do +unaligned accesses at the hardware level at some performance cost, but +has stricter requirement analogous to the ARM EABI ones. ARM is a mixed +bag: some older ARM hardware cannot perform unaligned access at all; +some can but at a cost; some can at virtually no cost.
+Further, even when a given architecture is capable of emitting such +unaligned accesses at the core or CPU level, at a higher (system) level +they might be forbidden because the target address falls within a +region in which only aligned accesses are possible.
The compiler must make reasonable decisions based on what it knows about a particular architecture and ABI based on what it can know. The developer must make correct decisions based on things the compiler cannot know such as memory region constraints.
OK -- is this an addition suggestion?
+3. Native vs emulated unaligned accesses.
+There is a different between the alignment of accesses in a C program +source code and the alignment of accesses at the core or CPU level. In +the following, translated accesses (core or CPU accesses) will be +qualified as /native/ accesses to distinguish them from (untranslated) C +program accesses.
+A C99 implementation might translate an unaligned access into a native +unaligned access, or it might emulate it by a combination of native +aligned accesses.
This isn't relevant, given the contraint on the compiler to do sane things for a given architecture.
"Sane things", yes; but not one sane thing only: for instance, both options are sane with an architecture where native unaligned accesses are possible at some cost, possibly without one option being clearly better than the other" so there is a choice here.
So I think this is relevant here, in order to inform the reader that there are options in the way unaligned accesses should be translated, and that this option might have to be decided on a case-by-case, or even access-by-access, basis.
I probably should make that explicit in the text.
+4. Alignment requirements in U-Boot
+As U-Boot runs on a variety of hardware using a variety of C99 +implementations, alignment requirements for the U-Boot code are the +strictest possible so that all implementations building U-Boot will have +their requirements satisfied. For instance, aligning longs to four-byte +boundaries is compatible with all implementations and is thus the +requirement for U-Boot.
+[U-Boot alignment requirements are actually stricter than just C99: in +U-Boot, at least some accesses of a given width must not be broken into +smaller accesses or lumped into a wider access, because they are done +to or from a device for which each physical access may have side +effects. C99 does not explicitly state such a constraint except in +general terms, that a C99 implementation .]
+The strict alignment requirements imply that all type declarations in +U-Boot should respect alignment requirements; especially, that no +structure should contain unaligned members.
This is where I moderately disagree. I agree we don't want to make things unaligned for no reason in data structures.
+5. Purposeful unaligned accesses in U-Boot
+However, there might be cases where some data representation may need +to include objects which are not aligned to their requirements. This +happens in structs representing protocols such as IP or USB, for +instance, or de facto standards such as FAT. Taking FAT as an example,
Right. We should combine 4 and 5 to an extent. We don't normally want, but have cases we end up with it.
Agreed on this and the previous comment; I'll rewrite sections 4 and 5 to make it clearer that the alignment rule on struct fields is not entirely strict.
+here is the start of the FAT header, expressed as a succession of C +types from position 0 in the first disk sector:
0 3 jump instruction
3 8 Name of the tool which formatted the FAT
11 2 Number of bytes per sector
+Since the third field is a short, as per ARM EABI it should be +even-aligned. However, it is not: its offset is 11, an odd value. +Therefore, a struct representing the FAT header (examples below are +adapted from actual code in fs/fdos/dos.h) would either have to split +the field in two smaller bytes...
- typedef struct bootsector
- {
unsigned char jump [3]; /* 0 Jump to boot code */
char banner[BANNER_LG]; /* 3 OEM name & version */
unsigned char secsiz_lo; /* 11 Bytes per sector LO */
unsigned char secsiz_hi; /* 12 Bytes per sector HI */
[...]
- } Directory_t;
+... but this is awkward and makes accesses to the field more +complicated to express and read. The alternative is to use a short +field, made unaligned by way of a 'packed' attribute on the struct:
- typedef struct bootsector
- {
unsigned char jump [3]; /* 0 Jump to boot code */
char banner[BANNER_LG]; /* 3 OEM name & version */
unsigned short secsiz; /* 11 Bytes per sector */
[...]
- } __attribute__ ((packed)) Directory_t;
+The latter is what is done in the U-Boot code, and the access is then +explicitly turned from one unaligned 16-bit to two aligned 8-bit ones +by using the macro __le16_to_cpu (which is slightly below U-Boot +requirements, as we should be using put_unaligned_le16 instead).
FWIW, this doesn't match the U-Boot sources where we just let the compiler do padding and now that I type that I wonder if that's not part of the blame for some of our fatwrite problems?
Hmm... I'd taken the code straight from my master branch. Which part of the FAT code does padding? In any case, you are right that we need to investigate.
But here's the non-problem, demonstrated with the 'gpt write' code on an armv5te target. Having described a struct to the compiler, and given the compiler knows architecture aligment requirements it will either go: (a) The struct is not packed, invisble padding time (b) The struct is packed, this is unaligned, do it correctly.
I'd say (b) is "do it as instructed" rather than "do it correctly", precisely because options exist to control how unaligned accesses go.
Apart from this, I agree about the packed vs non-packed behavior -- which of course is not specific to ARMv5te or even ARM. It is pretty much a de facto standard although to be complete, (b) does not conform to C99 per se.
However, ARMv5TE is not able to do native unaligned accesses, and the following does not seem specific to ARMv5TE so I am not sure why ARMv5TE specifically matters here.
+6. Unintended unaligned accesses in U-Boot
We indeed don't want to allow actually bad things to be written and work. On processors such as ARM where we may, or may not, do a silent fixup, we choose not to do a silent fixup and instead fail. For example, the following blows right up as we want it to today, and continues to blow up with the patch I sent where we send -mno-unaligned-access. It _only_ works if we cleared the SCTRL.A bit which _we_do_not_want_.
diff --git a/common/cmd_misc.c b/common/cmd_misc.c index 93f9eab..fa4a146 100644 --- a/common/cmd_misc.c +++ b/common/cmd_misc.c @@ -38,6 +38,18 @@ U_BOOT_CMD( " - delay execution for N seconds (N is _decimal_ !!!)" );
+static int do_unaligned(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) +{
- unsigned char buf[8] = { 0 };
- return *(int*)((unsigned long)buf | 1);
+}
+U_BOOT_CMD(
- unaligned, 1, 1, do_unaligned,
- "Make some unaligned accesses happen",
- "Make some unaligned accesses happen"
+);
#ifdef CONFIG_CMD_TIMER static int do_timer(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) {
Agreed so far: this code is an example of usually unpredictable run-time unaligned access.
And that's where we need to stop, more or less. The above is the type of unaligned access problem we care about because it fails, everywhere _unless_ we fix it up, and we don't want to. All of the problems we're encountering now are either:
- Self inflicted (due to the mismatch in what the compiler requires and what we've not done)
- Potentially abusing the packed attribute.
Indeed, and "that's where we need to stop" is where we disagree, the point of disagreement being that unpredictable run-time unaligned accesses would be "the type of unaligned access problem we care about because it fails everywhere", if by this you meant "the *only* type".
I don't only care about code which fails everywhere; I care about code which I can tell will fail elsewhere even though it "works for me", whether "me" means ARMv7 or ARM in general, or any other architecture for that matter.
(and I am and will be grateful to people using other architectures than ARM if they decide to suffer some small cost in order to try and catch an issue that might bite us even though it would not bite them.)
Having -munaligned-access in conjunction with SRC.A allows us to trap unintended unaligned accesses, at the only expense of having to make intended ones explicit rather than relying on a compiler option. That is a small cost to us, and a benefit overall.
In the case of the GPT code, I'm still not convinced there's a problem that isn't due to how we're building code that validly describes the standard and how we tell the compiler to build it. It should be __packed to match the on-disk format. Being __packed means the compiler does the right thing about access or it's a compiler bug.
Being packed just tells the compiler not to pad, it does not tell the compiler how to access the unaligned fields; how to access unaligned fields is what -m[no-]unaligned-access is about, which means the decision is in the hands of the developer who builds the code, based on what this developer intends the code to do.
Amicalement,

On Fri, Feb 21, 2014 at 11:25:36AM +0100, Albert ARIBAUD wrote:
Hi Tom,
On Thu, 20 Feb 2014 16:34:44 -0500, Tom Rini trini@ti.com wrote:
On Tue, Feb 18, 2014 at 01:52:42PM +0100, Albert ARIBAUD wrote:
There has been a few back-and-forths (and sideways too) about how unaligned accesses are considered in ARM U-Boot. This post is to (try and) get us together in one place, get things straight about what is currently done and why as far as alignment is concerned, and to get doc/README.arm-unaligned-accesses is clear and consistent with it.
I agree that we need to get on the same page and understand what is and isn't (and is and cannot be done for us) going on, so we can get back to bigger problems that need solving.
I wonder if we shouldn't largely crib from https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt and modify the examples to be relevant to our sources, and perhaps drop the networking section (as that doesn't directly apply).
And as an aside, building with -Wcast-align shows some places we _may_ need to investigate.
Thanks for the suggestion. If investigation shows it to be a good sign of some alignment issue, then maybe we could add it systematically if it does not adversely affect build or run performance.
The problem is it's going to give a _lot_ of false positives. One of the things we've got right now with W= thanks to Kbuild, and W=2 turns it on. It's not in the default list for the kernel because there's many cases of "but we know this is safe after investigation" and there's (apparently) not an easy way to annotate things.
[snip]
+ii) In order to make sure the following is self-sufficient, it goes +through the basics of alignment and assumes only good, not expert, +knowledge of the C language.
+1. C99 alignment requirements
+The C99 standard [1] (henceforth: 'C99') defines alignment requirements +as a "requirement that objects of a particular type be located on +storage boundaries with addresses that are particular multiples of a +byte address".
+In C99, unaligned accesses (those which which do not respect alignment +requirements of the object type being accessed) are deemed 'undefined'. +This means programs which contain unaligned accesses might build and +execute as if there were no alignment constraints, or build and execute +but not as expected, or build and crash at execution, or not build +at all--or even crash the compiler.
OK.
+2. Implementation alignment requirements
+While C99 does define alignment requirements, it does not lay out any +actual alignment requirements, because these depend greatly on the +hardware involved; they are thus defined by C99 implementations.
+For ARM, the C alignment requirements are laid out in the ARM EABI [2]. +For instance, is is the ARM EABI which sets the alignment constraint of +type 'long' to four-byte boundaries.
+Alignment requirements for a given architecture may differ from +hardware capabilities, i.e. they might be stricter than what the +hardware can actually do. One example is (32-bit) x86, which can do +unaligned accesses at the hardware level at some performance cost, but +has stricter requirement analogous to the ARM EABI ones. ARM is a mixed +bag: some older ARM hardware cannot perform unaligned access at all; +some can but at a cost; some can at virtually no cost.
+Further, even when a given architecture is capable of emitting such +unaligned accesses at the core or CPU level, at a higher (system) level +they might be forbidden because the target address falls within a +region in which only aligned accesses are possible.
The compiler must make reasonable decisions based on what it knows about a particular architecture and ABI based on what it can know. The developer must make correct decisions based on things the compiler cannot know such as memory region constraints.
OK -- is this an addition suggestion?
Assuming we don't go for a more direct copy of the kernel doc, yes.
+3. Native vs emulated unaligned accesses.
+There is a different between the alignment of accesses in a C program +source code and the alignment of accesses at the core or CPU level. In +the following, translated accesses (core or CPU accesses) will be +qualified as /native/ accesses to distinguish them from (untranslated) C +program accesses.
+A C99 implementation might translate an unaligned access into a native +unaligned access, or it might emulate it by a combination of native +aligned accesses.
This isn't relevant, given the contraint on the compiler to do sane things for a given architecture.
"Sane things", yes; but not one sane thing only: for instance, both options are sane with an architecture where native unaligned accesses are possible at some cost, possibly without one option being clearly better than the other" so there is a choice here.
So I think this is relevant here, in order to inform the reader that there are options in the way unaligned accesses should be translated, and that this option might have to be decided on a case-by-case, or even access-by-access, basis.
I probably should make that explicit in the text.
Well, the thing is we inform the compiler of the architecture and additional constraints and it does the correct thing, that needs to be clear.
+4. Alignment requirements in U-Boot
+As U-Boot runs on a variety of hardware using a variety of C99 +implementations, alignment requirements for the U-Boot code are the +strictest possible so that all implementations building U-Boot will have +their requirements satisfied. For instance, aligning longs to four-byte +boundaries is compatible with all implementations and is thus the +requirement for U-Boot.
+[U-Boot alignment requirements are actually stricter than just C99: in +U-Boot, at least some accesses of a given width must not be broken into +smaller accesses or lumped into a wider access, because they are done +to or from a device for which each physical access may have side +effects. C99 does not explicitly state such a constraint except in +general terms, that a C99 implementation .]
+The strict alignment requirements imply that all type declarations in +U-Boot should respect alignment requirements; especially, that no +structure should contain unaligned members.
This is where I moderately disagree. I agree we don't want to make things unaligned for no reason in data structures.
+5. Purposeful unaligned accesses in U-Boot
+However, there might be cases where some data representation may need +to include objects which are not aligned to their requirements. This +happens in structs representing protocols such as IP or USB, for +instance, or de facto standards such as FAT. Taking FAT as an example,
Right. We should combine 4 and 5 to an extent. We don't normally want, but have cases we end up with it.
Agreed on this and the previous comment; I'll rewrite sections 4 and 5 to make it clearer that the alignment rule on struct fields is not entirely strict.
OK.
+here is the start of the FAT header, expressed as a succession of C +types from position 0 in the first disk sector:
0 3 jump instruction
3 8 Name of the tool which formatted the FAT
11 2 Number of bytes per sector
+Since the third field is a short, as per ARM EABI it should be +even-aligned. However, it is not: its offset is 11, an odd value. +Therefore, a struct representing the FAT header (examples below are +adapted from actual code in fs/fdos/dos.h) would either have to split +the field in two smaller bytes...
- typedef struct bootsector
- {
unsigned char jump [3]; /* 0 Jump to boot code */
char banner[BANNER_LG]; /* 3 OEM name & version */
unsigned char secsiz_lo; /* 11 Bytes per sector LO */
unsigned char secsiz_hi; /* 12 Bytes per sector HI */
[...]
- } Directory_t;
+... but this is awkward and makes accesses to the field more +complicated to express and read. The alternative is to use a short +field, made unaligned by way of a 'packed' attribute on the struct:
- typedef struct bootsector
- {
unsigned char jump [3]; /* 0 Jump to boot code */
char banner[BANNER_LG]; /* 3 OEM name & version */
unsigned short secsiz; /* 11 Bytes per sector */
[...]
- } __attribute__ ((packed)) Directory_t;
+The latter is what is done in the U-Boot code, and the access is then +explicitly turned from one unaligned 16-bit to two aligned 8-bit ones +by using the macro __le16_to_cpu (which is slightly below U-Boot +requirements, as we should be using put_unaligned_le16 instead).
FWIW, this doesn't match the U-Boot sources where we just let the compiler do padding and now that I type that I wonder if that's not part of the blame for some of our fatwrite problems?
Hmm... I'd taken the code straight from my master branch. Which part of the FAT code does padding? In any case, you are right that we need to investigate.
Oh my. We've got fs/fdos/ that's (aside from automated things) untouched since 2003 and not normally built. fs/fat/ is where everyone goes for FAT code today.
But here's the non-problem, demonstrated with the 'gpt write' code on an armv5te target. Having described a struct to the compiler, and given the compiler knows architecture aligment requirements it will either go: (a) The struct is not packed, invisble padding time (b) The struct is packed, this is unaligned, do it correctly.
I'd say (b) is "do it as instructed" rather than "do it correctly", precisely because options exist to control how unaligned accesses go.
Keep in mind -mno-unaligned-access is an ARM thing, for all other architectures it just does the right thing.
Apart from this, I agree about the packed vs non-packed behavior -- which of course is not specific to ARMv5te or even ARM. It is pretty much a de facto standard although to be complete, (b) does not conform to C99 per se.
The compiler must, for a given ABI and C standard make things correct. That includes invisible padding of structs for alignment requirements and dealing with unaligned access correctly.
However, ARMv5TE is not able to do native unaligned accesses, and the following does not seem specific to ARMv5TE so I am not sure why ARMv5TE specifically matters here.
It matters here because I took this "bad" code and ran it on an ARMv5TE board without issue, because the code is not bad, it's correct. And it's going to be correct everywhere on any architecture.
+6. Unintended unaligned accesses in U-Boot
We indeed don't want to allow actually bad things to be written and work. On processors such as ARM where we may, or may not, do a silent fixup, we choose not to do a silent fixup and instead fail. For example, the following blows right up as we want it to today, and continues to blow up with the patch I sent where we send -mno-unaligned-access. It _only_ works if we cleared the SCTRL.A bit which _we_do_not_want_.
diff --git a/common/cmd_misc.c b/common/cmd_misc.c index 93f9eab..fa4a146 100644 --- a/common/cmd_misc.c +++ b/common/cmd_misc.c @@ -38,6 +38,18 @@ U_BOOT_CMD( " - delay execution for N seconds (N is _decimal_ !!!)" );
+static int do_unaligned(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) +{
- unsigned char buf[8] = { 0 };
- return *(int*)((unsigned long)buf | 1);
+}
+U_BOOT_CMD(
- unaligned, 1, 1, do_unaligned,
- "Make some unaligned accesses happen",
- "Make some unaligned accesses happen"
+);
#ifdef CONFIG_CMD_TIMER static int do_timer(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) {
Agreed so far: this code is an example of usually unpredictable run-time unaligned access.
Right. And part of my point in showing the above is that after adding -mno-unaligned-access to armv7 builds we're _still_ crashing like we want when people do things that aren't portable.
And that's where we need to stop, more or less. The above is the type of unaligned access problem we care about because it fails, everywhere _unless_ we fix it up, and we don't want to. All of the problems we're encountering now are either:
- Self inflicted (due to the mismatch in what the compiler requires and what we've not done)
- Potentially abusing the packed attribute.
Indeed, and "that's where we need to stop" is where we disagree, the point of disagreement being that unpredictable run-time unaligned accesses would be "the type of unaligned access problem we care about because it fails everywhere", if by this you meant "the *only* type".
I don't only care about code which fails everywhere; I care about code which I can tell will fail elsewhere even though it "works for me", whether "me" means ARMv7 or ARM in general, or any other architecture for that matter.
Right. But you're catching code that you can tell won't fail elsewhere and claiming that it will fail. The reason you can tell that it won't fail is that we aren't doing pointer math or other tricky things. We're doing well annotated plain old C. Lets quote the kernel doc[1]: "Another point worth mentioning is the use of __attribute__((packed)) on a structure type. This GCC-specific attribute tells the compiler never to insert any padding within structures, useful when you want to use a C struct to represent some data that comes in a fixed arrangement 'off the wire'.
You might be inclined to believe that usage of this attribute can easily lead to unaligned accesses when accessing fields that do not satisfy architectural alignment requirements. However, again, the compiler is aware of the alignment constraints and will generate extra instructions to perform the memory access in a way that does not cause unaligned access. Of course, the extra instructions obviously cause a loss in performance compared to the non-packed case, so the packed attribute should only be used when avoiding structure padding is of importance."
Which is the rule we need to be following, and I'm sure we can find cases where we aren't.
In other words, this is code that at first glance my look like it will generate unaligned access but in fact it never generate unaligned access, it will generate potentially slow access (which is why we should avoid them, unless important). Which is why I say "and that's where we need to stop", because the others aren't. We only see them as unaligned access on ARMv7 because the compiler assumes SCTRL.A being clear and not needing to take alignment care here, but needing to because we set SCTRL.A.
(and I am and will be grateful to people using other architectures than ARM if they decide to suffer some small cost in order to try and catch an issue that might bite us even though it would not bite them.)
Having -munaligned-access in conjunction with SRC.A allows us to trap unintended unaligned accesses, at the only expense of having to make intended ones explicit rather than relying on a compiler option. That is a small cost to us, and a benefit overall.
No. It's causing us to have correct code fail and in some cases add unneeded overhead for all architectures. We don't need to use get/put_unaligned in certain places where, regardless of architecture, things will work correctly.
In the case of the GPT code, I'm still not convinced there's a problem that isn't due to how we're building code that validly describes the standard and how we tell the compiler to build it. It should be __packed to match the on-disk format. Being __packed means the compiler does the right thing about access or it's a compiler bug.
Being packed just tells the compiler not to pad, it does not tell the compiler how to access the unaligned fields; how to access unaligned fields is what -m[no-]unaligned-access is about, which means the decision is in the hands of the developer who builds the code, based on what this developer intends the code to do.
No, what tells the compiler how to access the unaligned fields is -march/-mcpu/-mtune for all architectures as that informs gcc of lots of things about the architecture.
[1]: https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt

Hi Tom,
On Fri, 21 Feb 2014 08:38:18 -0500, Tom Rini trini@ti.com wrote:
And as an aside, building with -Wcast-align shows some places we _may_ need to investigate.
Thanks for the suggestion. If investigation shows it to be a good sign of some alignment issue, then maybe we could add it systematically if it does not adversely affect build or run performance.
The problem is it's going to give a _lot_ of false positives. One of the things we've got right now with W= thanks to Kbuild, and W=2 turns it on. It's not in the default list for the kernel because there's many cases of "but we know this is safe after investigation" and there's (apparently) not an easy way to annotate things.
Drat.
[snip]
+ii) In order to make sure the following is self-sufficient, it goes +through the basics of alignment and assumes only good, not expert, +knowledge of the C language.
+1. C99 alignment requirements
+The C99 standard [1] (henceforth: 'C99') defines alignment requirements +as a "requirement that objects of a particular type be located on +storage boundaries with addresses that are particular multiples of a +byte address".
+In C99, unaligned accesses (those which which do not respect alignment +requirements of the object type being accessed) are deemed 'undefined'. +This means programs which contain unaligned accesses might build and +execute as if there were no alignment constraints, or build and execute +but not as expected, or build and crash at execution, or not build +at all--or even crash the compiler.
OK.
+2. Implementation alignment requirements
+While C99 does define alignment requirements, it does not lay out any +actual alignment requirements, because these depend greatly on the +hardware involved; they are thus defined by C99 implementations.
+For ARM, the C alignment requirements are laid out in the ARM EABI [2]. +For instance, is is the ARM EABI which sets the alignment constraint of +type 'long' to four-byte boundaries.
+Alignment requirements for a given architecture may differ from +hardware capabilities, i.e. they might be stricter than what the +hardware can actually do. One example is (32-bit) x86, which can do +unaligned accesses at the hardware level at some performance cost, but +has stricter requirement analogous to the ARM EABI ones. ARM is a mixed +bag: some older ARM hardware cannot perform unaligned access at all; +some can but at a cost; some can at virtually no cost.
+Further, even when a given architecture is capable of emitting such +unaligned accesses at the core or CPU level, at a higher (system) level +they might be forbidden because the target address falls within a +region in which only aligned accesses are possible.
The compiler must make reasonable decisions based on what it knows about a particular architecture and ABI based on what it can know. The developer must make correct decisions based on things the compiler cannot know such as memory region constraints.
OK -- is this an addition suggestion?
Assuming we don't go for a more direct copy of the kernel doc, yes.
The kernel doc teaches developers how to specify wanted unaligned accesses at source code level. I want our doc to also teach its readers what our policy is regarding wanted *and* *unwanted* unaligned accesses, how it is implemented, and why; we can refer the kernel doc for some of this content, but not for all if it.
+3. Native vs emulated unaligned accesses.
+There is a different between the alignment of accesses in a C program +source code and the alignment of accesses at the core or CPU level. In +the following, translated accesses (core or CPU accesses) will be +qualified as /native/ accesses to distinguish them from (untranslated) C +program accesses.
+A C99 implementation might translate an unaligned access into a native +unaligned access, or it might emulate it by a combination of native +aligned accesses.
This isn't relevant, given the contraint on the compiler to do sane things for a given architecture.
"Sane things", yes; but not one sane thing only: for instance, both options are sane with an architecture where native unaligned accesses are possible at some cost, possibly without one option being clearly better than the other" so there is a choice here.
So I think this is relevant here, in order to inform the reader that there are options in the way unaligned accesses should be translated, and that this option might have to be decided on a case-by-case, or even access-by-access, basis.
I probably should make that explicit in the text.
Well, the thing is we inform the compiler of the architecture and additional constraints and it does the correct thing, that needs to be clear.
Yes, but again, "the correct thing" sounds like there is only one obviously correct thing to do and the others are incorrect, which is wrong: there are several options and which one is best (as opposed to correct) requires a choice based on a criterion.
[sections 4 and 5 to be merged]
Oh my. We've got fs/fdos/ that's (aside from automated things) untouched since 2003 and not normally built. fs/fat/ is where everyone goes for FAT code today.
Yes, I saw the removal patch. I will use another example then. :)
But here's the non-problem, demonstrated with the 'gpt write' code on an armv5te target. Having described a struct to the compiler, and given the compiler knows architecture aligment requirements it will either go: (a) The struct is not packed, invisble padding time (b) The struct is packed, this is unaligned, do it correctly.
I'd say (b) is "do it as instructed" rather than "do it correctly", precisely because options exist to control how unaligned accesses go.
Keep in mind -mno-unaligned-access is an ARM thing, for all other architectures it just does the right thing.
Well, yes, -m[no-]unalined-accesses is ARM-specific (and this patch is for an ARM-specific document even though parts of it are more general).
But again, "the right thing" does not exist per se: in ARM, like on PowerPC if I'm not mistaken, "the right thing" could be either to allow native unaligned accesses or to emulate unaligned accesses through smaller native ones, depending on what the goal is.
Apart from this, I agree about the packed vs non-packed behavior -- which of course is not specific to ARMv5te or even ARM. It is pretty much a de facto standard although to be complete, (b) does not conform to C99 per se.
The compiler must, for a given ABI and C standard make things correct. That includes invisible padding of structs for alignment requirements and dealing with unaligned access correctly.
However, ARMv5TE is not able to do native unaligned accesses, and the following does not seem specific to ARMv5TE so I am not sure why ARMv5TE specifically matters here.
It matters here because I took this "bad" code and ran it on an ARMv5TE board without issue, because the code is not bad, it's correct. And it's going to be correct everywhere on any architecture.
Which code do you mean? The code below, which generates then traps unaligned accesses? If that's what you mean, then no, this code is not valid since it is not C99 conformant - as any code which does unaligned accesses.
+6. Unintended unaligned accesses in U-Boot
We indeed don't want to allow actually bad things to be written and work. On processors such as ARM where we may, or may not, do a silent fixup, we choose not to do a silent fixup and instead fail. For example, the following blows right up as we want it to today, and continues to blow up with the patch I sent where we send -mno-unaligned-access. It _only_ works if we cleared the SCTRL.A bit which _we_do_not_want_.
diff --git a/common/cmd_misc.c b/common/cmd_misc.c index 93f9eab..fa4a146 100644 --- a/common/cmd_misc.c +++ b/common/cmd_misc.c @@ -38,6 +38,18 @@ U_BOOT_CMD( " - delay execution for N seconds (N is _decimal_ !!!)" );
+static int do_unaligned(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) +{
- unsigned char buf[8] = { 0 };
- return *(int*)((unsigned long)buf | 1);
+}
+U_BOOT_CMD(
- unaligned, 1, 1, do_unaligned,
- "Make some unaligned accesses happen",
- "Make some unaligned accesses happen"
+);
#ifdef CONFIG_CMD_TIMER static int do_timer(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) {
Agreed so far: this code is an example of usually unpredictable run-time unaligned access.
Right. And part of my point in showing the above is that after adding -mno-unaligned-access to armv7 builds we're _still_ crashing like we want when people do things that aren't portable.
We're still crashing for that scenario indeed, and yes, unaligned run-time pointer dereferencement is unportable; but to be fair, so is accessing unaligned fields in packed structs. It is not in C99.
And that's where we need to stop, more or less. The above is the type of unaligned access problem we care about because it fails, everywhere _unless_ we fix it up, and we don't want to. All of the problems we're encountering now are either:
- Self inflicted (due to the mismatch in what the compiler requires and what we've not done)
- Potentially abusing the packed attribute.
Indeed, and "that's where we need to stop" is where we disagree, the point of disagreement being that unpredictable run-time unaligned accesses would be "the type of unaligned access problem we care about because it fails everywhere", if by this you meant "the *only* type".
I don't only care about code which fails everywhere; I care about code which I can tell will fail elsewhere even though it "works for me", whether "me" means ARMv7 or ARM in general, or any other architecture for that matter.
Right. But you're catching code that you can tell won't fail elsewhere and claiming that it will fail. The reason you can tell that it won't fail is that we aren't doing pointer math or other tricky things. We're doing well annotated plain old C.
I understand what you are saying, and I would agree if I shared the assumptions it is based upon, which I don't. Here are these assumptions:
- it wrongly assumes that all architectures and systems properly handle accesses to unaligned struct fields when these accesses are not explicitly specified to be unaligned. Some simply won't. Some even will seem to tolerate unaligned accesses but perform them *wrong* -- I've been personally bitten by this kind of issue. ARM is not the only case where Linux suggests the use of get_unaligned() and put_unaligned(), and that's because ARM is not the only arch where the compiler might not "do the right thing" all the time and we have to tell it what to do on a case by case basis.
- it wrongly assumes that the only type of unaligned access we care about is caused by bad pointer arithmetic. As I said, we don't simply care about what will always fail. We care about what is likely to fail.
- it wrongly assumes packing structs is conformant. The 'packed' attribute is not even mentioned in C99. C99 never says unaligned access should or will work. By using the 'packed' attribute, we may be doing plain old C, but we're not doing conformant C. As soon as we let a C program perform implicit unaligned accesses (as opposed to explicitly using put_unaligned and get_unaligned), we're in dragon territory.
Lets quote the kernel doc[1]: "Another point worth mentioning is the use of __attribute__((packed)) on a structure type. This GCC-specific attribute tells the compiler never to insert any padding within structures, useful when you want to use a C struct to represent some data that comes in a fixed arrangement 'off the wire'.
BTW, note: GCC-spectific attribute -- not 'plain old C'. But granted, many compilers provide a similar option to pack structs, and compilers always know how to translate access to unaligned fields in packed structs, but...
You might be inclined to believe that usage of this attribute can easily lead to unaligned accesses when accessing fields that do not satisfy architectural alignment requirements. However, again, the compiler is aware of the alignment constraints and will generate extra instructions to perform the memory access in a way that does not cause unaligned access. Of course, the extra instructions obviously cause a loss in performance compared to the non-packed case, so the packed attribute should only be used when avoiding structure padding is of importance."
... this only describes one way the compiler can translate such accesses. There are others, and as the compiler knows several ways, there is a choice. And as there is a choice, ther has to be a criterion for choosing -- my criterion being to use the option to catch potential issues rather than not.
Which is the rule we need to be following, and I'm sure we can find cases where we aren't.
I do agree that we must follow a rule which mandates emulating unaligned accesses. And as far as I can tell, we do, even though we enable -munaligned-access.
In other words, this is code that at first glance my look like it will generate unaligned access but in fact it never generate unaligned access, it will generate potentially slow access (which is why we should avoid them, unless important). Which is why I say "and that's where we need to stop", because the others aren't. We only see them as unaligned access on ARMv7 because the compiler assumes SCTRL.A being clear and not needing to take alignment care here, but needing to because we set SCTRL.A.
Sorry, I don't really get what "the others aren't" means exactly.
(and I am and will be grateful to people using other architectures than ARM if they decide to suffer some small cost in order to try and catch an issue that might bite us even though it would not bite them.)
Having -munaligned-access in conjunction with SRC.A allows us to trap unintended unaligned accesses, at the only expense of having to make intended ones explicit rather than relying on a compiler option. That is a small cost to us, and a benefit overall.
No. It's causing us to have correct code fail and in some cases add unneeded overhead for all architectures. We don't need to use get/put_unaligned in certain places where, regardless of architecture, things will work correctly.
Code which traps with -munaligned-access and SRC.A set is code which is not correct, since the native unaligned access which was trapped reflects an unaligned access within the C source code, which is not allowed as per C99. If that access had been performed through get_ or put_unaligned, then the code would be conformant and wouldn't trap.
In the case of the GPT code, I'm still not convinced there's a problem that isn't due to how we're building code that validly describes the standard and how we tell the compiler to build it. It should be __packed to match the on-disk format. Being __packed means the compiler does the right thing about access or it's a compiler bug.
Being packed just tells the compiler not to pad, it does not tell the compiler how to access the unaligned fields; how to access unaligned fields is what -m[no-]unaligned-access is about, which means the decision is in the hands of the developer who builds the code, based on what this developer intends the code to do.
No, what tells the compiler how to access the unaligned fields is -march/-mcpu/-mtune for all architectures as that informs gcc of lots of things about the architecture.
Yes, but I think -march/-mcpu/-mtune take us away from the subject, in that they just set the default for -m[no-]unaligned-access which is the option controlling how unaligned accesses are translated into native accesses, and which can be set regardless of -march/-mcpu/-mtune.
BTW, the mere existence of -m[no-]unaligned-access shows i) that there is not one, but two "right things" the compiler can do when translating unaligned accesses into native accesses, and ii) that the default one is not ncessarily the right one.
Again -- I do insist -- I understand your viewpoint, which is that compilers can always be made to produce correct code from packed structs with unaligned fields, therefore such unaligned fields are not a problem and we should just let the compilers deal with it by passing them adequate options, which in the case of ARM means pasing gcc "-mno-unaligned-access" at least when it is not the default already.
And I would agree with this viewpoint if I did not want us to catch unaligned accesses in the source code which we don't know about yet.
My viewpoint is that not all unaligned accesses are wanted, and we should catch those that we did not and either remove them or make them explicit. And I think that is the gist of the kernel doc -- if relying on the compiler was enough, put/get_unaligned would not be suggested.
Now maybe another way to look at it is to compare the pros and cons of "enabling -m[no-]unaligned-code generally" vs "not enabling it unless we have no other choice for a give file" which is the current choice.
Pros of -mno-unaligned-access
- no case-by-case choice: all code is -mno-unaligned-access. - no risk of a build-time unaligned access causing a data abort. - no need to use get/put_unaligned, the compiler does the emulation.
Cons of -mno-unaligned-access
- performance hit (but unaligned accesses are infrequent). - new and possibly unwanted unaligned accesses are not detected.
Pros of -munaligned-access
- unwanted unaligned accesses are caught at run time.
Cons of -munaligned-access
- wanted unaligned accesses must use get_unaligned/put_unaligned (but that's what the kernel doc asks for anyway). - some files don't use get/put_unaligned as they should, and must thus be compiled on a case-by-case basis with -mno-unaligned-access.
Anyone feel free to correct/add to these cons/pros.
The way I see it, if we follow the kernel doc advice of always making unaligned accesses explicit in the source code (and we should), then -munaligned-access only has advantages over -mno-unaligned-access, because both behave the same way for wanted unaligned accesses, but -munaligned-access will catch unwanted ones whereas -mno-unaligned-access won't.
IOW, when does -munaligned-access bother us? When we have code which performs unaligned accesses that we have not looked at and saud we wanted tt that's the GPT case for instance. Piotr's patch says that we want these unaligned accesses, and then -munaligned-access does not bother us any more.
Thanks for you feedback
Amicalement,

On Sat, 22 Feb 2014 13:17:50 +0100, Albert ARIBAUD albert.u.boot@aribaud.net wrote:
Code which traps with -munaligned-access and SRC.A set is code which is not correct, since the native unaligned access which was trapped reflects an unaligned access within the C source code, which is not allowed as per C99. If that access had been performed through get_ or put_unaligned, then the code would be conformant and wouldn't trap.
To be more exact: ... then all accesses in the code would be conformant and wouldn't trap (the code as a whole would quite probably still not be conformant because of the way the unaligned access would be rewritten into aligned accesses by the get/put_unaligned macro).
Amicalement,

On Sat, Feb 22, 2014 at 01:17:50PM +0100, Albert ARIBAUD wrote:
[snip]
Again -- I do insist -- I understand your viewpoint, which is that compilers can always be made to produce correct code from packed structs with unaligned fields, therefore such unaligned fields are not a problem and we should just let the compilers deal with it by passing them adequate options, which in the case of ARM means pasing gcc "-mno-unaligned-access" at least when it is not the default already.
OK, good. I made a mistake when I accepted the ARM change that moved us away from -mno-unaligned-access + -march=armv7 (or armv6 but we don't have that tune atm). To be clear, our project policy is not a strict C99 implementation. We use that to guide us in many cases (and frankly, we may let C11 guide us as needed and things mature). We rely upon certain gcc behaviors. In this case we rely on access to packed structs and unaligned members working seamlessly. ARM must be corrected to work here as all other architectures do.
And I would agree with this viewpoint if I did not want us to catch unaligned accesses in the source code which we don't know about yet.
This is wrong. Every case we see on ARMv7 now is the case of "compiler was told unaligned access needs no special handling, perform no special handling here". For _all_ architectures gcc would look at this code and then based on what it's been told about CPU capabilities make a decision that it believes would not lead to a fault.
My viewpoint is that not all unaligned accesses are wanted, and we should catch those that we did not and either remove them or make them explicit. And I think that is the gist of the kernel doc -- if relying on the compiler was enough, put/get_unaligned would not be suggested.
No, this isn't quite right. The gist of the kernel doc is that there are some cases where the compiler cannot know if the addresses will be aligned or not, and _those_ require special care. Most cases however the compiler does take care to align pointers or break down the accesses when it can know things are unaligned.
[snip]
The way I see it, if we follow the kernel doc advice of always making unaligned accesses explicit in the source code (and we should), then -munaligned-access only has advantages over -mno-unaligned-access, because both behave the same way for wanted unaligned accesses, but -munaligned-access will catch unwanted ones whereas -mno-unaligned-access won't.
This glosses over the very important part where, in short, "__packed structs may look dangerous at first glance, but they aren't", with respect to the constraints the kernel (and ourself) place upon the compiler.
participants (2)
-
Albert ARIBAUD
-
Tom Rini