
Dear Albert,
Am Mi 21 Sep 2011 14:03:09 CEST schrieb Albert ARIBAUD:
Le 21/09/2011 13:20, Andreas Bießmann a écrit :
Dear "GROYER, Anthony", Dear Albert,
Am Mi 21 Sep 2011 12:51:33 CEST schrieb Albert ARIBAUD:
Le 21/09/2011 11:29, GROYER, Anthony a écrit :
<snip>
- replace use of r9-r10 with e.g. r10-r11 in the copy loop, to
preserve r9 during relocation.
If one is changing this place I would like to discuss another point here. In my last changeset for relocation I found some implementation in a/a/c/pxa/start.S which do save the register to stack before copy_loop, use almost all registers (only r8 is not used which is gd_t for arm, but I think it could be used here too cause it is saved on the stack) and save the registers back later on. I guess this could fasten the copy_loop a bit but needs to be proven. Anthony, if you change all start.S could you consider this also?
I am not 100% sure I get your point, but I assume that you are asking for *removal* of the saving and restoring, right?
No, that was not the point. I think the 'save registers before copy_loop to use more registers for ldmia/stmia instructions' is a good solution which could improve the copy_loop for all arm implementations.
I would tend to agree that saving and restoring registers in relocate_code is moot, as this function does not return in the usual sense.
No the code does register save before copy_loop and restore them right afterwards. Therefore even r8 could be used in the copy_loop cause it is preserved on the (newly created) stack. Have a look at a/a/c/pxa/start.S from line 241 (relocate_code) to 263 (end of copy_loop). But I guess the ldmia/stmia instructions could even use r3-r11, only r0-r2 needs to be preserved for loop counting. I wonder if this could improve the copy_loop ... will try to test it these days, if no one else can do it (Anthony?).
As for r8, it should be preserved as it points to gd, but that is ensured by the C code already IIRC.
We use -ffixed-r8 therefore the compiler takes care for the C part, but we need to respect this in asm.
Well, if we preserve r8 for the copy_loop and restore it right afterwards we could use it in the copy_loop for copy purposes. Cause there is no dereferencing of r8 in copy_loop.
best regards
Andreas Bießmann