
Dear Wolfgang,
Am Donnerstag, 5. Mai 2011, 07:32:20 schrieb Wolfgang Denk:
In message 201105030848.17576.alexander.stein@systec-electronic.com you
wrote:
This specific version was selected due to relocation problems on ARM. But I expect the dcache doesn't have that big influence on the named code part as the environment is already in RAM.
Your expectation is most likely completely wrong. Reading from / writing to uncached RAM is painfully slow compared to a system with caches turned on. And if you - as I speculate - need to checksum a huge amount of data, this will delay things without need.
Are you also still using the old environment code in your port, or is the new, hash table based one? When using the old code, there are additional penalties for using a needlessly big environment as each call to setenv() will recalculate the checksum.
I was digging into this problem for a short time. And yes, the CRC checksumcalculation takes about 25ms each run. So setenv is called for each stdin,stdout and stderr. which sums up to ~75ms. So you're right this is the old environment code. Here a dcache will speed up the execution of course. But our standard startup just stars U-Boot and copies the Linux kernel into RAM and starts it. There is not much use of dcache during copy here.
(III) you are running on a narrow system bus (16 bit) with non-optimal RAM timings;
It is using a 32-Bit RAM-Bus. So, no.
And your NOR flash?
It is connected 16-bit like most devices only support, but it is setup to use page read mode.
And your memory timings?
Should be pretty good.
(IV) you do all this with caches turned off;
dcaches should be off, while icaches are on. So yes and no.
DC of makes things awfully slow. See comments of commits c3330e9, 95c6f6d and 7e4a9e6 - for plain RAM bound operations like copying/uncompressing an image from RAM to RAM switchign on the DC can accelerate the system by a factor of up to >15.
Yes, from RAM to RAM, dcache will help a lot. But we neither copy from RAM to RAM nor do we uncompressing.
(V) you measure some numbers but you don;t understand what they mean.
These numbers show me that this part of code increases the start time of a considerable amount.
You don;t even understand that you have > 100 KiB of environment size which gets checksummed without need.
Mh, this might be an option for further ports.
Fact is, the code that you claim takes 100 (or 500) ms to run has no potential for such a long run time unless your system is seriously misconfigured. I guess it runs at least 100 times faster on all systems I have access to.
Well, as already said this is related to CRC calculation of environment. I did a fast port to v2011.03 and the setenv is a lot faster, which is due the new env code base. But I also noticed the time until kernel_entry is called is about 30ms later after reset than on the old code base. But I didn't investigate any time further to see what caused this. But AFAICS also the new U-Boot code doesn't enable dcache on ARM1136 either.
Regards, Alexander