[U-Boot] da850/L138 SPI flash transfer speed

Hello Sekhar,
I am working on reducing boot time on an L138 EVM and SPI flash transfer speed is currently the worst offender. U-Boot transfers from the SPI flash at 0.6Mbytes/s, this a lot slower than I would expect for a 50MHz SPI clock. Using a scope we found that the chip select is active throughout the transfer (as expected), we see ~160ns bursts of activity on the clock line for each byte transferred (8 bits @ 50MHz) with 1us idle periods in between. Where does the 1us delay between byte transfers come from? Is reading data bytes from the SPI registers very slow or is writing to RAM one byte at a time slowing the transfer?
Reading the source I can see that FAST_READ is being issued to the SPI flash and, unless I am missing something, there shouldn't be a delay between byte transfers. Looking at the spi_xfer() function in drivers/spi/davinci_spi.c and the L138 SPI module documentation I can think of making the following improvements: call spi_readl(ds, BUF) only once per byte transfer; take advantage of the tx/rx buffers for pipelining; write received data to RAM 32bit at a time, instead of 1 byte at a time. Does any one of these improvements go in the right direction?
Thank you -- Delio

Dear Delio Brignoli,
please mind the NetiQuette and restrict your line length to some 70 charatcers or so. Thanks.
In message 4D573595-069A-4490-AF2D-38ED3AAD70EB@audioscience.com you wrote:
I am working on reducing boot time on an L138 EVM and SPI flash transfer speed is currently the worst offender. U-Boot transfers from the SPI flash at 0.6Mbytes/s, this a lot slower than I would expect for a 50MHz SPI clock. Using a scope we found that the chip select is active throughout the transfer (as expected), we see ~160ns bursts of activity on the clock line for each byte transferred (8 bits @ 50MHz) with 1us idle periods in between. Where does the 1us delay between byte transfers come from? I s reading data bytes from the SPI registers very slow or is writing to RAM one byte at a time slowing the transfer?
Everything is slow as caches are not enabled.
Best regards,
Wolfgang Denk

Hello Wolfgang,
On 24/04/2010, at 10:29 AM, Wolfgang Denk wrote:
please mind the NetiQuette and restrict your line length to some 70 charatcers or so. Thanks.
Will do, thanks.
Everything is slow as caches are not enabled.
OK, so reducing the number of reads from registers and writes to RAM should improve performance. To you knowledge, would enabling the cache for davinci da850 break anything in U-Boot?
Best regards -- Delio

Dear Delio Brignoli,
In message ADFB8F9F-6368-4DEB-A994-E3F49C8F0620@audioscience.com you wrote:
OK, so reducing the number of reads from registers and writes to RAM should improve performance. To you knowledge, would enabling the cache for davinci da850 break anything in U-Boot?
No, except that it should be done consistently for all ARM processors.
Best regards,
Wolfgang Denk

On 24/04/2010, at 11:42 AM, Wolfgang Denk wrote:
In message ADFB8F9F-6368-4DEB-A994-E3F49C8F0620@audioscience.com you wrote:
OK, so reducing the number of reads from registers and writes to RAM should improve performance. To you knowledge, would enabling the cache for davinci da850 break anything in U-Boot?
No, except that it should be done consistently for all ARM processors.
Thank you Wolfgang. I will give the source a closer look to see how similar per-CPU setups are handled in U-Boot.
Kind regards -- Delio

To you knowledge, would enabling the cache for davinci da850 break anything in U-Boot?
No, except that it should be done consistently for all ARM processors.
Which reminds me I have to post V2 of my cache patch. V1 was sent on 2010-01-26 and some flush was missing. I didn't notice as network download worked fine.
I'll try to respin it as soon as possible.
/alessandro

Hello Delio,
On Sat, Apr 24, 2010 at 05:00:49, Delio Brignoli wrote:
Hello Wolfgang,
On 24/04/2010, at 10:29 AM, Wolfgang Denk wrote:
please mind the NetiQuette and restrict your line length to some 70 charatcers or so. Thanks.
Will do, thanks.
Everything is slow as caches are not enabled.
OK, so reducing the number of reads from registers and writes to RAM should improve performance. To you knowledge, would enabling the cache for davinci da850 break anything in U-Boot?
It would break EMAC driver for sure. The driver does not flush/invalidate the buffers under the assumption that the data cache is kept disabled.
Thanks, Sekhar

On Sat, Apr 24, 2010 at 03:59:22, Wolfgang Denk wrote:
Dear Delio Brignoli,
please mind the NetiQuette and restrict your line length to some 70 charatcers or so. Thanks.
In message 4D573595-069A-4490-AF2D-38ED3AAD70EB@audioscience.com you wrote:
I am working on reducing boot time on an L138 EVM and SPI flash transfer speed is currently the worst offender. U-Boot transfers from the SPI flash at 0.6Mbytes/s, this a lot slower than I would expect for a 50MHz SPI clock. Using a scope we found that the chip select is active throughout the transfer (as expected), we see ~160ns bursts of activity on the clock line for each byte transferred (8 bits @ 50MHz) with 1us idle periods in between. Where does the 1us delay between byte transfers come from? I s reading data bytes from the SPI registers very slow or is writing to RAM one byte at a time slowing the transfer?
Everything is slow as caches are not enabled.
The only delays being configured in the driver are the chip-select hold time delays which should not matter here as you see delays inserted between bytes which are part of a single transfer. I am starting to doubt peripheral mis-configuration as a possible cause here.
One way to mitigate the slow access to RAM would be to take advantage of external RAM burst by using EDMA. That will be some work because there is no other example of EDMA usage in U-Boot.
Thanks, Sekhar

Hello Sekhar,
Thank you for the advice below and sorry for the late reply; I have been offline for a week and I am catching up with my correspondence only now.
El 26/04/2010, a las 13:56, Nori, Sekhar escribió: [...]
The only delays being configured in the driver are the chip-select hold time delays which should not matter here as you see delays inserted between bytes which are part of a single transfer. I am starting to doubt peripheral mis-configuration as a possible cause here.
As you say chip-select hold time delays should not matter in this case. WDELAY and WDEL (respectively in SPIFMT and SPIDAT1 registers) control delay between transmissions, but they are both set to zero (disabled). So I believe slow RAM (and SPI module register) access is the most likely cause.
One way to mitigate the slow access to RAM would be to take advantage of external RAM burst by using EDMA. That will be some work because there is no other example of EDMA usage in U-Boot.
Thanks -- Delio
participants (4)
-
Alessandro Rubini
-
Delio Brignoli
-
Nori, Sekhar
-
Wolfgang Denk