[U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.

I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
Reduce the number of reads per byte transferred on the BUF register from 2 to 1 and take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM, SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a 2Mbyte image file. Remove unused variables in the spi_xfer() function.
Signed-off-by: Delio Brignoli dbrignoli@audioscience.com Tested-by: Ben Gardiner bengardiner@nanometrics.ca --- drivers/spi/davinci_spi.c | 69 ++++++++++++++++++++++----------------------- 1 files changed, 34 insertions(+), 35 deletions(-)
diff --git a/drivers/spi/davinci_spi.c b/drivers/spi/davinci_spi.c index 60ba007..d7f130d 100644 --- a/drivers/spi/davinci_spi.c +++ b/drivers/spi/davinci_spi.c @@ -131,12 +131,10 @@ int spi_xfer(struct spi_slave *slave, unsigned int bitlen, { struct davinci_spi_slave *ds = to_davinci_spi(slave); unsigned int len, data1_reg_val = readl(&ds->regs->dat1); - int ret, i; + unsigned int i_cnt = 0, o_cnt = 0, buf_reg_val; const u8 *txp = dout; /* dout can be NULL for read operation */ u8 *rxp = din; /* din can be NULL for write operation */
- ret = 0; - if (bitlen == 0) /* Finish any previously submitted transfers */ goto out; @@ -159,41 +157,42 @@ int spi_xfer(struct spi_slave *slave, unsigned int bitlen, readl(&ds->regs->buf);
/* keep writing and reading 1 byte until done */ - for (i = 0; i < len; i++) { - /* wait till TXFULL is asserted */ - while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK); - - /* write the data */ - data1_reg_val &= ~0xFFFF; - if (txp) { - data1_reg_val |= *txp; - txp++; + while ((i_cnt < len) || (o_cnt < len)) { + /* read RX buffer and flags */ + buf_reg_val = readl(&ds->regs->buf); + + /* if data is available */ + if ((i_cnt < len) && (buf_reg_val & SPIBUF_RXEMPTY_MASK) == 0) { + /* if there is no read buffer simply ignore the read character */ + if (rxp) + *rxp++ = buf_reg_val & 0xFF; + /* increment read words count */ + i_cnt++; }
- /* - * Write to DAT1 is required to keep the serial transfer going. - * We just terminate when we reach the end. - */ - if ((i == (len - 1)) && (flags & SPI_XFER_END)) { - /* clear CS hold */ - writel(data1_reg_val & - ~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1); - } else { - /* enable CS hold */ - data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) | + /* if the tx buffer is empty and there is still data to transmit */ + if ((o_cnt < len) && ((buf_reg_val & SPIBUF_TXFULL_MASK) == 0)) { + /* write the data */ + data1_reg_val &= ~0xFFFF; + if (txp) + data1_reg_val |= *txp++; + /* + * Write to DAT1 is required to keep the serial transfer going. + * We just terminate when we reach the end. + */ + if ((o_cnt == (len - 1)) && (flags & SPI_XFER_END)) { + /* clear CS hold */ + writel(data1_reg_val & + ~(1 << SPIDAT1_CSHOLD_SHIFT), + &ds->regs->dat1); + } else { + /* enable CS hold and write TX register */ + data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) | (slave->cs << SPIDAT1_CSNR_SHIFT)); - writel(data1_reg_val, &ds->regs->dat1); - } - - /* read the data - wait for data availability */ - while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK); - - if (rxp) { - *rxp = readl(&ds->regs->buf) & 0xFF; - rxp++; - } else { - /* simply drop the read character */ - readl(&ds->regs->buf); + writel(data1_reg_val, &ds->regs->dat1); + } + /* increment written words count */ + o_cnt++; } } return 0;

On Tue, Jun 1, 2010 at 7:36 AM, Delio Brignoli dbrignoli@audioscience.com wrote:
I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
Reduce the number of reads per byte transferred on the BUF register from 2 to 1 and take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM, SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a 2Mbyte image file. Remove unused variables in the spi_xfer() function.
Signed-off-by: Delio Brignoli dbrignoli@audioscience.com Tested-by: Ben Gardiner bengardiner@nanometrics.ca
Tested-by: Ben Gardiner bengardiner@nanometrics.ca
Applies cleanly to HEAD (9bb3b3d4406c1e388a99f6fb189147d6a06cc2cf) of git://git.denx.de/u-boot.git and HEAD (5f16b8551b125f16cd8d58f278cb25b94272fd9f) of git://arago-project.org/git/people/sekhar/u-boot-omapl1.git .
I am unable to build an SPI-capable u-boot for the da830 or da850 on git://git.denx.de/u-boot.git; tested with HEAD of git://arago-project.org/git/people/sekhar/u-boot-omapl1.git -- there is a noticeable speed improvement in SPI read speed.

I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
Reduce the number of reads per byte transferred on the BUF register from 2 to 1 and take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM, SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a 2Mbyte image file. Remove unused variables in the spi_xfer() function.
Signed-off-by: Delio Brignoli dbrignoli@audioscience.com Tested-by: Ben Gardiner bengardiner@nanometrics.ca
Checkpatch returned 4 errors.
Please fix and resubmit
Thanks, Sandeep

I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
Reduce the number of reads per byte transferred on the BUF register from
2
to 1 and take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM, SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with
a
2Mbyte image file. Remove unused variables in the spi_xfer() function.
Signed-off-by: Delio Brignoli dbrignoli@audioscience.com Tested-by: Ben Gardiner bengardiner@nanometrics.ca
Checkpatch returned 4 errors.
Please fix and resubmit
Thanks, Sandeep
I fixed them myself.
Pushed to u-boot-ti

On 01/06/10 12:36, Delio Brignoli wrote:
I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
Reduce the number of reads per byte transferred on the BUF register from 2 to 1 and take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM, SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a 2Mbyte image file. Remove unused variables in the spi_xfer() function.
Signed-off-by: Delio Brignoli dbrignoli@audioscience.com Tested-by: Ben Gardiner bengardiner@nanometrics.ca
Sorry, I'm a bit late to the party on this.
I have an alternative patch that tries to be even quicker, but I don't have the same platform as Delio, so can't compare like with like.
This diff applies before Delio's patch. If it is any faster I am prepared to create a proper patch. If nobody can test it for speed I'll probably just drop it, since it produces a slightly bigger executable and I don't know that it is actually any faster...
In essence, it splits up read and write operations to avoid testing pointers in every loop iteration. It also unrolls the last iteration so that it doesn't have to test for loop ending twice each time round. Finally it avoids bit setting/clearing on each iteration when the results would only turn out to be the same anyway.
Here's the diff:
diff --git a/drivers/spi/davinci_spi.c b/drivers/spi/davinci_spi.c index 60ba007..a90d2f4 100644 --- a/drivers/spi/davinci_spi.c +++ b/drivers/spi/davinci_spi.c @@ -126,16 +126,98 @@ void spi_release_bus(struct spi_slave *slave) writel(SPIGCR0_SPIRST_MASK, &ds->regs->gcr0); }
-int spi_xfer(struct spi_slave *slave, unsigned int bitlen, - const void *dout, void *din, unsigned long flags) +static inline u8 davinci_spi_read_data(struct davinci_spi_slave *ds, u32 data) +{ + unsigned int buf_reg_val; + /* wait till TXFULL is deasserted */ + while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK) + ; + writel(data, &ds->regs->dat1); + + /* read the data - wait for data availability */ + while ((buf_reg_val = readl(&ds->regs->buf)) & SPIBUF_RXEMPTY_MASK) + ; + return buf_reg_val & 0xFF; +} + +static int davinci_spi_read(struct spi_slave *slave, unsigned int len, + u8 *rxp, unsigned long flags) { struct davinci_spi_slave *ds = to_davinci_spi(slave); - unsigned int len, data1_reg_val = readl(&ds->regs->dat1); - int ret, i; - const u8 *txp = dout; /* dout can be NULL for read operation */ - u8 *rxp = din; /* din can be NULL for write operation */ + unsigned int data1_reg_val = readl(&ds->regs->dat1); + + /* do an empty read to clear the current contents */ + (void)readl(&ds->regs->buf); + + /* enable CS hold */ + data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) | + (slave->cs << SPIDAT1_CSNR_SHIFT)); + data1_reg_val &= ~0xFFFF; + + /* keep writing and reading 1 byte until only 1 byte left to read */ + while ((len--) > 1) { + *rxp++ = davinci_spi_read_data(ds, data1_reg_val); + }
- ret = 0; + /* + * clear CS hold when we reach the end. + */ + if (flags & SPI_XFER_END) + data1_reg_val &= ~(1 << SPIDAT1_CSHOLD_SHIFT); + + *rxp = davinci_spi_read_data(ds, data1_reg_val); + + return 0; +} + +static inline void davinci_spi_write_data(struct davinci_spi_slave *ds, u32 data) +{ + /* wait till TXFULL is deasserted */ + while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK) + ; + writel(data, &ds->regs->dat1); + + /* wait for read data availability */ + while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK) + ; +} + +static int davinci_spi_write(struct spi_slave *slave, unsigned int len, + const u8 *txp, unsigned long flags) +{ + struct davinci_spi_slave *ds = to_davinci_spi(slave); + unsigned int data1_reg_val = readl(&ds->regs->dat1); + + /* do an empty read to clear the current contents */ + (void)readl(&ds->regs->buf); + + /* enable CS hold */ + data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) | + (slave->cs << SPIDAT1_CSNR_SHIFT)); + data1_reg_val &= ~0xFFFF; + + /* keep writing and reading 1 byte until only 1 byte left to write */ + while ((len--) > 1) { + /* write the data */ + davinci_spi_write_data(ds, data1_reg_val | *txp++); + } + + /* + * clear CS hold when we reach the end. + */ + if (flags & SPI_XFER_END) + data1_reg_val &= ~(1 << SPIDAT1_CSHOLD_SHIFT); + + /* write the data */ + davinci_spi_write_data(ds, data1_reg_val | *txp); + + return 0; +} + +int spi_xfer(struct spi_slave *slave, unsigned int bitlen, + const void *dout, void *din, unsigned long flags) +{ + unsigned int len;
if (bitlen == 0) /* Finish any previously submitted transfers */ @@ -155,53 +237,15 @@ int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
len = bitlen / 8;
- /* do an empty read to clear the current contents */ - readl(&ds->regs->buf); - - /* keep writing and reading 1 byte until done */ - for (i = 0; i < len; i++) { - /* wait till TXFULL is asserted */ - while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK); - - /* write the data */ - data1_reg_val &= ~0xFFFF; - if (txp) { - data1_reg_val |= *txp; - txp++; - } - - /* - * Write to DAT1 is required to keep the serial transfer going. - * We just terminate when we reach the end. - */ - if ((i == (len - 1)) && (flags & SPI_XFER_END)) { - /* clear CS hold */ - writel(data1_reg_val & - ~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1); - } else { - /* enable CS hold */ - data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) | - (slave->cs << SPIDAT1_CSNR_SHIFT)); - writel(data1_reg_val, &ds->regs->dat1); - } - - /* read the data - wait for data availability */ - while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK); - - if (rxp) { - *rxp = readl(&ds->regs->buf) & 0xFF; - rxp++; - } else { - /* simply drop the read character */ - readl(&ds->regs->buf); - } - } - return 0; + if (din) + return davinci_spi_read(slave, len, din, flags); + else + return davinci_spi_write(slave, len, dout, flags);
out: if (flags & SPI_XFER_END) { - writel(data1_reg_val & - ~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1); + u8 dummy = 0; + davinci_spi_write(slave, 1, &dummy, flags); } return 0; }

On 01/06/10 12:36, Delio Brignoli wrote:
I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
Reduce the number of reads per byte transferred on the BUF register from
2 to 1 and
take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138
EVM,
SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with
a 2Mbyte image file.
Remove unused variables in the spi_xfer() function.
Signed-off-by: Delio Brignoli dbrignoli@audioscience.com Tested-by: Ben Gardiner bengardiner@nanometrics.ca
Sorry, I'm a bit late to the party on this.
It is late. Pull request already sent to Wolfgang
I have an alternative patch that tries to be even quicker, but I don't have the same platform as Delio, so can't compare like with like.
Compare it on your platform. I believe you have the OMAP L137. And post the results.
This diff applies before Delio's patch. If it is any faster I am prepared to create a proper patch. If nobody can test it for speed I'll probably just drop it, since it produces a slightly bigger executable and I don't know that it is actually any faster...
In essence, it splits up read and write operations to avoid testing pointers in every loop iteration. It also unrolls the last iteration so that it doesn't have to test for loop ending twice each time round. Finally it avoids bit setting/clearing on each iteration when the results would only turn out to be the same anyway.
Here's the diff:
diff --git a/drivers/spi/davinci_spi.c b/drivers/spi/davinci_spi.c index 60ba007..a90d2f4 100644 --- a/drivers/spi/davinci_spi.c +++ b/drivers/spi/davinci_spi.c @@ -126,16 +126,98 @@ void spi_release_bus(struct spi_slave *slave) writel(SPIGCR0_SPIRST_MASK, &ds->regs->gcr0); }
-int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
const void *dout, void *din, unsigned long flags)
+static inline u8 davinci_spi_read_data(struct davinci_spi_slave *ds, u32 data) +{
- unsigned int buf_reg_val;
- /* wait till TXFULL is deasserted */
- while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK)
;
- writel(data, &ds->regs->dat1);
- /* read the data - wait for data availability */
- while ((buf_reg_val = readl(&ds->regs->buf)) & SPIBUF_RXEMPTY_MASK)
;
- return buf_reg_val & 0xFF;
+}
+static int davinci_spi_read(struct spi_slave *slave, unsigned int len,
u8 *rxp, unsigned long flags)
{ struct davinci_spi_slave *ds = to_davinci_spi(slave);
- unsigned int len, data1_reg_val = readl(&ds->regs->dat1);
- int ret, i;
- const u8 *txp = dout; /* dout can be NULL for read operation */
- u8 *rxp = din; /* din can be NULL for write operation */
- unsigned int data1_reg_val = readl(&ds->regs->dat1);
- /* do an empty read to clear the current contents */
- (void)readl(&ds->regs->buf);
- /* enable CS hold */
- data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
(slave->cs << SPIDAT1_CSNR_SHIFT));
- data1_reg_val &= ~0xFFFF;
- /* keep writing and reading 1 byte until only 1 byte left to read */
- while ((len--) > 1) {
*rxp++ = davinci_spi_read_data(ds, data1_reg_val);
- }
- ret = 0;
- /*
* clear CS hold when we reach the end.
*/
- if (flags & SPI_XFER_END)
data1_reg_val &= ~(1 << SPIDAT1_CSHOLD_SHIFT);
- *rxp = davinci_spi_read_data(ds, data1_reg_val);
- return 0;
+}
+static inline void davinci_spi_write_data(struct davinci_spi_slave *ds, u32 data) +{
- /* wait till TXFULL is deasserted */
- while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK)
;
- writel(data, &ds->regs->dat1);
- /* wait for read data availability */
- while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK)
;
+}
+static int davinci_spi_write(struct spi_slave *slave, unsigned int len,
const u8 *txp, unsigned long flags)
+{
- struct davinci_spi_slave *ds = to_davinci_spi(slave);
- unsigned int data1_reg_val = readl(&ds->regs->dat1);
- /* do an empty read to clear the current contents */
- (void)readl(&ds->regs->buf);
- /* enable CS hold */
- data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
(slave->cs << SPIDAT1_CSNR_SHIFT));
- data1_reg_val &= ~0xFFFF;
- /* keep writing and reading 1 byte until only 1 byte left to write
*/
- while ((len--) > 1) {
/* write the data */
davinci_spi_write_data(ds, data1_reg_val | *txp++);
- }
- /*
* clear CS hold when we reach the end.
*/
- if (flags & SPI_XFER_END)
data1_reg_val &= ~(1 << SPIDAT1_CSHOLD_SHIFT);
- /* write the data */
- davinci_spi_write_data(ds, data1_reg_val | *txp);
- return 0;
+}
+int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
const void *dout, void *din, unsigned long flags)
+{
unsigned int len;
if (bitlen == 0) /* Finish any previously submitted transfers */
@@ -155,53 +237,15 @@ int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
len = bitlen / 8;
- /* do an empty read to clear the current contents */
- readl(&ds->regs->buf);
- /* keep writing and reading 1 byte until done */
- for (i = 0; i < len; i++) {
/* wait till TXFULL is asserted */
while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK);
/* write the data */
data1_reg_val &= ~0xFFFF;
if (txp) {
data1_reg_val |= *txp;
txp++;
}
/*
* Write to DAT1 is required to keep the serial transfer going.
* We just terminate when we reach the end.
*/
if ((i == (len - 1)) && (flags & SPI_XFER_END)) {
/* clear CS hold */
writel(data1_reg_val &
~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1);
} else {
/* enable CS hold */
data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
(slave->cs << SPIDAT1_CSNR_SHIFT));
writel(data1_reg_val, &ds->regs->dat1);
}
/* read the data - wait for data availability */
while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK);
if (rxp) {
*rxp = readl(&ds->regs->buf) & 0xFF;
rxp++;
} else {
/* simply drop the read character */
readl(&ds->regs->buf);
}
- }
- return 0;
- if (din)
return davinci_spi_read(slave, len, din, flags);
- else
return davinci_spi_write(slave, len, dout, flags);
out: if (flags & SPI_XFER_END) {
writel(data1_reg_val &
~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1);
u8 dummy = 0;
} return 0;davinci_spi_write(slave, 1, &dummy, flags);
} _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

On 17/06/10 16:10, Paulraj, Sandeep wrote:
On 01/06/10 12:36, Delio Brignoli wrote:
I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
Reduce the number of reads per byte transferred on the BUF register from
2 to 1 and
take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138
EVM,
SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with
a 2Mbyte image file.
Remove unused variables in the spi_xfer() function.
Signed-off-by: Delio Brignoli dbrignoli@audioscience.com Tested-by: Ben Gardiner bengardiner@nanometrics.ca
Sorry, I'm a bit late to the party on this.
It is late. Pull request already sent to Wolfgang
I have an alternative patch that tries to be even quicker, but I don't have the same platform as Delio, so can't compare like with like.
Compare it on your platform. I believe you have the OMAP L137. And post the results.
I don't have a scope to get an accurate measure. The best I can do right now is use a serial snooper to time between me pressing return and the next prompt turning up.
To try and drown out inaccuracies and delays, I ran:
sf read 0xc0008000 0 0x800000
So 8MiB in a reasonably consistent 5.62 - 5.63 seconds, which is about 1.49MiB/s by my reckoning. A bit faster, but way short of 6.25MiB/s.
Nick.

Hello Nick,
On 17/06/2010, at 17:02, Nick Thompson wrote:
On 01/06/10 12:36, Delio Brignoli wrote:
I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
[...]
I have an alternative patch that tries to be even quicker, but I don't have the same platform as Delio, so can't compare like with like.
I will test your patch on my L138 and let you know.
In essence, it splits up read and write operations to avoid testing pointers in every loop iteration. It also unrolls the last iteration so that it doesn't have to test for loop ending twice each time round. Finally it avoids bit setting/clearing on each iteration when the results would only turn out to be the same anyway.
I am wondering if it is OK to split reading and writing: when I wrote the patch I assumed that spi_xfer() was supposed to be able to read and write to two distinct buffers at the same time. As I understand it, your patch allows spi_xfer() to be either a read or a write operation but not both and I do not know if this semantic change is acceptable. Maybe Sekhar or Sandeep can clarify this.
Thanks -- Delio

On 17/06/10 18:38, Delio Brignoli wrote:
Hello Nick,
On 17/06/2010, at 17:02, Nick Thompson wrote:
On 01/06/10 12:36, Delio Brignoli wrote:
I have updated this patch based on the comments [1] by Wolfgang Denk and removed unused variables. [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
[...]
I have an alternative patch that tries to be even quicker, but I don't have the same platform as Delio, so can't compare like with like.
I will test your patch on my L138 and let you know.
That would be great, thanks - I'm using a SPANSION FLASH, though I don't think that makes any difference. It your results are consistent I will be more confident with my L137 platform as a comparison.
In essence, it splits up read and write operations to avoid testing pointers in every loop iteration. It also unrolls the last iteration so that it doesn't have to test for loop ending twice each time round. Finally it avoids bit setting/clearing on each iteration when the results would only turn out to be the same anyway.
I am wondering if it is OK to split reading and writing: when I wrote the patch I assumed that spi_xfer() was supposed to be able to read and write to two distinct buffers at the same time. As I understand it, your patch allows spi_xfer() to be either a read or a write operation but not both and I do not know if this semantic change is acceptable. Maybe Sekhar or Sandeep can clarify this.
I wondered about that too. I don't think it matters for SPI FLASH, but I guess it could for other SPI devices (Audio?).
I was thinking I could add a third routine 'read_write' that does both, which only gets called if a bi-direction transfer is requested. This would make the code even bigger, though it could be conditional on a CONFIG option so it could be excluded if only FLASH support is required.
(I'm not sure if U-Boot uses SPI for anything else anyway...)
In the mean time, I have reads working at 1.76MiB/s. I need to read the SPI manual carefully before I let that change into the wild though. Basically, I've added an assumption that by the time the RX buffer is ready with the returned bits, the TXer must have completed since they run to the same clock: so no need to test for TX not full in the loop.
Nick.

Hi Nick,
On Fri, Jun 18, 2010 at 09:26:59AM +0100, Nick Thompson wrote:
(I'm not sure if U-Boot uses SPI for anything else anyway...)
it can be useful for booting an FPGA. On some architectures this is faster than GPIO bit bang. This is of course board dependent, as implementation of the actual FPGA functions is in the board code anyways. (Using this in a not-yet-published board.)
Another use could be SPI-controlled displays (framebuffer driver for displaying a boot loge etc.). Did not have time to implement this yet, unfortunately... ;-)
Best regards, Wolfgang
participants (5)
-
Ben Gardiner
-
Delio Brignoli
-
Nick Thompson
-
Paulraj, Sandeep
-
wolfgang@leila.ping.de