[U-Boot] [PATCH] spi: cadence_qspi_apb: Improve indirect write transaction

If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com ---
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */ - if (((uintptr_t)txbuf % 4) || (write_bytes % 4)) + if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes); - else + } else { writesl(plat->ahbbase, txbuf, write_bytes >> 2); + if (write_bytes % 4) { + writesb(plat->ahbbase, + txbuf + rounddown(write_bytes, 4), + write_bytes % 4); + } + }
ret = wait_for_bit("QSPI", plat->regbase + CQSPI_REG_SDRAMLEVEL, CQSPI_REG_SDRAMLEVEL_WR_MASK <<

On Thursday 06 October 2016 04:49 PM, Vignesh R wrote:
If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com
Gentle ping... Any comments?
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */
if (((uintptr_t)txbuf % 4) || (write_bytes % 4))
if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes);
else
} else { writesl(plat->ahbbase, txbuf, write_bytes >> 2);
if (write_bytes % 4) {
writesb(plat->ahbbase,
txbuf + rounddown(write_bytes, 4),
write_bytes % 4);
}
}
ret = wait_for_bit("QSPI", plat->regbase + CQSPI_REG_SDRAMLEVEL, CQSPI_REG_SDRAMLEVEL_WR_MASK <<

On 10/18/2016 10:23 AM, Vignesh R wrote:
On Thursday 06 October 2016 04:49 PM, Vignesh R wrote:
If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com
Gentle ping... Any comments?
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */
if (((uintptr_t)txbuf % 4) || (write_bytes % 4))
if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes);
else
} else { writesl(plat->ahbbase, txbuf, write_bytes >> 2);
if (write_bytes % 4) {
writesb(plat->ahbbase,
txbuf + rounddown(write_bytes, 4),
write_bytes % 4);
}
You can probably pull this block from the else branch.
}
ret = wait_for_bit("QSPI", plat->regbase + CQSPI_REG_SDRAMLEVEL, CQSPI_REG_SDRAMLEVEL_WR_MASK <<

Hi,
On Tuesday 18 October 2016 05:15 PM, Marek Vasut wrote:
On 10/18/2016 10:23 AM, Vignesh R wrote:
On Thursday 06 October 2016 04:49 PM, Vignesh R wrote:
If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com
Gentle ping... Any comments?
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */
if (((uintptr_t)txbuf % 4) || (write_bytes % 4))
if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes);
else
} else { writesl(plat->ahbbase, txbuf, write_bytes >> 2);
if (write_bytes % 4) {
writesb(plat->ahbbase,
txbuf + rounddown(write_bytes, 4),
write_bytes % 4);
}
You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
With above change, can I have your Acked-by/Reviewed-by?

On 10/19/2016 06:40 AM, Vignesh R wrote:
Hi,
On Tuesday 18 October 2016 05:15 PM, Marek Vasut wrote:
On 10/18/2016 10:23 AM, Vignesh R wrote:
On Thursday 06 October 2016 04:49 PM, Vignesh R wrote:
If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com
Gentle ping... Any comments?
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */
if (((uintptr_t)txbuf % 4) || (write_bytes % 4))
if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes);
else
} else { writesl(plat->ahbbase, txbuf, write_bytes >> 2);
if (write_bytes % 4) {
writesb(plat->ahbbase,
txbuf + rounddown(write_bytes, 4),
write_bytes % 4);
}
You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
With above change, can I have your Acked-by/Reviewed-by?
Sure , just send a V2 and CC me.

On 10/19/2016 06:40 AM, Vignesh R wrote:
Hi,
On Tuesday 18 October 2016 05:15 PM, Marek Vasut wrote:
On 10/18/2016 10:23 AM, Vignesh R wrote:
On Thursday 06 October 2016 04:49 PM, Vignesh R wrote:
If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com
Gentle ping... Any comments?
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */
if (((uintptr_t)txbuf % 4) || (write_bytes % 4))
if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes);
else
} else { writesl(plat->ahbbase, txbuf, write_bytes >> 2);
if (write_bytes % 4) {
writesb(plat->ahbbase,
txbuf + rounddown(write_bytes, 4),
write_bytes % 4);
}
You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
Hmmmm, good point, in which case this is OK as-is.
With above change, can I have your Acked-by/Reviewed-by?
Reviewed-by: Marek Vasut marex@denx.de

On Wed, Oct 19, 2016 at 10:10 AM, Vignesh R vigneshr@ti.com wrote:
Hi,
On Tuesday 18 October 2016 05:15 PM, Marek Vasut wrote:
On 10/18/2016 10:23 AM, Vignesh R wrote:
On Thursday 06 October 2016 04:49 PM, Vignesh R wrote:
If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com
Gentle ping... Any comments?
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */
if (((uintptr_t)txbuf % 4) || (write_bytes % 4))
if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes);
else
} else { writesl(plat->ahbbase, txbuf, write_bytes >> 2);
if (write_bytes % 4) {
writesb(plat->ahbbase,
txbuf + rounddown(write_bytes, 4),
write_bytes % 4);
}
You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
With above change, can I have your Acked-by/Reviewed-by?
Also try to get the 'sf update' data before and after and append it on commit message.
thanks!

On 10/19/2016 04:41 PM, Jagan Teki wrote:
On Wed, Oct 19, 2016 at 10:10 AM, Vignesh R vigneshr@ti.com wrote:
Hi,
On Tuesday 18 October 2016 05:15 PM, Marek Vasut wrote:
On 10/18/2016 10:23 AM, Vignesh R wrote:
On Thursday 06 October 2016 04:49 PM, Vignesh R wrote:
If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com
Gentle ping... Any comments?
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */
if (((uintptr_t)txbuf % 4) || (write_bytes % 4))
if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes);
else
} else { writesl(plat->ahbbase, txbuf, write_bytes >> 2);
if (write_bytes % 4) {
writesb(plat->ahbbase,
txbuf + rounddown(write_bytes, 4),
write_bytes % 4);
}
You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
With above change, can I have your Acked-by/Reviewed-by?
Also try to get the 'sf update' data before and after and append it on commit message.
Why? Seems useless to me.

On Wed, Oct 19, 2016 at 8:18 PM, Marek Vasut marex@denx.de wrote:
On 10/19/2016 04:41 PM, Jagan Teki wrote:
On Wed, Oct 19, 2016 at 10:10 AM, Vignesh R vigneshr@ti.com wrote:
Hi,
On Tuesday 18 October 2016 05:15 PM, Marek Vasut wrote:
On 10/18/2016 10:23 AM, Vignesh R wrote:
On Thursday 06 October 2016 04:49 PM, Vignesh R wrote:
If the write transaction size(write_bytes) is not a multiple of word length, then issue word length writes till the we reach the dangling bytes. On the final write, issue byte by byte write to complete the transaction. This marginally improves write throughput when performing random sized writes to the flash.
Signed-off-by: Vignesh R vigneshr@ti.com
Gentle ping... Any comments?
Tested on K2G GP EVM.
drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c index e285d3c1e761..4b891f227243 100644 --- a/drivers/spi/cadence_qspi_apb.c +++ b/drivers/spi/cadence_qspi_apb.c @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, while (remaining > 0) { write_bytes = remaining > page_size ? page_size : remaining; /* Handle non-4-byte aligned access to avoid data abort. */
if (((uintptr_t)txbuf % 4) || (write_bytes % 4))
if ((uintptr_t)txbuf % 4) { writesb(plat->ahbbase, txbuf, write_bytes);
else
} else { writesl(plat->ahbbase, txbuf, write_bytes >> 2);
if (write_bytes % 4) {
writesb(plat->ahbbase,
txbuf + rounddown(write_bytes, 4),
write_bytes % 4);
}
You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
With above change, can I have your Acked-by/Reviewed-by?
Also try to get the 'sf update' data before and after and append it on commit message.
Why? Seems useless to me.
Since it's a performance improvement patch better to have that numbers, no harm getting that data.
thanks!

On 10/19/2016 05:19 PM, Jagan Teki wrote:
On Wed, Oct 19, 2016 at 8:18 PM, Marek Vasut marex@denx.de wrote:
On 10/19/2016 04:41 PM, Jagan Teki wrote:
On Wed, Oct 19, 2016 at 10:10 AM, Vignesh R vigneshr@ti.com wrote:
Hi,
On Tuesday 18 October 2016 05:15 PM, Marek Vasut wrote:
On 10/18/2016 10:23 AM, Vignesh R wrote:
On Thursday 06 October 2016 04:49 PM, Vignesh R wrote: > If the write transaction size(write_bytes) is not a multiple of word > length, then issue word length writes till the we reach the dangling > bytes. On the final write, issue byte by byte write to complete the > transaction. This marginally improves write throughput when performing > random sized writes to the flash. > > Signed-off-by: Vignesh R vigneshr@ti.com > ---
Gentle ping... Any comments?
> > Tested on K2G GP EVM. > > drivers/spi/cadence_qspi_apb.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/spi/cadence_qspi_apb.c b/drivers/spi/cadence_qspi_apb.c > index e285d3c1e761..4b891f227243 100644 > --- a/drivers/spi/cadence_qspi_apb.c > +++ b/drivers/spi/cadence_qspi_apb.c > @@ -752,10 +752,16 @@ int cadence_qspi_apb_indirect_write_execute(struct cadence_spi_platdata *plat, > while (remaining > 0) { > write_bytes = remaining > page_size ? page_size : remaining; > /* Handle non-4-byte aligned access to avoid data abort. */ > - if (((uintptr_t)txbuf % 4) || (write_bytes % 4)) > + if ((uintptr_t)txbuf % 4) { > writesb(plat->ahbbase, txbuf, write_bytes); > - else > + } else { > writesl(plat->ahbbase, txbuf, write_bytes >> 2); > + if (write_bytes % 4) { > + writesb(plat->ahbbase, > + txbuf + rounddown(write_bytes, 4), > + write_bytes % 4); > + }
You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
With above change, can I have your Acked-by/Reviewed-by?
Also try to get the 'sf update' data before and after and append it on commit message.
Why? Seems useless to me.
Since it's a performance improvement patch better to have that numbers, no harm getting that data.
Urghhhh, sf update is mixing multiple access patterns, it is by no means a good performance metric for evaluating performance of the write path.
What you would need to do here is perform long unaligned writes repeatedly (to eliminate outliers) and measure the improvement. And you'd have to make sure the erase cycle is not counted in.
I suspect the performance improvement would be negligible, but I'd be happy to be proven wrong. If it'd be negligible, then we should probably not complicate the code more and just drop this patch.

On Wednesday 19 October 2016 08:58 PM, Marek Vasut wrote:
On 10/19/2016 05:19 PM, Jagan Teki wrote:
On Wed, Oct 19, 2016 at 8:18 PM, Marek Vasut marex@denx.de wrote:
On 10/19/2016 04:41 PM, Jagan Teki wrote:
On Wed, Oct 19, 2016 at 10:10 AM, Vignesh R vigneshr@ti.com wrote:
Hi,
...
You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
With above change, can I have your Acked-by/Reviewed-by?
Also try to get the 'sf update' data before and after and append it on commit message.
Why? Seems useless to me.
Since it's a performance improvement patch better to have that numbers, no harm getting that data.
Urghhhh, sf update is mixing multiple access patterns, it is by no means a good performance metric for evaluating performance of the write path.
What you would need to do here is perform long unaligned writes repeatedly (to eliminate outliers) and measure the improvement. And you'd have to make sure the erase cycle is not counted in.
I suspect the performance improvement would be negligible, but I'd be happy to be proven wrong. If it'd be negligible, then we should probably not complicate the code more and just drop this patch.
Today, I was performing unaligned writes of various sizes to get performance numbers and discovered that unaligned writes (i.e txbuf address is not word aligned or write_byte % 4 != 0) are sometimes failing on TI platforms with Cadence QSPI (with or w/o this patch) :( Reverting the patch "mtd: cqspi: Simplify indirect write code" seems to be helping. I don't see anything obviously wrong here. Let me debug whats causing the difference and get back.

On 10/20/2016 01:01 PM, Vignesh R wrote:
On Wednesday 19 October 2016 08:58 PM, Marek Vasut wrote:
On 10/19/2016 05:19 PM, Jagan Teki wrote:
On Wed, Oct 19, 2016 at 8:18 PM, Marek Vasut marex@denx.de wrote:
On 10/19/2016 04:41 PM, Jagan Teki wrote:
On Wed, Oct 19, 2016 at 10:10 AM, Vignesh R vigneshr@ti.com wrote:
Hi,
...
> You can probably pull this block from the else branch.
Yeah, I guess writesb() can handle zero byte write request I believe.
With above change, can I have your Acked-by/Reviewed-by?
Also try to get the 'sf update' data before and after and append it on commit message.
Why? Seems useless to me.
Since it's a performance improvement patch better to have that numbers, no harm getting that data.
Urghhhh, sf update is mixing multiple access patterns, it is by no means a good performance metric for evaluating performance of the write path.
What you would need to do here is perform long unaligned writes repeatedly (to eliminate outliers) and measure the improvement. And you'd have to make sure the erase cycle is not counted in.
I suspect the performance improvement would be negligible, but I'd be happy to be proven wrong. If it'd be negligible, then we should probably not complicate the code more and just drop this patch.
Today, I was performing unaligned writes of various sizes to get performance numbers and discovered that unaligned writes (i.e txbuf address is not word aligned or write_byte % 4 != 0) are sometimes failing on TI platforms with Cadence QSPI (with or w/o this patch) :( Reverting the patch "mtd: cqspi: Simplify indirect write code" seems to be helping. I don't see anything obviously wrong here. Let me debug whats causing the difference and get back.
Can you please drill into it ? I'd be happy to help testing.
participants (3)
-
Jagan Teki
-
Marek Vasut
-
Vignesh R