[U-Boot] [PATCHv4] Optimized nand_read_buf for kirkwood

From: Nico Erfurth ne@erfurth.eu
The basic idea is taken from the linux-kernel, but further optimized.
First align the buffer to 8 bytes, then use ldrd/strd to read and store in 8 byte quantities, then do the final bytes.
Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this patch in place, reading the same amount of data was done in 27s (~4.89MB/s). So read performance is increased by ~80%!
Signed-off-by: Nico Erfurth ne@erfurth.eu Tested-by: Phil Sutter phil.sutter@viprinet.com Cc: Prafulla Wadaskar prafulla@marvell.com --- Changed since V3: - fixed author --- drivers/mtd/nand/kirkwood_nand.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
diff --git a/drivers/mtd/nand/kirkwood_nand.c b/drivers/mtd/nand/kirkwood_nand.c index 0a99a10..85ea5d2 100644 --- a/drivers/mtd/nand/kirkwood_nand.c +++ b/drivers/mtd/nand/kirkwood_nand.c @@ -38,6 +38,37 @@ struct kwnandf_registers { static struct kwnandf_registers *nf_reg = (struct kwnandf_registers *)KW_NANDF_BASE;
+ +/* + * The basic idea is stolen from the linux kernel, but the inner loop is + * optimized a bit more. + */ +static void kw_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len) +{ + struct nand_chip *chip = mtd->priv; + + while (len && (unsigned long)buf & 7) { + *buf++ = readb(chip->IO_ADDR_R); + len--; + }; + + /* This loop reads and writes 64bit per round. */ + asm volatile ( + "1:\n" + " subs %0, #8\n" + " ldrpld r2, [%2]\n" + " strpld r2, [%1], #8\n" + " bhi 1b\n" + " addne %0, #8\n" + : "+&r" (len), "+&r" (buf) + : "r" (chip->IO_ADDR_R) + : "r2", "r3", "memory", "cc" + ); + + while (len--) + *buf++ = readb(chip->IO_ADDR_R); +} + /* * hardware specific access to control-lines/bits */ @@ -80,6 +111,7 @@ int board_nand_init(struct nand_chip *nand) nand->ecc.mode = NAND_ECC_SOFT; #endif nand->cmd_ctrl = kw_nand_hwcontrol; + nand->read_buf = kw_nand_read_buf; nand->chip_delay = 40; nand->select_chip = kw_nand_select_chip; return 0;

On Mon, 2013-08-26 at 14:10 +0200, Phil Sutter wrote:
From: Nico Erfurth ne@erfurth.eu
The basic idea is taken from the linux-kernel, but further optimized.
First align the buffer to 8 bytes, then use ldrd/strd to read and store in 8 byte quantities, then do the final bytes.
Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this patch in place, reading the same amount of data was done in 27s (~4.89MB/s). So read performance is increased by ~80%!
Signed-off-by: Nico Erfurth ne@erfurth.eu Tested-by: Phil Sutter phil.sutter@viprinet.com Cc: Prafulla Wadaskar prafulla@marvell.com
Changed since V3:
- fixed author
It needs your Signed-off-by: as well -- can I add that when applying?
-Scott

On Mon, Aug 26, 2013 at 10:43:38AM -0500, Scott Wood wrote:
On Mon, 2013-08-26 at 14:10 +0200, Phil Sutter wrote:
From: Nico Erfurth ne@erfurth.eu
The basic idea is taken from the linux-kernel, but further optimized.
First align the buffer to 8 bytes, then use ldrd/strd to read and store in 8 byte quantities, then do the final bytes.
Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this patch in place, reading the same amount of data was done in 27s (~4.89MB/s). So read performance is increased by ~80%!
Signed-off-by: Nico Erfurth ne@erfurth.eu Tested-by: Phil Sutter phil.sutter@viprinet.com Cc: Prafulla Wadaskar prafulla@marvell.com
Changed since V3:
- fixed author
It needs your Signed-off-by: as well -- can I add that when applying?
Yes, that's fine by me. So if I submit others' patches, I need to sign them off as well?
Best wishes,
Phil Sutter Software Engineer

On Mon, Aug 26, 2013 at 06:00:39PM +0200, Phil Sutter wrote:
On Mon, Aug 26, 2013 at 10:43:38AM -0500, Scott Wood wrote:
On Mon, 2013-08-26 at 14:10 +0200, Phil Sutter wrote:
From: Nico Erfurth ne@erfurth.eu
The basic idea is taken from the linux-kernel, but further optimized.
First align the buffer to 8 bytes, then use ldrd/strd to read and store in 8 byte quantities, then do the final bytes.
Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this patch in place, reading the same amount of data was done in 27s (~4.89MB/s). So read performance is increased by ~80%!
Signed-off-by: Nico Erfurth ne@erfurth.eu Tested-by: Phil Sutter phil.sutter@viprinet.com Cc: Prafulla Wadaskar prafulla@marvell.com
Changed since V3:
- fixed author
It needs your Signed-off-by: as well -- can I add that when applying?
Yes, that's fine by me. So if I submit others' patches, I need to sign them off as well?
Ah, nevermind. Reading 1.12 of Documentation/SubmittingPatches helps.
Best wishes,
Phil Sutter Software Engineer

On Mon, Aug 26, 2013 at 02:10:56PM +0200, Phil Sutter wrote:
From: Nico Erfurth ne@erfurth.eu
The basic idea is taken from the linux-kernel, but further optimized.
First align the buffer to 8 bytes, then use ldrd/strd to read and store in 8 byte quantities, then do the final bytes.
Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this patch in place, reading the same amount of data was done in 27s (~4.89MB/s). So read performance is increased by ~80%!
Signed-off-by: Nico Erfurth ne@erfurth.eu Tested-by: Phil Sutter phil.sutter@viprinet.com Cc: Prafulla Wadaskar prafulla@marvell.com
Changed since V3:
- fixed author
drivers/mtd/nand/kirkwood_nand.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
I tried to build-test this, and I couldn't find any board that defines CONFIG_NAND_KIRKWOOD.
The patch that removed it was commit b5befd8211b54ae2d2fca3fbed061c879951ceaa ("arm/km: fix u-boot.kwb build breakage"), over two years ago. It's not clear whether the removal was intentional.
What target did you use to test this?
-Scott

Hi Scott,
On 11/14/2013 02:31 AM, Scott Wood wrote:
On Mon, Aug 26, 2013 at 02:10:56PM +0200, Phil Sutter wrote:
From: Nico Erfurth ne@erfurth.eu
The basic idea is taken from the linux-kernel, but further optimized.
First align the buffer to 8 bytes, then use ldrd/strd to read and store in 8 byte quantities, then do the final bytes.
Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this patch in place, reading the same amount of data was done in 27s (~4.89MB/s). So read performance is increased by ~80%!
Signed-off-by: Nico Erfurth ne@erfurth.eu Tested-by: Phil Sutter phil.sutter@viprinet.com Cc: Prafulla Wadaskar prafulla@marvell.com
Changed since V3:
- fixed author
drivers/mtd/nand/kirkwood_nand.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
I tried to build-test this, and I couldn't find any board that defines CONFIG_NAND_KIRKWOOD.
it's not in board specific code defined it's defined in a common kirkwood header:
arch/arm/include/asm/arch-kirkwood/config.h:58:#define CONFIG_NAND_KIRKWOOD
The patch that removed it was commit b5befd8211b54ae2d2fca3fbed061c879951ceaa ("arm/km: fix u-boot.kwb build breakage"), over two years ago. It's not clear whether the removal was intentional.
yes it was. We include this common header and therefore we don't need to redefine it in our board setup.
Regards Holger

On Thu, 2013-11-14 at 09:18 +0100, Holger Brunck wrote:
Hi Scott,
On 11/14/2013 02:31 AM, Scott Wood wrote:
On Mon, Aug 26, 2013 at 02:10:56PM +0200, Phil Sutter wrote:
From: Nico Erfurth ne@erfurth.eu
The basic idea is taken from the linux-kernel, but further optimized.
First align the buffer to 8 bytes, then use ldrd/strd to read and store in 8 byte quantities, then do the final bytes.
Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this patch in place, reading the same amount of data was done in 27s (~4.89MB/s). So read performance is increased by ~80%!
Signed-off-by: Nico Erfurth ne@erfurth.eu Tested-by: Phil Sutter phil.sutter@viprinet.com Cc: Prafulla Wadaskar prafulla@marvell.com
Changed since V3:
- fixed author
drivers/mtd/nand/kirkwood_nand.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
I tried to build-test this, and I couldn't find any board that defines CONFIG_NAND_KIRKWOOD.
it's not in board specific code defined it's defined in a common kirkwood header:
arch/arm/include/asm/arch-kirkwood/config.h:58:#define CONFIG_NAND_KIRKWOOD
Oops. I thought I grepped the whole tree rather than just include/, but apparently not.
-Scott

Scott,
On Wed, Nov 13, 2013 at 07:31:02PM -0600, Scott Wood wrote:
On Mon, Aug 26, 2013 at 02:10:56PM +0200, Phil Sutter wrote:
From: Nico Erfurth ne@erfurth.eu
The basic idea is taken from the linux-kernel, but further optimized.
First align the buffer to 8 bytes, then use ldrd/strd to read and store in 8 byte quantities, then do the final bytes.
Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this patch in place, reading the same amount of data was done in 27s (~4.89MB/s). So read performance is increased by ~80%!
Signed-off-by: Nico Erfurth ne@erfurth.eu Tested-by: Phil Sutter phil.sutter@viprinet.com Cc: Prafulla Wadaskar prafulla@marvell.com
Changed since V3:
- fixed author
drivers/mtd/nand/kirkwood_nand.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
I tried to build-test this, and I couldn't find any board that defines CONFIG_NAND_KIRKWOOD.
The patch that removed it was commit b5befd8211b54ae2d2fca3fbed061c879951ceaa ("arm/km: fix u-boot.kwb build breakage"), over two years ago. It's not clear whether the removal was intentional.
What target did you use to test this?
I tested using a custom board with Marvell Kirkwood SoC, but e.g. the Marvell OpenRD Ultimate should be fine.
Best wishes,
Phil Sutter Software Engineer
participants (3)
-
Holger Brunck
-
Phil Sutter
-
Scott Wood