[U-Boot] [PATCH] net: fec: Avoid MX28 bus sync issue

older
[U-Boot] [PATCH v4 0/8] ARMv7: Add...

Marek Vasut

12 Jul 2013 12 Jul '13

1:03 a.m.

The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de --- drivers/net/fec_mxc.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index 97bf8fe..ec5b9db 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -737,6 +737,28 @@ static int fec_send(struct eth_device *dev, void *packet, int length) flush_dcache_range(addr, addr + size);

/* + * Below we read the DMA descriptor's last four bytes back from the + * DRAM. This is important in order to make sure that all WRITE + * operations on the bus that were triggered by previous cache FLUSH + * have completed. + * + * Otherwise, on MX28, it is possible to observe a corruption of the + * DMA descriptors. Please refer to schematic "Figure 1-2" in MX28RM + * for the bus structure of MX28. The scenario is as follows: + * + * 1) ARM core triggers a series of WRITEs on the AHB_ARB2 bus going + * to DRAM due to flush_dcache_range() + * 2) ARM core writes the FEC registers via AHB_ARB2 + * 3) FEC DMA starts reading/writing from/to DRAM via AHB_ARB3 + * + * Note that 2) does sometimes finish before 1) due to reordering of + * WRITE accesses on the AHB bus, therefore triggering 3) before the + * DMA descriptor is fully written into DRAM. This results in occasional + * corruption of the DMA descriptor. + */ + readl(addr + size - 4); + + /* * Enable SmartDMA transmit task */ fec_tx_task_enable(fec);

-- 1.7.10.4

Show replies by date

Fabio Estevam

12 Jul 12 Jul

1:18 a.m.

On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...

The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com

Alexandre Pereira da Silva

5:41 a.m.

On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam festevam@gmail.com wrote:

...

On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com

It's working here too.

Tested-by: Alexandre Pereira da Silva aletes.xgr@gmail.com

Marek Vasut

5:51 a.m.

Hi,

...

On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam festevam@gmail.com wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com

It's working here too.

Tested-by: Alexandre Pereira da Silva aletes.xgr@gmail.com

Nice to hear, thank Albert for finding this.

Best regards, Marek Vasut

Hector Palacios

1:37 p.m.

Dear Marek,

On 07/12/2013 05:51 AM, Marek Vasut wrote:

...

Hi,

...
On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam festevam@gmail.com wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com

It's working here too.

Tested-by: Alexandre Pereira da Silva aletes.xgr@gmail.com

Nice to hear, thank Albert for finding this.

Thanks for sharing.

Unfortunately I'm still seeing non-recoverable timeouts when doing tftp transfers. Nevertheless, with this patch sometimes I'm able to transfer big files (100MiB) without problems (I was never able before). So this is a big improvement. I applied this patch over a v2013.01, does it need any additional patch? I think I saw one email about flush dcache...

Considering the other guys seem to work without problems I guess this scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz. I always suspect from the PHY. I'm disconcerted because usually the timeouts occur after having successfully downloaded 22 or 24 MiB. Other times it just downloads completely.

In any case, good job!

Best regards, -- Hector Palacios

Fabio Estevam

1:39 p.m.

Hi Hector,

On Fri, Jul 12, 2013 at 8:37 AM, Hector Palacios hector.palacios@digi.com wrote:

...

Thanks for sharing.

Unfortunately I'm still seeing non-recoverable timeouts when doing tftp transfers. Nevertheless, with this patch sometimes I'm able to transfer big files (100MiB) without problems (I was never able before). So this is a big improvement. I applied this patch over a v2013.01, does it need any additional patch? I think I saw one email about flush dcache...

Please try Stefano's tree instead: http://git.denx.de/?p=u-boot/u-boot-imx.git;a=shortlog;h=refs/heads/master

Regards,

Fabio Estevam

Marek Vasut

2:01 p.m.

Hi Hector,

...

Dear Marek,

On 07/12/2013 05:51 AM, Marek Vasut wrote:

...
Hi,

...
On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam festevam@gmail.com wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com

It's working here too.

Tested-by: Alexandre Pereira da Silva aletes.xgr@gmail.com

Nice to hear, thank Albert for finding this.

Thanks for sharing.

Unfortunately I'm still seeing non-recoverable timeouts when doing tftp transfers. Nevertheless, with this patch sometimes I'm able to transfer big files (100MiB) without problems (I was never able before). So this is a big improvement. I applied this patch over a v2013.01, does it need any additional patch? I think I saw one email about flush dcache...

Try Stefano's tree as Fabio suggested. I think it's already pushed and includes the fixes.

...

Considering the other guys seem to work without problems I guess this scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz. I always suspect from the PHY.

You can try using the PHYLIB (CONFIG_PHYLIB and CONFIG_PHY_SMSC as in sc_sps_1.h ) . Also, can you check which of the two "ret = -EINVAL" is triggered in fec_send() ? You can add simple printf() alongside both of them.

...

I'm disconcerted because usually the timeouts occur after having successfully downloaded 22 or 24 MiB. Other times it just downloads completely.

In any case, good job!

Best regards,

Hector Palacios

Best regards, Marek Vasut

Hector Palacios

5:08 p.m.

Hi Marek,

On 07/12/2013 02:01 PM, Marek Vasut wrote:

...

Hi Hector,

...
Dear Marek,

On 07/12/2013 05:51 AM, Marek Vasut wrote:

...
Hi,

...
On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam festevam@gmail.com wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com

It's working here too.

Tested-by: Alexandre Pereira da Silva aletes.xgr@gmail.com

Nice to hear, thank Albert for finding this.

Thanks for sharing.

Unfortunately I'm still seeing non-recoverable timeouts when doing tftp transfers. Nevertheless, with this patch sometimes I'm able to transfer big files (100MiB) without problems (I was never able before). So this is a big improvement. I applied this patch over a v2013.01, does it need any additional patch? I think I saw one email about flush dcache...

Try Stefano's tree as Fabio suggested. I think it's already pushed and includes the fixes.

I just tried, but it didn't help.

...

...
Considering the other guys seem to work without problems I guess this scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz. I always suspect from the PHY.

You can try using the PHYLIB (CONFIG_PHYLIB and CONFIG_PHY_SMSC as in sc_sps_1.h ) . Also, can you check which of the two "ret = -EINVAL" is triggered in fec_send() ? You can add simple printf() alongside both of them.

fec_send() *does not* ever fail, but I found something: It is very strange that the timeouts appear always after transferring between 20 and 24 MiB. So I thought maybe it was not an issue with the size of the file or the number of packets received, but instead a timed issue (an issue that happens after some period of time). I checked, and in fact the timeouts occur exactly 10 seconds after running the tftp command. I verified that this is what is happening by adding a udelay(100000) at fec_send(). In this case, the timeout also occurs after 10 seconds, but due to the delay, I have transferred only a few Kbytes. I tried to change different timeout related constants at tftp.c but still the issue happens after 10s. It's like if, after these 10 seconds, the PHY lost the link or something. Really odd. Does it tell you anything?

Best regards, -- Hector Palacios

Albert ARIBAUD

5:50 p.m.

Hi Hector,

On Fri, 12 Jul 2013 17:08:59 +0200, Hector Palacios hector.palacios@digi.com wrote:

...

Hi Marek,

On 07/12/2013 02:01 PM, Marek Vasut wrote:

...
Hi Hector,

...
Dear Marek,

On 07/12/2013 05:51 AM, Marek Vasut wrote:

...
Hi,

...
On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam festevam@gmail.com wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote: > The MX28 multi-layer AHB bus can be too slow and trigger the > FEC DMA too early, before all the data hit the DRAM. This patch > ensures the data are written in the RAM before the DMA starts. > Please see the comment in the patch for full details. > > This patch was produced with an amazing help from Albert Aribaud, > who pointed out it can possibly be such a bus synchronisation > issue. > > Signed-off-by: Marek Vasut marex@denx.de > Cc: Albert ARIBAUD albert.u.boot@aribaud.net > Cc: Fabio Estevam fabio.estevam@freescale.com > Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com

It's working here too.

Tested-by: Alexandre Pereira da Silva aletes.xgr@gmail.com

Nice to hear, thank Albert for finding this.

Thanks for sharing.

Unfortunately I'm still seeing non-recoverable timeouts when doing tftp transfers. Nevertheless, with this patch sometimes I'm able to transfer big files (100MiB) without problems (I was never able before). So this is a big improvement. I applied this patch over a v2013.01, does it need any additional patch? I think I saw one email about flush dcache...

Try Stefano's tree as Fabio suggested. I think it's already pushed and includes the fixes.

I just tried, but it didn't help.

...
...
Considering the other guys seem to work without problems I guess this scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz. I always suspect from the PHY.

You can try using the PHYLIB (CONFIG_PHYLIB and CONFIG_PHY_SMSC as in sc_sps_1.h ) . Also, can you check which of the two "ret = -EINVAL" is triggered in fec_send() ? You can add simple printf() alongside both of them.

fec_send() *does not* ever fail, but I found something: It is very strange that the timeouts appear always after transferring between 20 and 24 MiB. So I thought maybe it was not an issue with the size of the file or the number of packets received, but instead a timed issue (an issue that happens after some period of time). I checked, and in fact the timeouts occur exactly 10 seconds after running the tftp command. I verified that this is what is happening by adding a udelay(100000) at fec_send(). In this case, the timeout also occurs after 10 seconds, but due to the delay, I have transferred only a few Kbytes. I tried to change different timeout related constants at tftp.c but still the issue happens after 10s. It's like if, after these 10 seconds, the PHY lost the link or something. Really odd. Does it tell you anything?

Well, such a round number makes me think of an application-layer time-out. Do you have control over how your TFTP server is configured?

...

Best regards,

Hector Palacios

Amicalement,

-- Albert.

Marek Vasut

6:48 p.m.

Hi Hector,

[...]

...

...
Try Stefano's tree as Fabio suggested. I think it's already pushed and includes the fixes.

I just tried, but it didn't help.

OK, then it's something else.

...

...
...
Considering the other guys seem to work without problems I guess this scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz. I always suspect from the PHY.

You can try using the PHYLIB (CONFIG_PHYLIB and CONFIG_PHY_SMSC as in sc_sps_1.h ) . Also, can you check which of the two "ret = -EINVAL" is triggered in fec_send() ? You can add simple printf() alongside both of them.

fec_send() *does not* ever fail

OK, it might be something else entirely. Let's take a look ...

...

but I found something: It is very strange that the timeouts appear always after transferring between 20 and 24 MiB. So I thought maybe it was not an issue with the size of the file or the number of packets received, but instead a timed issue (an issue that happens after some period of time). I checked, and in fact the timeouts occur exactly 10 seconds after running the tftp command. I verified that this is what is happening by adding a udelay(100000) at fec_send(). In this case, the timeout also occurs after 10 seconds, but due to the delay, I have transferred only a few Kbytes.

Holy moly!

...

I tried to change different timeout related constants at tftp.c but still the issue happens after 10s. It's like if, after these 10 seconds, the PHY lost the link or something. Really odd. Does it tell you anything?

LAN8720 phy, right? Try implementing something like [1], by clearing the EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a simple PHY register RMW which you can stick somewhere into the PHY net/phy/smsc.c code.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine/+/b6...

Best regards, Marek Vasut

Hector Palacios

15 Jul 15 Jul

10:58 a.m.

Hi Marek,

On 07/12/2013 06:48 PM, Marek Vasut wrote:

...

...
[...]

but I found something: It is very strange that the timeouts appear always after transferring between 20 and 24 MiB. So I thought maybe it was not an issue with the size of the file or the number of packets received, but instead a timed issue (an issue that happens after some period of time). I checked, and in fact the timeouts occur exactly 10 seconds after running the tftp command. I verified that this is what is happening by adding a udelay(100000) at fec_send(). In this case, the timeout also occurs after 10 seconds, but due to the delay, I have transferred only a few Kbytes.

Holy moly!

...
I tried to change different timeout related constants at tftp.c but still the issue happens after 10s. It's like if, after these 10 seconds, the PHY lost the link or something. Really odd. Does it tell you anything?

LAN8720 phy, right? Try implementing something like [1], by clearing the EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a simple PHY register RMW which you can stick somewhere into the PHY net/phy/smsc.c code.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine/+/b6...

No, my PHY is a Micrel KSZ8031RNLI.

The hint about the PHY possibly going to power down mode is interesting but I checked the PHY registers and EDPD mode (Energy Detect Power Down) is off, at least before running the tftp command. Power Down mode is off too, so unless these are somehow enabled during the TFTP process, this is not what's happening.

The sniffer shows that the TFTP server simply stops sending data packets. I can see however the target sending several times the ACK packet to the last received data packet. This would point to the TFTP server (as Albert suggested), but the fact is the problem occurs with different TFTP servers (I tried three different servers) and it does not happen with an old v2009 U-Boot using the same target.

Many times, though not always, after the timeout occurs, I cancel with CTRL+C and run the tftp command again, and then the file downloads completely. The PHY is my usual suspect...

Best regards, -- Hector Palacios

Marek Vasut

2:30 p.m.

Dear Hector Palacios,

...

Hi Marek,

On 07/12/2013 06:48 PM, Marek Vasut wrote:

...
...
[...]

but I found something: It is very strange that the timeouts appear always after transferring between 20 and 24 MiB. So I thought maybe it was not an issue with the size of the file or the number of packets received, but instead a timed issue (an issue that happens after some period of time). I checked, and in fact the timeouts occur exactly 10 seconds after running the tftp command. I verified that this is what is happening by adding a udelay(100000) at fec_send(). In this case, the timeout also occurs after 10 seconds, but due to the delay, I have transferred only a few Kbytes.

Holy moly!

...
I tried to change different timeout related constants at tftp.c but still the issue happens after 10s. It's like if, after these 10 seconds, the PHY lost the link or something. Really odd. Does it tell you anything?

LAN8720 phy, right? Try implementing something like [1], by clearing the EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a simple PHY register RMW which you can stick somewhere into the PHY net/phy/smsc.c code.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine/+ /b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0

No, my PHY is a Micrel KSZ8031RNLI.

The hint about the PHY possibly going to power down mode is interesting but I checked the PHY registers and EDPD mode (Energy Detect Power Down) is off, at least before running the tftp command. Power Down mode is off too, so unless these are somehow enabled during the TFTP process, this is not what's happening.

OK, makes sense.

...

The sniffer shows that the TFTP server simply stops sending data packets. I can see however the target sending several times the ACK packet to the last received data packet. This would point to the TFTP server (as Albert suggested), but the fact is the problem occurs with different TFTP servers (I tried three different servers) and it does not happen with an old v2009 U-Boot using the same target.

Can you try running "dcache off" command before running the TFTP transfer? Does it still behave like this?

You might need to define #define CONFIG_CMD_CACHE for this to work.

...

Many times, though not always, after the timeout occurs, I cancel with CTRL+C and run the tftp command again, and then the file downloads completely. The PHY is my usual suspect...

Dang :-(

Best regards, Marek Vasut

Hector Palacios

5:09 p.m.

Hi Marek,

On 07/15/2013 02:30 PM, Marek Vasut wrote:

...

Dear Hector Palacios,

...
Hi Marek,

On 07/12/2013 06:48 PM, Marek Vasut wrote:

...
...
[...]

but I found something: It is very strange that the timeouts appear always after transferring between 20 and 24 MiB. So I thought maybe it was not an issue with the size of the file or the number of packets received, but instead a timed issue (an issue that happens after some period of time). I checked, and in fact the timeouts occur exactly 10 seconds after running the tftp command. I verified that this is what is happening by adding a udelay(100000) at fec_send(). In this case, the timeout also occurs after 10 seconds, but due to the delay, I have transferred only a few Kbytes.

Holy moly!

...
I tried to change different timeout related constants at tftp.c but still the issue happens after 10s. It's like if, after these 10 seconds, the PHY lost the link or something. Really odd. Does it tell you anything?

LAN8720 phy, right? Try implementing something like [1], by clearing the EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a simple PHY register RMW which you can stick somewhere into the PHY net/phy/smsc.c code.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine/+ /b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0

No, my PHY is a Micrel KSZ8031RNLI.

The hint about the PHY possibly going to power down mode is interesting but I checked the PHY registers and EDPD mode (Energy Detect Power Down) is off, at least before running the tftp command. Power Down mode is off too, so unless these are somehow enabled during the TFTP process, this is not what's happening.

OK, makes sense.

...
The sniffer shows that the TFTP server simply stops sending data packets. I can see however the target sending several times the ACK packet to the last received data packet. This would point to the TFTP server (as Albert suggested), but the fact is the problem occurs with different TFTP servers (I tried three different servers) and it does not happen with an old v2009 U-Boot using the same target.

Can you try running "dcache off" command before running the TFTP transfer? Does it still behave like this?

You might need to define #define CONFIG_CMD_CACHE for this to work.

Sourcery! It's not that it works with dcache off, I found something even more strange: The way I reproduce this issue is by setting the 'bootcmd' to 'dboot ${loadaddr} file100M'. When you set the 'bootcmd' like this:

setenv bootcmd tftp ${loadaddr} file100M

this eventually expands to

bootcmd=tftp 0x42000000 file100M

So this is the command that runs automatically after the bootdelay. I just discovered that if instead of letting the auto boot run, I press a key to stop the auto boot and run the command by hand (either running 'boot' or typing the command 'tftp 0x42000000 file100M'), the tftp transfer works perfectly. On the other hand, if I do the same but use the environment variable ${loadaddr}, i.e. 'tftp ${loadaddr} file100M'. It will stop after 10 seconds.

And to make things funnier I just reproduced this issue on the MX28EVK using the imx U-Boot custodian tree at commit a3f170cdbf7ae0bd24c94c2f46725699bbd69f05. That discards being a platform issue. @Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK? If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables). As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Best regards, -- Hector Palacios

Marek Vasut

5:12 p.m.

Hi Hector,

...

Hi Marek,

On 07/15/2013 02:30 PM, Marek Vasut wrote:

...
Dear Hector Palacios,

...
Hi Marek,

On 07/12/2013 06:48 PM, Marek Vasut wrote:

...
...
[...]

but I found something: It is very strange that the timeouts appear always after transferring between 20 and 24 MiB. So I thought maybe it was not an issue with the size of the file or the number of packets received, but instead a timed issue (an issue that happens after some period of time). I checked, and in fact the timeouts occur exactly 10 seconds after running the tftp command. I verified that this is what is happening by adding a udelay(100000) at fec_send(). In this case, the timeout also occurs after 10 seconds, but due to the delay, I have transferred only a few Kbytes.

Holy moly!

...
I tried to change different timeout related constants at tftp.c but still the issue happens after 10s. It's like if, after these 10 seconds, the PHY lost the link or something. Really odd. Does it tell you anything?

LAN8720 phy, right? Try implementing something like [1], by clearing the EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a simple PHY register RMW which you can stick somewhere into the PHY net/phy/smsc.c code.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine /+ /b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0

No, my PHY is a Micrel KSZ8031RNLI.

The hint about the PHY possibly going to power down mode is interesting but I checked the PHY registers and EDPD mode (Energy Detect Power Down) is off, at least before running the tftp command. Power Down mode is off too, so unless these are somehow enabled during the TFTP process, this is not what's happening.

OK, makes sense.

...
The sniffer shows that the TFTP server simply stops sending data packets. I can see however the target sending several times the ACK packet to the last received data packet. This would point to the TFTP server (as Albert suggested), but the fact is the problem occurs with different TFTP servers (I tried three different servers) and it does not happen with an old v2009 U-Boot using the same target.

Can you try running "dcache off" command before running the TFTP transfer? Does it still behave like this?

You might need to define #define CONFIG_CMD_CACHE for this to work.

Sourcery! It's not that it works with dcache off, I found something even more strange: The way I reproduce this issue is by setting the 'bootcmd' to 'dboot ${loadaddr} file100M'. When you set the 'bootcmd' like this:

setenv bootcmd tftp ${loadaddr} file100M

this eventually expands to

bootcmd=tftp 0x42000000 file100M

So this is the command that runs automatically after the bootdelay. I just discovered that if instead of letting the auto boot run, I press a key to stop the auto boot and run the command by hand (either running 'boot' or typing the command 'tftp 0x42000000 file100M'), the tftp transfer works perfectly. On the other hand, if I do the same but use the environment variable ${loadaddr}, i.e. 'tftp ${loadaddr} file100M'. It will stop after 10 seconds.

Count it be your hardware needs some more delay to stabilize?

...

And to make things funnier I just reproduced this issue on the MX28EVK using the imx U-Boot custodian tree at commit a3f170cdbf7ae0bd24c94c2f46725699bbd69f05. That discards being a platform issue.

It still might be a PHY issue, no?

...

@Fabio: could you manually run the command 'tftp ${loadaddr} file100M'

I guess that'd be a 100MB file?

...

in your EVK? If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables). As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Best regards,

Hector Palacios

Best regards, Marek Vasut

Hector Palacios

5:24 p.m.

On 07/15/2013 05:12 PM, Marek Vasut wrote:

...

Hi Hector,

...
Hi Marek,

On 07/15/2013 02:30 PM, Marek Vasut wrote:

...
Dear Hector Palacios,

...
Hi Marek,

On 07/12/2013 06:48 PM, Marek Vasut wrote:

...
...
[...]

but I found something: It is very strange that the timeouts appear always after transferring between 20 and 24 MiB. So I thought maybe it was not an issue with the size of the file or the number of packets received, but instead a timed issue (an issue that happens after some period of time). I checked, and in fact the timeouts occur exactly 10 seconds after running the tftp command. I verified that this is what is happening by adding a udelay(100000) at fec_send(). In this case, the timeout also occurs after 10 seconds, but due to the delay, I have transferred only a few Kbytes.

Holy moly!

...
I tried to change different timeout related constants at tftp.c but still the issue happens after 10s. It's like if, after these 10 seconds, the PHY lost the link or something. Really odd. Does it tell you anything?

LAN8720 phy, right? Try implementing something like [1], by clearing the EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a simple PHY register RMW which you can stick somewhere into the PHY net/phy/smsc.c code.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine /+ /b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0

No, my PHY is a Micrel KSZ8031RNLI.

The hint about the PHY possibly going to power down mode is interesting but I checked the PHY registers and EDPD mode (Energy Detect Power Down) is off, at least before running the tftp command. Power Down mode is off too, so unless these are somehow enabled during the TFTP process, this is not what's happening.

OK, makes sense.

...
The sniffer shows that the TFTP server simply stops sending data packets. I can see however the target sending several times the ACK packet to the last received data packet. This would point to the TFTP server (as Albert suggested), but the fact is the problem occurs with different TFTP servers (I tried three different servers) and it does not happen with an old v2009 U-Boot using the same target.

Can you try running "dcache off" command before running the TFTP transfer? Does it still behave like this?

You might need to define #define CONFIG_CMD_CACHE for this to work.

Sourcery! It's not that it works with dcache off, I found something even more strange: The way I reproduce this issue is by setting the 'bootcmd' to 'dboot ${loadaddr} file100M'. When you set the 'bootcmd' like this:

setenv bootcmd tftp ${loadaddr} file100M

this eventually expands to

bootcmd=tftp 0x42000000 file100M

So this is the command that runs automatically after the bootdelay. I just discovered that if instead of letting the auto boot run, I press a key to stop the auto boot and run the command by hand (either running 'boot' or typing the command 'tftp 0x42000000 file100M'), the tftp transfer works perfectly. On the other hand, if I do the same but use the environment variable ${loadaddr}, i.e. 'tftp ${loadaddr} file100M'. It will stop after 10 seconds.

Count it be your hardware needs some more delay to stabilize?

It's not a problem of running the command later. If I run the command later, manually, but use ${loadaddr} instead of the hardcoded value 0x42000000, then it will fail. It's like if accessing the environment for grabbing the value of ${loadaddr} or for grabbing the value of ${bootcmd}, made the problem appear, while if I run a manual command that does not require accesing the environment, it works.

...

...
And to make things funnier I just reproduced this issue on the MX28EVK using the imx U-Boot custodian tree at commit a3f170cdbf7ae0bd24c94c2f46725699bbd69f05. That discards being a platform issue.

It still might be a PHY issue, no?

I guess it might, but the MX28EVK uses a different PHY. An SMSC, I believe.

...

...
@Fabio: could you manually run the command 'tftp ${loadaddr} file100M'

I guess that'd be a 100MB file?

Yes. The size is not that important as far as it takes more than 10 seconds to download. The keypoint here is using the environment variable, rather than an hex value.

Best regards, -- Hector Palacios

Fabio Estevam

16 Jul 16 Jul

5:51 a.m.

On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios hector.palacios@digi.com wrote:

...

@Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK?

Yes, this is what I have been running since the beginning.

...

If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables).

I can't reproduce the problem here.

...

As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Can you transfer from a PC to another PC via TFTP?

Fabio Estevam

6:18 a.m.

On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam festevam@gmail.com wrote:

...

On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios hector.palacios@digi.com wrote:

...
@Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK?

Yes, this is what I have been running since the beginning.

...
If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables).

I can't reproduce the problem here.

...
As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Can you transfer from a PC to another PC via TFTP?

This is my tftp configuration for your reference:

$ cat /etc/xinetd.d/tftp service tftp { protocol = udp port = 69 socket_type = dgram wait = yes user = nobody server = /usr/sbin/in.tftpd server_args = /tftpboot disable = no }

Marek Vasut

6:44 a.m.

Dear Fabio Estevam,

...

On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam festevam@gmail.com wrote:

...
On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios

hector.palacios@digi.com wrote:

...
@Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK?

Yes, this is what I have been running since the beginning.

...
If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables).

I can't reproduce the problem here.

...
As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Can you transfer from a PC to another PC via TFTP?

This is my tftp configuration for your reference:

$ cat /etc/xinetd.d/tftp service tftp { protocol = udp port = 69 socket_type = dgram wait = yes user = nobody server = /usr/sbin/in.tftpd server_args = /tftpboot disable = no }

Another thing of interest would be a 'tcpdump' pcap capture of that connection.

Best regards, Marek Vasut

Hector Palacios

17 Jul 17 Jul

5:55 p.m.

Dear Marek,

On 07/16/2013 06:44 AM, Marek Vasut wrote:

...

Dear Fabio Estevam,

...
On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam festevam@gmail.com wrote:

...
On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios

hector.palacios@digi.com wrote:

...
@Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK?

Yes, this is what I have been running since the beginning.

...
If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables).

I can't reproduce the problem here.

...
As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Can you transfer from a PC to another PC via TFTP?

Yes I can.

...

Another thing of interest would be a 'tcpdump' pcap capture of that connection.

I was initially filtering out only TFTP packets of my wireshark trace and all looked correct. After taking a second look to the full trace I see now a hint. Around 7 seconds after starting the TFTP transfer the server is sending an ARP to the target asking for the owner of the target's IP. The target is receiving this ARP and apparently responding (at least this is what my debug code shows as it gets into arp.c:ArpReceive(), case ARPOP_REQUEST and sending a packet), but this ARP reply from the target is not reaching the network. My sniffer does not capture this reply.

The server resends the ARP request twice more (seconds 8 and 9) to the target and since it doesn't get a reply then sends a broadcast ARP (seconds 10) asking who has that IP. Since nobody responds it stops sending data.

The times that it works (and I don't know the magic behind using a numeric address versus using ${loadaddr} when they have the same value), the ARP replies do reach the network and the server continues the transmission normally.

Using a v2009 U-Boot, the behaviour is exactly the same, but the target's ARP replies always reach the network, and the transfers always succeed.

Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

Best regards, -- Hector Palacios

Marek Vasut

18 Jul 18 Jul

6:12 a.m.

Dear Hector Palacios,

...

Dear Marek,

On 07/16/2013 06:44 AM, Marek Vasut wrote:

...
Dear Fabio Estevam,

...
On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam festevam@gmail.com wrote:

...
On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios

hector.palacios@digi.com wrote:

...
@Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK?

Yes, this is what I have been running since the beginning.

...
If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables).

I can't reproduce the problem here.

...
As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Can you transfer from a PC to another PC via TFTP?

Yes I can.

...
Another thing of interest would be a 'tcpdump' pcap capture of that connection.

I was initially filtering out only TFTP packets of my wireshark trace and all looked correct. After taking a second look to the full trace I see now a hint. Around 7 seconds after starting the TFTP transfer the server is sending an ARP to the target asking for the owner of the target's IP. The target is receiving this ARP and apparently responding (at least this is what my debug code shows as it gets into arp.c:ArpReceive(), case ARPOP_REQUEST and sending a packet), but this ARP reply from the target is not reaching the network. My sniffer does not capture this reply.

The server resends the ARP request twice more (seconds 8 and 9) to the target and since it doesn't get a reply then sends a broadcast ARP (seconds 10) asking who has that IP. Since nobody responds it stops sending data.

The times that it works (and I don't know the magic behind using a numeric address versus using ${loadaddr} when they have the same value), the ARP replies do reach the network and the server continues the transmission normally.

Using a v2009 U-Boot, the behaviour is exactly the same, but the target's ARP replies always reach the network, and the transfers always succeed.

Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

Try a 1:1 connection with a direct cable. Also, try multiple cables ;-)

Best regards, Marek Vasut

Hector Palacios

12 Sep 12 Sep

12:22 p.m.

Hello,

Going back to this old thread I have some news regarding the problem with TFTP transmissions blocking (timed out) after 10 seconds on the FEC of the MX28. See below:

On 07/17/2013 05:55 PM, Hector Palacios wrote:

...

Dear Marek,

On 07/16/2013 06:44 AM, Marek Vasut wrote:

...
Dear Fabio Estevam,

...
On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam festevam@gmail.com wrote:

...
On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios

hector.palacios@digi.com wrote:

...
@Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK?

Yes, this is what I have been running since the beginning.

...
If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables).

I can't reproduce the problem here.

...
As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Can you transfer from a PC to another PC via TFTP?

Yes I can.

...
Another thing of interest would be a 'tcpdump' pcap capture of that connection.

I was initially filtering out only TFTP packets of my wireshark trace and all looked correct. After taking a second look to the full trace I see now a hint. Around 7 seconds after starting the TFTP transfer the server is sending an ARP to the target asking for the owner of the target's IP. The target is receiving this ARP and apparently responding (at least this is what my debug code shows as it gets into arp.c:ArpReceive(), case ARPOP_REQUEST and sending a packet), but this ARP reply from the target is not reaching the network. My sniffer does not capture this reply.

The server resends the ARP request twice more (seconds 8 and 9) to the target and since it doesn't get a reply then sends a broadcast ARP (seconds 10) asking who has that IP. Since nobody responds it stops sending data.

The times that it works (and I don't know the magic behind using a numeric address versus using ${loadaddr} when they have the same value), the ARP replies do reach the network and the server continues the transmission normally.

Using a v2009 U-Boot, the behaviour is exactly the same, but the target's ARP replies always reach the network, and the transfers always succeed.

Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

We tracked down the issue to an ARP request from the server that was never answered by the target. We later noticed that the problem did not happen anymore when building U-Boot with a different toolchain and that the issue seemed to be in the alignment of the RX buffer in the stack, which old GCC compilers seem to do wrong.

Here is a patch:

From: Robert Hodaszi robert.hodaszi@digi.com Date: Fri, 6 Sep 2013 09:50:52 +0200 Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because of GCC bug

Older GCC versions don't handle well alignment on stack variables. The temporary RX buffer is a local variable, so it is on the stack. Because the FEC driver is using DMA for transmission, receive and transmit buffers should be aligned on 64 byte. The transmit buffers are not allocated in the driver internally, it sends the packets directly as it gets them. So these packets should be aligned. When the ARP layer wants to reply to an ARP request, it uses the FEC driver's temporary RX buffer (used to pass data to the ARP layer) to store the new packet, and pass it back to the FEC driver's send function. Because of a GCC bug, this buffer is not aligned well, and when the driver tries to send it, it first rounds the address down to the alignment boundary. That causes invalid data.

To fix it, don't put the temporary onto the stack.

Signed-off-by: Robert Hodaszi robert.hodaszi@digi.com Signed-off-by: Hector Palacios hector.palacios@digi.com --- drivers/net/fec_mxc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index f4f72b7..315017e 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev) uint16_t bd_status; uint32_t addr, size, end; int i; - uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN); + /* Don't place this variable on the stack, because older GCC version + * doesn't handle alignement on stack well. + */ + static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);

/* * Check if any critical events have happened

Marek Vasut

12:50 p.m.

Dear Hector Palacios,

...

Hello,

Going back to this old thread I have some news regarding the problem with TFTP transmissions blocking (timed out) after 10 seconds on the FEC of the MX28. See below:

On 07/17/2013 05:55 PM, Hector Palacios wrote:

...
Dear Marek,

On 07/16/2013 06:44 AM, Marek Vasut wrote:

...
Dear Fabio Estevam,

...
On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam festevam@gmail.com

wrote:

...

...
...
...
...
On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios

hector.palacios@digi.com wrote:

...
@Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK?

Yes, this is what I have been running since the beginning.

...
If it doesn't fail, could you try running it again after playing with the environment (setting/printing some variables).

I can't reproduce the problem here.

...
As I said, this issue appeared with different TFTP servers and is independent of whether the dcache is or not enabled.

Can you transfer from a PC to another PC via TFTP?

Yes I can.

...
Another thing of interest would be a 'tcpdump' pcap capture of that connection.

I was initially filtering out only TFTP packets of my wireshark trace and all looked correct. After taking a second look to the full trace I see now a hint. Around 7 seconds after starting the TFTP transfer the server is sending an ARP to the target asking for the owner of the target's IP. The target is receiving this ARP and apparently responding (at least this is what my debug code shows as it gets into arp.c:ArpReceive(), case ARPOP_REQUEST and sending a packet), but this ARP reply from the target is not reaching the network. My sniffer does not capture this reply.

The server resends the ARP request twice more (seconds 8 and 9) to the target and since it doesn't get a reply then sends a broadcast ARP (seconds 10) asking who has that IP. Since nobody responds it stops sending data.

The times that it works (and I don't know the magic behind using a numeric address versus using ${loadaddr} when they have the same value), the ARP replies do reach the network and the server continues the transmission normally.

Using a v2009 U-Boot, the behaviour is exactly the same, but the target's ARP replies always reach the network, and the transfers always succeed.

Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

We tracked down the issue to an ARP request from the server that was never answered by the target. We later noticed that the problem did not happen anymore when building U-Boot with a different toolchain and that the issue seemed to be in the alignment of the RX buffer in the stack, which old GCC compilers seem to do wrong.

Here is a patch:

From: Robert Hodaszi robert.hodaszi@digi.com Date: Fri, 6 Sep 2013 09:50:52 +0200 Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because of GCC bug

Older GCC versions don't handle well alignment on stack variables. The temporary RX buffer is a local variable, so it is on the stack. Because the FEC driver is using DMA for transmission, receive and transmit buffers should be aligned on 64 byte. The transmit buffers are not allocated in the driver internally, it sends the packets directly as it gets them. So these packets should be aligned. When the ARP layer wants to reply to an ARP request, it uses the FEC driver's temporary RX buffer (used to pass data to the ARP layer) to store the new packet, and pass it back to the FEC driver's send function. Because of a GCC bug

Can you point to this GCC bug in the GCC bugzilla or something?

...

this buffer is not aligned well, and when the driver tries to send it, it first rounds the address down to the alignment boundary. That causes invalid data.

To fix it, don't put the temporary onto the stack.

Signed-off-by: Robert Hodaszi robert.hodaszi@digi.com Signed-off-by: Hector Palacios hector.palacios@digi.com

drivers/net/fec_mxc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index f4f72b7..315017e 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev) uint16_t bd_status; uint32_t addr, size, end; int i;
  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
  /* Don't place this variable on the stack, because older GCC
version + * doesn't handle alignement on stack well.
   */
  static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);

The buffer might as well be allocated using ALLOC_CACHE_ALIGN_BUFFER() from include/common.h . Still, are you _really_ sure the buffer is unaligned ? Do you have a testcase maybe ?

btw. I am able to replicate this issue sometimes even using GCC 4.8.0 .

Best regards, Marek Vasut

Robert Hodaszi

1:02 p.m.

Hi,

Sorry, hopefully that will be a plain-text.

There are a lot of bug announcement, just make a search: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33721 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

Also, I printed out the buffer addresses, and that temporary RX buffer was not aligned. So the transmit function rounded it down to the alignment boundary, and so caused invalid data transmission. (By the way. Shouldn't the transmit function check whether the alignment is proper, and throw an error message, instead of round it down? That would make more sense.)

Best regards, Robert Hodaszi

On 2013-09-12 12:50, Marek Vasut wrote:

...

Dear Hector Palacios,

...
Hello,

Going back to this old thread I have some news regarding the problem with TFTP transmissions blocking (timed out) after 10 seconds on the FEC of the MX28. See below:

On 07/17/2013 05:55 PM, Hector Palacios wrote:

...
Dear Marek,

On 07/16/2013 06:44 AM, Marek Vasut wrote:

...
Dear Fabio Estevam,

...
On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevamfestevam@gmail.com

wrote:

...
...
...
...
...
On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios

hector.palacios@digi.com wrote: > @Fabio: could you manually run the command 'tftp ${loadaddr} > file100M' in your EVK? Yes, this is what I have been running since the beginning.

> If it doesn't fail, could you try running it again after playing with > the environment (setting/printing some variables). I can't reproduce the problem here.

> As I said, this issue appeared with different TFTP servers and is > independent of whether the dcache is or not enabled. Can you transfer from a PC to another PC via TFTP?

Yes I can.

...
Another thing of interest would be a 'tcpdump' pcap capture of that connection.

I was initially filtering out only TFTP packets of my wireshark trace and all looked correct. After taking a second look to the full trace I see now a hint. Around 7 seconds after starting the TFTP transfer the server is sending an ARP to the target asking for the owner of the target's IP. The target is receiving this ARP and apparently responding (at least this is what my debug code shows as it gets into arp.c:ArpReceive(), case ARPOP_REQUEST and sending a packet), but this ARP reply from the target is not reaching the network. My sniffer does not capture this reply.

The server resends the ARP request twice more (seconds 8 and 9) to the target and since it doesn't get a reply then sends a broadcast ARP (seconds 10) asking who has that IP. Since nobody responds it stops sending data.

The times that it works (and I don't know the magic behind using a numeric address versus using ${loadaddr} when they have the same value), the ARP replies do reach the network and the server continues the transmission normally.

Using a v2009 U-Boot, the behaviour is exactly the same, but the target's ARP replies always reach the network, and the transfers always succeed.

Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

We tracked down the issue to an ARP request from the server that was never answered by the target. We later noticed that the problem did not happen anymore when building U-Boot with a different toolchain and that the issue seemed to be in the alignment of the RX buffer in the stack, which old GCC compilers seem to do wrong.

Here is a patch:

From: Robert Hodaszirobert.hodaszi@digi.com Date: Fri, 6 Sep 2013 09:50:52 +0200 Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because of GCC bug

Older GCC versions don't handle well alignment on stack variables. The temporary RX buffer is a local variable, so it is on the stack. Because the FEC driver is using DMA for transmission, receive and transmit buffers should be aligned on 64 byte. The transmit buffers are not allocated in the driver internally, it sends the packets directly as it gets them. So these packets should be aligned. When the ARP layer wants to reply to an ARP request, it uses the FEC driver's temporary RX buffer (used to pass data to the ARP layer) to store the new packet, and pass it back to the FEC driver's send function. Because of a GCC bug

Can you point to this GCC bug in the GCC bugzilla or something?

...
this buffer is not aligned well, and when the driver tries to send it, it first rounds the address down to the alignment boundary. That causes invalid data.

To fix it, don't put the temporary onto the stack.

Signed-off-by: Robert Hodaszirobert.hodaszi@digi.com Signed-off-by: Hector Palacioshector.palacios@digi.com

drivers/net/fec_mxc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index f4f72b7..315017e 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev) uint16_t bd_status; uint32_t addr, size, end; int i;
  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
  /* Don't place this variable on the stack, because older GCC
version + * doesn't handle alignement on stack well.
   */
  static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
The buffer might as well be allocated using ALLOC_CACHE_ALIGN_BUFFER() from include/common.h . Still, are you _really_ sure the buffer is unaligned ? Do you have a testcase maybe ?

btw. I am able to replicate this issue sometimes even using GCC 4.8.0 .

Best regards, Marek Vasut

Marek Vasut

4:05 p.m.

Dear Robert Hodaszi,

...

Hi,

Sorry, hopefully that will be a plain-text.

There are a lot of bug announcement, just make a search: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33721

This was apparently fixed three years ago.

...

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

And this one six years ago ...

...

Also, I printed out the buffer addresses, and that temporary RX buffer was not aligned. So the transmit function rounded it down to the alignment boundary, and so caused invalid data transmission. (By the way. Shouldn't the transmit function check whether the alignment is proper, and throw an error message, instead of round it down? That would make more sense.)

Looking at the code one more time, it'd make most sense to simply allocate the buffer NOT on stack, but with some memalign-kind-of call to avoid this abuse of stack. You see, the max packet size is around 2k, which is quite a lot. How does this proposal sound to you ?

Best regards, Marek Vasut

Robert Hodaszi

4:15 p.m.

I just brought up two samples, that it was a long-term problem. Now it is fixed. But if somebody is trying to compile the U-Boot with an older toolchain, it could cause problems. I had problems with 4.4.6.

Yes, 2k is a lot, and I would not put it on the stack. That was just a quick fix. It would be nicer to allocate the memory with malloc, and should do that at initialization time.

Best regards, Robert Hodaszi

On 2013-09-12 16:05, Marek Vasut wrote:

...

Dear Robert Hodaszi,

...
Hi,

Sorry, hopefully that will be a plain-text.

There are a lot of bug announcement, just make a search: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33721

This was apparently fixed three years ago.

...
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

And this one six years ago ...

...
Also, I printed out the buffer addresses, and that temporary RX buffer was not aligned. So the transmit function rounded it down to the alignment boundary, and so caused invalid data transmission. (By the way. Shouldn't the transmit function check whether the alignment is proper, and throw an error message, instead of round it down? That would make more sense.)

Looking at the code one more time, it'd make most sense to simply allocate the buffer NOT on stack, but with some memalign-kind-of call to avoid this abuse of stack. You see, the max packet size is around 2k, which is quite a lot. How does this proposal sound to you ?

Best regards, Marek Vasut

Marek Vasut

4:31 p.m.

Dear Robert Hodaszi,

...

I just brought up two samples, that it was a long-term problem. Now it is fixed. But if somebody is trying to compile the U-Boot with an older toolchain, it could cause problems. I had problems with 4.4.6.

Yes, 2k is a lot, and I would not put it on the stack. That was just a quick fix. It would be nicer to allocate the memory with malloc, and should do that at initialization time.

Please do not top-post.

Memalign should do here with proper rounding. Can you prepare another patch please?

Best regards, Marek Vasut

Robert Hodaszi

4:32 p.m.

On 2013-09-12 16:31, Marek Vasut wrote:

...

Dear Robert Hodaszi,

Please do not top-post.

Memalign should do here with proper rounding. Can you prepare another patch please?

Best regards, Marek Vasut

Ok. I will try to do that tomorrow.

Best regards, Robert Hodaszi

Marek Vasut

5:06 p.m.

Dear Robert Hodaszi,

...

On 2013-09-12 16:31, Marek Vasut wrote:

...
Dear Robert Hodaszi,

Please do not top-post.

Memalign should do here with proper rounding. Can you prepare another patch please?

Best regards, Marek Vasut

Ok. I will try to do that tomorrow.

Cool, thanks ! Nice find btw ;-)

Best regards, Marek Vasut

Wolfgang Denk

8:17 p.m.

Dear Marek Vasut,

In message 201309121605.04824.marex@denx.de you wrote:

...

Looking at the code one more time, it'd make most sense to simply allocate the buffer NOT on stack, but with some memalign-kind-of call to avoid this abuse of stack. You see, the max packet size is around 2k, which is quite a lot. How does this proposal sound to you ?

It makes perfect sense to allocate variable with function scope only on the stack. That's what the stack has been invented for.

If there is a bug with that, this mug must be isolated and fixed. It makes zero sense to just paper over it.

Best regards,

Wolfgang Denk

Fabio Estevam

8:39 p.m.

On Thu, Sep 12, 2013 at 3:17 PM, Wolfgang Denk wd@denx.de wrote:

...

Dear Marek Vasut,

In message 201309121605.04824.marex@denx.de you wrote:

...
Looking at the code one more time, it'd make most sense to simply allocate the buffer NOT on stack, but with some memalign-kind-of call to avoid this abuse of stack. You see, the max packet size is around 2k, which is quite a lot. How does this proposal sound to you ?

It makes perfect sense to allocate variable with function scope only on the stack. That's what the stack has been invented for.

This buffer in the fec driver will be used in DMA transfer, so maybe that's the reason we should use malloc instead of using the stack?

At least in the kernel, we don't use stack for DMA buffers.

Wolfgang Denk

8:53 p.m.

Dear Fabio Estevam,

In message CAOMZO5BY6kDOCoWn_SRwhPE7ssMjAReZ2bcFXGgFF-_Wrdgo0g@mail.gmail.com you wrote:

...

...
It makes perfect sense to allocate variable with function scope only on the stack. That's what the stack has been invented for.

This buffer in the fec driver will be used in DMA transfer, so maybe that's the reason we should use malloc instead of using the stack?

What has DMA to do with that? We're talking about alignment only.

...

At least in the kernel, we don't use stack for DMA buffers.

Yes, but them, the kernel uses a much more complicated memory setup, with somewhat different mappings for different regions. We don't do that - or do we? IMO we use a single mapping for the whole RAM ?

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Far back in the mists of ancient time, in the great and glorious days of the former Galactic Empire, life was wild, rich and largely tax free. - Douglas Adams, _The Hitchhiker's Guide to the Galaxy_

Fabio Estevam

9:37 p.m.

On Thu, Sep 12, 2013 at 3:53 PM, Wolfgang Denk wd@denx.de wrote:

...

Dear Fabio Estevam,

In message CAOMZO5BY6kDOCoWn_SRwhPE7ssMjAReZ2bcFXGgFF-_Wrdgo0g@mail.gmail.com you wrote:

...
...
It makes perfect sense to allocate variable with function scope only on the stack. That's what the stack has been invented for.

This buffer in the fec driver will be used in DMA transfer, so maybe that's the reason we should use malloc instead of using the stack?

What has DMA to do with that? We're talking about alignment only.

I mentioned DMA because we align the buffer with __aligned(ARCH_DMA_MINALIGN).

Will try to see if I can reproduce the problem here, but the last time I tried I was not able to.

Maybe the gcc version that Robert and Hector pointed out may explain the different behaviour.

Regards,

Fabio Estevam

Robert Hodaszi

13 Sep 13 Sep

1:11 p.m.

On 2013-09-12 21:37, Fabio Estevam wrote:

...

On Thu, Sep 12, 2013 at 3:53 PM, Wolfgang Denk wd@denx.de wrote:

...
Dear Fabio Estevam,

In message CAOMZO5BY6kDOCoWn_SRwhPE7ssMjAReZ2bcFXGgFF-_Wrdgo0g@mail.gmail.com you wrote:

...
...
It makes perfect sense to allocate variable with function scope only on the stack. That's what the stack has been invented for.

This buffer in the fec driver will be used in DMA transfer, so maybe that's the reason we should use malloc instead of using the stack?

What has DMA to do with that? We're talking about alignment only.

I mentioned DMA because we align the buffer with __aligned(ARCH_DMA_MINALIGN).

Will try to see if I can reproduce the problem here, but the last time I tried I was not able to.

Maybe the gcc version that Robert and Hector pointed out may explain the different behaviour.

Regards,

Fabio Estevam

Ok. Then what about if I would use the stack, but align the buffer manually.

From: Robert Hodaszi robert.hodaszi@digi.com Date: Fri, 13 Sep 2013 13:07:52 +0200 Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because of GCC bug

Older GCC versions don't handle well alignment on stack variables. The temporary RX buffer is a local variable, so it is on the stack. Because the FEC driver is using DMA for transmission, receive and transmit buffers should be align on 64 byte. The transmit buffers are not allocated in the driver internally, it sends the packets directly as it gets them. So these packets should be aligned. When the ARP layer wants to reply to an ARP request, it uses the FEC driver's temporary RX buffer (used to pass data to the ARP layer) to store the new packet, and pass it back to the FEC driver's send function. Because of a GCC bug, this buffer is not aligned well, and when the driver tries to send it, it first rounds the address down to the alignment boundary. That causes invalid data.

To fix it, allocate more space on the stack, and align the buffer manually.

Signed-off-by: Robert Hodaszi robert.hodaszi@digi.com --- drivers/net/fec_mxc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index f4f72b7..09d816d 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -828,7 +828,9 @@ static int fec_recv(struct eth_device *dev) uint16_t bd_status; uint32_t addr, size, end; int i; - uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN); + /* Align the receive buffer */ + uchar buff_unaligned[FEC_MAX_PKT_SIZE + (ARCH_DMA_MINALIGN - 1)]; + uchar *buff = ((uint32_t)buff_unaligned + (ARCH_DMA_MINALIGN - 1)) & ~(ARCH_DMA_MINALIGN - 1);

/* * Check if any critical events have happened

Best regards, Robert Hodaszi

Robert Hodaszi

1:13 p.m.

On 2013-09-13 13:11, Robert Hodaszi wrote:

...

On 2013-09-12 21:37, Fabio Estevam wrote:

...
On Thu, Sep 12, 2013 at 3:53 PM, Wolfgang Denk wd@denx.de wrote:

...
Dear Fabio Estevam,

In message CAOMZO5BY6kDOCoWn_SRwhPE7ssMjAReZ2bcFXGgFF-_Wrdgo0g@mail.gmail.com you wrote:

...
...
It makes perfect sense to allocate variable with function scope only on the stack. That's what the stack has been invented for.

This buffer in the fec driver will be used in DMA transfer, so maybe that's the reason we should use malloc instead of using the stack?

What has DMA to do with that? We're talking about alignment only.

I mentioned DMA because we align the buffer with __aligned(ARCH_DMA_MINALIGN).

Will try to see if I can reproduce the problem here, but the last time I tried I was not able to.

Maybe the gcc version that Robert and Hector pointed out may explain the different behaviour.

Regards,

Fabio Estevam

Ok. Then what about if I would use the stack, but align the buffer manually.

From: Robert Hodaszi robert.hodaszi@digi.com Date: Fri, 13 Sep 2013 13:07:52 +0200 Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because of GCC bug

Older GCC versions don't handle well alignment on stack variables. The temporary RX buffer is a local variable, so it is on the stack. Because the FEC driver is using DMA for transmission, receive and transmit buffers should be align on 64 byte. The transmit buffers are not allocated in the driver internally, it sends the packets directly as it gets them. So these packets should be aligned. When the ARP layer wants to reply to an ARP request, it uses the FEC driver's temporary RX buffer (used to pass data to the ARP layer) to store the new packet, and pass it back to the FEC driver's send function. Because of a GCC bug, this buffer is not aligned well, and when the driver tries to send it, it first rounds the address down to the alignment boundary. That causes invalid data.

To fix it, allocate more space on the stack, and align the buffer manually.

Signed-off-by: Robert Hodaszi robert.hodaszi@digi.com

drivers/net/fec_mxc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index f4f72b7..09d816d 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -828,7 +828,9 @@ static int fec_recv(struct eth_device *dev) uint16_t bd_status; uint32_t addr, size, end; int i;
  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
  /* Align the receive buffer */
  uchar buff_unaligned[FEC_MAX_PKT_SIZE + (ARCH_DMA_MINALIGN - 1)];
  uchar *buff = ((uint32_t)buff_unaligned + (ARCH_DMA_MINALIGN - 
1)) & ~(ARCH_DMA_MINALIGN - 1);
    /*
     * Check if any critical events have happened
Best regards, Robert Hodaszi

I think, I sent it again in HTML format... Sorry...

Best regards, Robert Hodaszi

Marek Vasut

4:01 p.m.

Dear Robert Hodaszi,

...

On 2013-09-13 13:11, Robert Hodaszi wrote:

...
On 2013-09-12 21:37, Fabio Estevam wrote:

...
On Thu, Sep 12, 2013 at 3:53 PM, Wolfgang Denk wd@denx.de wrote:

...
Dear Fabio Estevam,

In message CAOMZO5BY6kDOCoWn_SRwhPE7ssMjAReZ2bcFXGgFF-_Wrdgo0g@mail.gmail.com

you wrote:

...
...
It makes perfect sense to allocate variable with function scope only on the stack. That's what the stack has been invented for.

This buffer in the fec driver will be used in DMA transfer, so maybe that's the reason we should use malloc instead of using the stack?

What has DMA to do with that? We're talking about alignment only.

I mentioned DMA because we align the buffer with __aligned(ARCH_DMA_MINALIGN).

Will try to see if I can reproduce the problem here, but the last time I tried I was not able to.

Maybe the gcc version that Robert and Hector pointed out may explain the different behaviour.

Regards,

Fabio Estevam

Ok. Then what about if I would use the stack, but align the buffer manually.

From: Robert Hodaszi robert.hodaszi@digi.com Date: Fri, 13 Sep 2013 13:07:52 +0200 Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because

of GCC bug

Older GCC versions don't handle well alignment on stack variables. The temporary RX buffer is a local variable, so it is on the stack. Because the FEC driver is using DMA for transmission, receive and transmit buffers should be align on 64 byte. The transmit buffers are not allocated in the driver internally, it sends the packets directly as it gets them. So these packets should be aligned. When the ARP layer wants to reply to an ARP request, it uses the FEC driver's temporary RX buffer (used to pass data to the ARP layer) to store the new packet, and pass it back to the FEC driver's send function. Because of a GCC bug, this buffer is not aligned well, and when the driver tries to send it, it first rounds the address down to the alignment boundary. That causes invalid data.

To fix it, allocate more space on the stack, and align the buffer manually.

Signed-off-by: Robert Hodaszi robert.hodaszi@digi.com

drivers/net/fec_mxc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index f4f72b7..09d816d 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -828,7 +828,9 @@ static int fec_recv(struct eth_device *dev)
    uint16_t bd_status;
    uint32_t addr, size, end;
    int i;
  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
  /* Align the receive buffer */
  uchar buff_unaligned[FEC_MAX_PKT_SIZE + (ARCH_DMA_MINALIGN - 1)];
  uchar *buff = ((uint32_t)buff_unaligned + (ARCH_DMA_MINALIGN -
1)) & ~(ARCH_DMA_MINALIGN - 1);
    /*

     * Check if any critical events have happened
Best regards, Robert Hodaszi
I think, I sent it again in HTML format... Sorry...

OK, but as Wolfgang already pointed out, can you tell use what compiler exposes this behavior and show us the details WD requested please ?

Best regards, Marek Vasut

Robert Hodaszi

4:24 p.m.

...

OK, but as Wolfgang already pointed out, can you tell use what compiler exposes this behavior and show us the details WD requested please ?

Best regards, Marek Vasut

You can find the informations below.

Please, use my patch or don't use it, or feel free to modify it as you wish, but sorry, I really don't have more time to work on this problem.

/ $ arm-linux-gcc --version// //arm-linux-gcc (crosstool-NG 1.12.1) 4.4.6// //Copyright (C) 2010 Free Software Foundation, Inc.// //This is free software; see the source for copying conditions. There is NO// //warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.// // // $ arm-linux-ld --version// //GNU ld (crosstool-NG 1.12.1) 2.20.1.20100303// //Copyright 2009 Free Software Foundation, Inc.// //This program is free software; you may redistribute it under the terms of// //the GNU General Public License version 3 or (at your option) a later version.// //This program has absolutely no warranty./

*Before**:

*/4003ff5c <fec_recv>:// //4003ff5c: e92d44f0 push {r4, r5, r6, r7, sl, lr}// //4003ff60: e590403c ldr r4, [r0, #60] ; 0x3c// //4003ff64: e24ddc06 sub sp, sp, #1536 ; 0x600// //4003ff68: e5943000 ldr r3, [r4]// //4003ff6c: e594600c ldr r6, [r4, #12]// //4003ff70: e5935004 ldr r5, [r3, #4]// //4003ff74: e594a008 ldr sl, [r4, #8]// //4003ff78: e1a07000 mov r7, r0// //4003ff7c: e5943000 ldr r3, [r4]// //4003ff80: e3150101 tst r5, #1073741824 ; 0x40000000// //4003ff84: e5835004 str r5, [r3, #4]// //4003ff88: 0a000007 beq 4003ffac <fec_recv+0x50>// //4003ff8c: ebfffd9e bl 4003f60c <fec_halt>// //4003ff90: e5941018 ldr r1, [r4, #24]// //4003ff94: e1a00007 mov r0, r7// //4003ff98: ebfffef1 bl 4003fb64 <fec_init>// //4003ff9c: e1a01005 mov r1, r5// //4003ffa0: e59f01a4 ldr r0, [pc, #420] ; 4004014c <fec_recv+0x1f0>// //4003ffa4: ebff76b7 bl 4001da88 <printf>// //4003ffa8: ea00001d b 40040024 <fec_recv+0xc8>// //4003ffac: e3550000 cmp r5, #0// //4003ffb0: aa000003 bge 4003ffc4 <fec_recv+0x68>// //4003ffb4: e59320c4 ldr r2, [r3, #196] ; 0xc4// //4003ffb8: e3822001 orr r2, r2, #1// //4003ffbc: e5943000 ldr r3, [r4]// //4003ffc0: e58320c4 str r2, [r3, #196] ; 0xc4// //4003ffc4: e3150201 tst r5, #268435456 ; 0x10000000// //4003ffc8: 0a00000d beq 40040004 <fec_recv+0xa8>// //4003ffcc: e5943000 ldr r3, [r4]// //4003ffd0: e59330c4 ldr r3, [r3, #196] ; 0xc4// //4003ffd4: e3130001 tst r3, #1// //4003ffd8: 0a000009 beq 40040004 <fec_recv+0xa8>// //4003ffdc: e1a00007 mov r0, r7// //4003ffe0: ebfffd89 bl 4003f60c <fec_halt>// //4003ffe4: e5943000 ldr r3, [r4]// //4003ffe8: e59320c4 ldr r2, [r3, #196] ; 0xc4// //4003ffec: e3c22001 bic r2, r2, #1// //4003fff0: e5943000 ldr r3, [r4]// //4003fff4: e1a00007 mov r0, r7// //4003fff8: e58320c4 str r2, [r3, #196] ; 0xc4// //4003fffc: e5941018 ldr r1, [r4, #24]// //40040000: ebfffed7 bl 4003fb64 <fec_init>// //40040004: e1a05186 lsl r5, r6, #3// //40040008: e08a6005 add r6, sl, r5// //4004000c: e3c6003f bic r0, r6, #63 ; 0x3f// //40040010: e2801040 add r1, r0, #64 ; 0x40// //40040014: ebff0152 bl 40000564 <invalidate_dcache_range>// //40040018: e1d620b2 ldrh r2, [r6, #2]// //4004001c: e3120902 tst r2, #32768 ; 0x8000// //40040020: 0a000001 beq 4004002c <fec_recv+0xd0>// //40040024: e3a05000 mov r5, #0// //40040028: ea000044 b 40040140 <fec_recv+0x1e4>// //4004002c: e59f311c ldr r3, [pc, #284] ; 40040150 <fec_recv+0x1f4>// //40040030: e0023003 and r3, r2, r3// //40040034: e3530b02 cmp r3, #2048 ; 0x800// //40040038: 1a000016 bne 40040098 <fec_recv+0x13c>// //4004003c: e19a30b5 ldrh r3, [sl, r5]// //40040040: e3530012 cmp r3, #18// //40040044: da000013 ble 40040098 <fec_recv+0x13c>// //40040048: e5967004 ldr r7, [r6, #4]// //4004004c: e19a50b5 ldrh r5, [sl, r5]// //40040050: e2455004 sub r5, r5, #4// //40040054: e287103f add r1, r7, #63 ; 0x3f// //40040058: e0811005 add r1, r1, r5// //4004005c: e3c7003f bic r0, r7, #63 ; 0x3f// //40040060: e3c1103f bic r1, r1, #63 ; 0x3f// //40040064: ebff013e bl 40000564 <invalidate_dcache_range>// //40040068: e1a00007 mov r0, r7// //4004006c: e1a01005 mov r1, r5// //40040070: ebfffe57 bl 4003f9d4 <swap_packet>// //40040074: e1a01007 mov r1, r7// //40040078: e1a02005 mov r2, r5// //4004007c: e1a0000d mov r0, sp// //40040080: eb0030c5 bl 4004c39c <memcpy>// //40040084: e1a0000d mov r0, sp// //40040088: e1a01005 mov r1, r5// //4004008c: e1a0600d mov r6, sp// //40040090: eb00465e bl 40051a10 <NetReceive>// //40040094: ea000005 b 400400b0 <fec_recv+0x154>// //40040098: e2125037 ands r5, r2, #55 ; 0x37// //4004009c: 0a000003 beq 400400b0 <fec_recv+0x154>// //400400a0: e5961004 ldr r1, [r6, #4]// //400400a4: e59f00a8 ldr r0, [pc, #168] ; 40040154 <fec_recv+0x1f8>// //400400a8: ebff7676 bl 4001da88 <printf>// //400400ac: e3a05000 mov r5, #0// //400400b0: e594300c ldr r3, [r4, #12]// //400400b4: e2032007 and r2, r3, #7// //400400b8: e3520007 cmp r2, #7// //400400bc: 1a000014 bne 40040114 <fec_recv+0x1b8>// //400400c0: e5940008 ldr r0, [r4, #8]// //400400c4: e243227e sub r2, r3, #-536870905 ; 0xe0000007// //400400c8: e2433007 sub r3, r3, #7// //400400cc: e0800183 add r0, r0, r3, lsl #3// //400400d0: e1a02182 lsl r2, r2, #3// //400400d4: ea000009 b 40040100 <fec_recv+0x1a4>// //400400d8: e5941008 ldr r1, [r4, #8]// //400400dc: e353003f cmp r3, #63 ; 0x3f// //400400e0: 13a0e902 movne lr, #32768 ; 0x8000// //400400e4: 03a0ea0a moveq lr, #40960 ; 0xa000// //400400e8: e081c002 add ip, r1, r2// //400400ec: e1cce0b2 strh lr, [ip, #2]// //400400f0: e3a0c000 mov ip, #0// //400400f4: e181c0b2 strh ip, [r1, r2]// //400400f8: e2833001 add r3, r3, #1// //400400fc: e2822008 add r2, r2, #8// //40040100: e594100c ldr r1, [r4, #12]// //40040104: e1530001 cmp r3, r1// //40040108: dafffff2 ble 400400d8 <fec_recv+0x17c>// //4004010c: e2801040 add r1, r0, #64 ; 0x40// //40040110: ebff011d bl 4000058c <flush_dcache_range>// //40040114: e5943000 ldr r3, [r4]// //40040118: e3a02401 mov r2, #16777216 ; 0x1000000// //4004011c: e5832010 str r2, [r3, #16]// //40040120: e594300c ldr r3, [r4, #12]// //40040124: e2833001 add r3, r3, #1// //40040128: e21331fe ands r3, r3, #-2147483585 ; 0x8000003f// //4004012c: 42433001 submi r3, r3, #1// //40040130: 41e03d03 mvnmi r3, r3, lsl #26// //40040134: 41e03d23 mvnmi r3, r3, lsr #26// //40040138: 42833001 addmi r3, r3, #1// //4004013c: e584300c str r3, [r4, #12]// //40040140: e1a00005 mov r0, r5// //40040144: e28ddc06 add sp, sp, #1536 ; 0x600// //40040148: e8bd84f0 pop {r4, r5, r6, r7, sl, pc}// //4004014c: 40068c43 .word 0x40068c43// //40040150: 00000837 .word 0x00000837// //40040154: 40068c58 .word 0x40068c58/

*After:

*/4003ff5c <fec_recv>:// //4003ff5c: e92d44f0 push {r4, r5, r6, r7, sl, lr}// //4003ff60: e590403c ldr r4, [r0, #60] ; 0x3c// //4003ff64: e1a0a000 mov sl, r0// //4003ff68: e5943000 ldr r3, [r4]// //4003ff6c: e594600c ldr r6, [r4, #12]// //4003ff70: e5935004 ldr r5, [r3, #4]// //4003ff74: e5947008 ldr r7, [r4, #8]// //4003ff78: e5943000 ldr r3, [r4]// //4003ff7c: e3150101 tst r5, #1073741824 ; 0x40000000// //4003ff80: e5835004 str r5, [r3, #4]// //4003ff84: 0a000007 beq 4003ffa8 <fec_recv+0x4c>// //4003ff88: ebfffd9f bl 4003f60c <fec_halt>// //4003ff8c: e5941018 ldr r1, [r4, #24]// //4003ff90: e1a0000a mov r0, sl// //4003ff94: ebfffef2 bl 4003fb64 <fec_init>// //4003ff98: e1a01005 mov r1, r5// //4003ff9c: e59f019c ldr r0, [pc, #412] ; 40040140 <fec_recv+0x1e4>// //4003ffa0: ebff76b8 bl 4001da88 <printf>// //4003ffa4: ea00001d b 40040020 <fec_recv+0xc4>// //4003ffa8: e3550000 cmp r5, #0// //4003ffac: aa000003 bge 4003ffc0 <fec_recv+0x64>// //4003ffb0: e59320c4 ldr r2, [r3, #196] ; 0xc4// //4003ffb4: e3822001 orr r2, r2, #1// //4003ffb8: e5943000 ldr r3, [r4]// //4003ffbc: e58320c4 str r2, [r3, #196] ; 0xc4// //4003ffc0: e3150201 tst r5, #268435456 ; 0x10000000// //4003ffc4: 0a00000d beq 40040000 <fec_recv+0xa4>// //4003ffc8: e5943000 ldr r3, [r4]// //4003ffcc: e59330c4 ldr r3, [r3, #196] ; 0xc4// //4003ffd0: e3130001 tst r3, #1// //4003ffd4: 0a000009 beq 40040000 <fec_recv+0xa4>// //4003ffd8: e1a0000a mov r0, sl// //4003ffdc: ebfffd8a bl 4003f60c <fec_halt>// //4003ffe0: e5943000 ldr r3, [r4]// //4003ffe4: e59320c4 ldr r2, [r3, #196] ; 0xc4// //4003ffe8: e3c22001 bic r2, r2, #1// //4003ffec: e5943000 ldr r3, [r4]// //4003fff0: e1a0000a mov r0, sl// //4003fff4: e58320c4 str r2, [r3, #196] ; 0xc4// //4003fff8: e5941018 ldr r1, [r4, #24]// //4003fffc: ebfffed8 bl 4003fb64 <fec_init>// //40040000: e1a05186 lsl r5, r6, #3// //40040004: e0876005 add r6, r7, r5// //40040008: e3c6003f bic r0, r6, #63 ; 0x3f// //4004000c: e2801040 add r1, r0, #64 ; 0x40// //40040010: ebff0153 bl 40000564 <invalidate_dcache_range>// //40040014: e1d620b2 ldrh r2, [r6, #2]// //40040018: e3120902 tst r2, #32768 ; 0x8000// //4004001c: 0a000001 beq 40040028 <fec_recv+0xcc>// //40040020: e3a05000 mov r5, #0// //40040024: ea000043 b 40040138 <fec_recv+0x1dc>// //40040028: e59f3114 ldr r3, [pc, #276] ; 40040144 <fec_recv+0x1e8>// //4004002c: e0023003 and r3, r2, r3// //40040030: e3530b02 cmp r3, #2048 ; 0x800// //40040034: 1a000015 bne 40040090 <fec_recv+0x134>// //40040038: e19730b5 ldrh r3, [r7, r5]// //4004003c: e3530012 cmp r3, #18// //40040040: da000012 ble 40040090 <fec_recv+0x134>// //40040044: e5966004 ldr r6, [r6, #4]// //40040048: e19750b5 ldrh r5, [r7, r5]// //4004004c: e2455004 sub r5, r5, #4// //40040050: e286103f add r1, r6, #63 ; 0x3f// //40040054: e0811005 add r1, r1, r5// //40040058: e3c6003f bic r0, r6, #63 ; 0x3f// //4004005c: e3c1103f bic r1, r1, #63 ; 0x3f// //40040060: ebff013f bl 40000564 <invalidate_dcache_range>// //40040064: e1a00006 mov r0, r6// //40040068: e1a01005 mov r1, r5// //4004006c: ebfffe58 bl 4003f9d4 <swap_packet>// //40040070: e1a01006 mov r1, r6// //40040074: e1a02005 mov r2, r5// //40040078: e59f00c8 ldr r0, [pc, #200] ; 40040148 <fec_recv+0x1ec>// //4004007c: eb0030c4 bl 4004c394 <memcpy>// //40040080: e59f00c0 ldr r0, [pc, #192] ; 40040148 <fec_recv+0x1ec>// //40040084: e1a01005 mov r1, r5// //40040088: eb00465e bl 40051a08 <NetReceive>// //4004008c: ea000005 b 400400a8 <fec_recv+0x14c>// //40040090: e2125037 ands r5, r2, #55 ; 0x37// //40040094: 0a000003 beq 400400a8 <fec_recv+0x14c>// //40040098: e5961004 ldr r1, [r6, #4]// //4004009c: e59f00a8 ldr r0, [pc, #168] ; 4004014c <fec_recv+0x1f0>// //400400a0: ebff7678 bl 4001da88 <printf>// //400400a4: e3a05000 mov r5, #0// //400400a8: e594300c ldr r3, [r4, #12]// //400400ac: e2032007 and r2, r3, #7// //400400b0: e3520007 cmp r2, #7// //400400b4: 1a000014 bne 4004010c <fec_recv+0x1b0>// //400400b8: e5940008 ldr r0, [r4, #8]// //400400bc: e243227e sub r2, r3, #-536870905 ; 0xe0000007// //400400c0: e2433007 sub r3, r3, #7// //400400c4: e0800183 add r0, r0, r3, lsl #3// //400400c8: e1a02182 lsl r2, r2, #3// //400400cc: ea000009 b 400400f8 <fec_recv+0x19c>// //400400d0: e5941008 ldr r1, [r4, #8]// //400400d4: e353003f cmp r3, #63 ; 0x3f// //400400d8: 13a0e902 movne lr, #32768 ; 0x8000// //400400dc: 03a0ea0a moveq lr, #40960 ; 0xa000// //400400e0: e081c002 add ip, r1, r2// //400400e4: e1cce0b2 strh lr, [ip, #2]// //400400e8: e3a0c000 mov ip, #0// //400400ec: e181c0b2 strh ip, [r1, r2]// //400400f0: e2833001 add r3, r3, #1// //400400f4: e2822008 add r2, r2, #8// //400400f8: e594100c ldr r1, [r4, #12]// //400400fc: e1530001 cmp r3, r1// //40040100: dafffff2 ble 400400d0 <fec_recv+0x174>// //40040104: e2801040 add r1, r0, #64 ; 0x40// //40040108: ebff011f bl 4000058c <flush_dcache_range>// //4004010c: e5943000 ldr r3, [r4]// //40040110: e3a02401 mov r2, #16777216 ; 0x1000000// //40040114: e5832010 str r2, [r3, #16]// //40040118: e594300c ldr r3, [r4, #12]// //4004011c: e2833001 add r3, r3, #1// //40040120: e21331fe ands r3, r3, #-2147483585 ; 0x8000003f// //40040124: 42433001 submi r3, r3, #1// //40040128: 41e03d03 mvnmi r3, r3, lsl #26// //4004012c: 41e03d23 mvnmi r3, r3, lsr #26// //40040130: 42833001 addmi r3, r3, #1// //40040134: e584300c str r3, [r4, #12]// //40040138: e1a00005 mov r0, r5// //4004013c: e8bd84f0 pop {r4, r5, r6, r7, sl, pc}// //40040140: 40068c3b .word 0x40068c3b// //40040144: 00000837 .word 0x00000837// //40040148: 4008a880 .word 0x4008a880// //4004014c: 40068c50 .word 0x40068c50// /

Robert Hodaszi

Wolfgang Denk

6:06 p.m.

Dear Robert Hodaszi,

In message 5232F2E7.4050407@digi.com you wrote:

...

Ok. Then what about if I would use the stack, but align the buffer manually.

Has this been tested? Does it work?

...

  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);

```
  /* Align the receive buffer */
```

  uchar buff_unaligned[FEC_MAX_PKT_SIZE + (ARCH_DMA_MINALIGN - 1)];

  uchar *buff = ((uint32_t)buff_unaligned + (ARCH_DMA_MINALIGN - 1)) & ~(ARCH_DMA_MINALIGN - 1);

You should use the ALIGN() macro here.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Niklaus Wirth has lamented that, whereas Europeans pronounce his name correctly (Ni-klows Virt), Americans invariably mangle it into (Nick- les Worth). Which is to say that Europeans call him by name, but Americans call him by value.

Marek Vasut

6:24 p.m.

Dear Wolfgang Denk,

...

Dear Robert Hodaszi,

In message 5232F2E7.4050407@digi.com you wrote:

...
Ok. Then what about if I would use the stack, but align the buffer manually.

Has this been tested? Does it work?

...
  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
  /* Align the receive buffer */
  uchar buff_unaligned[FEC_MAX_PKT_SIZE + (ARCH_DMA_MINALIGN - 1)];
  uchar *buff = ((uint32_t)buff_unaligned + (ARCH_DMA_MINALIGN -
1)) & ~(ARCH_DMA_MINALIGN - 1);
You should use the ALIGN() macro here.

We already have this stuff in include/common.h ... ALLOC_CACHE_ALIGNED_BUFFER it is called IIRC

Best regards, Marek Vasut

Wolfgang Denk

7:46 p.m.

Dear Marek Vasut,

In message 201309131824.52506.marex@denx.de you wrote:

...

...
...
  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
  /* Align the receive buffer */
  uchar buff_unaligned[FEC_MAX_PKT_SIZE + (ARCH_DMA_MINALIGN - 1)];
  uchar *buff = ((uint32_t)buff_unaligned + (ARCH_DMA_MINALIGN -
1)) & ~(ARCH_DMA_MINALIGN - 1);
You should use the ALIGN() macro here.
We already have this stuff in include/common.h ... ALLOC_CACHE_ALIGNED_BUFFER it is called IIRC

That's ALLOC_CACHE_ALIGN_BUFFER. Thanks.

Wolfgang Denk

Fabio Estevam

15 Sep 15 Sep

12:05 a.m.

Hi Robert and Hector,

On Fri, Sep 13, 2013 at 2:46 PM, Wolfgang Denk wd@denx.de wrote:

...

That's ALLOC_CACHE_ALIGN_BUFFER. Thanks.

Could you please let us know wthether the change below fix the problem?

Thanks,

Fabio Estevam

--- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -794,7 +794,7 @@ static int fec_recv(struct eth_device *dev) uint16_t bd_status; uint32_t addr, size, end; int i; - uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN); + ALLOC_CACHE_ALIGN_BUFFER(uchar, buff, FEC_MAX_PKT_SIZE);

/* * Check if any critical events have happened

Robert Hodaszi

12 Sep 12 Sep

1:08 p.m.

Hi,

Sorry, hopefully that will be a plain-text.

There are a lot of bug announcement, just make a search: gcc.gnu.org/bugzilla/show_bug.cgi?id=33721 gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

Best regards, Robert Hodaszi

On 2013-09-12 12:50, Marek Vasut wrote:

...

Dear Hector Palacios,

...
Hello,

Going back to this old thread I have some news regarding the problem with TFTP transmissions blocking (timed out) after 10 seconds on the FEC of the MX28. See below:

On 07/17/2013 05:55 PM, Hector Palacios wrote:

...
Dear Marek,

On 07/16/2013 06:44 AM, Marek Vasut wrote:

...
Dear Fabio Estevam,

...
On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevamfestevam@gmail.com

wrote:

...
...
...
...
...
On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios

hector.palacios@digi.com wrote: > @Fabio: could you manually run the command 'tftp ${loadaddr} > file100M' in your EVK? Yes, this is what I have been running since the beginning.

> If it doesn't fail, could you try running it again after playing with > the environment (setting/printing some variables). I can't reproduce the problem here.

> As I said, this issue appeared with different TFTP servers and is > independent of whether the dcache is or not enabled. Can you transfer from a PC to another PC via TFTP?

Yes I can.

...
Another thing of interest would be a 'tcpdump' pcap capture of that connection.

I was initially filtering out only TFTP packets of my wireshark trace and all looked correct. After taking a second look to the full trace I see now a hint. Around 7 seconds after starting the TFTP transfer the server is sending an ARP to the target asking for the owner of the target's IP. The target is receiving this ARP and apparently responding (at least this is what my debug code shows as it gets into arp.c:ArpReceive(), case ARPOP_REQUEST and sending a packet), but this ARP reply from the target is not reaching the network. My sniffer does not capture this reply.

The server resends the ARP request twice more (seconds 8 and 9) to the target and since it doesn't get a reply then sends a broadcast ARP (seconds 10) asking who has that IP. Since nobody responds it stops sending data.

The times that it works (and I don't know the magic behind using a numeric address versus using ${loadaddr} when they have the same value), the ARP replies do reach the network and the server continues the transmission normally.

Using a v2009 U-Boot, the behaviour is exactly the same, but the target's ARP replies always reach the network, and the transfers always succeed.

Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

We tracked down the issue to an ARP request from the server that was never answered by the target. We later noticed that the problem did not happen anymore when building U-Boot with a different toolchain and that the issue seemed to be in the alignment of the RX buffer in the stack, which old GCC compilers seem to do wrong.

Here is a patch:

From: Robert Hodaszirobert.hodaszi@digi.com Date: Fri, 6 Sep 2013 09:50:52 +0200 Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because of GCC bug

Older GCC versions don't handle well alignment on stack variables. The temporary RX buffer is a local variable, so it is on the stack. Because the FEC driver is using DMA for transmission, receive and transmit buffers should be aligned on 64 byte. The transmit buffers are not allocated in the driver internally, it sends the packets directly as it gets them. So these packets should be aligned. When the ARP layer wants to reply to an ARP request, it uses the FEC driver's temporary RX buffer (used to pass data to the ARP layer) to store the new packet, and pass it back to the FEC driver's send function. Because of a GCC bug

Can you point to this GCC bug in the GCC bugzilla or something?

...
this buffer is not aligned well, and when the driver tries to send it, it first rounds the address down to the alignment boundary. That causes invalid data.

To fix it, don't put the temporary onto the stack.

Signed-off-by: Robert Hodaszirobert.hodaszi@digi.com Signed-off-by: Hector Palacioshector.palacios@digi.com

drivers/net/fec_mxc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index f4f72b7..315017e 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev) uint16_t bd_status; uint32_t addr, size, end; int i;
  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
  /* Don't place this variable on the stack, because older GCC
version + * doesn't handle alignement on stack well.
   */
  static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
The buffer might as well be allocated using ALLOC_CACHE_ALIGN_BUFFER() from include/common.h . Still, are you _really_ sure the buffer is unaligned ? Do you have a testcase maybe ?

btw. I am able to replicate this issue sometimes even using GCC 4.8.0 .

Best regards, Marek Vasut

Wolfgang Denk

8:12 p.m.

Dear Robert Hodaszi,

In message 5231A0C0.8070308@digi.com you wrote:

...

Sorry, hopefully that will be a plain-text.

There are a lot of bug announcement, just make a search: gcc.gnu.org/bugzilla/show_bug.cgi?id=33721 gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

Hm... what exactly are the values of STACK_BOUNDARY and PREFERRED_STACK_BOUNDARY for the failing version of GCC, then?

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Each team building another component has been using the most recent tested version of the integrated system as a test bed for debugging its piece. Their work will be set back by having that test bed change under them. Of course it must. But the changes need to be quantized. Then each user has periods of productive stability, interrupted by bursts of test-bed change. This seems to be much less disruptive than a constant rippling and trembling. - Frederick Brooks Jr., "The Mythical Man Month"

Wolfgang Denk

7:50 p.m.

Dear Hector Palacios,

In message 523195CA.3010305@digi.com you wrote:

...

Here is a patch:

From: Robert Hodaszi robert.hodaszi@digi.com Date: Fri, 6 Sep 2013 09:50:52 +0200 Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because of GCC bug

Older GCC versions don't handle well alignment on stack variables.

Can you please be specific - which exact versions of GCC are supposed to misbehave here?

...

To fix it, don't put the temporary onto the stack.

This is not a good idea, as it wastes a memory for no good reason.

...

Signed-off-by: Robert Hodaszi robert.hodaszi@digi.com Signed-off-by: Hector Palacios hector.palacios@digi.com

drivers/net/fec_mxc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index f4f72b7..315017e 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev) uint16_t bd_status; uint32_t addr, size, end; int i;
  uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
  /* Don't place this variable on the stack, because older GCC version
   * doesn't handle alignement on stack well.
   */
  static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);

I have to admit that I doubt the explanation - somthing else is probaly wrong instead. I would really like to know which compiler version misbehaves, and what the generated code looks like, both in the working and in the broken case.

Thanks.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de I had the rare misfortune of being one of the first people to try and implement a PL/1 compiler. -- T. Cheatham

Troy Kisky

13 Jul 13 Jul

4:43 a.m.

On 7/11/2013 4:18 PM, Fabio Estevam wrote:

...

On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Perhaps this because our memory barriers are lacking.

Linux has this code asm/io.h:#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) asm/barrier.h-#define mb() do { dsb(); outer_sync(); } while (0) asm/barrier.h-#define rmb() dsb() asm/barrier.h:#define wmb() mb() asm/barrier.h-#else asm/barrier.h-#define mb() barrier() asm/barrier.h-#define rmb() barrier() asm/barrier.h:#define wmb() barrier() asm/barrier.h-#endif

asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7 asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory") asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6 asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \

_____________________________________ Can you try just adding a dsb() instead of the dummy read?

If this also fixes your problem, it seems better to fix our writel macro

Thanks Troy

Albert ARIBAUD

15 Jul 15 Jul

3:41 p.m.

Hi Troy,

On Fri, 12 Jul 2013 19:43:07 -0700, Troy Kisky troy.kisky@boundarydevices.com wrote:

...

On 7/11/2013 4:18 PM, Fabio Estevam wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Perhaps this because our memory barriers are lacking.

Linux has this code asm/io.h:#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) asm/barrier.h-#define mb() do { dsb(); outer_sync(); } while (0) asm/barrier.h-#define rmb() dsb() asm/barrier.h:#define wmb() mb() asm/barrier.h-#else asm/barrier.h-#define mb() barrier() asm/barrier.h-#define rmb() barrier() asm/barrier.h:#define wmb() barrier() asm/barrier.h-#endif

asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7 asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory") asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6 asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \

Can you try just adding a dsb() instead of the dummy read?

If this also fixes your problem, it seems better to fix our writel macro

Not sure I understand who you are answering to exactly, as Fabio indicates his problem is solved.

Besides, Marek and I had in fact investigated barriers, adding some as I did in times past in mvgbe.c, and fiddling with the one already in dcache_flush_range(). None of this had any effect.

However, if our barriers are lacking, then they may fail us somehow on other occasions, so best is if we analyze the failings. Can you clarify in what respect exactly they are lacking? For instance, are all our barriers lacking, or only some, and which ones?

Amicalement,

-- Albert.

Troy Kisky

7:39 p.m.

On 7/15/2013 6:41 AM, Albert ARIBAUD wrote:

...

Hi Troy,

On Fri, 12 Jul 2013 19:43:07 -0700, Troy Kisky troy.kisky@boundarydevices.com wrote:

...
On 7/11/2013 4:18 PM, Fabio Estevam wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Perhaps this because our memory barriers are lacking.

Linux has this code asm/io.h:#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) asm/barrier.h-#define mb() do { dsb(); outer_sync(); } while (0) asm/barrier.h-#define rmb() dsb() asm/barrier.h:#define wmb() mb() asm/barrier.h-#else asm/barrier.h-#define mb() barrier() asm/barrier.h-#define rmb() barrier() asm/barrier.h:#define wmb() barrier() asm/barrier.h-#endif

asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7 asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory") asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6 asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \

Can you try just adding a dsb() instead of the dummy read?

If this also fixes your problem, it seems better to fix our writel macro

Not sure I understand who you are answering to exactly, as Fabio indicates his problem is solved.

Besides, Marek and I had in fact investigated barriers, adding some as I did in times past in mvgbe.c, and fiddling with the one already in dcache_flush_range(). None of this had any effect.

You tried adding a dsb() to dcache_flush_range()? That should have fixed the problem as well.

...

However, if our barriers are lacking, then they may fail us somehow on other occasions, so best is if we analyze the failings. Can you clarify in what respect exactly they are lacking? For instance, are all our barriers lacking, or only some, and which ones?

Amicalement,

Linux has

Kconfig:config ARM_DMA_MEM_BUFFERABLE Kconfig- bool "Use non-cacheable memory for DMA" if (CPU_V6 || CPU_V6K) && !CPU_V7 Kconfig- depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP || \ Kconfig- MACH_REALVIEW_PB11MP) Kconfig- default y if CPU_V6 || CPU_V6K || CPU_V7 Kconfig- help

So, if this symbol is y then all writel/readl will be preceded by a dsb() as shown above.

However, u-boot probably uses cacheable memory for dma, so maybe u-boot is also correct with

asm/io.h-#define dmb() __asm__ __volatile__ ("" : : : "memory") asm/io.h-#define __iormb() dmb() asm/io.h:#define __iowmb() dmb()

and no dsb(), but perhaps flush_dcache still needs one at the end.

Troy

Troy Kisky

9:59 p.m.

On 7/15/2013 10:39 AM, Troy Kisky wrote:

...

On 7/15/2013 6:41 AM, Albert ARIBAUD wrote:

...
Hi Troy,

On Fri, 12 Jul 2013 19:43:07 -0700, Troy Kisky troy.kisky@boundarydevices.com wrote:

...
On 7/11/2013 4:18 PM, Fabio Estevam wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Perhaps this because our memory barriers are lacking.

Linux has this code asm/io.h:#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) asm/barrier.h-#define mb() do { dsb(); outer_sync(); } while (0) asm/barrier.h-#define rmb() dsb() asm/barrier.h:#define wmb() mb() asm/barrier.h-#else asm/barrier.h-#define mb() barrier() asm/barrier.h-#define rmb() barrier() asm/barrier.h:#define wmb() barrier() asm/barrier.h-#endif

asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7 asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory") asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6 asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \

Can you try just adding a dsb() instead of the dummy read?

If this also fixes your problem, it seems better to fix our writel macro

Not sure I understand who you are answering to exactly, as Fabio indicates his problem is solved.

Besides, Marek and I had in fact investigated barriers, adding some as I did in times past in mvgbe.c, and fiddling with the one already in dcache_flush_range(). None of this had any effect.

You tried adding a dsb() to dcache_flush_range()? That should have fixed the problem as well.

...
However, if our barriers are lacking, then they may fail us somehow on other occasions, so best is if we analyze the failings. Can you clarify in what respect exactly they are lacking? For instance, are all our barriers lacking, or only some, and which ones?

Amicalement,

Linux has

Kconfig:config ARM_DMA_MEM_BUFFERABLE Kconfig- bool "Use non-cacheable memory for DMA" if (CPU_V6 || CPU_V6K) && !CPU_V7 Kconfig- depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP || \ Kconfig- MACH_REALVIEW_PB11MP) Kconfig- default y if CPU_V6 || CPU_V6K || CPU_V7 Kconfig- help

So, if this symbol is y then all writel/readl will be preceded by a dsb() as shown above.

However, u-boot probably uses cacheable memory for dma, so maybe u-boot is also correct with

asm/io.h-#define dmb() __asm__ __volatile__ ("" : : : "memory") asm/io.h-#define __iormb() dmb() asm/io.h:#define __iowmb() dmb()

and no dsb(), but perhaps flush_dcache still needs one at the end.

Troy

for armv7, flush dcache does have a dsb.

//u-boot #define CP15DSB asm volatile ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0))

//linux asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0) : "memory")

Don't know why I didn't see that before. So, I don't know why that wasn't good enough.

Maybe CONFIG_SYS_DCACHE_OFF was set and

void flush_dcache_range(unsigned long start, unsigned long stop) { }

Needs a dsb too???

Troy

Albert ARIBAUD

10:20 p.m.

Hi Troy,

On Mon, 15 Jul 2013 12:59:56 -0700, Troy Kisky troy.kisky@boundarydevices.com wrote:

...

On 7/15/2013 10:39 AM, Troy Kisky wrote:

...
On 7/15/2013 6:41 AM, Albert ARIBAUD wrote:

...
Hi Troy,

On Fri, 12 Jul 2013 19:43:07 -0700, Troy Kisky troy.kisky@boundarydevices.com wrote:

...
On 7/11/2013 4:18 PM, Fabio Estevam wrote:

...
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut marex@denx.de wrote:

...
The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Excellent, managed to transfer 90MB via TFTP on mx28evk without a single timeout.

Tested-by: Fabio Estevam fabio.estevam@freescale.com _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Perhaps this because our memory barriers are lacking.

Linux has this code asm/io.h:#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE asm/io.h-#include <asm/barrier.h> asm/io.h-#define __iormb() rmb() asm/io.h:#define __iowmb() wmb() asm/io.h-#else asm/io.h-#define __iormb() do { } while (0) asm/io.h:#define __iowmb() do { } while (0) asm/io.h-#endif

asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) asm/barrier.h-#define mb() do { dsb(); outer_sync(); } while (0) asm/barrier.h-#define rmb() dsb() asm/barrier.h:#define wmb() mb() asm/barrier.h-#else asm/barrier.h-#define mb() barrier() asm/barrier.h-#define rmb() barrier() asm/barrier.h:#define wmb() barrier() asm/barrier.h-#endif

asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7 asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory") asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6 asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c5, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \ asm/barrier.h- : : "r" (0) : "memory") asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \

Can you try just adding a dsb() instead of the dummy read?

If this also fixes your problem, it seems better to fix our writel macro

Not sure I understand who you are answering to exactly, as Fabio indicates his problem is solved.

Besides, Marek and I had in fact investigated barriers, adding some as I did in times past in mvgbe.c, and fiddling with the one already in dcache_flush_range(). None of this had any effect.

You tried adding a dsb() to dcache_flush_range()? That should have fixed the problem as well.

...
However, if our barriers are lacking, then they may fail us somehow on other occasions, so best is if we analyze the failings. Can you clarify in what respect exactly they are lacking? For instance, are all our barriers lacking, or only some, and which ones?

Amicalement,

Linux has

Kconfig:config ARM_DMA_MEM_BUFFERABLE Kconfig- bool "Use non-cacheable memory for DMA" if (CPU_V6 || CPU_V6K) && !CPU_V7 Kconfig- depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP || \ Kconfig- MACH_REALVIEW_PB11MP) Kconfig- default y if CPU_V6 || CPU_V6K || CPU_V7 Kconfig- help

So, if this symbol is y then all writel/readl will be preceded by a dsb() as shown above.

However, u-boot probably uses cacheable memory for dma, so maybe u-boot is also correct with

asm/io.h-#define dmb() __asm__ __volatile__ ("" : : : "memory") asm/io.h-#define __iormb() dmb() asm/io.h:#define __iowmb() dmb()

and no dsb(), but perhaps flush_dcache still needs one at the end.

Troy

for armv7, flush dcache does have a dsb.

//u-boot #define CP15DSB asm volatile ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0))

//linux asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0) : "memory")

Don't know why I didn't see that before. So, I don't know why that wasn't good enough.

Maybe CONFIG_SYS_DCACHE_OFF was set and

void flush_dcache_range(unsigned long start, unsigned long stop) { }

Needs a dsb too???

I think everything is explained in Marek's commit message.

...

Troy

Amicalement,

-- Albert.

Albert ARIBAUD

10:20 p.m.

Hi Troy,

On Mon, 15 Jul 2013 10:39:54 -0700, Troy Kisky troy.kisky@boundarydevices.com wrote:

...

...
Besides, Marek and I had in fact investigated barriers, adding some as I did in times past in mvgbe.c, and fiddling with the one already in dcache_flush_range(). None of this had any effect.

You tried adding a dsb() to dcache_flush_range()? That should have fixed the problem as well.

There already was a memory barrier -- the only one kind known bu ARM926J-S, which is a write buffer(s) drain -- and no, it should not have fixed the problem (and did not), because the issue is not about pushing the transactions out of the CPU soon enough; it is about having the EMI perform them before it passes our 'go' command to the ENET DMA.

...

...
However, if our barriers are lacking, then they may fail us somehow on other occasions, so best is if we analyze the failings. Can you clarify in what respect exactly they are lacking? For instance, are all our barriers lacking, or only some, and which ones?

Amicalement,

Linux has

Kconfig:config ARM_DMA_MEM_BUFFERABLE Kconfig- bool "Use non-cacheable memory for DMA" if (CPU_V6 || CPU_V6K) && !CPU_V7 Kconfig- depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP || \ Kconfig- MACH_REALVIEW_PB11MP) Kconfig- default y if CPU_V6 || CPU_V6K || CPU_V7 Kconfig- help

So, if this symbol is y then all writel/readl will be preceded by a dsb() as shown above.

While I do understand what is done there, I do not see the interest since 1) in our issue, the problem was not in any writel() or readl(), and 2) U-Boot uses readl() and writel() for device memory ranges, which are not cached at all and thus do not need any flush, invalidate or barrier.

...

However, u-boot probably uses cacheable memory for dma, so maybe u-boot is also correct with

Actually, in the driver where the issue occurred, U-boot does use cacheable memory for DMA; that's why the driver contains calls to flush_dcache_range() and invalidate_dcache_range() on the descriptors...

...

asm/io.h-#define dmb() __asm__ __volatile__ ("" : : : "memory") asm/io.h-#define __iormb() dmb() asm/io.h:#define __iowmb() dmb()

and no dsb(), but perhaps flush_dcache still needs one at the end.

... but that's for descriptors, which are not accessed uwing writel() or readl(); for device registers, such as ENET, we use writel() and readl() but no cache.

So far we never had to use any data memory barrier on peripheral accesses. The worst that happened AFAIK is when we needed instruction barriers to prevent the compiler from mis-ordering device writes wrt their source code order; and at that time, our readl() and writel() implementations were not volatile. Today, I am pretty sure that these instruction barriers are uneeded.

...

Troy

Amicalement,

-- Albert.

Troy Kisky

11:18 p.m.

On 7/15/2013 1:20 PM, Albert ARIBAUD wrote:

...

Hi Troy,

On Mon, 15 Jul 2013 10:39:54 -0700, Troy Kisky troy.kisky@boundarydevices.com wrote:

...
...
Besides, Marek and I had in fact investigated barriers, adding some as I did in times past in mvgbe.c, and fiddling with the one already in dcache_flush_range(). None of this had any effect.

You tried adding a dsb() to dcache_flush_range()? That should have fixed the problem as well.

There already was a memory barrier -- the only one kind known bu ARM926J-S, which is a write buffer(s) drain -- and no, it should not have fixed the problem (and did not), because the issue is not about pushing the transactions out of the CPU soon enough; it is about having the EMI perform them before it passes our 'go' command to the ENET DMA.

Thanks for straightening me out. My back just popped a couple of times.

Troy

Albert ARIBAUD

12 Jul 12 Jul

7:57 a.m.

Hi Marek,

On Fri, 12 Jul 2013 01:03:04 +0200, Marek Vasut marex@denx.de wrote:

...

The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

drivers/net/fec_mxc.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c index 97bf8fe..ec5b9db 100644 --- a/drivers/net/fec_mxc.c +++ b/drivers/net/fec_mxc.c @@ -737,6 +737,28 @@ static int fec_send(struct eth_device *dev, void *packet, int length) flush_dcache_range(addr, addr + size);

/*
* Below we read the DMA descriptor's last four bytes back from the
* DRAM. This is important in order to make sure that all WRITE
* operations on the bus that were triggered by previous cache FLUSH
* have completed.
*
* Otherwise, on MX28, it is possible to observe a corruption of the
* DMA descriptors. Please refer to schematic "Figure 1-2" in MX28RM
* for the bus structure of MX28. The scenario is as follows:
*
* 1) ARM core triggers a series of WRITEs on the AHB_ARB2 bus going
*    to DRAM due to flush_dcache_range()
* 2) ARM core writes the FEC registers via AHB_ARB2
* 3) FEC DMA starts reading/writing from/to DRAM via AHB_ARB3
*
* Note that 2) does sometimes finish before 1) due to reordering of
* WRITE accesses on the AHB bus, therefore triggering 3) before the
* DMA descriptor is fully written into DRAM. This results in occasional
* corruption of the DMA descriptor.
*/
readl(addr + size - 4);

/*

Enable SmartDMA transmit task

*/ fec_tx_task_enable(fec);

This being a bugfix patch, and having been tested twice, I suggest that it go in 2013.07, maybe with the commit message reduced to its first paragraph above -- although of course I do appreciate the second one, except it tends to minimize Marek's own contribution to the fix, which is by far the most important.

Amicalement,

-- Albert.

Albert ARIBAUD

8:39 a.m.

Afterthought:

On Fri, 12 Jul 2013 07:57:18 +0200, Albert ARIBAUD albert.u.boot@aribaud.net wrote:

...

except it tends to minimize Marek's own contribution to the fix, which is by far the most important.

'The most important' 'by far' being of course Marek's contribution, not minimizing it. :)

Amicalement,

-- Albert.

Marek Vasut

1:51 p.m.

Hi Albert,

...

Afterthought:

On Fri, 12 Jul 2013 07:57:18 +0200, Albert ARIBAUD

albert.u.boot@aribaud.net wrote:

...
except it tends to minimize Marek's own contribution to the fix, which is by far the most important.

'The most important' 'by far' being of course Marek's contribution, not minimizing it. :)

I'm convinced I owe you a truckload of tartelette ;-)

Best regards, Marek Vasut

Stefano Babic

8:56 a.m.

Hi Marek, hi Albert,

On 12/07/2013 07:57, Albert ARIBAUD wrote:

...

This being a bugfix patch, and having been tested twice, I suggest that it go in 2013.07, maybe with the commit message reduced to its first paragraph above -- although of course I do appreciate the second one, except it tends to minimize Marek's own contribution to the fix, which is by far the most important.

Well, a big thank to both of you !

I merge this into u-boot-imx and I will send soon my PR.

Regards, Stefano

-- ===================================================================== DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-53 Fax: +49-8142-66989-80 Email: sbabic@denx.de =====================================================================

Stefano Babic

9:30 a.m.

On 12/07/2013 01:03, Marek Vasut wrote:

...

The MX28 multi-layer AHB bus can be too slow and trigger the FEC DMA too early, before all the data hit the DRAM. This patch ensures the data are written in the RAM before the DMA starts. Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud, who pointed out it can possibly be such a bus synchronisation issue.

Signed-off-by: Marek Vasut marex@denx.de Cc: Albert ARIBAUD albert.u.boot@aribaud.net Cc: Fabio Estevam fabio.estevam@freescale.com Cc: Stefano Babic sbabic@denx.de

Applied to u-boot-imx (fix), thanks !

Best regards, Stefano Babic

4252

Age (days ago)

4317

Last active (days ago)

List overview

Download

54 comments

10 participants

tags (0)

participants (10)

Albert ARIBAUD
Alexandre Pereira da Silva
Fabio Estevam
Hector Palacios
Marek Vasut
Marek Vasut
Robert Hodaszi
Stefano Babic
Troy Kisky
Wolfgang Denk