[U-Boot] common/xyzmodem.c, ymodem, slow behavior receiving bytes

Dear all,
i am experiencing a strange behavior on the u-boot ymodem protocol. I am running u-boot, (pulled from origin/head yesterday) on a mcf5307 based board. I am currently working to include this board/cpu as supported into u-boot.
cpu runs at 90Mhz (bus clock@45), baud rate 115200.
On an older u-boot (2011.09.00122), i was able to upload files through ymodem at 115200 without any issue.
Once pulled yesterday head, and installed on the board, i am no more able to upload files. U-boot times out when the first 128 bytes block is sent, and sends SYN (C) again, so i started some debugging on xyzmodem.c:
Protocol times-out here, from line 434
.... xyz.len = (c == SOH) ? 128 : 1024; xyz.bufp = xyz.pkt; for (i = 0; i < xyz.len; i++) { res = CYGACC_COMM_IF_GETC_TIMEOUT (*xyz.__chan, &c); ZM_DEBUG (zm_save (c)); if (res) { xyz.pkt[i] = c; } else { ZM_DEBUG (zm_dump (__LINE__)); ---->> return xyzModem_timeout; } }
Echoing received char back to PC, i see about 1 every 3 received chars seems lost somewhere, so the first 128bytes block is never completely received (time out).
I measured baudrate through oscilloscope, from PC to u-boot board, i have 115600 from u-boot board to PC, i have 117180 (still inside 2% tolerance)
Also verified timer/delay functions works properly in new u-boot build. As clients i used minicom and other, and have the same result.
So from a first look, seems the receive routine from line 434 is no more fast enough to receive the whole block. I moved speed down to 57600, upload works fine again.
What do you think about ?
Best Regards, Angelo Dureghello

Hi all,
seems that mcf5307, working at 90Mhz, is not fast enough when "-Os -g" compile options are set.
I changed for test config.mk from
DBGFLAGS= -g # -DDEBUG OPTFLAGS= -Os #-fomit-frame-pointer
into
DBGFLAGS= #-g # -DDEBUG OPTFLAGS= -O2 #-Os #-fomit-frame-pointer
common compiles now with -O2 and ymodem works fine again at 115200.
Also, i don't understand why "-g" is set by default. Is there a way to override/customize this options for this cpu ? Or i have to definitely step down to 57600 ?
Best Regards, Angelo Dureghello
On Sat, Nov 10, 2012 at 02:40:07PM +0100, Angelo Dureghello wrote:
Dear all,
i am experiencing a strange behavior on the u-boot ymodem protocol. I am running u-boot, (pulled from origin/head yesterday) on a mcf5307 based board. I am currently working to include this board/cpu as supported into u-boot.
cpu runs at 90Mhz (bus clock@45), baud rate 115200.
On an older u-boot (2011.09.00122), i was able to upload files through ymodem at 115200 without any issue.
Once pulled yesterday head, and installed on the board, i am no more able to upload files. U-boot times out when the first 128 bytes block is sent, and sends SYN (C) again, so i started some debugging on xyzmodem.c:
Protocol times-out here, from line 434
.... xyz.len = (c == SOH) ? 128 : 1024; xyz.bufp = xyz.pkt; for (i = 0; i < xyz.len; i++) { res = CYGACC_COMM_IF_GETC_TIMEOUT (*xyz.__chan, &c); ZM_DEBUG (zm_save (c)); if (res) { xyz.pkt[i] = c; } else { ZM_DEBUG (zm_dump (__LINE__)); ---->> return xyzModem_timeout; } }
Echoing received char back to PC, i see about 1 every 3 received chars seems lost somewhere, so the first 128bytes block is never completely received (time out).
I measured baudrate through oscilloscope, from PC to u-boot board, i have 115600 from u-boot board to PC, i have 117180 (still inside 2% tolerance)
Also verified timer/delay functions works properly in new u-boot build. As clients i used minicom and other, and have the same result.
So from a first look, seems the receive routine from line 434 is no more fast enough to receive the whole block. I moved speed down to 57600, upload works fine again.
What do you think about ?
Best Regards, Angelo Dureghello

Dear Angelo Dureghello,
Hi all,
seems that mcf5307, working at 90Mhz, is not fast enough when "-Os -g" compile options are set.
I changed for test config.mk from
DBGFLAGS= -g # -DDEBUG OPTFLAGS= -Os #-fomit-frame-pointer
into
DBGFLAGS= #-g # -DDEBUG OPTFLAGS= -O2 #-Os #-fomit-frame-pointer
common compiles now with -O2 and ymodem works fine again at 115200.
Also, i don't understand why "-g" is set by default. Is there a way to override/customize this options for this cpu ? Or i have to definitely step down to 57600 ?
You can set up the CFLAGS in config.mk for your CPU.
btw. read RFC1855 please.
Best Regards, Angelo Dureghello
On Sat, Nov 10, 2012 at 02:40:07PM +0100, Angelo Dureghello wrote:
Dear all,
i am experiencing a strange behavior on the u-boot ymodem protocol. I am running u-boot, (pulled from origin/head yesterday) on a mcf5307 based board. I am currently working to include this board/cpu as supported into u-boot.
cpu runs at 90Mhz (bus clock@45), baud rate 115200.
On an older u-boot (2011.09.00122), i was able to upload files through ymodem at 115200 without any issue.
Once pulled yesterday head, and installed on the board, i am no more able to upload files. U-boot times out when the first 128 bytes block is sent, and sends SYN (C) again, so i started some debugging on xyzmodem.c:
Protocol times-out here, from line 434
.... xyz.len = (c == SOH) ? 128 : 1024; xyz.bufp = xyz.pkt; for (i = 0; i < xyz.len; i++)
{ res = CYGACC_COMM_IF_GETC_TIMEOUT (*xyz.__chan, &c); ZM_DEBUG (zm_save (c)); if (res)
{
xyz.pkt[i] = c;
}
else
{
ZM_DEBUG (zm_dump (__LINE__));
---->> return xyzModem_timeout;
}
}
Echoing received char back to PC, i see about 1 every 3 received chars seems lost somewhere, so the first 128bytes block is never completely received (time out).
I measured baudrate through oscilloscope, from PC to u-boot board, i have 115600 from u-boot board to PC, i have 117180 (still inside 2% tolerance)
Also verified timer/delay functions works properly in new u-boot build. As clients i used minicom and other, and have the same result.
So from a first look, seems the receive routine from line 434 is no more fast enough to receive the whole block. I moved speed down to 57600, upload works fine again.
What do you think about ?
Best Regards, Angelo Dureghello
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

Dear Angelo Dureghello,
please don't top-post / full quote.
In message 20121113001651.GA21177@angel3 you wrote:
seems that mcf5307, working at 90Mhz, is not fast enough when "-Os -g" compile options are set.
I changed for test config.mk from
DBGFLAGS= -g # -DDEBUG OPTFLAGS= -Os #-fomit-frame-pointer
into
DBGFLAGS= #-g # -DDEBUG OPTFLAGS= -O2 #-Os #-fomit-frame-pointer
common compiles now with -O2 and ymodem works fine again at 115200.
Removing -g makes no sense in this cotext. It has no impact on the generated code.
I am really surprised about your claim that the -O2 compiled code is actually running faster than the -Os compiled one on a low-end system as yours (90 MHz CPU clock, 8 kB cache size).
Which exact tool chain are you using to build the code?
Also, i don't understand why "-g" is set by default.
It is set because it is useful to some (those in the need of debugging their code) and does not hurt others.
Is there a way to override/customize this options for this cpu ?
I am not convinced that it makes sense to change settings on a per-cpu base. A 90 MHz CPU should be more than sufficient to receive data at 115kbps. I can only compare against 50 MHz PowerQuicc I systems (which is about the lowest end machines I have at hands now), and there no such problem exists.
It would be good to understand exactly where the problem is coming from. I don't think that the -Os setting is the core of the problem; I tend to suspect rather your tool chain or your serial driver or such.
Or i have to definitely step down to 57600 ?
There should be no need for that.
Best regards,
Wolfgang Denk

Hi Wolfgang and all,
On Tue, Nov 13, 2012 at 08:09:19AM +0100, Wolfgang Denk wrote:
Dear Angelo Dureghello,
please don't top-post / full quote.
Ack.
I am really surprised about your claim that the -O2 compiled code is actually running faster than the -Os compiled one on a low-end system as yours (90 MHz CPU clock, 8 kB cache size).
Which exact tool chain are you using to build the code?
I tested 2 different toolchains,
m68k-elf-gcc (Sourcery CodeBench Lite 2011.09-21) 4.6.1
and another older downloaded from uClinux site, based on
m68k-elf-gcc (GCC) 4.2.4
Nothing change, issue is the same.
Also, i don't understand why "-g" is set by default.
It is set because it is useful to some (those in the need of debugging their code) and does not hurt others.
My question becouse sometime embedded programmers fight for a bite free in the flash. On limited boards like mine (4M Flash) once kernel and some apps are stored, very small size remain for u-boot and some n.v. data storage. But no problem for me, i change flags as needed eventually.
I am not convinced that it makes sense to change settings on a per-cpu base. A 90 MHz CPU should be more than sufficient to receive data at 115kbps. I can only compare against 50 MHz PowerQuicc I systems (which is about the lowest end machines I have at hands now), and there no such problem exists.
Sure, 20 Mhz boards are able to handle 115200 as well. So there is clearly something non working properly in my custom board. First thought is the sdram not well initialized, resulting in very slow execution of the u-boot code. Is there a way in u-boot to test code execution speed ?
Best Regards, Angelo Dureghello

ons 2012-11-14 klockan 10:47 +0100 skrev Angelo Dureghello:
Also, i don't understand why "-g" is set by default.
My question becouse sometime embedded programmers fight for a bite free in the flash. On limited boards like mine (4M Flash) once kernel and some apps are stored, very small size remain for u-boot and some n.v. data storage. But no problem for me, i change flags as needed eventually.
-g do not change the on-flash size at all. The debug information is not included in the u-boot.bin file you install in flash.
Sure, 20 Mhz boards are able to handle 115200 as well. So there is clearly something non working properly in my custom board. First thought is the sdram not well initialized, resulting in very slow execution of the u-boot code. Is there a way in u-boot to test code execution speed ?
Memory speed can be tested with mw command to fill some region of unused memory with a pattern.

Dear Angelo,
In message 20121114094706.GA5697@angel3 you wrote:
Which exact tool chain are you using to build the code?
I tested 2 different toolchains,
m68k-elf-gcc (Sourcery CodeBench Lite 2011.09-21) 4.6.1
OK, this should be recent enough...
It is set because it is useful to some (those in the need of debugging their code) and does not hurt others.
My question becouse sometime embedded programmers fight for a bite free in the flash. On limited boards like mine (4M Flash) once kernel and some apps are stored, very small size remain for u-boot and some n.v. data storage. But no problem for me, i change flags as needed eventually.
-g has zero impact on the image size!! The debug symbol tables are only present in the ELFK file, but they are not included in the binary image you install on your flash. This is what I mean by "it does not hurt others" - the U-Boot image size with or without -g is exactly the same.
Sure, 20 Mhz boards are able to handle 115200 as well. So there is clearly something non working properly in my custom board. First thought is the sdram not well initialized, resulting in very slow execution of the u-boot code. Is there a way in u-boot to test code execution speed ?
Not directly. If you just want to compare boards it is often sufficient to do this with some shell loops, say:
date ; for i in 1 2 3 4 5 6 7 8 9 0 ; do for j in 1 2 3 4 5 6 7 8 9 0 ; do for k in 1 2 3 4 5 6 7 8 9 0 ; do setenv foo 1 ; setenv foo done ; done ; done ; date
Best regards,
Wolfgang Denk

Hi all,
i finally found an issue in cache initialization for this cpu.
Setting correct CACR / ACR regs solved the issue.
Many thanks,
Best Regards Angelo Dureghello

Dear Angelo,
In message 20121119215454.GA7107@angel3 you wrote:
i finally found an issue in cache initialization for this cpu.
Setting correct CACR / ACR regs solved the issue.
Ah! Thanks a lot for the feedback.
I'm glad my initial suspicions were right. Can you please post some hints / patches what needs to be fixed, and where?
Thanks in advance.
Best regards,
Wolfgang Denk

On Mon, Nov 19, 2012 at 11:58:24PM +0100, Wolfgang Denk wrote:
Dear Wolfgang and All,
I'm glad my initial suspicions were right. Can you please post some hints / patches what needs to be fixed, and where?
Sorry for the late reply. I damaged a gpio port of the board and still re-soldering a better one for my tests.
Issues was related to my "board.h" only, and maybe related to mcf5307 support, that i am trying add to u-boot.
mcf5307 seems to have someway a special case for the cache. It has a 8KB unified instruction+data cache, that seems to be the same, and this is quite oddly, of V.2 mcf5249.
As a first point, my board.h was not configuring any sdram cache-enabled address area. This was probably causing a very slow non-cached code execution and non-cached data read/write. In this case execution is a bit faster changing -Os to -O2.
So i enabled the sdram cache with:
#define CONFIG_SYS_CACHELINE_SIZE 16
#define ICACHE_STATUS (CONFIG_SYS_INIT_RAM_ADDR + \ CONFIG_SYS_INIT_RAM_SIZE - 8) #define DCACHE_STATUS (CONFIG_SYS_INIT_RAM_ADDR + \ CONFIG_SYS_INIT_RAM_SIZE - 4) #define CONFIG_SYS_ICACHE_INV (CF_CACR_CINVA) #define CONFIG_SYS_CACHE_ACR0 (CF_ACR_CM_WT | CF_ACR_SM_ALL | \ CF_ACR_EN) #define CONFIG_SYS_CACHE_ICACR (CF_CACR_DCM_P | CF_CACR_ESB | \ CF_CACR_EC)
Still, had to use "write-through", becouse "copyback" need also a "flush" (m68k "cpushl") that is still not iplemented in u-boot for m68k (it seems to be a to-do).
Finally, i had to remove every mask from ACR (CF_ADDRMASK(x) macro) since was causing for my case a complete block after cache initialization. For this, i looked uClinu cache init, but still have to understand clearly why the mask setup block the boot. I will clarify this as soon i have a new board running and will post back.
Best Regards, Angelo Dureghello

Dear all,
i have found out that this issue was also and probably mainly caused from the issue fixed in my last patch posted:
http://patchwork.ozlabs.org/patch/201421/
All serial driver routines (for mcfuart.c in my case) was executed from the flash memory, even after monitor relocation to ram.
Considering flash memory access time (10 wait states set in my case), this was causing code execution speed to be dramatically slow, near the limit for receiving at 115200 uart speed.
Enabling cache for the sdram fasted up a bit the operations and solved the issue, but main problem here was certainly the execution from flash.
Best Regards Angelo Dureghello

Dear Angelo,
In message 20121123225721.GA28751@angel3 you wrote:
i have found out that this issue was also and probably mainly caused from the issue fixed in my last patch posted:
http://patchwork.ozlabs.org/patch/201421/
All serial driver routines (for mcfuart.c in my case) was executed from the flash memory, even after monitor relocation to ram.
Considering flash memory access time (10 wait states set in my case), this was causing code execution speed to be dramatically slow, near the limit for receiving at 115200 uart speed.
Enabling cache for the sdram fasted up a bit the operations and solved the issue, but main problem here was certainly the execution from flash.
Thanks for following up on this!
Best regards,
Wolfgang Denk
participants (4)
-
Angelo Dureghello
-
Henrik Nordström
-
Marek Vasut
-
Wolfgang Denk