How to debug u-boot data abort

Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks

在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
* e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>

在 2022/3/23 15:45, qianfan 写道:
在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling
for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>

在 2022/3/23 16:02, qianfan 写道:
在 2022/3/23 15:45, qianfan 写道:
在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling
for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>
"data abort" messages:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ...
objdump u-boot:pc is in malloc and lr is in env_attr_walk
unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4]
r3 is 3ff589e0 and it's not a valid ram address on am335x.

On 3/23/22 10:13, qianfan wrote:
在 2022/3/23 16:02, qianfan 写道:
在 2022/3/23 15:45, qianfan 写道:
在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>
"data abort" messages:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ...
objdump u-boot:pc is in malloc and lr is in env_attr_walk
unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4]
r3 is 3ff589e0 and it's not a valid ram address on am335x.
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote:
在 2022/3/23 16:02, qianfan 写道:
在 2022/3/23 15:45, qianfan 写道:
在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>
"data abort" messages:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ...
objdump u-boot:pc is in malloc and lr is in env_attr_walk
unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4]
r3 is 3ff589e0 and it's not a valid ram address on am335x.
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message printed.
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote:
在 2022/3/23 16:02, qianfan 写道:
在 2022/3/23 15:45, qianfan 写道:
在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>
"data abort" messages:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ...
objdump u-boot:pc is in malloc and lr is in env_attr_walk
unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4]
r3 is 3ff589e0 and it's not a valid ram address on am335x.
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Best regards
Heinrich
printed.
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

no malloc messages even if i remove the _DEBUG marco check in assert. maybe it can’t detected by do_check_inuse_chunk().
在 2022年3月23日,18:12,Heinrich Schuchardt xypron.glpk@gmx.de 写道: On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote:
在 2022/3/23 16:02, qianfan 写道:
在 2022/3/23 15:45, qianfan 写道:
在 2022/3/23 10:28, qianfan 写道: > Hi: > I had a custom AM335X board connected my computer by usbnet. It > always report data abort when 'dhcp': > Next it the log: > U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 > +0800) > CPU : AM335X-GP rev 2.1 > Model: WISDOM AM335X CCT > DRAM: 512 MiB > NAND: 256 MiB > MMC: OMAP SD/MMC: 0 > Loading Environment from NAND... *** Warning - bad CRC, using > default environment > Net: Could not get PHY for ethernet@4a100000: addr 0 > eth2: ethernet@4a100000, eth3: usb_ether > Hit any key to stop autoboot: 0 > => setenv autoload no > => dhcp > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > MAC de:ad:be:ef:00:01 > HOST MAC de:ad:be:ef:00:00 > RNDIS ready > musb-hdrc: peripheral reset irq lost! > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > USB RNDIS network up! > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > DHCP client bound to address 192.168.200.4 (757 ms) > data abort > pc : [<9fe9b0a2>] lr : [<9febbc3f>] > reloc pc : [<808130a2>] lr : [<80833c3f>] > sp : 9de53410 ip : 9de53578 fp : 00000001 > r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > Code: f023 0303 60ca 4403 (6091) 685a > Resetting CPU ... > resetting ... > It's there has any doc about how to debug data abort? Or is the bug > is already fixed? > Thanks This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad. Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem. find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code. ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100 fs: fat: consistent error handling for flush_dir() Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362). In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir()
- 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>
"data abort" messages: data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ... objdump u-boot:pc is in malloc and lr is in env_attr_walk unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4] r3 is 3ff589e0 and it's not a valid ram address on am335x.
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer. The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1. You should be able to get the assert output with #include <common.h> #define _DEBUG 1 #include <log.h> at the top of common/dlmalloc.c. You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Best regards
Heinrich
printed.
#define DEBUG 1 #include <common.h> #include <log.h> Best regards Heinrich

Adding 'while (1) ;' before bad_mode in data_abort function and so I can gdb u-boot when data abort.
Yes, I can connect it via gdb, but bt can't show the full stack.
(gdb) add-symbol-file u-boot 0x9ff66000 add symbol table from file "u-boot" at .text_addr = 0x9ff66000 (y or n) y Reading symbols from u-boot...done. (gdb) bt #0 do_data_abort (pt_regs=0x9df30eb8) at arch/arm/lib/interrupts.c:169 #1 0x9ff661c8 in data_abort () at arch/arm/lib/vectors.S:271 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Hi @Sean Anderson, I had notice that you just submit some patchs about malloc, could you please also check this?
在 2022/3/23 18:12, Heinrich Schuchardt 写道:
On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote:
在 2022/3/23 16:02, qianfan 写道:
在 2022/3/23 15:45, qianfan 写道:
在 2022/3/23 10:28, qianfan 写道: > > Hi: > > I had a custom AM335X board connected my computer by usbnet. It > always report data abort when 'dhcp': > > Next it the log: > > U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 > +0800) > > CPU : AM335X-GP rev 2.1 > Model: WISDOM AM335X CCT > DRAM: 512 MiB > NAND: 256 MiB > MMC: OMAP SD/MMC: 0 > Loading Environment from NAND... *** Warning - bad CRC, using > default environment > > Net: Could not get PHY for ethernet@4a100000: addr 0 > eth2: ethernet@4a100000, eth3: usb_ether > Hit any key to stop autoboot: 0 > => setenv autoload no > => dhcp > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > MAC de:ad:be:ef:00:01 > HOST MAC de:ad:be:ef:00:00 > RNDIS ready > musb-hdrc: peripheral reset irq lost! > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > USB RNDIS network up! > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > DHCP client bound to address 192.168.200.4 (757 ms) > data abort > pc : [<9fe9b0a2>] lr : [<9febbc3f>] > reloc pc : [<808130a2>] lr : [<80833c3f>] > sp : 9de53410 ip : 9de53578 fp : 00000001 > r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > Code: f023 0303 60ca 4403 (6091) 685a > Resetting CPU ... > > resetting ... > > > It's there has any doc about how to debug data abort? Or is the bug > is already fixed? > > Thanks > This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>
"data abort" messages:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ...
objdump u-boot:pc is in malloc and lr is in env_attr_walk
unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4]
r3 is 3ff589e0 and it's not a valid ram address on am335x.
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Best regards
Heinrich
printed.
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

在 2022/3/23 18:12, Heinrich Schuchardt 写道:
On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote:
在 2022/3/23 16:02, qianfan 写道:
在 2022/3/23 15:45, qianfan 写道:
在 2022/3/23 10:28, qianfan 写道: > > Hi: > > I had a custom AM335X board connected my computer by usbnet. It > always report data abort when 'dhcp': > > Next it the log: > > U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 > +0800) > > CPU : AM335X-GP rev 2.1 > Model: WISDOM AM335X CCT > DRAM: 512 MiB > NAND: 256 MiB > MMC: OMAP SD/MMC: 0 > Loading Environment from NAND... *** Warning - bad CRC, using > default environment > > Net: Could not get PHY for ethernet@4a100000: addr 0 > eth2: ethernet@4a100000, eth3: usb_ether > Hit any key to stop autoboot: 0 > => setenv autoload no > => dhcp > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > MAC de:ad:be:ef:00:01 > HOST MAC de:ad:be:ef:00:00 > RNDIS ready > musb-hdrc: peripheral reset irq lost! > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > USB RNDIS network up! > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > DHCP client bound to address 192.168.200.4 (757 ms) > data abort > pc : [<9fe9b0a2>] lr : [<9febbc3f>] > reloc pc : [<808130a2>] lr : [<80833c3f>] > sp : 9de53410 ip : 9de53578 fp : 00000001 > r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > Code: f023 0303 60ca 4403 (6091) 685a > Resetting CPU ... > > resetting ... > > > It's there has any doc about how to debug data abort? Or is the bug > is already fixed? > > Thanks > This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>
"data abort" messages:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ...
objdump u-boot:pc is in malloc and lr is in env_attr_walk
unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4]
r3 is 3ff589e0 and it's not a valid ram address on am335x.
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Finally I got a malloc error message on console:
TFTP from server 192.168.200.1; our IP address is 192.168.200.39 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ###################################################### 0 Bytes 1.9 MiB/s done Bytes transferred = 1274816 (1373c0 hex) common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' failed.
I had tried many times, do_check_chunk not always failed, and sometimes it report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' failed. The situation is not the same.
I got a bt stack when malloc failed:
(gdb) bt #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at common/dlmalloc.c:866 #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) at common/dlmalloc.c:900 #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized out>, attributes=0x9df28dec "") at env/attr.c:184 #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, env_flag=512, flag=0) at cmd/nvedit.c:296 #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) at cmd/nvedit.c:318 #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, argv=0x9df442c8) at cmd/net.c:268 #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at common/cli_hush.c:3206 #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 #21 0x9ff77c1a in cli_loop () at common/cli.c:229 #22 0x9ff70d3e in main_loop () at common/main.c:66 #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at include/initcall.h:46 #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at common/board_r.c:822 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Best regards
Heinrich
printed.
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

It's very strange. And I can't detect it's a bug of usb or dlmalloc.
1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
2. Starting u-boot and dhcp via am335x's usb net, data abort.
3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
在 2022/3/24 17:33, qianfan 写道:
在 2022/3/23 18:12, Heinrich Schuchardt 写道:
On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote:
在 2022/3/23 16:02, qianfan 写道:
在 2022/3/23 15:45, qianfan 写道: > > > 在 2022/3/23 10:28, qianfan 写道: >> >> Hi: >> >> I had a custom AM335X board connected my computer by usbnet. It >> always report data abort when 'dhcp': >> >> Next it the log: >> >> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >> +0800) >> >> CPU : AM335X-GP rev 2.1 >> Model: WISDOM AM335X CCT >> DRAM: 512 MiB >> NAND: 256 MiB >> MMC: OMAP SD/MMC: 0 >> Loading Environment from NAND... *** Warning - bad CRC, using >> default environment >> >> Net: Could not get PHY for ethernet@4a100000: addr 0 >> eth2: ethernet@4a100000, eth3: usb_ether >> Hit any key to stop autoboot: 0 >> => setenv autoload no >> => dhcp >> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >> MAC de:ad:be:ef:00:01 >> HOST MAC de:ad:be:ef:00:00 >> RNDIS ready >> musb-hdrc: peripheral reset irq lost! >> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >> USB RNDIS network up! >> BOOTP broadcast 1 >> BOOTP broadcast 2 >> BOOTP broadcast 3 >> DHCP client bound to address 192.168.200.4 (757 ms) >> data abort >> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >> reloc pc : [<808130a2>] lr : [<80833c3f>] >> sp : 9de53410 ip : 9de53578 fp : 00000001 >> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >> Code: f023 0303 60ca 4403 (6091) 685a >> Resetting CPU ... >> >> resetting ... >> >> >> It's there has any doc about how to debug data abort? Or is the bug >> is already fixed? >> >> Thanks >> > This bug doesn't fixed on master code. I found v2021.01 is good and > v2021.04-rc2 is bad. > > Also I had tested this on beaglebone black with am335x_evm_defconfig, > has the simliar problem. > > find the first bug commit via 'git bisect': it told me that commit > e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very > strange due to this commit doesn't touch any dhcp or network code. > > ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug > e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit > commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 > Author: Heinrich Schuchardt xypron.glpk@gmx.de > Date: Wed Jan 20 22:21:53 2021 +0100 > > fs: fat: consistent error handling for flush_dir() > > Provide function description for flush_dir(). > Move all error messages for flush_dir() from the callers to the > function. > Move mapping of errors to -EIO to the function. > Always check return value of flush_dir() (Coverity CID 316362). > > In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. > > Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de > > :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e > 77d188b1c99181fd71f2167fdeee3434a09db209 M fs > > > 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before > e97eb638de0dc8f6e989e20eaeb0342f103cb917: > > * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error > handling for flush_dir() > * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag > 'u-boot-rockchip-20210121' of > https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip > |\ > | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc > based PCIe controller driver > > I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. > > U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) > > CPU : AM335X-GP rev 2.1 > Model: TI AM335x BeagleBone Black > DRAM: 512 MiB > WDT: Started with servicing (60s timeout) > NAND: 0 MiB > MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 > Loading Environment from FAT... <ethaddr> not set. Validating first > E-fuse MAC > Net: eth2: ethernet@4a100000, eth3: usb_ether > Hit any key to stop autoboot: 0 > => dhcp > ethernet@4a100000 Waiting for PHY auto negotiation to > complete......... TIMEOUT ! > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > MAC de:ad:be:ef:00:01 > HOST MAC de:ad:be:ef:00:00 > RNDIS ready > musb-hdrc: peripheral reset irq lost! > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > USB RNDIS network up! > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > DHCP client bound to address 192.168.200.157 (757 ms) > Using usb_ether device > TFTP from server 192.168.200.1; our IP address is 192.168.200.157 > Filename 'u-boot.img'. > Load address: 0x82000000 > Loading: > ################################################################# > ################################################################# > ################################################################# > ######################### > 2.5 MiB/s > done > Bytes transferred = 1123888 (112630 hex) > => >
"data abort" messages:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ...
objdump u-boot:pc is in malloc and lr is in env_attr_walk
unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4]
r3 is 3ff589e0 and it's not a valid ram address on am335x.
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Finally I got a malloc error message on console:
TFTP from server 192.168.200.1; our IP address is 192.168.200.39 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ###################################################### 0 Bytes 1.9 MiB/s done Bytes transferred = 1274816 (1373c0 hex) common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' failed.
I had tried many times, do_check_chunk not always failed, and sometimes it report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' failed. The situation is not the same.
I got a bt stack when malloc failed:
(gdb) bt #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at common/dlmalloc.c:866 #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) at common/dlmalloc.c:900 #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized out>, attributes=0x9df28dec "") at env/attr.c:184 #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, env_flag=512, flag=0) at cmd/nvedit.c:296 #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) at cmd/nvedit.c:318 #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, argv=0x9df442c8) at cmd/net.c:268 #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at common/cli_hush.c:3206 #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 #21 0x9ff77c1a in cli_loop () at common/cli.c:229 #22 0x9ff70d3e in main_loop () at common/main.c:66 #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at include/initcall.h:46 #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at common/board_r.c:822 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Best regards
Heinrich
printed.
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

Hello,
qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
It's very strange. And I can't detect it's a bug of usb or dlmalloc.
Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
Starting u-boot and dhcp via am335x's usb net, data abort.
start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
I am sorry to re-open a thread that is one year old but this is still an open bug. The BBB is affected. In particular the BBBW because there is no Ethernet connector, which makes the Eth-over-USB emulation even more important. All U-Boots since 2021 are affected: spurious data aborts, usually at the end of network interactions (tftp, ping). I could not bisect it because the boot was deeply broken as well on a significant range of commits :-/.
On my side I narrowed it down to an env update which fails in malloc as well. If I comment the env update, it fails a bit later. It really looks like a stack corruption which is either related to the Ethernet USB gadget or the USB controller driver itself. Network transfers on the BBBW using regular Ethernet does not trigger any error.
I also observe the very strange "fix" mentioned above: starting and killing fastboot makes all tftp pass... If anyone has more details to share, or perhaps a subsequent thread giving more details, I would really like to see this fixed upstream, I suppose I am not the only one :-)
Thanks, Miquèl
在 2022/3/24 17:33, qianfan 写道:
在 2022/3/23 18:12, Heinrich Schuchardt 写道:
On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote:
在 2022/3/23 16:02, qianfan 写道: > > > 在 2022/3/23 15:45, qianfan 写道: >> >> >> 在 2022/3/23 10:28, qianfan 写道: >>> >>> Hi: >>> >>> I had a custom AM335X board connected my computer by usbnet. It >>> always report data abort when 'dhcp': >>> >>> Next it the log: >>> >>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>> +0800) >>> >>> CPU : AM335X-GP rev 2.1 >>> Model: WISDOM AM335X CCT >>> DRAM: 512 MiB >>> NAND: 256 MiB >>> MMC: OMAP SD/MMC: 0 >>> Loading Environment from NAND... *** Warning - bad CRC, using >>> default environment >>> >>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>> eth2: ethernet@4a100000, eth3: usb_ether >>> Hit any key to stop autoboot: 0 >>> => setenv autoload no >>> => dhcp >>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>> MAC de:ad:be:ef:00:01 >>> HOST MAC de:ad:be:ef:00:00 >>> RNDIS ready >>> musb-hdrc: peripheral reset irq lost! >>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>> USB RNDIS network up! >>> BOOTP broadcast 1 >>> BOOTP broadcast 2 >>> BOOTP broadcast 3 >>> DHCP client bound to address 192.168.200.4 (757 ms) >>> data abort >>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>> Code: f023 0303 60ca 4403 (6091) 685a >>> Resetting CPU ... >>> >>> resetting ... >>> >>> >>> It's there has any doc about how to debug data abort? Or is the bug >>> is already fixed? >>> >>> Thanks >>> >> This bug doesn't fixed on master code. I found v2021.01 is good and >> v2021.04-rc2 is bad. >> >> Also I had tested this on beaglebone black with am335x_evm_defconfig, >> has the simliar problem. >> >> find the first bug commit via 'git bisect': it told me that commit >> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >> strange due to this commit doesn't touch any dhcp or network code. >> >> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >> Author: Heinrich Schuchardt xypron.glpk@gmx.de >> Date: Wed Jan 20 22:21:53 2021 +0100 >> >> fs: fat: consistent error handling for flush_dir() >> >> Provide function description for flush_dir(). >> Move all error messages for flush_dir() from the callers to the >> function. >> Move mapping of errors to -EIO to the function. >> Always check return value of flush_dir() (Coverity CID 316362). >> >> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >> >> Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de >> >> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >> >> >> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >> >> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >> handling for flush_dir() >> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >> 'u-boot-rockchip-20210121' of >> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >> |\ >> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >> based PCIe controller driver >> >> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >> >> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >> >> CPU : AM335X-GP rev 2.1 >> Model: TI AM335x BeagleBone Black >> DRAM: 512 MiB >> WDT: Started with servicing (60s timeout) >> NAND: 0 MiB >> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >> Loading Environment from FAT... <ethaddr> not set. Validating first >> E-fuse MAC >> Net: eth2: ethernet@4a100000, eth3: usb_ether >> Hit any key to stop autoboot: 0 >> => dhcp >> ethernet@4a100000 Waiting for PHY auto negotiation to >> complete......... TIMEOUT ! >> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >> MAC de:ad:be:ef:00:01 >> HOST MAC de:ad:be:ef:00:00 >> RNDIS ready >> musb-hdrc: peripheral reset irq lost! >> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >> USB RNDIS network up! >> BOOTP broadcast 1 >> BOOTP broadcast 2 >> BOOTP broadcast 3 >> DHCP client bound to address 192.168.200.157 (757 ms) >> Using usb_ether device >> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >> Filename 'u-boot.img'. >> Load address: 0x82000000 >> Loading: >> ################################################################# >> ################################################################# >> ################################################################# >> ######################### >> 2.5 MiB/s >> done >> Bytes transferred = 1123888 (112630 hex) >> => >> "data abort" messages:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ...
objdump u-boot:pc is in malloc and lr is in env_attr_walk
unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4]
r3 is 3ff589e0 and it's not a valid ram address on am335x.
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Finally I got a malloc error message on console:
TFTP from server 192.168.200.1; our IP address is 192.168.200.39 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ###################################################### 0 Bytes 1.9 MiB/s done Bytes transferred = 1274816 (1373c0 hex) common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed.
I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same.
I got a bt stack when malloc failed:
(gdb) bt #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 #21 0x9ff77c1a in cli_loop () at common/cli.c:229 #22 0x9ff70d3e in main_loop () at common/main.c:66 #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Best regards
Heinrich
printed.
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

Am 20. Juli 2023 18:39:17 MESZ schrieb Miquel Raynal miquel.raynal@bootlin.com:
Hello,
qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
It's very strange. And I can't detect it's a bug of usb or dlmalloc.
Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
Starting u-boot and dhcp via am335x's usb net, data abort.
start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
I am sorry to re-open a thread that is one year old but this is still an open bug. The BBB is affected. In particular the BBBW because there is no Ethernet connector, which makes the Eth-over-USB emulation even more important. All U-Boots since 2021 are affected: spurious data aborts, usually at the end of network interactions (tftp, ping). I could not bisect it because the boot was deeply broken as well on a significant range of commits :-/.
On my side I narrowed it down to an env update which fails in malloc as well. If I comment the env update, it fails a bit later. It really looks like a stack corruption which is either related to the Ethernet USB gadget or the USB controller driver itself. Network transfers on the BBBW using regular Ethernet does not trigger any error.
I also observe the very strange "fix" mentioned above: starting and killing fastboot makes all tftp pass... If anyone has more details to share, or perhaps a subsequent thread giving more details, I would really like to see this fixed upstream, I suppose I am not the only one :-)
Thanks, Miquèl
Can this problem be reproduced on QEMU?
Best regards
Heinrich
在 2022/3/24 17:33, qianfan 写道:
在 2022/3/23 18:12, Heinrich Schuchardt 写道:
On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote: > > 在 2022/3/23 16:02, qianfan 写道: >> >> >> 在 2022/3/23 15:45, qianfan 写道: >>> >>> >>> 在 2022/3/23 10:28, qianfan 写道: >>>> >>>> Hi: >>>> >>>> I had a custom AM335X board connected my computer by usbnet. It >>>> always report data abort when 'dhcp': >>>> >>>> Next it the log: >>>> >>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>> +0800) >>>> >>>> CPU : AM335X-GP rev 2.1 >>>> Model: WISDOM AM335X CCT >>>> DRAM: 512 MiB >>>> NAND: 256 MiB >>>> MMC: OMAP SD/MMC: 0 >>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>> default environment >>>> >>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>> eth2: ethernet@4a100000, eth3: usb_ether >>>> Hit any key to stop autoboot: 0 >>>> => setenv autoload no >>>> => dhcp >>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>> MAC de:ad:be:ef:00:01 >>>> HOST MAC de:ad:be:ef:00:00 >>>> RNDIS ready >>>> musb-hdrc: peripheral reset irq lost! >>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>> USB RNDIS network up! >>>> BOOTP broadcast 1 >>>> BOOTP broadcast 2 >>>> BOOTP broadcast 3 >>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>> data abort >>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>> Code: f023 0303 60ca 4403 (6091) 685a >>>> Resetting CPU ... >>>> >>>> resetting ... >>>> >>>> >>>> It's there has any doc about how to debug data abort? Or is the bug >>>> is already fixed? >>>> >>>> Thanks >>>> >>> This bug doesn't fixed on master code. I found v2021.01 is good and >>> v2021.04-rc2 is bad. >>> >>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>> has the simliar problem. >>> >>> find the first bug commit via 'git bisect': it told me that commit >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>> strange due to this commit doesn't touch any dhcp or network code. >>> >>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>> Author: Heinrich Schuchardt xypron.glpk@gmx.de >>> Date: Wed Jan 20 22:21:53 2021 +0100 >>> >>> fs: fat: consistent error handling for flush_dir() >>> >>> Provide function description for flush_dir(). >>> Move all error messages for flush_dir() from the callers to the >>> function. >>> Move mapping of errors to -EIO to the function. >>> Always check return value of flush_dir() (Coverity CID 316362). >>> >>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>> >>> Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de >>> >>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>> >>> >>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>> >>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>> handling for flush_dir() >>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>> 'u-boot-rockchip-20210121' of >>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>> |\ >>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>> based PCIe controller driver >>> >>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>> >>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>> >>> CPU : AM335X-GP rev 2.1 >>> Model: TI AM335x BeagleBone Black >>> DRAM: 512 MiB >>> WDT: Started with servicing (60s timeout) >>> NAND: 0 MiB >>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>> Loading Environment from FAT... <ethaddr> not set. Validating first >>> E-fuse MAC >>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>> Hit any key to stop autoboot: 0 >>> => dhcp >>> ethernet@4a100000 Waiting for PHY auto negotiation to >>> complete......... TIMEOUT ! >>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>> MAC de:ad:be:ef:00:01 >>> HOST MAC de:ad:be:ef:00:00 >>> RNDIS ready >>> musb-hdrc: peripheral reset irq lost! >>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>> USB RNDIS network up! >>> BOOTP broadcast 1 >>> BOOTP broadcast 2 >>> BOOTP broadcast 3 >>> DHCP client bound to address 192.168.200.157 (757 ms) >>> Using usb_ether device >>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>> Filename 'u-boot.img'. >>> Load address: 0x82000000 >>> Loading: >>> ################################################################# >>> ################################################################# >>> ################################################################# >>> ######################### >>> 2.5 MiB/s >>> done >>> Bytes transferred = 1123888 (112630 hex) >>> => >>> > "data abort" messages: > > data abort > pc : [<9ff8196c>] lr : [<9ffa1cd7>] > reloc pc : [<8081496c>] lr : [<80834cd7>] > sp : 9df38e60 ip : 9df38fc8 fp : 00000001 > r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d > r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 > r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > Code: 0303 60ca 4403 6091 (685a) f042 > Resetting CPU ... > > objdump u-boot:pc is in malloc and lr is in env_attr_walk > > unlink(victim, bck, fwd); > 80814966: 60ca str r2, [r1, #12] > set_inuse_bit_at_offset(victim, victim_size); > 80814968: 4403 add r3, r0 > unlink(victim, bck, fwd); > 8081496a: 6091 str r1, [r2, #8] > set_inuse_bit_at_offset(victim, victim_size); > 8081496c: 685a ldr r2, [r3, #4] > 8081496e: f042 0201 orr.w r2, r2, #1 > 80814972: 605a str r2, [r3, #4] > > r3 is 3ff589e0 and it's not a valid ram address on am335x. > >
I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Finally I got a malloc error message on console:
TFTP from server 192.168.200.1; our IP address is 192.168.200.39 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ###################################################### 0 Bytes 1.9 MiB/s done Bytes transferred = 1274816 (1373c0 hex) common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed.
I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same.
I got a bt stack when malloc failed:
(gdb) bt #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 #21 0x9ff77c1a in cli_loop () at common/cli.c:229 #22 0x9ff70d3e in main_loop () at common/main.c:66 #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Best regards
Heinrich
printed.
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

Hi Heinrich,
xypron.glpk@gmx.de wrote on Thu, 20 Jul 2023 19:55:39 +0200:
Am 20. Juli 2023 18:39:17 MESZ schrieb Miquel Raynal miquel.raynal@bootlin.com:
Hello,
qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
It's very strange. And I can't detect it's a bug of usb or dlmalloc.
Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
Starting u-boot and dhcp via am335x's usb net, data abort.
start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
I am sorry to re-open a thread that is one year old but this is still an open bug. The BBB is affected. In particular the BBBW because there is no Ethernet connector, which makes the Eth-over-USB emulation even more important. All U-Boots since 2021 are affected: spurious data aborts, usually at the end of network interactions (tftp, ping). I could not bisect it because the boot was deeply broken as well on a significant range of commits :-/.
On my side I narrowed it down to an env update which fails in malloc as well. If I comment the env update, it fails a bit later. It really looks like a stack corruption which is either related to the Ethernet USB gadget or the USB controller driver itself. Network transfers on the BBBW using regular Ethernet does not trigger any error.
I also observe the very strange "fix" mentioned above: starting and killing fastboot makes all tftp pass... If anyone has more details to share, or perhaps a subsequent thread giving more details, I would really like to see this fixed upstream, I suppose I am not the only one :-)
Thanks, Miquèl
Can this problem be reproduced on QEMU?
I haven't tried on QEMU, what do you have in mind? What should we try to do?
Thanks, Miquèl
Best regards
Heinrich
在 2022/3/24 17:33, qianfan 写道:
在 2022/3/23 18:12, Heinrich Schuchardt 写道:
On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道: > On 3/23/22 10:13, qianfan wrote: >> >> 在 2022/3/23 16:02, qianfan 写道: >>> >>> >>> 在 2022/3/23 15:45, qianfan 写道: >>>> >>>> >>>> 在 2022/3/23 10:28, qianfan 写道: >>>>> >>>>> Hi: >>>>> >>>>> I had a custom AM335X board connected my computer by usbnet. It >>>>> always report data abort when 'dhcp': >>>>> >>>>> Next it the log: >>>>> >>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>>> +0800) >>>>> >>>>> CPU : AM335X-GP rev 2.1 >>>>> Model: WISDOM AM335X CCT >>>>> DRAM: 512 MiB >>>>> NAND: 256 MiB >>>>> MMC: OMAP SD/MMC: 0 >>>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>>> default environment >>>>> >>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>>> eth2: ethernet@4a100000, eth3: usb_ether >>>>> Hit any key to stop autoboot: 0 >>>>> => setenv autoload no >>>>> => dhcp >>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>> MAC de:ad:be:ef:00:01 >>>>> HOST MAC de:ad:be:ef:00:00 >>>>> RNDIS ready >>>>> musb-hdrc: peripheral reset irq lost! >>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>> USB RNDIS network up! >>>>> BOOTP broadcast 1 >>>>> BOOTP broadcast 2 >>>>> BOOTP broadcast 3 >>>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>>> data abort >>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>> Code: f023 0303 60ca 4403 (6091) 685a >>>>> Resetting CPU ... >>>>> >>>>> resetting ... >>>>> >>>>> >>>>> It's there has any doc about how to debug data abort? Or is the bug >>>>> is already fixed? >>>>> >>>>> Thanks >>>>> >>>> This bug doesn't fixed on master code. I found v2021.01 is good and >>>> v2021.04-rc2 is bad. >>>> >>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>>> has the simliar problem. >>>> >>>> find the first bug commit via 'git bisect': it told me that commit >>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>>> strange due to this commit doesn't touch any dhcp or network code. >>>> >>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>>> Author: Heinrich Schuchardt xypron.glpk@gmx.de >>>> Date: Wed Jan 20 22:21:53 2021 +0100 >>>> >>>> fs: fat: consistent error handling for flush_dir() >>>> >>>> Provide function description for flush_dir(). >>>> Move all error messages for flush_dir() from the callers to the >>>> function. >>>> Move mapping of errors to -EIO to the function. >>>> Always check return value of flush_dir() (Coverity CID 316362). >>>> >>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>>> >>>> Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de >>>> >>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>>> >>>> >>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>>> >>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>>> handling for flush_dir() >>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>>> 'u-boot-rockchip-20210121' of >>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>>> |\ >>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>>> based PCIe controller driver >>>> >>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>>> >>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>>> >>>> CPU : AM335X-GP rev 2.1 >>>> Model: TI AM335x BeagleBone Black >>>> DRAM: 512 MiB >>>> WDT: Started with servicing (60s timeout) >>>> NAND: 0 MiB >>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>>> Loading Environment from FAT... <ethaddr> not set. Validating first >>>> E-fuse MAC >>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>>> Hit any key to stop autoboot: 0 >>>> => dhcp >>>> ethernet@4a100000 Waiting for PHY auto negotiation to >>>> complete......... TIMEOUT ! >>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>> MAC de:ad:be:ef:00:01 >>>> HOST MAC de:ad:be:ef:00:00 >>>> RNDIS ready >>>> musb-hdrc: peripheral reset irq lost! >>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>> USB RNDIS network up! >>>> BOOTP broadcast 1 >>>> BOOTP broadcast 2 >>>> BOOTP broadcast 3 >>>> DHCP client bound to address 192.168.200.157 (757 ms) >>>> Using usb_ether device >>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>>> Filename 'u-boot.img'. >>>> Load address: 0x82000000 >>>> Loading: >>>> ################################################################# >>>> ################################################################# >>>> ################################################################# >>>> ######################### >>>> 2.5 MiB/s >>>> done >>>> Bytes transferred = 1123888 (112630 hex) >>>> => >>>> >> "data abort" messages: >> >> data abort >> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >> reloc pc : [<8081496c>] lr : [<80834cd7>] >> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >> Code: 0303 60ca 4403 6091 (685a) f042 >> Resetting CPU ... >> >> objdump u-boot:pc is in malloc and lr is in env_attr_walk >> >> unlink(victim, bck, fwd); >> 80814966: 60ca str r2, [r1, #12] >> set_inuse_bit_at_offset(victim, victim_size); >> 80814968: 4403 add r3, r0 >> unlink(victim, bck, fwd); >> 8081496a: 6091 str r1, [r2, #8] >> set_inuse_bit_at_offset(victim, victim_size); >> 8081496c: 685a ldr r2, [r3, #4] >> 8081496e: f042 0201 orr.w r2, r2, #1 >> 80814972: 605a str r2, [r3, #4] >> >> r3 is 3ff589e0 and it's not a valid ram address on am335x. >> >> > > I have seen crashes in common/dlmalloc.c before after double free() or > free() with an incorrect pointer. > > The assert() statements in do_check_inuse_chunk() are meant to catch > this but assert() as defined in include/log.h does not stop the code and > even does not print without _DEBUG=1. > > You should be able to get the assert output with > > #include <common.h> > #define _DEBUG 1 > #include <log.h> > > at the top of common/dlmalloc.c. > > You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Finally I got a malloc error message on console:
TFTP from server 192.168.200.1; our IP address is 192.168.200.39 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ###################################################### 0 Bytes 1.9 MiB/s done Bytes transferred = 1274816 (1373c0 hex) common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed.
I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same.
I got a bt stack when malloc failed:
(gdb) bt #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 #21 0x9ff77c1a in cli_loop () at common/cli.c:229 #22 0x9ff70d3e in main_loop () at common/main.c:66 #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Best regards
Heinrich
printed.
> > #define DEBUG 1 > #include <common.h> > #include <log.h> > > Best regards > > Heinrich

On Thu, Jul 20, 2023 at 06:39:17PM +0200, Miquel Raynal wrote:
Hello,
qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
It's very strange. And I can't detect it's a bug of usb or dlmalloc.
Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
Starting u-boot and dhcp via am335x's usb net, data abort.
start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
I am sorry to re-open a thread that is one year old but this is still an open bug. The BBB is affected. In particular the BBBW because there is no Ethernet connector, which makes the Eth-over-USB emulation even more important. All U-Boots since 2021 are affected: spurious data aborts, usually at the end of network interactions (tftp, ping). I could not bisect it because the boot was deeply broken as well on a significant range of commits :-/.
On my side I narrowed it down to an env update which fails in malloc as well. If I comment the env update, it fails a bit later. It really looks like a stack corruption which is either related to the Ethernet USB gadget or the USB controller driver itself. Network transfers on the BBBW using regular Ethernet does not trigger any error.
I also observe the very strange "fix" mentioned above: starting and killing fastboot makes all tftp pass... If anyone has more details to share, or perhaps a subsequent thread giving more details, I would really like to see this fixed upstream, I suppose I am not the only one :-)
What happens if you increase the malloc pool from say 32MB (current value, 0x2000000) to 64MB (so 0x4000000) ?

Hi Tom,
trini@konsulko.com wrote on Thu, 20 Jul 2023 14:34:52 -0400:
On Thu, Jul 20, 2023 at 06:39:17PM +0200, Miquel Raynal wrote:
Hello,
qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
It's very strange. And I can't detect it's a bug of usb or dlmalloc.
Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
Starting u-boot and dhcp via am335x's usb net, data abort.
start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
I am sorry to re-open a thread that is one year old but this is still an open bug. The BBB is affected. In particular the BBBW because there is no Ethernet connector, which makes the Eth-over-USB emulation even more important. All U-Boots since 2021 are affected: spurious data aborts, usually at the end of network interactions (tftp, ping). I could not bisect it because the boot was deeply broken as well on a significant range of commits :-/.
On my side I narrowed it down to an env update which fails in malloc as well. If I comment the env update, it fails a bit later. It really looks like a stack corruption which is either related to the Ethernet USB gadget or the USB controller driver itself. Network transfers on the BBBW using regular Ethernet does not trigger any error.
I also observe the very strange "fix" mentioned above: starting and killing fastboot makes all tftp pass... If anyone has more details to share, or perhaps a subsequent thread giving more details, I would really like to see this fixed upstream, I suppose I am not the only one :-)
What happens if you increase the malloc pool from say 32MB (current value, 0x2000000) to 64MB (so 0x4000000) ?
Same result. I tried to increment the heap size to 64MB as well as the stack size (16 -> 64MB), same behavior.
Thanks, Miquèl

在 2023/7/21 0:39, Miquel Raynal 写道:
Hello,
qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
It's very strange. And I can't detect it's a bug of usb or dlmalloc.
Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
Starting u-boot and dhcp via am335x's usb net, data abort.
start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
I am sorry to re-open a thread that is one year old but this is still an open bug. The BBB is affected. In particular the BBBW because there is no Ethernet connector, which makes the Eth-over-USB emulation even more important. All U-Boots since 2021 are affected: spurious data aborts, usually at the end of network interactions (tftp, ping). I could not bisect it because the boot was deeply broken as well on a significant range of commits :-/.
On my side I narrowed it down to an env update which fails in malloc as well. If I comment the env update, it fails a bit later. It really looks like a stack corruption which is either related to the Ethernet USB gadget or the USB controller driver itself. Network transfers on the BBBW using regular Ethernet does not trigger any error.
I also observe the very strange "fix" mentioned above: starting and killing fastboot makes all tftp pass... If anyone has more details to share, or perhaps a subsequent thread giving more details, I would really like to see this fixed upstream, I suppose I am not the only one :-)
Hi:
Could you please try this two patches?
http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-1-qianf...
http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-2-qianf...
Thanks
Thanks, Miquèl
在 2022/3/24 17:33, qianfan 写道:
在 2022/3/23 18:12, Heinrich Schuchardt 写道:
On 3/23/22 11:07, qianfan wrote:
在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote: > 在 2022/3/23 16:02, qianfan 写道: >> >> 在 2022/3/23 15:45, qianfan 写道: >>> >>> 在 2022/3/23 10:28, qianfan 写道: >>>> Hi: >>>> >>>> I had a custom AM335X board connected my computer by usbnet. It >>>> always report data abort when 'dhcp': >>>> >>>> Next it the log: >>>> >>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>> +0800) >>>> >>>> CPU : AM335X-GP rev 2.1 >>>> Model: WISDOM AM335X CCT >>>> DRAM: 512 MiB >>>> NAND: 256 MiB >>>> MMC: OMAP SD/MMC: 0 >>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>> default environment >>>> >>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>> eth2: ethernet@4a100000, eth3: usb_ether >>>> Hit any key to stop autoboot: 0 >>>> => setenv autoload no >>>> => dhcp >>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>> MAC de:ad:be:ef:00:01 >>>> HOST MAC de:ad:be:ef:00:00 >>>> RNDIS ready >>>> musb-hdrc: peripheral reset irq lost! >>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>> USB RNDIS network up! >>>> BOOTP broadcast 1 >>>> BOOTP broadcast 2 >>>> BOOTP broadcast 3 >>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>> data abort >>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>> Code: f023 0303 60ca 4403 (6091) 685a >>>> Resetting CPU ... >>>> >>>> resetting ... >>>> >>>> >>>> It's there has any doc about how to debug data abort? Or is the bug >>>> is already fixed? >>>> >>>> Thanks >>>> >>> This bug doesn't fixed on master code. I found v2021.01 is good and >>> v2021.04-rc2 is bad. >>> >>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>> has the simliar problem. >>> >>> find the first bug commit via 'git bisect': it told me that commit >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>> strange due to this commit doesn't touch any dhcp or network code. >>> >>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>> Author: Heinrich Schuchardt xypron.glpk@gmx.de >>> Date: Wed Jan 20 22:21:53 2021 +0100 >>> >>> fs: fat: consistent error handling for flush_dir() >>> >>> Provide function description for flush_dir(). >>> Move all error messages for flush_dir() from the callers to the >>> function. >>> Move mapping of errors to -EIO to the function. >>> Always check return value of flush_dir() (Coverity CID 316362). >>> >>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>> >>> Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de >>> >>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>> >>> >>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>> >>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>> handling for flush_dir() >>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>> 'u-boot-rockchip-20210121' of >>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>> |\ >>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>> based PCIe controller driver >>> >>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>> >>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>> >>> CPU : AM335X-GP rev 2.1 >>> Model: TI AM335x BeagleBone Black >>> DRAM: 512 MiB >>> WDT: Started with servicing (60s timeout) >>> NAND: 0 MiB >>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>> Loading Environment from FAT... <ethaddr> not set. Validating first >>> E-fuse MAC >>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>> Hit any key to stop autoboot: 0 >>> => dhcp >>> ethernet@4a100000 Waiting for PHY auto negotiation to >>> complete......... TIMEOUT ! >>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>> MAC de:ad:be:ef:00:01 >>> HOST MAC de:ad:be:ef:00:00 >>> RNDIS ready >>> musb-hdrc: peripheral reset irq lost! >>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>> USB RNDIS network up! >>> BOOTP broadcast 1 >>> BOOTP broadcast 2 >>> BOOTP broadcast 3 >>> DHCP client bound to address 192.168.200.157 (757 ms) >>> Using usb_ether device >>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>> Filename 'u-boot.img'. >>> Load address: 0x82000000 >>> Loading: >>> ################################################################# >>> ################################################################# >>> ################################################################# >>> ######################### >>> 2.5 MiB/s >>> done >>> Bytes transferred = 1123888 (112630 hex) >>> => >>> > "data abort" messages: > > data abort > pc : [<9ff8196c>] lr : [<9ffa1cd7>] > reloc pc : [<8081496c>] lr : [<80834cd7>] > sp : 9df38e60 ip : 9df38fc8 fp : 00000001 > r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d > r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 > r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > Code: 0303 60ca 4403 6091 (685a) f042 > Resetting CPU ... > > objdump u-boot:pc is in malloc and lr is in env_attr_walk > > unlink(victim, bck, fwd); > 80814966: 60ca str r2, [r1, #12] > set_inuse_bit_at_offset(victim, victim_size); > 80814968: 4403 add r3, r0 > unlink(victim, bck, fwd); > 8081496a: 6091 str r1, [r2, #8] > set_inuse_bit_at_offset(victim, victim_size); > 8081496c: 685a ldr r2, [r3, #4] > 8081496e: f042 0201 orr.w r2, r2, #1 > 80814972: 605a str r2, [r3, #4] > > r3 is 3ff589e0 and it's not a valid ram address on am335x. > > I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer.
The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1.
You should be able to get the assert output with
#include <common.h> #define _DEBUG 1 #include <log.h>
at the top of common/dlmalloc.c.
You should get full malloc debug output with
Hi: I had try add DEBUG marco before <log.h> and no other malloc message
assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG.
Finally I got a malloc error message on console:
TFTP from server 192.168.200.1; our IP address is 192.168.200.39 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ###################################################### 0 Bytes 1.9 MiB/s done Bytes transferred = 1274816 (1373c0 hex) common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed.
I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same.
I got a bt stack when malloc failed:
(gdb) bt #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 #21 0x9ff77c1a in cli_loop () at common/cli.c:229 #22 0x9ff70d3e in main_loop () at common/main.c:66 #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Best regards
Heinrich
printed.
#define DEBUG 1 #include <common.h> #include <log.h>
Best regards
Heinrich

Hi qianfan,
qianfanguijin@163.com wrote on Fri, 21 Jul 2023 08:31:17 +0800:
在 2023/7/21 0:39, Miquel Raynal 写道:
Hello,
qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
It's very strange. And I can't detect it's a bug of usb or dlmalloc.
Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
Starting u-boot and dhcp via am335x's usb net, data abort.
start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
I am sorry to re-open a thread that is one year old but this is still an open bug. The BBB is affected. In particular the BBBW because there is no Ethernet connector, which makes the Eth-over-USB emulation even more important. All U-Boots since 2021 are affected: spurious data aborts, usually at the end of network interactions (tftp, ping). I could not bisect it because the boot was deeply broken as well on a significant range of commits :-/.
On my side I narrowed it down to an env update which fails in malloc as well. If I comment the env update, it fails a bit later. It really looks like a stack corruption which is either related to the Ethernet USB gadget or the USB controller driver itself. Network transfers on the BBBW using regular Ethernet does not trigger any error.
I also observe the very strange "fix" mentioned above: starting and killing fastboot makes all tftp pass... If anyone has more details to share, or perhaps a subsequent thread giving more details, I would really like to see this fixed upstream, I suppose I am not the only one :-)
Hi:
Could you please try this two patches?
http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-1-qianf...
http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-2-qianf...
Indeed these patches work. I ended up rewriting one of them to propose a different approach. I also found two other proposals for the same issue which are still pending around. I hope this submission will make it to avoid more time to be spent on this :-)
Thanks a lot for the pointers, I've Cc'ed you on the submissions.
Kind regards, Miquèl

On 3/23/22 08:45, qianfan wrote:
在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort
This could be an alignment error.
pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>]
You can use these addresses together with the u-boot.map file to figure out in which function the abort occurs and from where it was called.
Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions.
sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a
This is how to find the exact instruction causing the problem:
$ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \
ARCH=arm scripts/decodecode
Code: f023 0303 60ca 4403 (6091) 685a All code ======== 0: 23 f0 and %eax,%esi 2: 03 03 add (%rbx),%eax 4: ca 60 03 lret $0x360 7:* 44 91 rex.R xchg %eax,%ecx <-- trapping instruction 9: 60 (bad) a: 5a pop %rdx b: 68 .byte 0x68
Code starting with the faulting instruction =========================================== 0: 91 xchg %eax,%ecx 1: 60 (bad) 2: 5a pop %rdx 3: 68 .byte 0x68
I hope this helps to figure out, where exactly the problem occurs
Best regards
Heinrich
Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>

On Wed, Mar 23, 2022 at 09:27:08AM +0100, Heinrich Schuchardt wrote:
On 3/23/22 08:45, qianfan wrote:
在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort
This could be an alignment error.
pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>]
You can use these addresses together with the u-boot.map file to figure out in which function the abort occurs and from where it was called.
Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions.
sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a
This is how to find the exact instruction causing the problem:
$ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \
ARCH=arm scripts/decodecode
Code: f023 0303 60ca 4403 (6091) 685a All code ======== 0: 23 f0 and %eax,%esi 2: 03 03 add (%rbx),%eax 4: ca 60 03 lret $0x360 7:* 44 91 rex.R xchg %eax,%ecx <-- trapping instruction 9: 60 (bad) a: 5a pop %rdx b: 68 .byte 0x68
Code starting with the faulting instruction
0: 91 xchg %eax,%ecx 1: 60 (bad) 2: 5a pop %rdx 3: 68 .byte 0x68
The code looks like x86 instructions. Please don't forget to add "CROSS_COMPILE=..." :)
Code: f023 0303 60ca 4403 (6091) 685a All code ======== 0: f023 0303 bic.w r3, r3, #3 4: 60ca str r2, [r1, #12] 6: 4403 add r3, r0 8:* 6091 str r1, [r2, #8] <-- trapping instruction a: 685a ldr r2, [r3, #4]
Code starting with the faulting instruction =========================================== 0: 6091 str r1, [r2, #8] 2: 685a ldr r2, [r3, #4]
Then, ${CROSS_COMPILE}objdump --disassemble=malloc -lS ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${PATTERN} # Here, PATTERN may be the instruction ("6091") or the location ("8081496c" in your case?)
or similarly
${CROSS_COMPILE}gdb --batch -ex "disas/m ${LOC}" ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${LOC} # Here, LOC is your "reloc pc" (0x80817586)
gives you some hint about the exact location.
-Takahiro Akashi
I hope this helps to figure out, where exactly the problem occurs
Best regards
Heinrich
Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>

在 2022/3/24 11:18, AKASHI Takahiro 写道:
On Wed, Mar 23, 2022 at 09:27:08AM +0100, Heinrich Schuchardt wrote:
On 3/23/22 08:45, qianfan wrote:
在 2022/3/23 10:28, qianfan 写道:
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort
This could be an alignment error.
pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>]
You can use these addresses together with the u-boot.map file to figure out in which function the abort occurs and from where it was called.
Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions.
sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a
This is how to find the exact instruction causing the problem:
$ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \
ARCH=arm scripts/decodecode
Code: f023 0303 60ca 4403 (6091) 685a All code ======== 0: 23 f0 and %eax,%esi 2: 03 03 add (%rbx),%eax 4: ca 60 03 lret $0x360 7:* 44 91 rex.R xchg %eax,%ecx <-- trapping instruction 9: 60 (bad) a: 5a pop %rdx b: 68 .byte 0x68
Code starting with the faulting instruction
0: 91 xchg %eax,%ecx 1: 60 (bad) 2: 5a pop %rdx 3: 68 .byte 0x68
The code looks like x86 instructions. Please don't forget to add "CROSS_COMPILE=..." :)
Code: f023 0303 60ca 4403 (6091) 685a All code ======== 0: f023 0303 bic.w r3, r3, #3 4: 60ca str r2, [r1, #12] 6: 4403 add r3, r0 8:* 6091 str r1, [r2, #8] <-- trapping instruction a: 685a ldr r2, [r3, #4]
Code starting with the faulting instruction
0: 6091 str r1, [r2, #8] 2: 685a ldr r2, [r3, #4]
Then, ${CROSS_COMPILE}objdump --disassemble=malloc -lS ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${PATTERN} # Here, PATTERN may be the instruction ("6091") or the location ("8081496c" in your case?)
or similarly
${CROSS_COMPILE}gdb --batch -ex "disas/m ${LOC}" ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${LOC} # Here, LOC is your "reloc pc" (0x80817586)
gives you some hint about the exact location.
-Takahiro Akashi
Hi:
Thanks for your's guide. I know the pc in malloc and lr is env_attr_walk. But can't get the full stack or malloc.
I can't understand dlmalloc's logic and it's hard to me to solve this problem.
I hope this helps to figure out, where exactly the problem occurs
Best regards
Heinrich
Resetting CPU ...
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad.
Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem.
find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code.
➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt xypron.glpk@gmx.de Date: Wed Jan 20 22:21:53 2021 +0100
fs: fat: consistent error handling for flush_dir()
Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362).
In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs
184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917:
- e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver
I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) =>

Le mer. 23 mars 2022 à 03:28, qianfan qianfanguijin@163.com a écrit :
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
Don't have any idea on what is causing the crash, but to answer your question about debugging data abort : from the reg dump, you can look at the PC and LR registers to see the function that caused the crash (in PC) and its caller (in LR) by using the .map file (generated after compilation). use the values of pc and lr ante relocation (the 2nd ligne in the dump above: reloc pc ...)
Regards -- Abder
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks

在 2022/3/23 15:51, Abder 写道:
Le mer. 23 mars 2022 à 03:28, qianfan qianfanguijin@163.com a écrit :
Hi:
I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp':
Next it the log:
U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment
Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ...
Don't have any idea on what is causing the crash, but to answer your question about debugging data abort : from the reg dump, you can look at the PC and LR registers to see the function that caused the crash (in PC) and its caller (in LR) by using the .map file (generated after compilation). use the values of pc and lr ante relocation (the 2nd ligne in the dump above: reloc pc ...)
Hi:
Thanks for your's guide. I had this data abort message:
data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>]
and found the pc and lc address in u-boot.map:
.text.env_attr_walk 0x0000000080834c54 0xb4 env/built-in.o 0x0000000080834c54 env_attr_walk .text.malloc 0x0000000080814900 0x420 common/built-in.o 0x0000000080814900 malloc
Is means that data abort when 'malloc' called from env_attr_walk? It's there has a better way that can dump stack?
Regards
Abder
resetting ...
It's there has any doc about how to debug data abort? Or is the bug is already fixed?
Thanks
participants (7)
-
Abder
-
AKASHI Takahiro
-
Heinrich Schuchardt
-
Miquel Raynal
-
qianfan
-
qianfanguijin@163.com
-
Tom Rini