
On 9/2/20 8:26 AM, Heinrich Schuchardt wrote:
On 01.09.20 03:19, Rick Chen wrote:
Hi Sean
On 8/20/20 4:47 AM, Rick Chen wrote:
Hi Sean
Hi Sean
On 8/18/20 11:48 PM, Rick Chen wrote: > Hi Tom > >> This patch adds the necessary configs and docs for FPIOA and GPIO support >> on the K210. >> >> The board does not boot unless CONSOLE_LOGLEVEL is set to a non-default >> value . It also boots when the tree is dirty (and CONSOLE_LOGLEVEL is not >> changed). It also boots when changes are made to the device tree and then >> committed. I don't know why this happens. These breakages only occur after >> bf2fb81ad3. >> >> Signed-off-by: Sean Anderson seanga2@gmail.com >> --- >> >> Changes in v5: >> - Increase CONSOLE_LOGLEVEL to 5 as a hack to get the board booting again >> - Patch 05/12 "gpio: sifive: Use generic reg read function" has been superseded >> by commit 2548493ab4. > > Would you like to pick up this series, [PATCH v5 00/11] riscv: Add > FPIOA and GPIO support for Kendryte K210 ? > Or maybe it is better to figure out what is wrong here and find the > root cause why it need to Increase CONSOLE_LOGLEVEL to 5 as a hack ?
As an additional note, *CONFIG_LOGLEVEL (whoops) can also be decreased for the same effect. In addition, there are several other ways I found to "fix" this bug (as noted in the commit message). If you would like to test this out, I have two trees [1, 2] where this series (actually a slightly earlier version of this series) is applied just before and just after bf2fb81ad3. The original patch is located at [3].
--Sean
[1] https://github.com/Forty-Bot/u-boot/tree/maix_gpio_good [2] https://github.com/Forty-Bot/u-boot/tree/maix_gpio_bad [3] https://patchwork.ozlabs.org/project/uboot/patch/20200724111225.12513-15-ovi...
I don't have a K210 board for verification. But it is OK to run in AE350 board after applying your series.
After check about this commit "common/board_r: Remove initr_serial wrapper", it seem shall not affect anything. It just change to call serial_initialize directly. Only I can think about maybe it is a cache problem. Just like sometime we add a printf, then the problem will be walk around.
Can you dig in to find the root cause ? For code stability, it is better not to have any unknown issue. Yo can dirty hack and work around currently, but it may crash again after several commits.
Ok, so I did some further digging, but I was unable to pin down the cause of the bug. My efforts to determine a cause have been primarily thwarted because the bug disappears after any change to initialization code. Adding any function to init_sequence_f or init_sequence_r, even a no-op function which just returns 0, causes the board to boot normally. In addition, adding a nop() to any function in those sequences will cause the board to boot normally. The board seems to fail to boot only with a very specific boot sequence and timing.
If you modify any code and the result will change, then you shall debug it via debugger(GDB) without any code modification.
When the board fails to boot, it hangs in a manner similar to when the
Maybe you can try to set a break and access the bus, if the bus access fail, then you re-set a break a bit ahead until the bus access NOT fail.
Yeah, I was investigating that, however I was unable to get the k210 to break at 0x80000000. I suspect this may be a problem with openocd, as the k210 port is rather buggy (e.g. it can cause address misaligned errors, and sometimes leaves the pc in the debug dection of memory). I *can* get it to break in the otp (0x88000000), so perhaps I just need to identify the address before it jumps to U-Boot.
To see if this way can pin down which instruction or the crucial code to cause the bus hang problem. And guess what maybe the root-cause.
If you can find the instruction which may cause the bus hang, you can info all-registers and compare the differences between NG and OK. And guess what maybe the root-cause.
Trying to narrow down on the problem I found the following:
The system hangs before arch_cpu_init_dm() is called.
This is not always the case. On most boots, the following output is present:
U-Boot 2020.10-rc3-00045-g7532b003f0 (Sep 02 2020 - 11:09:16 -0400)
DRAM: 8 MiB
which means at least everything up to dram_init gets called.
After adding some debug functions the error appeared and disappeared when changing the code in function panic(). So my guess is that there is some alignment problem in the static data section.
I investigated this further using the following script
while true; do sed -i 's/nop();$/nop(); nop();/g' board/sipeed/maix/maix.c && git commit --amend --no-edit board/sipeed/maix/maix.c && CROSS_COMPILE=riscv64-linux-gnu- make -j$(nproc) && kflash -tp /dev/ttyUSB0 -B bit_mic -b 1500000 u-boot-dtb.bin" done
To start this process, create a commit which adds a nop() to board/sipeed/maix/maix.c. On every iteration, this script will amend that commit by adding another nop. I tried up to 65 nops. If the amount of nops is 0, 24, 28, 29, 30, 31, 32, 40, 44, 46, 49, 56, 60, 61, 62, 63, or 64 the board fails to boot. Of these failures, all printed up to "DRAM: ..." except for those with 28, 29, 30, 31, 60, 61, 62, or 64 nops. There is clearly a pattern whith failures occuring at or near (but not always exactly on) multiples of 4, and in the lead-up to multiples of 32.
My next line of investigation will be to determine the size and alignment of the various sections based in the failing configurations. I also plan to try enabling more debug info and see if I can trigger this issue by adding some choice no-ops.
--Sean