
On 6/6/21 1:57 PM, Heinrich Schuchardt wrote:
On 6/6/21 7:52 PM, Sean Anderson wrote:
On 6/6/21 1:28 PM, Heinrich Schuchardt wrote:
On 6/6/21 6:44 PM, Simon Glass wrote:
Hi Heinrich,
On Mon, 22 Mar 2021 at 18:56, Simon Glass sjg@chromium.org wrote:
Hi Heinrich,
On Mon, 22 Mar 2021 at 23:02, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 22.03.21 06:21, Simon Glass wrote: > At present if sandbox crashes it prints a message and tries to > exit. But > with the recently introduced signal handler, it often seems to get > stuck > in a loop until the stack overflows: > > Segmentation violation > > Segmentation violation > > Segmentation violation > > Segmentation violation > > Segmentation violation > > Segmentation violation > > Segmentation violation > ...
Hello Simon,
do you have a reproducible example? I never have seen this.
https://source.denx.de/u-boot/custodians/u-boot-dm/-/jobs/242433
You need to run that commit with pytest though...it does not happen when run directly.
BTW this sems to expose some rather nasty bug in dlmalloc or how it is used. I notice that as soon as the first test is run, the 'top' value in dlmalloc is outside the range of the malloc pool, which seems wrong. I wonder if there is something broken with how dm_test_pre_run() and dm_test_post_run() work.
Corrupting gd could cause an endless recursive loop, as these lines follow printing the observed string:
printf("pc = 0x%lx, ", pc); printf("pc_reloc =0x%lx\n\n", pc - gd->reloc_off);
Yes I suspect printf() is dead.
If we remove SA_NODEFER from the signal mask in arch/sandbox/cpu/os.c, recursion cannot occur anymore. If a segmentation violation occurs inside the handler it will be delegated to the default handler.
Furthermore we could consider removing the signal handler at the start of os_signal_action().
The issue is that if you get a segfault you really don't know if you can continue and do anything else.
What is the goal with the signal handler? I don't think the user can do anything about it.
Hello Simon,
the signal handler prints out the crash location and this makes analyzing problems much easier. It proved valuable to me several times.
Can't you just rerun with gdb?
This would require that the problem is easily reproducible which may not be the case.
Hm, perhaps you could keep track of how many times we've tried to catch a signal, and bail if this is the second time around. E.g. something like
static void os_signal_handler(int sig, siginfo_t *info, void *con) { /* other variables */ static int level = 0;
switch (level++) { case 0: break; case 1: sandbox_exit(); default: os_exit(0); }
/* rest of the handler */ }
--Sean
Best regards
Heinrich
I keep hitting this problem during development with sandbox, so I think I need to apply this patch.
Does anything need to be updated in the tests?
Regards, Simon
Did you try removing SA_NODEFER as proposed?
Best regards
Heinrich