Re: [U-Boot] [PATCH v9] [RFC] Add dmmalloc module for DM.

26 Oct 2012

Hi Tomas,
On Fri, Oct 26, 2012 at 6:16 AM, Tomas Hlavacek tmshlvck@gmail.com wrote:
...
Hello Graeme,
On Thu, Oct 25, 2012 at 3:40 AM, Graeme Russ graeme.russ@gmail.com wrote:
...
...

diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h
index 2b9af93..9045829 100644
--- a/arch/arm/include/asm/global_data.h
+++ b/arch/arm/include/asm/global_data.h
@@ -82,6 +82,9 @@ typedef       struct  global_data {
        unsigned long   post_log_res; /* success of POST test */
        unsigned long   post_init_f_time; /* When post_init_f started */
 #endif
+#ifdef CONFIG_SYS_EARLY_MALLOC

  void            *early_heap;    /* heap for early_malloc */



+#endif
Why not early_heap_header *early_heap; ?
It might be.
Actually it is a good point because I am using 3 different ways of
dealing with addresses: 1) struct early_heap header * or struct
early_block_header * - I am using this when I want to access members
of the stucture or compute address past the structure (which is where
the heap or block starts); 2) phys_addr_t - which is plain integer and
I use this for simple computations when I do not want to worry about
pointer arithmetic; 3) void * when I have just plain address,
especially when I want to pass an addres among logical parts of the
mallocator or outside. This may a bit controversial and perhaps I
should replace it by specific strucutre pointers internally.
I am unable to decide: Should I remove struct early_heap_header from
dmmalloc.h making it publicly unavailable or should I rather change
the void * to struct early_heap_header * in the GD structure? What do
you think is better?
I think struct early_heap_header * in the GD structure is the better way to
go as that is exactly what it is.
[snip]
...
...
...

  h = (struct early_heap_header *)CONFIG_SYS_EARLY_HEAP_ADDR;


  b = (struct early_block_header *)(h + 1);



Hmmm, does this really work? I would have thought:
b = (struct early_block_header *)(h + sizeof(struct early_heap_header));
but I could be mistaken
It seems that it works as it is (at least I wrote bunch of tests and I
inspected resulting heaps and it was all right). I believe that since
h is a pointer to the struct early_heap_header then pointer arithmetic
is in effect and h+1 actually means "next element in the array of
struct early_heap_header". Which is the address past the header that
equals beginning of the heap data block. (?)
As I said, I could be mistaken - it appears I am :)
...
...
...
+int early_malloc_active(void)
+{

  return ((gd->flags & GD_FLG_RELOC) != GD_FLG_RELOC);



+}
I think we need another flag - GD_FLG_RELOC gets set before the permanent
heap is initialised, so there is a window of opportunity where things may
break
Well, as I wrote in the commit message - this is only a temporary
implementation. I suppose I am going to change this when we have more
coarse initialization flags wired into DM (which I believe we are
going to have it anyway). So now I am just working around "forward
dependency" here.
...
...



+void early_heap_dump(struct early_heap_header *h)
+{

  struct early_block_header *b;



  debug("heap: h=%p, h->size=%d\n", h, h->size);



  b = (struct early_block_header *)(h+1);


  while ((phys_addr_t)b + sizeof(struct early_block_header)


                  < (phys_addr_t)h + h->size) {


          debug("block: h=%p h->size=%d b=%p b->size=%d b->(used)=%d\n",


                          h, h->size, b, BLOCK_SIZE(b->size),


                          BLOCK_USED(b->size));


          assert(BLOCK_SIZE(b->size) > 0);


          b = (struct early_block_header *)((phys_addr_t)b +


                          sizeof(struct early_block_header) +


                          BLOCK_SIZE(b->size));


  }


  debug("--- heap dump end ---\n");



+}
Nice touch, but could we just iterate through all ealry heap chunks starting
from gd->early_heap?
Or I can have two functions. One heap specific and one for all heaps.
I think both might be useful when somebody needs to debug early_malloc
or memory usage etc. in the early stage. Thanks.
True, just adding another function which iterates through the heaps and
calls this function would be fine.
[snip]
...
...
...
+static inline void *dmrealloc(void *oldaddr, size_t bytes)
+{
+#ifdef CONFIG_SYS_EARLY_MALLOC

  char *addr;


  struct early_block_header *h;


  if (early_malloc_active()) {


          addr = dmmalloc(bytes);


          if (addr == NULL)


                  return NULL;



          h = BLOCK_HEADER(oldaddr);


          if (BLOCK_FREE(h->size))


                  return NULL;



          if (bytes > BLOCK_SIZE(h->size))


                  bytes = BLOCK_SIZE(h->size);



          memcpy(addr, oldaddr, bytes);


          dmfree(oldaddr);


          return addr;


  }



+#endif /* CONFIG_SYS_EARLY_MALLOC */

  return realloc(oldaddr, bytes);



+}
Hmmm, we have a very intersting corner case to deal with. What if we use
dmrealloc() to move allocated data from the early malloc pool to the final
malloc pool?
At some point, we need to assume the developer is only ever going to pass
early malloc'd memory to dmrealloc()
Good point! The current code would do the job assuming that the
early_malloc can still access the proper early_heap (or a copy, but in
that case some additional magic is needed) and the real malloc is
already initialized.
As you can see I am still sticking with the double-copy method. It is
maybe due to lack of insight. But the discussion here was not
absolutely conclusive last time. I have even some experimental code
(not ready for submitting at all) for double copy method but I would
prefer to discuss it separately when the early_malloc() is done.
Well the double-copy approach to reallocating early-malloc'd memory and
how dmrealloc() is implemented are tightly coupled.
I understand the reason for the double copy is performance related (getting
into a position to enable cache as early as possible), but I wonder how
much will really be gained. It seems to me that the only performance gain
will be for the execution of the driver 'relocation fixup' code wich, I
assume, would not really consume that many CPU cycles.
There is a point where code simplicity outweighs performance gains.
...
...
The other option is to use the gd->flags...
Define two flags - something like:
GD_FLG_HEAP_INIT -> Final heap has been initialised
GD_FLG_EH_DONE    -> free(), realloc() refer to final heap
(I don't like the names, I'm just not up to thinking of anything better)
This way we can use dmalloc() prior to the heap being initialised and then
set GD_FLG_HEAP_INIT. Once GD_FLG_HEAP_INIT has been set, do a bunch of
dmrealloc() calls to essentially move the data into the permanent heap (of
course pointers int the structures need to be fixed up by the drivers) and
finally set GD_FLG_EH_DONE (and call early_heap_dump)
One problem I see is what happens of you call malloc() and free() on the
same block between the setting of GD_FLG_HEAP_INIT and GD_FLG_EH_DONE? The
code will try to reference the block as if it is in the early heap, but it
won't be.
One solution is, on detection of GD_FLG_HEAP_INIT being set, dmfree() and
dmrealloc() could search the early heap chunks instead of just assuming
that the referenced block is in the early heap (if GD_FLG_HEAP_INIT has not
been set, it will be safe to assume the memory is on the early heap)
Sure we will have that flags. But I think we can use them as well for
switching from DM driver instance list to tree for example. Or other
way around: I can use DM flags for early_malloc. Therefore I would
like to synchronize with DM cores for PCI and another low-level things
which are certainly going to start in early stage. It would be best to
use the same flags and switch on/off early_malloc based on DM internal
state.
Ah, I see where more performance gains are to be made by switching on cache
earlier - During the reallocation phase, you are switching from the list
based structure (fast for the small number of pre-SDRAM drivers) into the
final tree based structure.
I'm looking at this early malloc code from a much more generic point of
view - I think there are use-cases outside the driver model, so I don't
see a need (rather the opposite) to tie early malloc to the driver model
Regards,
Graeme