Sunday, January 17, 2016

Udoo Neo M4 memory layout and performance

The M4 has several non-contiguous memory blocks for code and data

  • TCM (tightly coupled memory)
    • TCMU
      • This is 32K SRAM for Code
    • TCML
      • This is 32K SRAM for Data
  • OCRAM
    • (Need to look into how much can be access TBD)
  • DDR
    • Linux sets aside 8MB 
      • A9 and M4 must share this RAM 
    • The last 1MB is used for MCC/RPMSG
  • SPI Flash

The application baseflight port will be more then 32K of text (code)  So, the code must be in one memory or split.  

Using a split method requires some up front work with the linker file and the loader.  It is possible to create a linker file putting some code in TCMU and DDR.  The idea being RTOS, interrupt; code which has a high bandwidth rate.  Less bandwidth code would go into DDR, or even OCRAM or SPI flash.

Here is an example of a loader file:


/* Specify the memory areas */
MEMORY
{
  m_interrupts          (RX)  : ORIGIN = 0x9ff00000, LENGTH = 0x00008000
  m_text                (RX)  : ORIGIN = 0x84000000, LENGTH = 0x00040000
  m_data                (RW)  : ORIGIN = 0x84040000, LENGTH = 0x00028000
}

__FLASH_START = ORIGIN(m_interrupts);
__FLASH_END   = ORIGIN(m_text) + LENGTH(m_text);

/* Define output sections */
SECTIONS
{
  /* The startup code goes first into Flash */
  .interrupts :
  {
    __VECTOR_TABLE = .;
    . = ALIGN(4);
    KEEP(*(.isr_vector))     /* Startup code */
    . = ALIGN(4);
    *croutine.c.obj (.text .text*)
    *event_groups.c.obj (.text .text*)
    *list.c.obj (.text .text*)
    *queue.c.obj (.text .text*)
    *tasks.c.obj (.text .text*)
    *timers.c.obj (.text .text*)
    *port.c.obj (.text .text*)
    *startup_MCIMX6X_M4.S.obj (.text .text*)
    *system_MCIMX6X_M4.c.obj (.text .text*)
    *uart_imx.c.obj (.text .text*)
    *mu_imx.c.obj (.text .text*)
    *heap_2.c.obj (.text .text*)
    *rpmsg_rtos.c.obj (.text .text*)
    *platform.c.obj (.text .text*)
    *hil.c.obj (.text .text*)
    *sh_mem.c.obj (.text .text*)
    *remote_device.c.obj (.text .text*)
    *rpmsg.c.obj (.text .text*)
    *rpmsg_ext.c.obj (.text .text*)
    *rpmsg_core.c.obj (.text .text*)
    *rpmsg_porting.c.obj (.text .text*)
    
    *baseflight.c.obj (.text .text*)
    
  } > m_interrupts

  /* The program code and other data goes into Flash */
  .text :
  {
    . = ALIGN(4);
    *(.text)                 /* .text sections (code) */
    *(.text*)                /* .text* sections (code) */
    *(.rodata)               /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)              /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)               /* glue arm to thumb code */
    *(.glue_7t)              /* glue thumb to arm code */
    *(.eh_frame)
    KEEP (*(.init))
    KEEP (*(.fini))
    . = ALIGN(4);
  } > m_text


The m_interrupt section, need to call out the file name with the text, this will move the code into the section.  So, for slow code put in DDR. 

When creating a binary file, the code is it two separate sections:

  m_interrupts          (RX)  : ORIGIN = 0x9ff00000, LENGTH = 0x00008000
  m_text                (RX)  : ORIGIN = 0x84000000, LENGTH = 0x00040000

so, the binary will span from 0x84000000 to 0x9FF08000.  So, to fix this, must use a Intel Hex file. The Intel hex file will only create records for the sections that need to be fixed with data.

So, a M4 loader now must able to load intel hex files into memory.

Also, the next pass will move .data sections into SRAM along with the stack. 





No comments:

Post a Comment