In my previous articles I have used the command line on Linux to build and debug NXP MCUXpresso SDK applications. In this article I’m running code on NXP i.MX RT1064 in RAM or FLASH.
Outline
In this tutorial I’m going to run code in RAM and FLASH (XiP, eXecute in Place) on the i.MX RT1064. For getting started with the MCUXpresso SDK on Linux I recommend to have a read at my previous articles:
- Tutorial: MCUXpresso SDK with Linux, Part 1: Installation and Build with Make
- Tutorial: MCUXpresso SDK with Linux, Part 2: Commandline Debugging with GDB
💡 I’m using the command line on purpose in this article. The MCUXpresso IDE is available on Linux too and is usually a better and easier starting point for development.
I’m using Linux in a Oracle VM (Ubuntu) with the NXP MCUXpresso SDK for the i.MX RT1064 EVK board (see First Steps with the NXP i.MX RT1064-EVK Board).
NXP i.MX RT1064 Board
The board has the different memory areas available:
- internal ITC SRAM, base address: 0x0000’0000, size 0x2’0000 (128 KByte)
- internal DTC SRAM, base address: 0x2000’0000, size 0x2’0000 (128 KByte)
- internal OC SRAM, base address: 0x2020’0000, size 0xC’0000 (768 KByte)
- internal SPI FLASH: base address: 0x7000’0000, size: 0x40’0000 (4 MByte)
- external SDRAM,base address 0x8000’0000, size 0x200’0000 (32 MByte)
💡 there is an extra 64 MByte Hyperflash available on the board, but this requires adding/removing resistors on the backside of the board.
Running from FLASH (XiP)
The i.MX RT does not have FLASH memory integrated with the MCU as it is the case for most microcontrollers. Instead it uses serial (SPI) FLASH memory which usually is an external memory chip. In the case of the i.MX RT1064 there are 4 MByte FLASH wired to the device internally. Technically it is the same as having it externally, except that the needed board space is smaller. Because the CPU does not know about the FLASH, the FLASH need a special header programmed at the start of the memory which is read by the CPU. For this the following defines need to be turned on:
XIP_EXTERNAL_FLASH=1 XIP_BOOT_HEADER_ENABLE=1
The SPI FLASH memory is not used for data as it usually is used on microcontrollers. Instead the processor can execute code in it, which is called XiP or ‘eXecute in Place’.
The next thing is the linker file: The following file places that header, code, constants and vector table into the external FLASH starting at address 0x7000’0000. The Data-Tightly-Coupled (DTC) RAM is used for the heap and stack.
GROUP ( "libcr_nohost_nf.a" "libcr_c.a" "libcr_eabihelpers.a" "libgcc.a" ) MEMORY { /* Define each memory region */ PROGRAM_FLASH (rx) : ORIGIN = 0x70000000, LENGTH = 0x400000 /* 4M bytes (alias Flash) */ SRAM_DTC (rwx) : ORIGIN = 0x20000000, LENGTH = 0x20000 /* 128K bytes (alias RAM) */ SRAM_ITC (rwx) : ORIGIN = 0x0, LENGTH = 0x20000 /* 128K bytes (alias RAM2) */ SRAM_OC (rwx) : ORIGIN = 0x20200000, LENGTH = 0xc0000 /* 768K bytes (alias RAM3) */ BOARD_SDRAM (rwx) : ORIGIN = 0x80000000, LENGTH = 0x2000000 /* 32M bytes (alias RAM4) */ } ENTRY(ResetISR) SECTIONS { /* Image Vector Table and Boot Data for booting from external flash */ .boot_hdr : ALIGN(4) { FILL(0xff) __boot_hdr_start__ = ABSOLUTE(.) ; KEEP(*(.boot_hdr.conf)) . = 0x1000 ; KEEP(*(.boot_hdr.ivt)) . = 0x1020 ; KEEP(*(.boot_hdr.boot_data)) . = 0x1030 ; KEEP(*(.boot_hdr.dcd_data)) __boot_hdr_end__ = ABSOLUTE(.) ; . = 0x2000 ; } >PROGRAM_FLASH /* MAIN TEXT SECTION */ .text : ALIGN(4) { FILL(0xff) __vectors_start__ = ABSOLUTE(.) ; KEEP(*(.isr_vector)) /* Global Section Table */ . = ALIGN(4) ; __section_table_start = .; __data_section_table = .; LONG(LOADADDR(.data)); LONG( ADDR(.data)); LONG( SIZEOF(.data)); LONG(LOADADDR(.data_RAM2)); LONG( ADDR(.data_RAM2)); LONG( SIZEOF(.data_RAM2)); LONG(LOADADDR(.data_RAM3)); LONG( ADDR(.data_RAM3)); LONG( SIZEOF(.data_RAM3)); LONG(LOADADDR(.data_RAM4)); LONG( ADDR(.data_RAM4)); LONG( SIZEOF(.data_RAM4)); __data_section_table_end = .; __bss_section_table = .; LONG( ADDR(.bss)); LONG( SIZEOF(.bss)); LONG( ADDR(.bss_RAM2)); LONG( SIZEOF(.bss_RAM2)); LONG( ADDR(.bss_RAM3)); LONG( SIZEOF(.bss_RAM3)); LONG( ADDR(.bss_RAM4)); LONG( SIZEOF(.bss_RAM4)); __bss_section_table_end = .; __section_table_end = . ; /* End of Global Section Table */ *(.after_vectors*) } > PROGRAM_FLASH .text : ALIGN(4) { *(.text*) *(.rodata .rodata.* .constdata .constdata.*) . = ALIGN(4); } > PROGRAM_FLASH /* * for exception handling/unwind - some Newlib functions (in common * with C++ and STDC++) use this. */ .ARM.extab : ALIGN(4) { *(.ARM.extab* .gnu.linkonce.armextab.*) } > PROGRAM_FLASH __exidx_start = .; .ARM.exidx : ALIGN(4) { *(.ARM.exidx* .gnu.linkonce.armexidx.*) } > PROGRAM_FLASH __exidx_end = .; _etext = .; /* DATA section for SRAM_ITC */ .data_RAM2 : ALIGN(4) { FILL(0xff) PROVIDE(__start_data_RAM2 = .) ; *(.ramfunc.$RAM2) *(.ramfunc.$SRAM_ITC) *(.data.$RAM2*) *(.data.$SRAM_ITC*) . = ALIGN(4) ; PROVIDE(__end_data_RAM2 = .) ; } > SRAM_ITC AT>PROGRAM_FLASH /* DATA section for SRAM_OC */ .data_RAM3 : ALIGN(4) { FILL(0xff) PROVIDE(__start_data_RAM3 = .) ; *(.ramfunc.$RAM3) *(.ramfunc.$SRAM_OC) *(.data.$RAM3*) *(.data.$SRAM_OC*) . = ALIGN(4) ; PROVIDE(__end_data_RAM3 = .) ; } > SRAM_OC AT>PROGRAM_FLASH /* DATA section for BOARD_SDRAM */ .data_RAM4 : ALIGN(4) { FILL(0xff) PROVIDE(__start_data_RAM4 = .) ; *(.ramfunc.$RAM4) *(.ramfunc.$BOARD_SDRAM) *(.data.$RAM4*) *(.data.$BOARD_SDRAM*) . = ALIGN(4) ; PROVIDE(__end_data_RAM4 = .) ; } > BOARD_SDRAM AT>PROGRAM_FLASH /* MAIN DATA SECTION */ .uninit_RESERVED : ALIGN(4) { KEEP(*(.bss.$RESERVED*)) . = ALIGN(4) ; _end_uninit_RESERVED = .; } > SRAM_DTC /* Main DATA section (SRAM_DTC) */ .data : ALIGN(4) { FILL(0xff) _data = . ; *(vtable) *(.ramfunc*) *(NonCacheable.init) *(.data*) . = ALIGN(4) ; _edata = . ; } > SRAM_DTC AT>PROGRAM_FLASH /* BSS section for SRAM_ITC */ .bss_RAM2 : ALIGN(4) { PROVIDE(__start_bss_RAM2 = .) ; *(.bss.$RAM2*) *(.bss.$SRAM_ITC*) . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */ PROVIDE(__end_bss_RAM2 = .) ; } > SRAM_ITC /* BSS section for SRAM_OC */ .bss_RAM3 : ALIGN(4) { PROVIDE(__start_bss_RAM3 = .) ; *(.bss.$RAM3*) *(.bss.$SRAM_OC*) . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */ PROVIDE(__end_bss_RAM3 = .) ; } > SRAM_OC /* BSS section for BOARD_SDRAM */ .bss_RAM4 : ALIGN(4) { PROVIDE(__start_bss_RAM4 = .) ; *(.bss.$RAM4*) *(.bss.$BOARD_SDRAM*) . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */ PROVIDE(__end_bss_RAM4 = .) ; } > BOARD_SDRAM /* MAIN BSS SECTION */ .bss : ALIGN(4) { _bss = .; *(NonCacheable) *(.bss*) *(COMMON) . = ALIGN(4) ; _ebss = .; PROVIDE(end = .); } > SRAM_DTC /* NOINIT section for SRAM_ITC */ .noinit_RAM2 (NOLOAD) : ALIGN(4) { *(.noinit.$RAM2*) *(.noinit.$SRAM_ITC*) . = ALIGN(4) ; } > SRAM_ITC /* NOINIT section for SRAM_OC */ .noinit_RAM3 (NOLOAD) : ALIGN(4) { *(.noinit.$RAM3*) *(.noinit.$SRAM_OC*) . = ALIGN(4) ; } > SRAM_OC /* NOINIT section for BOARD_SDRAM */ .noinit_RAM4 (NOLOAD) : ALIGN(4) { *(.noinit.$RAM4*) *(.noinit.$BOARD_SDRAM*) . = ALIGN(4) ; } > BOARD_SDRAM /* DEFAULT NOINIT SECTION */ .noinit (NOLOAD): ALIGN(4) { _noinit = .; *(.noinit*) . = ALIGN(4) ; _end_noinit = .; } > SRAM_DTC /* Reserve and place Heap within memory map */ _HeapSize = 0x1000; .heap : ALIGN(4) { _pvHeapStart = .; . += _HeapSize; . = ALIGN(4); _pvHeapLimit = .; } > SRAM_DTC _StackSize = 0x1000; /* Reserve space in memory for Stack */ .heap2stackfill : { . += _StackSize; } > SRAM_DTC /* Locate actual Stack in memory map */ .stack ORIGIN(SRAM_DTC) + LENGTH(SRAM_DTC) - _StackSize - 0: ALIGN(4) { _vStackBase = .; . = ALIGN(4); _vStackTop = . + _StackSize; } > SRAM_DTC /* Provide basic symbols giving location and size of main text * block, including initial values of RW data sections. Note that * these will need extending to give a complete picture with * complex images (e.g multiple Flash banks). */ _image_start = LOADADDR(.text); _image_end = LOADADDR(.data) + SIZEOF(.data); _image_size = _image_end - _image_start; }
Running from OC (On-Chip) RAM
It is possible to avoid external FLASH with linking things to the different RAM sections of the device. The following linker file runs everything from RAM:
GROUP ( "libcr_nohost_nf.a" "libcr_c.a" "libcr_eabihelpers.a" "libgcc.a" ) MEMORY { /* Define each memory region */ PROGRAM_FLASH (rx) : ORIGIN = 0x70000000, LENGTH = 0x400000 /* 4M bytes (alias Flash) */ SRAM_OC (rwx) : ORIGIN = 0x20200000, LENGTH = 0xc0000 /* 768K bytes (alias RAM) */ SRAM_ITC (rwx) : ORIGIN = 0x0, LENGTH = 0x20000 /* 128K bytes (alias RAM2) */ SRAM_DTC (rwx) : ORIGIN = 0x20000000, LENGTH = 0x20000 /* 128K bytes (alias RAM3) */ BOARD_SDRAM (rwx) : ORIGIN = 0x80000000, LENGTH = 0x2000000 /* 32M bytes (alias RAM4) */ } /* Define a symbol for the top of each memory region */ __base_PROGRAM_FLASH = 0x70000000 ; /* PROGRAM_FLASH */ __base_Flash = 0x70000000 ; /* Flash */ __top_PROGRAM_FLASH = 0x70000000 + 0x400000 ; /* 4M bytes */ __top_Flash = 0x70000000 + 0x400000 ; /* 4M bytes */ __base_SRAM_OC = 0x20200000 ; /* SRAM_OC */ __base_RAM = 0x20200000 ; /* RAM */ __top_SRAM_OC = 0x20200000 + 0xc0000 ; /* 768K bytes */ __top_RAM = 0x20200000 + 0xc0000 ; /* 768K bytes */ __base_SRAM_ITC = 0x0 ; /* SRAM_ITC */ __base_RAM2 = 0x0 ; /* RAM2 */ __top_SRAM_ITC = 0x0 + 0x20000 ; /* 128K bytes */ __top_RAM2 = 0x0 + 0x20000 ; /* 128K bytes */ __base_SRAM_DTC = 0x20000000 ; /* SRAM_DTC */ __base_RAM3 = 0x20000000 ; /* RAM3 */ __top_SRAM_DTC = 0x20000000 + 0x20000 ; /* 128K bytes */ __top_RAM3 = 0x20000000 + 0x20000 ; /* 128K bytes */ __base_BOARD_SDRAM = 0x80000000 ; /* BOARD_SDRAM */ __base_RAM4 = 0x80000000 ; /* RAM4 */ __top_BOARD_SDRAM = 0x80000000 + 0x2000000 ; /* 32M bytes */ __top_RAM4 = 0x80000000 + 0x2000000 ; /* 32M bytes */ ENTRY(ResetISR) SECTIONS { /* MAIN TEXT SECTION */ .text : ALIGN(4) { FILL(0xff) __vectors_start__ = ABSOLUTE(.) ; KEEP(*(.isr_vector)) /* Global Section Table */ . = ALIGN(4) ; __section_table_start = .; __data_section_table = .; LONG(LOADADDR(.data)); LONG( ADDR(.data)); LONG( SIZEOF(.data)); LONG(LOADADDR(.data_RAM2)); LONG( ADDR(.data_RAM2)); LONG( SIZEOF(.data_RAM2)); LONG(LOADADDR(.data_RAM3)); LONG( ADDR(.data_RAM3)); LONG( SIZEOF(.data_RAM3)); LONG(LOADADDR(.data_RAM4)); LONG( ADDR(.data_RAM4)); LONG( SIZEOF(.data_RAM4)); __data_section_table_end = .; __bss_section_table = .; LONG( ADDR(.bss)); LONG( SIZEOF(.bss)); LONG( ADDR(.bss_RAM2)); LONG( SIZEOF(.bss_RAM2)); LONG( ADDR(.bss_RAM3)); LONG( SIZEOF(.bss_RAM3)); LONG( ADDR(.bss_RAM4)); LONG( SIZEOF(.bss_RAM4)); __bss_section_table_end = .; __section_table_end = . ; /* End of Global Section Table */ *(.after_vectors*) } > SRAM_OC .text : ALIGN(4) { *(.text*) *(.rodata .rodata.* .constdata .constdata.*) . = ALIGN(4); } > SRAM_OC /* * for exception handling/unwind - some Newlib functions (in common * with C++ and STDC++) use this. */ .ARM.extab : ALIGN(4) { *(.ARM.extab* .gnu.linkonce.armextab.*) } > SRAM_OC __exidx_start = .; .ARM.exidx : ALIGN(4) { *(.ARM.exidx* .gnu.linkonce.armexidx.*) } > SRAM_OC __exidx_end = .; _etext = .; /* DATA section for SRAM_ITC */ .data_RAM2 : ALIGN(4) { FILL(0xff) PROVIDE(__start_data_RAM2 = .) ; *(.ramfunc.$RAM2) *(.ramfunc.$SRAM_ITC) *(.data.$RAM2*) *(.data.$SRAM_ITC*) . = ALIGN(4) ; PROVIDE(__end_data_RAM2 = .) ; } > SRAM_ITC AT>SRAM_OC /* DATA section for SRAM_DTC */ .data_RAM3 : ALIGN(4) { FILL(0xff) PROVIDE(__start_data_RAM3 = .) ; *(.ramfunc.$RAM3) *(.ramfunc.$SRAM_DTC) *(.data.$RAM3*) *(.data.$SRAM_DTC*) . = ALIGN(4) ; PROVIDE(__end_data_RAM3 = .) ; } > SRAM_DTC AT>SRAM_OC /* DATA section for BOARD_SDRAM */ .data_RAM4 : ALIGN(4) { FILL(0xff) PROVIDE(__start_data_RAM4 = .) ; *(.ramfunc.$RAM4) *(.ramfunc.$BOARD_SDRAM) *(.data.$RAM4*) *(.data.$BOARD_SDRAM*) . = ALIGN(4) ; PROVIDE(__end_data_RAM4 = .) ; } > BOARD_SDRAM AT>SRAM_OC /* MAIN DATA SECTION */ .uninit_RESERVED : ALIGN(4) { KEEP(*(.bss.$RESERVED*)) . = ALIGN(4) ; _end_uninit_RESERVED = .; } > SRAM_OC /* Main DATA section (SRAM_OC) */ .data : ALIGN(4) { FILL(0xff) _data = . ; *(vtable) *(.ramfunc*) *(.data*) . = ALIGN(4) ; _edata = . ; } > SRAM_OC AT>SRAM_OC /* BSS section for SRAM_ITC */ .bss_RAM2 : ALIGN(4) { PROVIDE(__start_bss_RAM2 = .) ; *(.bss.$RAM2*) *(.bss.$SRAM_ITC*) . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */ PROVIDE(__end_bss_RAM2 = .) ; } > SRAM_ITC /* BSS section for SRAM_DTC */ .bss_RAM3 : ALIGN(4) { PROVIDE(__start_bss_RAM3 = .) ; *(.bss.$RAM3*) *(.bss.$SRAM_DTC*) . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */ PROVIDE(__end_bss_RAM3 = .) ; } > SRAM_DTC /* BSS section for BOARD_SDRAM */ .bss_RAM4 : ALIGN(4) { PROVIDE(__start_bss_RAM4 = .) ; *(.bss.$RAM4*) *(.bss.$BOARD_SDRAM*) . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */ PROVIDE(__end_bss_RAM4 = .) ; } > BOARD_SDRAM /* MAIN BSS SECTION */ .bss : ALIGN(4) { _bss = .; *(.bss*) *(COMMON) . = ALIGN(4) ; _ebss = .; PROVIDE(end = .); } > SRAM_OC /* NOINIT section for SRAM_ITC */ .noinit_RAM2 (NOLOAD) : ALIGN(4) { *(.noinit.$RAM2*) *(.noinit.$SRAM_ITC*) . = ALIGN(4) ; } > SRAM_ITC /* NOINIT section for SRAM_DTC */ .noinit_RAM3 (NOLOAD) : ALIGN(4) { *(.noinit.$RAM3*) *(.noinit.$SRAM_DTC*) . = ALIGN(4) ; } > SRAM_DTC /* NOINIT section for BOARD_SDRAM */ .noinit_RAM4 (NOLOAD) : ALIGN(4) { *(.noinit.$RAM4*) *(.noinit.$BOARD_SDRAM*) . = ALIGN(4) ; } > BOARD_SDRAM /* DEFAULT NOINIT SECTION */ .noinit (NOLOAD): ALIGN(4) { _noinit = .; *(.noinit*) . = ALIGN(4) ; _end_noinit = .; } > SRAM_OC /* Reserve and place Heap within memory map */ _HeapSize = 0x1000; .heap : ALIGN(4) { _pvHeapStart = .; . += _HeapSize; . = ALIGN(4); _pvHeapLimit = .; } > SRAM_OC _StackSize = 0x1000; /* Reserve space in memory for Stack */ .heap2stackfill : { . += _StackSize; } > SRAM_OC /* Locate actual Stack in memory map */ .stack ORIGIN(SRAM_OC) + LENGTH(SRAM_OC) - _StackSize - 0: ALIGN(4) { _vStackBase = .; . = ALIGN(4); _vStackTop = . + _StackSize; } > SRAM_OC /* Provide basic symbols giving location and size of main text * block, including initial values of RW data sections. Note that * these will need extending to give a complete picture with * complex images (e.g multiple Flash banks). */ _image_start = LOADADDR(.text); _image_end = LOADADDR(.data) + SIZEOF(.data); _image_size = _image_end - _image_start; }
Mixing RAM and FLASH
Running code from RAM has a performance benefit. So it makes sense to run ‘slower’ parts in normal XiP FLASH, but run code in RAM where higher performance is needed.
To place a function into RAM, I add an attribute with the desired section name:
static void __attribute__((section (".ramfunc"))) blinkRAM(void) { if (g_pinSet) { GPIO_PinWrite(EXAMPLE_LED_GPIO, EXAMPLE_LED_GPIO_PIN, 0U); g_pinSet = false; } else { GPIO_PinWrite(EXAMPLE_LED_GPIO, EXAMPLE_LED_GPIO_PIN, 1U); g_pinSet = true; } }
With this, the function gets copied and placed in RAM. One thing to note is that depending on the call distance a veneer function might be used to reach the RAM address where the function is placed. More details on this topic in Execute-Only Code with GNU and gcc which covers that topic from a different angle.
Summary
It is possible to place code and data either in FLASH or in RAM. All what is needed is the correct linker file for it. The placement is controlled by the linker file, and with using __attribute__ parts of the application can be in FLASH or RAM. If you are not familiar with the GNU linker file syntax, I recommend you start with the MCUXpresso IDE because it provides projects with working linker files.
Happy XiPing 🙂
Links
-
- Tutorial: MCUXpresso SDK with Linux, Part 1: Installation and Build with Make
- Tutorial: MCUXpresso SDK with Linux, Part 2: Commandline Debugging with GDB
- NXP MCUXpresso SDK web site: https://mcuxpresso.nxp.com
- First Steps with the NXP i.MX RT1064-EVK Board
- Regaining Debug Access to NXP i.MX RT1064-EVK executing WFI
- Tutorial: Booting the NXP i.MX RT from Micro SD Card
- NXP i.MX RT1064 Board: https://www.nxp.com/support/developer-resources/evaluation-and-development-boards/sabre-development-system/mimxrt1064-evk-i.mx-rt1064-evaluation-kit:MIMXRT1064-EVK
Thanks Erich as always!
Perhaps you could summarize difference between 3 types of SRAM?
LikeLike
There is an application note on that subject: https://www.nxp.com/docs/en/application-note/AN12077.pdf
The ITCM and DTCM run at the same frequency as the ARM core, so should provide better performance for I (Instructions) and D (Data), while the OC RAM runs at 1/4 core frequency.
LikeLike
Pingback: Tutorial: MCUXpresso SDK with Linux, Part 1: Installation and Build with Make | MCU on Eclipse
Pingback: Tutorial: MCUXpresso SDK with Linux, Part 2: Commandline Debugging with GDB | MCU on Eclipse
When running from RAM, SRAM_OC is duplicated at SRAM_OC for the initialization data:
} > SRAM_OC AT>SRAM_OC
Does this consume twice the space? Is there any way to avoid that?
Do you have an example of the RAM version with the headers to get it to load from flash (load to RAM)?
LikeLike
The SRAM_OC on the left is the virtual address, while the SRAM_OC on the right is the load memory address. It does not use the space twice.
LikeLike