Optimizing CI/CD with RAM Target Applications

Usually, I run applications in the micro-controller FLASH memory. But for a CI/CD or testing environment that is not the best choice.

It is possible to have a ‘RAM target’, where the application is running in RAM instead of FLASH memory. This has the advantage not to ‘wear-out’ the FLASH memory. Plus loading and running in RAM is faster. This makes having RAM targets especially useful for testing.

In this article I’m using the NXP LPC55S16-EVK board, but any other target or board is applicable.

NXP LPC55S16-EVK
NXP LPC55S16-EVK

RAM Targets

RAM targets have to link the code and everything else into RAM. I have on GitHub an example how to do this. The example uses the LPC55S16 MCU on the LPC55S19-EVK board:

LPC55S16 MCU
LPC55S16 MCU

The memory map of the LPC55S16 looks like this:

MEMORY
{
/* Define each memory region */
PROGRAM_FLASH (rx) : ORIGIN = 0x0, LENGTH = 0x3d000 /* 244K bytes (alias Flash) */
SRAM (rwx) : ORIGIN = 0x20000000, LENGTH = 0x10000 /* 64K bytes (alias RAM) */
USB_RAM (rwx) : ORIGIN = 0x20010000, LENGTH = 0x4000 /* 16K bytes (alias RAM2) */
SRAMX (rwx) : ORIGIN = 0x4000000, LENGTH = 0x4000 /* 16K bytes (alias RAM3) */
}

In my linker file I place for example all the code into the SRAM section:

    .text : ALIGN(4)
{
FILL(0xff)
__vectors_start__ = ABSOLUTE(.) ;
KEEP(*(.isr_vector))
/* Global Section Table */
. = ALIGN(4) ;
__section_table_start = .;
__data_section_table = .;
LONG(LOADADDR(.data));
LONG( ADDR(.data));
LONG( SIZEOF(.data));
LONG(LOADADDR(.data_RAM2));
LONG( ADDR(.data_RAM2));
LONG( SIZEOF(.data_RAM2));
LONG(LOADADDR(.data_RAM3));
LONG( ADDR(.data_RAM3));
LONG( SIZEOF(.data_RAM3));
__data_section_table_end = .;
__bss_section_table = .;
LONG( ADDR(.bss));
LONG( SIZEOF(.bss));
LONG( ADDR(.bss_RAM2));
LONG( SIZEOF(.bss_RAM2));
LONG( ADDR(.bss_RAM3));
LONG( SIZEOF(.bss_RAM3));
__bss_section_table_end = .;
__section_table_end = . ;
/* End of Global Section Table */

*(.after_vectors*)

*(.text*)
*(.rodata .rodata.* .constdata .constdata.*)
. = ALIGN(4);
} > SRAM

Because the vector table is in RAM too, I have to point the VTOR (Vector Table Origin Register) to it. This is done at the beginning of the reset vector or startup code:

__attribute__ ((naked, section(".after_vectors.reset")))
void ResetISR(void) {
    // Disable interrupts
    __asm volatile ("cpsid i");

    // Config VTOR & MSPLIM register
    __asm volatile ("LDR R0, =0xE000ED08  \n"
                    "STR %0, [R0]         \n"
                    "LDR R1, [%0]         \n"
                    "MSR MSP, R1          \n"
                    "MSR MSPLIM, %1       \n"
                    :
                    : "r"(g_pfnVectors), "r"(_vStackBase)
                    : "r0", "r1");

Loading into RAM

To load the application into the target RAM, an easy way is to use the LinkServer ‘run’ command. For example:

LinkServer run LPC55S16:LPCXpresso55S16 LPC55S16_Blinky_RAM.elf

Or for a binary file:

LinkServer run LPC55S16:LPCXpresso55S16 LPC55S16_Blinky_RAM.bin -a 0x20000000

The -a option tells LinkServer the address where to load the binary file.

With LinkServer run, it will load and run the application. And LinkServer will wait until the application terminates with a *STOP* message. See On-Target Testing with LinkServer Runner and VS Code).

With LinkServer version v25.6.13 there is a timeout option:

--exit-timeout INTEGER Timeout (seconds) to wait for application
termination

With this, I simply can use

LinkServer run LPC55S16:LPCXpresso55S16 LPC55S16_Blinky_RAM.elf --exit-timeout 0

or for a binary file:

LinkServer run LPC55S16:LPCXpresso55S16 LPC55S16_Blinky_RAM.bin -a 0x20000000 --exit-timeout 0

With this, it loads the application, starts it and exits the LinkServer process.

Summary

Using RAM targets are very useful. I can faster load an application as no time consuming flash programming needed. Because FLASH is not programmed, it is not topic of flash wear out. This is a consideration in test farms where the application gets loaded many times. Of course, the downside is that the amount of RAM is limited on embedded targets. Running unit tests in a CI/CD environment doesn’t need the full application code. It only requires the unit to be tested. And with the above LinkServer commands loading and running in RAM is very easy.

Happy RAMing 🙂

Links

9 thoughts on “Optimizing CI/CD with RAM Target Applications

  1. Yes, running unit-tests in RAM is really useful.

    Even more useful is running portable unit-tests in RAM unde QEMU. (By portable I mean without dependencies on specific hardware). For example, I run my unit-tests on Cortex-M0/M3/M4F/M7F/A15/A72/RV32IMAC/RV64IMACFD via QEMU.

    These configurations also allow running the tests via GitHub Actions.

    Liked by 1 person

    • Hi Liviu,
      are you using QEMU as full chip simulator with all the peripherals simulated? Running the core works well, but because lack of peripheral support by the silicon vendors is a severe limitation to me. If I cannot have peripheral simulation, I rather run the code natively on the host. Thoughts?

      Like

      • I know that this is hard to believe, because I’ve been there, but in a good design you split the code into multiple libraries so you can separate the portable parts from the specific parts. Therefore you can unit test the portable parts on any platform, including on your native development platform (Windows/Mac/Linux). Actually, unit tests are mainly native applications that start with main() and return an exit code, possibly reading files for input stimulus and writing files for results; on Arm/RISC-V they run via semihosting.

        On QEMU I use only the core, basically SysTick for Cortex-M or a few other registers for RISC-V.

        For some tests it is necessary to use mocks to replace the library dependencies, and this is how you manage the hardware dependencies too. For simple tests you can inject the input data directly from the code; for more complicated tests you get the data from external files. It may seem complicated, but it isn’t. It has the big advantage that it makes the dependency graph clearer.

        Of course, for the application’s final integration tests you need the real hardware, but this is a different story, and if you have good unit tests for all dependent libraries, you have fewer surprises during the integration phase.

        Like

        • Hi Liviu,
          it is not hard to believe. I’m exactly doing this, separating the application part from the low-level-hardware dependent parts. This allows me to run the tests and application on the host (Linux and docker container, to be exactly). Exactly what you describe. The question I have is: if QUEMU only provides the core and the SysTick, why should I use it? Compared to running things natively on the host, it is slower, more complex plus an additional setup. And for the on-target and hardware dependent tests, I need the embedded hardware anyway. Yes, with QUEMU I could have some coverage of SysTick plus basic instruction set tests, but that’s what I get anyway and much easier with a single on-target test. I see some value with QUEMU, but it is mostly academic. It would dramatically change the picture if silicon vendors would provide FCS (Full Chip Models) of their devices for QUEMU. What I see is some pieces, but mostly community provided. I guess the silicon vendors are not supporting it or are very reluctant because of its GPL2 license?

          Like

        • | if QEMU only provides the core and the SysTick, why should I use it?

          Well, probably some may consider this absurd, but I prefer to run the tests on all those architectures mentioned before, in addition to the native platform, to guarantee that my code is really portable. Not all compilers on all architectures behave absolutely the same, and sometimes there are small differences in warnings, or even worse, in undefined behaviours, so, to have a good sleep at night, I prefer to test my code on all architectures, with as many compilers and compiler versions as I can. With the xPack binary tools this is quite easy, and does not require any heavy solutions, like docker. For example, the current test set includes 216 different combinations, debug/release, cmake/meson, gcc/clang, native/cortex/riscv, etc.

          Like

        • But you will only know that it is portable with the abstraction and only up to the ‘Systick level’, not if it really would be portable and run with the hardware. I see your point if you want to run the high level part on say RISC-V and on ARM Cortex-M. But QUEMU would be imho of very limited value if you need to test your system for a given hardware, an having an abstraction layer working on the host (e.g. Linux) anyway.

          Like

        • If you build and run your unit tests only in the Docker container, you’ll only prove that your code compiles and runs on Linux. If you were able to do this on Linux, your tests definitely don’t need any specific hardware. Therefore, if you additionally build exactly the same tests with various Cortex-M and RISC-V compilers and run them via QEMU, you’ll also know that your code compiles and runs on those architectures. The SysTick is not relevant here; I use it only to run unit tests for RTOS-based applications, but truly portable libraries don’t need it at all. Testing for particular hardware is part of integration tests—you cannot run them in your Docker container. Therefore, for portable tests, QEMU is a great addition.

          Like

        • I’m build on Docker (Linux), but then I run the application either on that host (Linux) or run it with a test runner on the embedded target as well. So it runs both on the host and on-target that way.

          Like

        • Sure, running the test on your specific embedded target is the ultimate test, but this requires your target board to be attached to the CI/CD server, so you cannot use public test runners (like GitHub Actions).

          For portable libraries there is no specific target; they are expected to run on (almost) any Cortex/RISC-V board. It is not realistic to keep a large set of boards attached to my Mac to run the tests. Therefore, I selected the existing QEMU Cortex/RISC-V emulated boards and configured builds for all of them. They run exactly the same code as the native (Windows/Linux/macOS) tests, compiled for each architecture, so there is no need for specific hardware. This solution also has the advantage that I can run the CI tests automatically on each push via GitHub Actions, not on my local machine.

          So, for specific use cases, running the tests locally may be enough. For more complex use cases involving portable libraries, running in RAM via QEMU may be an useful solution.

          Like

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.