Avoiding Stack Overflows: Application Monitoring the Stack Usage

One of the biggest fears of embedded systems developers are stack overflows. FreeRTOS includes a cool feature to monitor and catch task stack overflows. But what about the MSP (Main Stack Pointer) on ARM, or the interrupt stack? What if not using an RTOS and running a bare-metal application?

There is a simple way monitoring stack usage at runtime, and for this I want to share the routines and what is now available inside the McuArm module.

On the ARM Cortex-M (e.g. 0, 3, 4 or 7) architecture, there are two stack pointers:

MSP: Main Stack Pointer, this one is active after reset and used for the startup, main() and interrupts
PSP: Process Stack Pointer: this one is used for example by an RTOS like FreeRTOS for the current thread or process stack pointer. The image below shows the MSP and PSP register for a bare-metal application, where the PSP is not used.

That architecture is actually very nice: the tasks or processes do not need to allocate extra space for the interrupt stacks. But as for every stack: it is not good if the stack space allocated is not enough for the application. Cortex-M33 architecture features a special stack overflow protection register. Some architectures (e.g. M4) feature a memory protection (MPU), but ARM did not design it well because it does not provide a fine granularity (e.g. only 4 KByte memory blocks), which makes not usable.

This article shows how using a simple approach with a predefined pattern on the stack can be used to monitor stack usage. For other approaches, check out the links section at the end of this article.

Heap and Stack usage

The MCUXpresso IDE has a nice view for the current stack usage. However, that view only shows the status with the current MSP, and does *not* consider any stack used previously which could be much higher (e.g. during interrupt execution):

Image Information

Another useful information is the ‘Image Information’. Based on gcc compiler information it can estimate the stack usage. However, it misses things like recursion or standard library function calls, or of there is no debug information about it:

FreeRTOS

FreeRTOS has built-in support to monitor and trap on stack overflows, and the usage is nicely shown too:

But this only covers the tasks, not the MSP or interrupt stack.

Stack Fill Pattern

A very generic way to determine any stack usage is:

Fill the stack with a predefined pattern
Check how much of that pattern is still present (not overwritten)

I have used that approach ad-hoc, e.g. doing this with the GDB debugger. But when I recently bench-marked different stack usage of different standard libraries, this was not efficient. So I decided to add a few routines to the McuArm module to make it easier and everyone else can take advantage of it.

McuArm Implementation

The module has a few configuration macros which can be set or overwritten by the project settings.

#ifndef McuArmTools_CONFIG_STACK_CHECK_PATTERN
  #define McuArmTools_CONFIG_STACK_CHECK_PATTERN  (0xdeadbeef)
    /*!< Byte pattern on stack, to mark it is 'unused' */
#endif

/* The two symbols below shall be set by the linker script file to mark top and bottom of stack. Note that the two addresses need to be 32bit aligned! */
#ifndef McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP
  #define McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP  _vStackTop
#endif

#ifndef McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE
  #define McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE _vStackBase
#endif

/* on ARM Cortex, the stack grows from 'top' (higher address) to the 'bottom' (lower address) */
extern uint32_t McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE; /*!< base address of stack, this is a numerically lower address than the top */
extern uint32_t McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP;  /*!< top or end of stack, at the top. Highest address. Stack is growing from base to top */

That way a different pattern or different linker symbols can be configured.

Next, there are functions to get the current stack pointer, linker allocated stack size and the linker symbols:

void *McuArmTools_GetSP(void) {
#ifdef __GNUC__
  void *sp;

  __asm__ __volatile__ ("mrs %0, msp" : "=r"(sp));
  return sp;
#else
  #warning "only for GCC"
  return NULL;
#endif
}
/*!
 * \brief Return the stack bottom, as configured in the linker file. The stack grows from the top (higher address) to the base (lower address).
 * \return Return the address of the top (last) stack unit
 */
uint32_t *McuArmTools_GetLinkerMainStackBase(void) {
  return &McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE;
}

/*!
 * \brief Return the stack top, as set in the linker file. The stack grows from the top (higher address) to the base (lower address).
 * \return Return the address of the top (last) stack unit
 */
uint32_t *McuArmTools_GetLinkerMainStackTop(void) {
  return &McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP;
}

/*!
 * \brief Returns the size of the main (MSP) stack size, using linker symbols for top (higher address) and base (lower address).
 * \return Number of bytes allocated by the linker for the stack
 */
uint32_t McuArmTools_GetLinkerMainStackSize(void) {
  return (uint32_t)&McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP - (uint32_t)&McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE;
}

To fill the stack with the pre-defined pattern, the following function is used:

/*!
 * \brief Fill the stack space with the checking pattern, up to the current MSP.
 */
void McuArmTools_FillMainStackSpace(void) {
  uint32_t *base = (uint32_t*)&McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE;
  uint32_t *msp = McuArmTools_GetSP(); /* get current MSP stack pointer */
  /* the current MSP is near the top */
  while(base<msp) { /* fill from the base to the top */
    *base = McuArmTools_CONFIG_STACK_CHECK_PATTERN;
    base++;
  }
}

Finally, two functions to get the size of used and unused stack space:

/*!
 * \brief Calculates the unused stack space, based on the checking pattern.
 * \return Number of unused main stack space.
 */
uint32_t McuArmTools_GetUnusedMainStackSpace(void) {
  uint32_t unused = 0; /* number of unused bytes */
  uint32_t *p = (uint32_t*)&McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE;

  /* check if the pattern stored on the stack has been changed */
  while (*p==McuArmTools_CONFIG_STACK_CHECK_PATTERN) {
    unused += sizeof(uint32_t); /* count number of unused bytes */
    p++;
  }
  return unused; /* return the number of unused bytes */
}

/*!
 * \brief Returns the used main stack space, based on the overwritten checking pattern.
 * \return Number of used main stack bytes
 */
uint32_t McuArmTools_GetUsedMainStackSpace(void) {
  return McuArmTools_GetLinkerMainStackSize()-McuArmTools_GetUnusedMainStackSpace();
}

Usage

Usage is very simple: fill first the stack with the pattern, and later on calculate the used (or free space):

int main(void) {
  McuArmTools_FillMainStackSpace();

  BOARD_InitBootPins();
  BOARD_InitBootClocks();

  stdlib_test();

  printf("stack size used: %ld\n", McuArmTools_GetUsedMainStackSpace());

  for(;;) {
    __asm volatile ("nop");
  }
  return 0 ;
}

This can be verified in the memory view too:

Memory View with used and untouched stack space

Summary

There are many different ways to estimate or calculate the needs stack space: from static analysis to trial-and-error up to checking it with a stack memory pattern. Using a stack memory pattern is very simple and does not require any dedicated hardware, yet it is very useful. Just keep in mind that it only will cover what you have been executing (or tested), so a static worst case analysis is still required for safety critical applications.

Happy stacking 🙂

Links

I take a different approach.

I put the macro WATCHDOG_STACK_WATCH() , found below, at the entry point of all of my ISRs.

This macro saves the stack pointer and the return address of the calling function,
in non-initialized global RAM. In a location unlikely to be over written by any stack faults. These can then be examined at system start after a fault. They can also be watched at runtime by a watchdog process, to see if things are getting to close to the edge.

static __inline__ void *sp_get(void)
{
void *sp;

__asm__ __volatile__ (“mrs %0, msp” : “=r”(sp));

return( sp );
}
/*
* Pure C version, that generates a warning about returning the
* address of a temporary value:
*
* void *sp_get( void )
* {
* volatile uint32_t dummy = 0UL; // Put an initialized variable on the stack
* return( (void *) &dummy ); // Return its address – therefore the (approx.) present SP value
* }
*
*/

#define WATCHDOG_STACK_WATCH() \
do{ \
if( sp_get() < stack_watch_vptr_g ) \
{ \
stack_watch_ptr_vng = sp_get(); /* If (sp) <= __bss_end CRASH! Do a WatchDog Bark here, and leave bread crumbs? */ \
stack_return_ptr_vng = __builtin_extract_return_addr( __builtin_return_address(0) ); \
} \
}while( 0 ) /* Save the stack low point mark */
/*
* This version avoids allocating a local variable, because if there
* is a fault that could fail.
* The downside is by using two calls to sp_get(), nested interrupts
* have the potential to return the wrong value.
* Nested interrupts are best avoided in the first place, so no issue
* in current code.
*/

An other approach to detecting stack overflow is to put the stack at the bottom of RAM. Then a hardware fault will be generated, assuming there is nothing allocated below the RAM, which is usually the case.
[I was not sure my earlier comment about that was posted because of an error during posting.]

LikeLiked by 1 person

Reply ↓

11 thoughts on “Avoiding Stack Overflows: Application Monitoring the Stack Usage”

Dave Nadler on February 19, 2023 at 18:29 said:

Hi Erich – I made a FreeRTOS port for M3-M7 Cortex that checks MSP (interrupt stack), see: https://github.com/DRNadler/FreeRTOS_helpers.
It’s not mainlined as I did not code support for compilers other than GCC.
Hope that helps someone!
Best Regards, Dave

LikeLike

Reply ↓
- Erich Styger on February 19, 2023 at 19:42 said:
  
  Hi Dave,
  thanks, I was not aware of your implementation. I’ll have a look in the next days.
  
  LikeLike
  
  Reply ↓
Tommy Murphy on February 19, 2023 at 21:09 said:

Stack underflow is another thing that should be checked for/guarded against. But, I suppose if a high level language such as C/C++ is used exclusively, other than for some startup/utility code, then that should never happen?

LikeLike

Reply ↓
- Erich Styger on February 20, 2023 at 11:48 said:
  
  Good questions! 🙂
  Yes, it could happen, and this is why usually I place the stack at the end of the RAM. So if it would underflow, then it would run into illegal memory and trap on that.
  Other than that, I could imagine a stack underflow if using variable stack frames (yes, gcc supports that, which is very, very ugly and I would not let pass a student who would use that) and the code would damage the variable stack frame size on the stack.
  But usually in such scenarios, the caller function would easily crash and that would be easy to isolate?
  
  LikeLike
  
  Reply ↓
pozzugno on February 19, 2023 at 22:38 said:

Great article, but it is better to notice that this approach doesn’t return the worst case stack usage. Of course, the most important value is the WORST CASE stack usage because we should size the stack space on this value.
It’s very odd there aren’t tools that calculate exactly the amount of the stack space in the worst case condition. I know it isn’t a simple task (recursion, function pointers and so on), but I think that in many embedded software the developer could instruct effectively this type of tool (for example, associating to the function pointers a list of possible values) so that it could calculate the exact worst case stack usage.

LikeLike

Reply ↓
- Erich Styger on February 20, 2023 at 11:44 said:
  
  Thank you!
  The good thing with the approach presented is that it returns the worst case for all the test cases or how the application is running.
  As for finding the true worst case with static analysis, you have to specify the recursion depths plus map any function pointer calls, plus giving the interrupt nesting levels.
  With this, you can calculate the worst case, and you can do this for example with https://mcuoneclipse.com/2015/08/21/gnu-static-stack-usage-analysis/ (this is part of the links at the end of the article too).
  
  LikeLike
  
  Reply ↓
rlpaddock on February 20, 2023 at 16:12 said:

I take a different approach.

I put the macro WATCHDOG_STACK_WATCH() , found below, at the entry point of all of my ISRs.

This macro saves the stack pointer and the return address of the calling function,
in non-initialized global RAM. In a location unlikely to be over written by any stack faults. These can then be examined at system start after a fault. They can also be watched at runtime by a watchdog process, to see if things are getting to close to the edge.

static __inline__ void *sp_get(void)
{
void *sp;

__asm__ __volatile__ (“mrs %0, msp” : “=r”(sp));

return( sp );
}
/*
* Pure C version, that generates a warning about returning the
* address of a temporary value:
*
* void *sp_get( void )
* {
* volatile uint32_t dummy = 0UL; // Put an initialized variable on the stack
* return( (void *) &dummy ); // Return its address – therefore the (approx.) present SP value
* }
*
*/

#define WATCHDOG_STACK_WATCH() \
do{ \
if( sp_get() < stack_watch_vptr_g ) \
{ \
stack_watch_ptr_vng = sp_get(); /* If (sp) <= __bss_end CRASH! Do a WatchDog Bark here, and leave bread crumbs? */ \
stack_return_ptr_vng = __builtin_extract_return_addr( __builtin_return_address(0) ); \
} \
}while( 0 ) /* Save the stack low point mark */
/*
* This version avoids allocating a local variable, because if there
* is a fault that could fail.
* The downside is by using two calls to sp_get(), nested interrupts
* have the potential to return the wrong value.
* Nested interrupts are best avoided in the first place, so no issue
* in current code.
*/

An other approach to detecting stack overflow is to put the stack at the bottom of RAM. Then a hardware fault will be generated, assuming there is nothing allocated below the RAM, which is usually the case.
[I was not sure my earlier comment about that was posted because of an error during posting.]

LikeLiked by 1 person

Reply ↓
- Erich Styger on February 21, 2023 at 10:45 said:
  
  That a really good and cool way to handle this, thanks for sharing!
  
  LikeLike
  
  Reply ↓
- rlpaddock on February 21, 2023 at 17:11 said:
  
  “if( sp_get() < stack_watch_vptr_g ) \"
  
  should read:
  
  if( sp_get() < stack_watch_ptr_vng ) \
  
  LikeLiked by 1 person
  
  Reply ↓
DrPi on February 25, 2023 at 13:18 said:

Hello Erich,
Good post, as usual 🙂
You’ll find another good complementary page on the same subject here : https://interrupt.memfault.com/blog/using-psp-msp-limit-registers-for-stack-overflow

LikeLike

Reply ↓
- Erich Styger on February 25, 2023 at 13:35 said:
  
  Thanks! And thanks for the link: this was I was meaning with the M33 ability using a stack limit register.
  
  LikeLike
  
  Reply ↓

What do you think? Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

MCU on Eclipse

Everything on Eclipse, Microcontrollers and Software

Avoiding Stack Overflows: Application Monitoring the Stack Usage

Heap and Stack usage

Image Information

FreeRTOS

Stack Fill Pattern

McuArm Implementation

Usage

Summary

Links

11 thoughts on “Avoiding Stack Overflows: Application Monitoring the Stack Usage”

What do you think? Cancel reply

Heap and Stack usage

Image Information

FreeRTOS

Stack Fill Pattern

McuArm Implementation

Usage

Summary

Links

Share this:

Related

11 thoughts on “Avoiding Stack Overflows: Application Monitoring the Stack Usage”

What do you think? Cancel reply