One of the biggest fears of embedded systems developers are stack overflows. FreeRTOS includes a cool feature to monitor and catch task stack overflows. But what about the MSP (Main Stack Pointer) on ARM, or the interrupt stack? What if not using an RTOS and running a bare-metal application?

There is a simple way monitoring stack usage at runtime, and for this I want to share the routines and what is now available inside the McuArm module.
On the ARM Cortex-M (e.g. 0, 3, 4 or 7) architecture, there are two stack pointers:
- MSP: Main Stack Pointer, this one is active after reset and used for the startup, main() and interrupts
- PSP: Process Stack Pointer: this one is used for example by an RTOS like FreeRTOS for the current thread or process stack pointer. The image below shows the MSP and PSP register for a bare-metal application, where the PSP is not used.

That architecture is actually very nice: the tasks or processes do not need to allocate extra space for the interrupt stacks. But as for every stack: it is not good if the stack space allocated is not enough for the application. Cortex-M33 architecture features a special stack overflow protection register. Some architectures (e.g. M4) feature a memory protection (MPU), but ARM did not design it well because it does not provide a fine granularity (e.g. only 4 KByte memory blocks), which makes not usable.
This article shows how using a simple approach with a predefined pattern on the stack can be used to monitor stack usage. For other approaches, check out the links section at the end of this article.
Heap and Stack usage
The MCUXpresso IDE has a nice view for the current stack usage. However, that view only shows the status with the current MSP, and does *not* consider any stack used previously which could be much higher (e.g. during interrupt execution):

Image Information
Another useful information is the ‘Image Information’. Based on gcc compiler information it can estimate the stack usage. However, it misses things like recursion or standard library function calls, or of there is no debug information about it:

FreeRTOS
FreeRTOS has built-in support to monitor and trap on stack overflows, and the usage is nicely shown too:

But this only covers the tasks, not the MSP or interrupt stack.
Stack Fill Pattern
A very generic way to determine any stack usage is:
- Fill the stack with a predefined pattern
- Check how much of that pattern is still present (not overwritten)
I have used that approach ad-hoc, e.g. doing this with the GDB debugger. But when I recently bench-marked different stack usage of different standard libraries, this was not efficient. So I decided to add a few routines to the McuArm module to make it easier and everyone else can take advantage of it.
McuArm Implementation
The module has a few configuration macros which can be set or overwritten by the project settings.
#ifndef McuArmTools_CONFIG_STACK_CHECK_PATTERN
#define McuArmTools_CONFIG_STACK_CHECK_PATTERN (0xdeadbeef)
/*!< Byte pattern on stack, to mark it is 'unused' */
#endif
/* The two symbols below shall be set by the linker script file to mark top and bottom of stack. Note that the two addresses need to be 32bit aligned! */
#ifndef McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP
#define McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP _vStackTop
#endif
#ifndef McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE
#define McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE _vStackBase
#endif
/* on ARM Cortex, the stack grows from 'top' (higher address) to the 'bottom' (lower address) */
extern uint32_t McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE; /*!< base address of stack, this is a numerically lower address than the top */
extern uint32_t McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP; /*!< top or end of stack, at the top. Highest address. Stack is growing from base to top */
That way a different pattern or different linker symbols can be configured.
Next, there are functions to get the current stack pointer, linker allocated stack size and the linker symbols:
void *McuArmTools_GetSP(void) {
#ifdef __GNUC__
void *sp;
__asm__ __volatile__ ("mrs %0, msp" : "=r"(sp));
return sp;
#else
#warning "only for GCC"
return NULL;
#endif
}
/*!
* \brief Return the stack bottom, as configured in the linker file. The stack grows from the top (higher address) to the base (lower address).
* \return Return the address of the top (last) stack unit
*/
uint32_t *McuArmTools_GetLinkerMainStackBase(void) {
return &McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE;
}
/*!
* \brief Return the stack top, as set in the linker file. The stack grows from the top (higher address) to the base (lower address).
* \return Return the address of the top (last) stack unit
*/
uint32_t *McuArmTools_GetLinkerMainStackTop(void) {
return &McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP;
}
/*!
* \brief Returns the size of the main (MSP) stack size, using linker symbols for top (higher address) and base (lower address).
* \return Number of bytes allocated by the linker for the stack
*/
uint32_t McuArmTools_GetLinkerMainStackSize(void) {
return (uint32_t)&McuArmTools_CONFIG_LINKER_SYMBOL_STACK_TOP - (uint32_t)&McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE;
}
To fill the stack with the pre-defined pattern, the following function is used:
/*!
* \brief Fill the stack space with the checking pattern, up to the current MSP.
*/
void McuArmTools_FillMainStackSpace(void) {
uint32_t *base = (uint32_t*)&McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE;
uint32_t *msp = McuArmTools_GetSP(); /* get current MSP stack pointer */
/* the current MSP is near the top */
while(base<msp) { /* fill from the base to the top */
*base = McuArmTools_CONFIG_STACK_CHECK_PATTERN;
base++;
}
}
Finally, two functions to get the size of used and unused stack space:
/*!
* \brief Calculates the unused stack space, based on the checking pattern.
* \return Number of unused main stack space.
*/
uint32_t McuArmTools_GetUnusedMainStackSpace(void) {
uint32_t unused = 0; /* number of unused bytes */
uint32_t *p = (uint32_t*)&McuArmTools_CONFIG_LINKER_SYMBOL_STACK_BASE;
/* check if the pattern stored on the stack has been changed */
while (*p==McuArmTools_CONFIG_STACK_CHECK_PATTERN) {
unused += sizeof(uint32_t); /* count number of unused bytes */
p++;
}
return unused; /* return the number of unused bytes */
}
/*!
* \brief Returns the used main stack space, based on the overwritten checking pattern.
* \return Number of used main stack bytes
*/
uint32_t McuArmTools_GetUsedMainStackSpace(void) {
return McuArmTools_GetLinkerMainStackSize()-McuArmTools_GetUnusedMainStackSpace();
}
Usage
Usage is very simple: fill first the stack with the pattern, and later on calculate the used (or free space):
int main(void) {
McuArmTools_FillMainStackSpace();
BOARD_InitBootPins();
BOARD_InitBootClocks();
stdlib_test();
printf("stack size used: %ld\n", McuArmTools_GetUsedMainStackSpace());
for(;;) {
__asm volatile ("nop");
}
return 0 ;
}
This can be verified in the memory view too:

Summary
There are many different ways to estimate or calculate the needs stack space: from static analysis to trial-and-error up to checking it with a stack memory pattern. Using a stack memory pattern is very simple and does not require any dedicated hardware, yet it is very useful. Just keep in mind that it only will cover what you have been executing (or tested), so a static worst case analysis is still required for safety critical applications.
Happy stacking π
Links
- McuLib: https://github.com/ErichStyger/McuOnEclipseLibrary
- Stack Canaries with GCC: Checking for Stack Overflow atΒ Runtime
- Adding the Picolib C/C++ Standard Library to an existing GNU ARM Embedded Toolchain
- Changing Heap and Stack Size for NXP Kinetis SDK V2.0 gcc Projects
- Understanding FreeRTOS Task Stack Usage and Kernel Awareness Information
- GNU Static Stack Usage Analysis
- Optimized FreeRTOS: Stack Check and SysTick for ARM Cortex Cores
Hi Erich – I made a FreeRTOS port for M3-M7 Cortex that checks MSP (interrupt stack), see: https://github.com/DRNadler/FreeRTOS_helpers.
It’s not mainlined as I did not code support for compilers other than GCC.
Hope that helps someone!
Best Regards, Dave
LikeLike
Hi Dave,
thanks, I was not aware of your implementation. I’ll have a look in the next days.
LikeLike
Stack underflow is another thing that should be checked for/guarded against. But, I suppose if a high level language such as C/C++ is used exclusively, other than for some startup/utility code, then that should never happen?
LikeLike
Good questions! π
Yes, it could happen, and this is why usually I place the stack at the end of the RAM. So if it would underflow, then it would run into illegal memory and trap on that.
Other than that, I could imagine a stack underflow if using variable stack frames (yes, gcc supports that, which is very, very ugly and I would not let pass a student who would use that) and the code would damage the variable stack frame size on the stack.
But usually in such scenarios, the caller function would easily crash and that would be easy to isolate?
LikeLike
Great article, but it is better to notice that this approach doesn’t return the worst case stack usage. Of course, the most important value is the WORST CASE stack usage because we should size the stack space on this value.
It’s very odd there aren’t tools that calculate exactly the amount of the stack space in the worst case condition. I know it isn’t a simple task (recursion, function pointers and so on), but I think that in many embedded software the developer could instruct effectively this type of tool (for example, associating to the function pointers a list of possible values) so that it could calculate the exact worst case stack usage.
LikeLike
Thank you!
The good thing with the approach presented is that it returns the worst case for all the test cases or how the application is running.
As for finding the true worst case with static analysis, you have to specify the recursion depths plus map any function pointer calls, plus giving the interrupt nesting levels.
With this, you can calculate the worst case, and you can do this for example with https://mcuoneclipse.com/2015/08/21/gnu-static-stack-usage-analysis/ (this is part of the links at the end of the article too).
LikeLike
I take a different approach.
I put the macro WATCHDOG_STACK_WATCH() , found below, at the entry point of all of my ISRs.
This macro saves the stack pointer and the return address of the calling function,
in non-initialized global RAM. In a location unlikely to be over written by any stack faults. These can then be examined at system start after a fault. They can also be watched at runtime by a watchdog process, to see if things are getting to close to the edge.
static __inline__ void *sp_get(void)
{
void *sp;
__asm__ __volatile__ (“mrs %0, msp” : “=r”(sp));
return( sp );
}
/*
* Pure C version, that generates a warning about returning the
* address of a temporary value:
*
* void *sp_get( void )
* {
* volatile uint32_t dummy = 0UL; // Put an initialized variable on the stack
* return( (void *) &dummy ); // Return its address – therefore the (approx.) present SP value
* }
*
*/
#define WATCHDOG_STACK_WATCH() \
do{ \
if( sp_get() < stack_watch_vptr_g ) \
{ \
stack_watch_ptr_vng = sp_get(); /* If (sp) <= __bss_end CRASH! Do a WatchDog Bark here, and leave bread crumbs? */ \
stack_return_ptr_vng = __builtin_extract_return_addr( __builtin_return_address(0) ); \
} \
}while( 0 ) /* Save the stack low point mark */
/*
* This version avoids allocating a local variable, because if there
* is a fault that could fail.
* The downside is by using two calls to sp_get(), nested interrupts
* have the potential to return the wrong value.
* Nested interrupts are best avoided in the first place, so no issue
* in current code.
*/
An other approach to detecting stack overflow is to put the stack at the bottom of RAM. Then a hardware fault will be generated, assuming there is nothing allocated below the RAM, which is usually the case.
[I was not sure my earlier comment about that was posted because of an error during posting.]
LikeLiked by 1 person
That a really good and cool way to handle this, thanks for sharing!
LikeLike
“if( sp_get() < stack_watch_vptr_g ) \"
should read:
if( sp_get() < stack_watch_ptr_vng ) \
LikeLiked by 1 person
Hello Erich,
Good post, as usual π
You’ll find another good complementary page on the same subject here : https://interrupt.memfault.com/blog/using-psp-msp-limit-registers-for-stack-overflow
LikeLike
Thanks! And thanks for the link: this was I was meaning with the M33 ability using a stack limit register.
LikeLike