Stack Canaries with GCC: Checking for Stack Overflow at Runtime

Stack overflows are probably the number 1 enemy of embedded applications: a call to a a printf() monster likely will use too much stack space, resulting in overwritten memory and crashing applications. But stack memory is limited and expensive on these devices, so you don’t want to spend too much space for it. But for sure not to little too. Or bad things will happen.

The Eclipse based MCUXpresso IDE has a ‘Heap and Stack Usage’ view which can be used to monitor the stack usage and shows that a stack overflow happened:

Heap and Stack Usage

Heap and Stack Usage

But this is using the help of the debugger: how to catch stack overflows at runtime without the need of a debugger? There is an option in the GNU gcc compiler to help with this kind of situation, even if it was not originally intended for something different.

Stack Overflows

The problem is that application call stack (function calls, pushing parameters and using local variables) is growing into one direction. If the reserved stack space is not large enough, the call stack space can grow into the other memory area and corrupt data:

stack overflow

stack overflow

There are different ways to deal with this:

  • Static Analysis. Making a good analysis how much stack is needed. Recursion can be a problem.
  • Using MPU (Hardware Memory Protection) to detect and protect the overflow
  • Using hardware watch points to detect the overwrite
  • Place sentinel values at the end of the stack space which are periodically checked

The last option is what can be turned on in FreeRTOS.

Stack Overflow Detection in FreeRTOS

This article uses the NXP MCUXpresso IDE V11 which uses GNU tools. In this article I describe an approach with the GNU gcc in a bare-metal (no RTOS) environment, because FreeRTOS already includes an option to check for a stack overflow at runtime: the check is performed at task context switch, see “FreeRTOS – stacks and stack overflow” for more details.

FreeRTOS has two methods: one is just comparing the current task stack pointer with a known stack limit value (if it is outside the stack range). The second method includes the first plus places a pattern at the end of the stack and verifies it if it has been touched. The second method takes more time. And both methods are used at context switch time only, so stack overflow detection might not be detected for a while.

Static Stack Usage Checking

The MCUXpresso IDE V11 includes the ‘Image Info’ view which calculates the stack space needed:

Image Info

Image Info

This is a good start, but it does not have the information from the libraries. To get that information, one would have to rebuild correctly all the GNU libraries which can be a daunting task.

GNU StackGuard (Buffer Overflow Exploit Protection)

There is another problem especially when considering security: arbitrary code execution causing a stack overflow/corruption with the goal to take control over the system. These are called ‘stack overflow exploits’. See http://phrack.org/issues/49/14.html for a good tutorial on this concept (and if you want to get into the hacking business 😉 ).

To counter these exploits, compilers including the gcc started to add ‘hardening’ options to detect these exploits. One of it is the GNU gcc StackGuard (see ftp://gcc.gnu.org/pub/gcc/summit/2003/Stackguard.pdf). In that approach, the compiler is placing a ‘canary’ guard into each instrumented function stack frame:

Similar to the canaries used in coal mines, a stack canary is a variable with a special value placed at the end of the stack memory. Assuming that an exploit with a stack buffer overflow will very likely overwrite that canary, it can be detected by the by the running program.

GCC and -fstack-protector

The gcc compiler provides a set of options to use canaries (see https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html).

-fstack-protector: Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.

-fstack-protector-all: Like -fstack-protector except that all functions are protected.

For example add that option to the compiler settings like this:

-fstack-protector-all

-fstack-protector-all

How does it work?

Below is a small function which prints a value. Possibly that printf() might cause a stack overflow:

void printValue(int val) {
  printf("The value is: '%d'", val);
}

If using the Stack Guard functionality of the GNU compiler, I have to provide two things:

  1. stack guard (32bit) value, ideally with a ‘random’ value, named __stack_chk_guard
  2. error callback function, named __stack_chk_fail

Below is a very simple implementation of this:

unsigned long __stack_chk_guard = 0xDEADBEEF;

void __stack_chk_fail(void) { /* will be called if guard/canary gets corrupted */
  /* Handle error, print error message, stop the target, ... */
  DisableInterrupts();
  __asm volatile("bkpt #0"); /* break target */
}

💡 Check your library implementation! For example the NXP provided NewLib and NewLib nano libraries in MCUXpresso IDE V11.0.1 already include a default implementation of the guard variable and fail hook (as weak symbols). The RedLib library does not have it, so you have to add it anyway. For newlib and newlib nano provide your own implementation.

The compiler generates the following code:

  1. At function entry, it stores the __stack_chk_guard value into the stack frame at function entry
  2. At function exit, the guard value on the stack is compared against the value in __stack_chk_guard

To illustrate this, here the commented disassembly (ARM Cortex-M4F):

00000000 <printValue>:
   0:	b580      	push	{r7, lr}    ; push used regs
   2:	b084      	sub	sp, #16         ; reserve space
   4:	af00      	add	r7, sp, #0      ; move SP to R7
   6:	6078      	str	r0, [r7, #4]    ; store param
   8:	4b08      	ldr	r3, [pc, #32]	; load &__stack_chk_guard
   a:	681b      	ldr	r3, [r3, #0]    ; load content of it
   c:	60fb      	str	r3, [r7, #12]   ; store canary value
   e:	6879      	ldr	r1, [r7, #4]    ; load fucntion param
  10:	4807      	ldr	r0, [pc, #28]	; load &printf
  12:	f7ff fffe 	bl	0 <_printf>     ; call printf
			12: R_ARM_THM_CALL	_printf
  16:	bf00      	nop
  18:	4b04      	ldr	r3, [pc, #16]	; load &__stack_chk_guard
  1a:	68fa      	ldr	r2, [r7, #12]   ; load canary
  1c:	681b      	ldr	r3, [r3, #0]    ; load __stack_chk_guard
  1e:	429a      	cmp	r2, r3          ; compare it
  20:	d001      	beq.n	26              ; match?
  22:	f7ff fffe 	bl	0 <printValue>  ; no: call error handler
			22: R_ARM_THM_CALL	__stack_chk_fail
  26:	3710      	adds	r7, #16         ; normal exit code
  28:	46bd      	mov	sp, r7          ; restore sp
  2a:	bd80      	pop	{r7, pc}        ; resore pushed regs
	...
			2c: R_ARM_ABS32	__stack_chk_guard
			30: R_ARM_ABS32	.rodata

💡 The point to make here is: the check is something has overwritten the stack space of the instrumented function (printValue() in this case). The gcc original implementation does not catch my case above where the allocated stack for the application overflows.

Excluding Functions from Protection

With –fstack-protector-all I’m instrumenting all functions. Of course that instrumentation has a cost a runtime. The other thing is that the startup code might cause false alarm if the canary variable has not been setup yet or is not initialized yet. For this, I can use the following attribute:

__attribute__ ((no_instrument_function))

For example I have excluded the data initialization (zero-out and copy-down) in my startup code that way:

disabled canary check with no_instrument_function attribute

disabled canary check with no_instrument_function attribute

Stack End Canary

As explained above, the gcc implementation is about exploit code which tries to overwrite the stack and return address to execute arbitrary code. My goal is to detect the problem that there is not enough stack space for the application.

So instead checking if the canary in each function has been overwritten, I can check if the ‘global’ canary __stack_chk_guard  is overwritten :-).

For this, I’m placing the global canary at the end of the stack, using the approach I have described in “Defining Variables at Absolute Addresses with gcc”:

/* place the following canary variable at the end of the stack */
__attribute__((section (".stack"))) unsigned long __stack_chk_guard = 0xDEADBEEF;

This is of course depending on the linker file, and this is how I have my stack space allocated:

     _StackSize = 0x100;
     /* Reserve space in memory for Stack */
    .heap2stackfill  :
    {
        . += _StackSize;
    } > SRAM_UPPER
    /* Locate actual Stack in memory map */
    .stack ORIGIN(SRAM_UPPER) + LENGTH(SRAM_UPPER) - _StackSize - 0:  ALIGN(4)
    {
        _vStackBase = .;
        . = ALIGN(4);
        _vStackTop = . + _StackSize;
    } > SRAM_UPPER

Verify in the map file that the canary is at the right place:

__stack_chk_guard

__stack_chk_guard

In the above case the stack bottom is at 0x2001’0000 and grows towards 0x2000’ff00.

💡 Some linker files use an approach to let the heap and stack grow towards each other. This might be a smart idea to utilize memory, but personally I don’t like it. First I rather avoid a growing heap at all, and I rather want to have a controlled environment with a clearly defined stack space area.

Stack Overflow Detection in Action

Below a debug session which catched such a stack overflow :-). If the local canary value does not match any more the global one, the error hook gets called:

Stack Overflow Detected

Stack Overflow Detected

Summary

The gcc StackGuard cannot be only used to detect stack overflow exploits, it is useful too to check the application stack overflow case. Of course this is not a 100% check, because it relies on the fact that an overflow really changes the canary at the end of the stack. There are cases where stack space is allocated but not used. Still, it is a good check with little overhead to each function.

If using FreeRTOS, I use the FreeRTOS build-in task stack overflow protection. And I can combine this with the gcc StackGuard feature, but then this would be either only checking interrupt (MSP) stack or I would use it to harden my code against buffer overflow exploits too. This will slow down each instrumented function, but there is no free security.

Happy Canaring 🙂

Links

Advertisements

1 thought on “Stack Canaries with GCC: Checking for Stack Overflow at Runtime

  1. Pingback: Checking for Stack Overflow at Runtime #Programming #Debugging @McuOnEclipse « Adafruit Industries – Makers, hackers, artists, designers and engineers!

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.