Debugging Hard Faults on ARM Cortex-M

It is as bad as this: my application stopped in an unhandled interrupt service routine:

Cpu_Interrupt
Cpu_Interrupt

That does not tell much. I’m using Processor Expert generated code, and with this all my ‘unhandled’ vectors are pointing the same handler:

Default Handlers in Vectors.c
Default Handlers in Vectors.c

Vectors.c and Default Handlers

That vectors.c is generated by Processor Expert, but I can change it so it generates a different handler for each interrupt. This is configured in the Build options tab of the CPU properties:

Own Handler for every unhandled interrupt
Own Handler for every unhandled interrupt

With this my vector table changes to use a dedicated handler for each vector:

Own Handlers in Vector Table
Own Handlers in Vector Table

And now I see what is causing my problem: a Hard Fault:

Hard Fault
Hard Fault

The question is now: what is causing that hard fault? Answers to this are behind this link. As for simple example, a NULL function pointer call like this will likely cause such a hard fault:

void (*f)(void);
void call_null_pointer_function(void) {
  f(); /* will execute code at address zero */
}

Executing code at address zero is not something wrong, but there is the vector table and likely the instructions there might be illegal instructions.

Another example is the one below which tries to write 10 to the address zero: on most ARM Cortex the vector table at address zero is in FLASH memory, so writing to that ROM is likely to fail and to cause a hard fault too:

void write_to_rom(void) {
  *((int*)0x0) = 10; /* tries to write to address zero */
}

The problem is: how to find the offending position in the code? The Hard Core handler does not provide any help yet. But this application note link gives more details and explains that a lot of information is stored in the system about the fault itself.

What makes things a lot easier is to use a custom handler.

Simple PC Handler

A very minimalistic handler just provides the offending PC (Program Counter position). I’m using here the syntax for ARM gcc (as used with CodeWarrior for MCU10.3 and the KL25Z Freedom board), but can be easily changed to any other compiler.

An easy method is to replace the Processor Expert generated code fro the Hard_Fault handler with the following one:

__attribute__((naked))
PE_ISR(Cpu_ivINT_Hard_Fault)
{
  __asm volatile (
    " movs r0,#4                  \n"  /* load bit mask into R0 */
    " mov r1, lr                  \n"  /* load link register into R1 */
    " tst r0, r1                  \n"  /* compare with bitmask */
    " beq _MSP                    \n"  /* if bitmask is set: stack pointer is in PSP. Otherwise in MSP */
    " mrs r0, psp                 \n"  /* otherwise: stack pointer is in PSP */
    " b _HALT                    \n"  /* go to part which loads the PC */
  "_MSP:                          \n"  /* stack pointer is in MSP register */
    " mrs r0, msp                 \n"  /* load stack pointer into R0 */
  "_HALT:                        \n"  /* find out where the hard fault happened */
    " ldr r1,[r0,#24]             \n"  /* load program counter into R1. R1 contains address of the next instruction where the hard fault happened */
    " bkpt #0                     \n" /* cause the debugger to stop */
  );
}

The assembly code checks which stack we are using (MSP or PSP), and then loads the offending PC position on the stack into the register R1. So R1 will contain the code address where the problem happened:

R1 contains the offending PC
R1 contains the offending PC

Entering that address in the Disassembly View jumps to that position. I just need to keep in mind that the program counter is *after* the problem, and that the program counter has an odd address for ARM Thumb code. So for my example here the problem is caused by the instruction at address 0x608:

Problem at 0x608
Problem at 0x608

Extended Handler

The handler can be extended so it shows as well the other registers stored on the stack:

/**
 * This is called from the HardFaultHandler with a pointer the Fault stack
 * as the parameter. We can then read the values from the stack and place them
 * into local variables for ease of reading.
 * We then read the various Fault Status and Address Registers to help decode
 * cause of the fault.
 * The function ends with a BKPT instruction to force control back into the debugger
 */
#pragma GCC diagnostic ignored "-Wunused-but-set-variable"
void McuHardFault_HandlerC(uint32_t *hardfault_args)
{
  /*lint -save  -e550 Symbol not accessed. */
  static volatile unsigned long stacked_r0;
  static volatile unsigned long stacked_r1;
  static volatile unsigned long stacked_r2;
  static volatile unsigned long stacked_r3;
  static volatile unsigned long stacked_r12;
  static volatile unsigned long stacked_lr;
  static volatile unsigned long stacked_pc;
  static volatile unsigned long stacked_psr;
  static volatile unsigned long _CFSR;
  static volatile unsigned long _HFSR;
  static volatile unsigned long _DFSR;
  static volatile unsigned long _AFSR;
  static volatile unsigned long _BFAR;
  static volatile unsigned long _MMAR;
  stacked_r0 = ((unsigned long)hardfault_args[0]);          /* http://www.asciiworld.com/-Smiley,20-.html                                   */
  stacked_r1 = ((unsigned long)hardfault_args[1]);          /*                         oooo$$$$$$$$$$$$oooo                                 */
  stacked_r2 = ((unsigned long)hardfault_args[2]);          /*                      oo$$$$$$$$$$$$$$$$$$$$$$$$o                             */
  stacked_r3 = ((unsigned long)hardfault_args[3]);          /*                    oo$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$o         o$   $$ o$      */
  stacked_r12 = ((unsigned long)hardfault_args[4]);         /*    o $ oo        o$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$o       $$ $$ $$o$     */
  stacked_lr = ((unsigned long)hardfault_args[5]);          /* oo $ $ "$      o$$$$$$$$$    $$$$$$$$$$$$$    $$$$$$$$$o       $$$o$$o$      */
  stacked_pc = ((unsigned long)hardfault_args[6]);          /* "$$$$$$o$     o$$$$$$$$$      $$$$$$$$$$$      $$$$$$$$$$o    $$$$$$$$       */
  stacked_psr = ((unsigned long)hardfault_args[7]);         /*   $$$$$$$    $$$$$$$$$$$      $$$$$$$$$$$      $$$$$$$$$$$$$$$$$$$$$$$       */
                                                            /*   $$$$$$$$$$$$$$$$$$$$$$$    $$$$$$$$$$$$$    $$$$$$$$$$$$$$  """$$$         */
  /* Configurable Fault Status Register */                  /*    "$$$""""$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     "$$$        */
  /* Consists of MMSR, BFSR and UFSR */                     /*     $$$   o$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     "$$$o      */
  _CFSR = (*((volatile unsigned long *)(0xE000ED28)));      /*    o$$"   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$       $$$o     */
                                                            /*    $$$    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$" "$$$$$$ooooo$$$$o   */
  /* Hard Fault Status Register */                          /*   o$$$oooo$$$$$  $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$   o$$$$$$$$$$$$$$$$$  */
  _HFSR = (*((volatile unsigned long *)(0xE000ED2C)));      /*   $$$$$$$$"$$$$   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     $$$$""""""""        */
                                                            /*  """"       $$$$    "$$$$$$$$$$$$$$$$$$$$$$$$$$$$"      o$$$                 */
  /* Debug Fault Status Register */                         /*             "$$$o     """$$$$$$$$$$$$$$$$$$"$$"         $$$                  */
  _DFSR = (*((volatile unsigned long *)(0xE000ED30)));      /*               $$$o          "$$""$$$$$$""""           o$$$                   */
                                                            /*                $$$$o                                o$$$"                    */
  /* Auxiliary Fault Status Register */                     /*                 "$$$$o      o$$$$$$o"$$$$o        o$$$$                      */
  _AFSR = (*((volatile unsigned long *)(0xE000ED3C)));      /*                   "$$$$$oo     ""$$$$o$$$$$o   o$$$$""                       */
                                                            /*                      ""$$$$$oooo  "$$$o$$$$$$$$$"""                          */
                                                            /*                         ""$$$$$$$oo $$$$$$$$$$                               */
  /* Read the Fault Address Registers. */                   /*                                 """"$$$$$$$$$$$                              */
  /* These may not contain valid values. */                 /*                                     $$$$$$$$$$$$                             */
  /* Check BFARVALID/MMARVALID to see */                    /*                                      $$$$$$$$$$"                             */
  /* if they are valid values */                            /*                                       "$$$""                                 */
  /* MemManage Fault Address Register */
  _MMAR = (*((volatile unsigned long *)(0xE000ED34)));
  /* Bus Fault Address Register */
  _BFAR = (*((volatile unsigned long *)(0xE000ED38)));

#if 0 /* experimental, seems not to work properly with GDB in KDS V3.2.0 */
#ifdef __GNUC__ /* might improve stack, see https://www.element14.com/community/message/199113/l/gdb-assisted-debugging-of-hard-faults#199113 */
  __asm volatile (
      "tst lr,#4     \n" /* check which stack pointer we are using */
      "ite eq        \n"
      "mrseq r0, msp \n" /* use MSP */
      "mrsne r0, psp \n" /* use PSP */
      "mov sp, r0    \n" /* set stack pointer so GDB shows proper stack frame */
  );
#endif
#endif
  __asm("BKPT #0\n") ; /* cause the debugger to stop */
  /*lint -restore */
}

/*
** ===================================================================
**     Method      :  HardFaultHandler (component HardFault)
**
**     Description :
**         Hard Fault Handler
**     Parameters  : None
**     Returns     : Nothing
** ===================================================================
*/
#pragma GCC diagnostic ignored "-Wunused-but-set-variable"
__attribute__((naked))
#if McuLib_CONFIG_SDK_VERSION_USED==McuLib_CONFIG_SDK_RPI_PICO
void isr_hardfault(void)
#elif McuLib_CONFIG_SDK_VERSION_USED != McuLib_CONFIG_SDK_PROCESSOR_EXPERT
void HardFault_Handler(void)
#else
void McuHardFault_HardFaultHandler(void)
#endif
{
  __asm volatile (
    ".syntax unified              \n"  /* needed for the 'adds r1,#2' below */
    " movs r0,#4                  \n"  /* load bit mask into R0 */
    " mov r1, lr                  \n"  /* load link register into R1 */
    " tst r0, r1                  \n"  /* compare with bitmask */
    " beq _MSP                    \n"  /* if bitmask is set: stack pointer is in PSP. Otherwise in MSP */
    " mrs r0, psp                 \n"  /* otherwise: stack pointer is in PSP */
    " b _GetPC                    \n"  /* go to part which loads the PC */
  "_MSP:                          \n"  /* stack pointer is in MSP register */
    " mrs r0, msp                 \n"  /* load stack pointer into R0 */
  "_GetPC:                        \n"  /* find out where the hard fault happened */
    " ldr r1,[r0,#24]             \n"  /* load program counter into R1. R1 contains address of the next instruction where the hard fault happened */
#if McuHardFault_CONFIG_SETTING_SEMIHOSTING
  /* The following code checks if the hard fault is caused by a semihosting BKPT instruction which is "BKPT 0xAB" (opcode: 0xBEAB)
     The idea is taken from the MCUXpresso IDE/SDK code, so credits and kudos to the MCUXpresso IDE team! */
    " ldrh r2,[r1]                \n"  /* load opcode causing the fault */
    " ldr r3,=0xBEAB              \n"  /* load constant 0xBEAB (BKPT 0xAB) into R3" */
    " cmp r2,r3                   \n"  /* is it the BKPT 0xAB? */
    " beq _SemihostReturn         \n"  /* if yes, return from semihosting */
    " b McuHardFault_HandlerC   \n"  /* if no, dump the register values and halt the system */
  "_SemihostReturn:               \n"  /* returning from semihosting fault */
    " adds r1,#2                  \n"  /* r1 points to the semihosting BKPT instruction. Adjust the PC to skip it (2 bytes) */
    " str r1,[r0,#24]             \n"  /* store back the adjusted PC value to the interrupt stack frame */
    " movs r1,#32                 \n"  /* need to pass back a return value to emulate a successful semihosting operation. 32 is an arbitrary value */
    " str r1,[r0,#0]              \n"  /* store the return value on the stack frame */
    " bx lr                       \n"  /* return from the exception handler back to the application */
#else
    " b McuHardFault_HandlerC   \n"  /* decode more information. R0 contains pointer to stack frame */
#endif
  );
}

This will store all the stacked registers into variables I can inspect:

Stacked Registers
Stacked Registers

Summary

With a custom hard fault handler in place, things get a lot easier to solve. So I’m adding that custom handler to my Processor Expert projects to find out what is causing the problem. The only small issue with above approach is that Processor Expert will overwrite my handlers/modifications in Cpu.c, if I do not disable code generation for it. That problem could be solved with a custom handler in the Processor Expert settings. If there is interest about how to do this: post a comment 🙂

Happy Faulting 🙂

34 thoughts on “Debugging Hard Faults on ARM Cortex-M

  1. Pingback: ARM Cortex-M0+ Interrupts and FreeRTOS | MCU on Eclipse

  2. Pingback: A Processor Expert Component to Help with Hard Faults | MCU on Eclipse

  3. Pingback: A new Freedom Board: FRDM-KL05Z | MCU on Eclipse

  4. Pingback: Tutorial: DIY Kinetis SDK Project with Eclipse – Board Configuration | MCU on Eclipse

  5. Hi Erich.
    I have problems of Hard Fault with a KL04.
    I use the kinesit Design Studio 1.1.1.
    Using this post, the problem is in function “zero_fill_bss()”.

    I don’t know how fix this problem.
    Can you help me?
    Thank’s.

    Like

  6. Hi Erich,

    Im trying to follow your steps but im facing some problems that i can’t find out how to solve. I just copied the suggested code to be generated by processor expert for the function PE_ISR(Cpu_ivINT_Hard_Fault), but the compiler says the labels _MSP and _HALT are already defined in the file ccTv4L8N.s, which is not possible to find a source file in my project. Could you please help me to solve that issue? Btw, My codewarrior is 10.5.

    Thanks in advance

    Like

  7. Nice info. But my registers tab is blank, what could cause this? I looked at the registers tab before, no problems. Strange…

    Like

  8. I figured out the problem above (not seeing my registers). But now the problem I’m having is the inline assembly code that you put above is not being executed. The debugger just skips over it. I

    Like

  9. Pingback: Debugging ARM Cortex-M Hard Faults with GDB Custom Command | MCU on Eclipse

  10. Firstly, thanks for an excellent article. A great resource of information.

    I’ve included the full custom hardfault handler but (as noted in the ‘Summary’), I’m now having this issue:

    > The only small issue with above approach is that Processor Expert will overwrite my handlers/modifications in Cpu.c, if I do not disable code generation for it. That problem could be solved with a custom handler in the Processor Expert settings. If there is interest about how to do this: post a comment

    How do we get around this?
    I’ve tried changing the processor expert settings, but when I recreate the PE code the function of course reverts back to the default. I’ve tried altering the code but if I put a function call in to the ISR that will then affect the registers that I want to inspect.
    Any guidance you can offer is gratefully received.

    Like

  11. Pingback: ARM Cortex-M, Interrupts and FreeRTOS: Part 1 | MCU on Eclipse

  12. Pingback: Debugging ARM Cortex-M0+ HardFaults | MCU on Eclipse

  13. Pingback: Cortex-M – Debugging runtime memory corruption – LB9MG

  14. Hi Erich,

    This write up is super helpful. I am not clear with the assembly code used to call the default handler . Especially the below four lines:

    ” movs r0,#4 \n”
    ” movs r1, lr \n”
    ” tst r0, r1 \n”
    ” beq _MSP \n”

    From what I understand LR is checked for 3rd bit set(4 byte alignment). How does this tell us which stack(PSP/ MSP) we are using ?

    Thanks

    Like

  15. Hi, I see you are reading from offset 20 in the pushed stack frame. This is the pushed LR. Why don’t you read from offset 24, which is the pushed PC?

    Like

  16. Pingback: Tutorial: Adding FreeRTOS to where there is no FreeRTOS | MCU on Eclipse

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.