Debugging Hard Faults on ARM Cortex-M

Posted on November 24, 2012 by Erich Styger

It is as bad as this: my application stopped in an unhandled interrupt service routine:

That does not tell much. I’m using Processor Expert generated code, and with this all my ‘unhandled’ vectors are pointing the same handler:

Vectors.c and Default Handlers

That vectors.c is generated by Processor Expert, but I can change it so it generates a different handler for each interrupt. This is configured in the Build options tab of the CPU properties:

Own Handler for every unhandled interrupt

With this my vector table changes to use a dedicated handler for each vector:

And now I see what is causing my problem: a Hard Fault:

The question is now: what is causing that hard fault? Answers to this are behind this link. As for simple example, a NULL function pointer call like this will likely cause such a hard fault:

void (*f)(void);
void call_null_pointer_function(void) {
  f(); /* will execute code at address zero */
}

Executing code at address zero is not something wrong, but there is the vector table and likely the instructions there might be illegal instructions.

Another example is the one below which tries to write 10 to the address zero: on most ARM Cortex the vector table at address zero is in FLASH memory, so writing to that ROM is likely to fail and to cause a hard fault too:

void write_to_rom(void) {
  *((int*)0x0) = 10; /* tries to write to address zero */
}

The problem is: how to find the offending position in the code? The Hard Core handler does not provide any help yet. But this application note link gives more details and explains that a lot of information is stored in the system about the fault itself.

What makes things a lot easier is to use a custom handler.

Simple PC Handler

A very minimalistic handler just provides the offending PC (Program Counter position). I’m using here the syntax for ARM gcc (as used with CodeWarrior for MCU10.3 and the KL25Z Freedom board), but can be easily changed to any other compiler.

An easy method is to replace the Processor Expert generated code fro the Hard_Fault handler with the following one:

__attribute__((naked))
PE_ISR(Cpu_ivINT_Hard_Fault)
{
  __asm volatile (
    " movs r0,#4                  \n"  /* load bit mask into R0 */
    " mov r1, lr                  \n"  /* load link register into R1 */
    " tst r0, r1                  \n"  /* compare with bitmask */
    " beq _MSP                    \n"  /* if bitmask is set: stack pointer is in PSP. Otherwise in MSP */
    " mrs r0, psp                 \n"  /* otherwise: stack pointer is in PSP */
    " b _HALT                    \n"  /* go to part which loads the PC */
  "_MSP:                          \n"  /* stack pointer is in MSP register */
    " mrs r0, msp                 \n"  /* load stack pointer into R0 */
  "_HALT:                        \n"  /* find out where the hard fault happened */
    " ldr r1,[r0,#24]             \n"  /* load program counter into R1. R1 contains address of the next instruction where the hard fault happened */
    " bkpt #0                     \n" /* cause the debugger to stop */
  );
}

The assembly code checks which stack we are using (MSP or PSP), and then loads the offending PC position on the stack into the register R1. So R1 will contain the code address where the problem happened:

Entering that address in the Disassembly View jumps to that position. I just need to keep in mind that the program counter is *after* the problem, and that the program counter has an odd address for ARM Thumb code. So for my example here the problem is caused by the instruction at address 0x608:

Extended Handler

The handler can be extended so it shows as well the other registers stored on the stack:

/**
 * This is called from the HardFaultHandler with a pointer the Fault stack
 * as the parameter. We can then read the values from the stack and place them
 * into local variables for ease of reading.
 * We then read the various Fault Status and Address Registers to help decode
 * cause of the fault.
 * The function ends with a BKPT instruction to force control back into the debugger
 */
#pragma GCC diagnostic ignored "-Wunused-but-set-variable"
void McuHardFault_HandlerC(uint32_t *hardfault_args)
{
  /*lint -save  -e550 Symbol not accessed. */
  static volatile unsigned long stacked_r0;
  static volatile unsigned long stacked_r1;
  static volatile unsigned long stacked_r2;
  static volatile unsigned long stacked_r3;
  static volatile unsigned long stacked_r12;
  static volatile unsigned long stacked_lr;
  static volatile unsigned long stacked_pc;
  static volatile unsigned long stacked_psr;
  static volatile unsigned long _CFSR;
  static volatile unsigned long _HFSR;
  static volatile unsigned long _DFSR;
  static volatile unsigned long _AFSR;
  static volatile unsigned long _BFAR;
  static volatile unsigned long _MMAR;
  stacked_r0 = ((unsigned long)hardfault_args[0]);          /* http://www.asciiworld.com/-Smiley,20-.html                                   */
  stacked_r1 = ((unsigned long)hardfault_args[1]);          /*                         oooo$$$$$$$$$$$$oooo                                 */
  stacked_r2 = ((unsigned long)hardfault_args[2]);          /*                      oo$$$$$$$$$$$$$$$$$$$$$$$$o                             */
  stacked_r3 = ((unsigned long)hardfault_args[3]);          /*                    oo$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$o         o$   $$ o$      */
  stacked_r12 = ((unsigned long)hardfault_args[4]);         /*    o $ oo        o$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$o       $$ $$ $$o$     */
  stacked_lr = ((unsigned long)hardfault_args[5]);          /* oo $ $ "$      o$$$$$$$$$    $$$$$$$$$$$$$    $$$$$$$$$o       $$$o$$o$      */
  stacked_pc = ((unsigned long)hardfault_args[6]);          /* "$$$$$$o$     o$$$$$$$$$      $$$$$$$$$$$      $$$$$$$$$$o    $$$$$$$$       */
  stacked_psr = ((unsigned long)hardfault_args[7]);         /*   $$$$$$$    $$$$$$$$$$$      $$$$$$$$$$$      $$$$$$$$$$$$$$$$$$$$$$$       */
                                                            /*   $$$$$$$$$$$$$$$$$$$$$$$    $$$$$$$$$$$$$    $$$$$$$$$$$$$$  """$$$         */
  /* Configurable Fault Status Register */                  /*    "$$$""""$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     "$$$        */
  /* Consists of MMSR, BFSR and UFSR */                     /*     $$$   o$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     "$$$o      */
  _CFSR = (*((volatile unsigned long *)(0xE000ED28)));      /*    o$$"   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$       $$$o     */
                                                            /*    $$$    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$" "$$$$$$ooooo$$$$o   */
  /* Hard Fault Status Register */                          /*   o$$$oooo$$$$$  $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$   o$$$$$$$$$$$$$$$$$  */
  _HFSR = (*((volatile unsigned long *)(0xE000ED2C)));      /*   $$$$$$$$"$$$$   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     $$$$""""""""        */
                                                            /*  """"       $$$$    "$$$$$$$$$$$$$$$$$$$$$$$$$$$$"      o$$$                 */
  /* Debug Fault Status Register */                         /*             "$$$o     """$$$$$$$$$$$$$$$$$$"$$"         $$$                  */
  _DFSR = (*((volatile unsigned long *)(0xE000ED30)));      /*               $$$o          "$$""$$$$$$""""           o$$$                   */
                                                            /*                $$$$o                                o$$$"                    */
  /* Auxiliary Fault Status Register */                     /*                 "$$$$o      o$$$$$$o"$$$$o        o$$$$                      */
  _AFSR = (*((volatile unsigned long *)(0xE000ED3C)));      /*                   "$$$$$oo     ""$$$$o$$$$$o   o$$$$""                       */
                                                            /*                      ""$$$$$oooo  "$$$o$$$$$$$$$"""                          */
                                                            /*                         ""$$$$$$$oo $$$$$$$$$$                               */
  /* Read the Fault Address Registers. */                   /*                                 """"$$$$$$$$$$$                              */
  /* These may not contain valid values. */                 /*                                     $$$$$$$$$$$$                             */
  /* Check BFARVALID/MMARVALID to see */                    /*                                      $$$$$$$$$$"                             */
  /* if they are valid values */                            /*                                       "$$$""                                 */
  /* MemManage Fault Address Register */
  _MMAR = (*((volatile unsigned long *)(0xE000ED34)));
  /* Bus Fault Address Register */
  _BFAR = (*((volatile unsigned long *)(0xE000ED38)));

#if 0 /* experimental, seems not to work properly with GDB in KDS V3.2.0 */
#ifdef __GNUC__ /* might improve stack, see https://www.element14.com/community/message/199113/l/gdb-assisted-debugging-of-hard-faults#199113 */
  __asm volatile (
      "tst lr,#4     \n" /* check which stack pointer we are using */
      "ite eq        \n"
      "mrseq r0, msp \n" /* use MSP */
      "mrsne r0, psp \n" /* use PSP */
      "mov sp, r0    \n" /* set stack pointer so GDB shows proper stack frame */
  );
#endif
#endif
  __asm("BKPT #0\n") ; /* cause the debugger to stop */
  /*lint -restore */
}

/*
** ===================================================================
**     Method      :  HardFaultHandler (component HardFault)
**
**     Description :
**         Hard Fault Handler
**     Parameters  : None
**     Returns     : Nothing
** ===================================================================
*/
#pragma GCC diagnostic ignored "-Wunused-but-set-variable"
__attribute__((naked))
#if McuLib_CONFIG_SDK_VERSION_USED==McuLib_CONFIG_SDK_RPI_PICO
void isr_hardfault(void)
#elif McuLib_CONFIG_SDK_VERSION_USED != McuLib_CONFIG_SDK_PROCESSOR_EXPERT
void HardFault_Handler(void)
#else
void McuHardFault_HardFaultHandler(void)
#endif
{
  __asm volatile (
    ".syntax unified              \n"  /* needed for the 'adds r1,#2' below */
    " movs r0,#4                  \n"  /* load bit mask into R0 */
    " mov r1, lr                  \n"  /* load link register into R1 */
    " tst r0, r1                  \n"  /* compare with bitmask */
    " beq _MSP                    \n"  /* if bitmask is set: stack pointer is in PSP. Otherwise in MSP */
    " mrs r0, psp                 \n"  /* otherwise: stack pointer is in PSP */
    " b _GetPC                    \n"  /* go to part which loads the PC */
  "_MSP:                          \n"  /* stack pointer is in MSP register */
    " mrs r0, msp                 \n"  /* load stack pointer into R0 */
  "_GetPC:                        \n"  /* find out where the hard fault happened */
    " ldr r1,[r0,#24]             \n"  /* load program counter into R1. R1 contains address of the next instruction where the hard fault happened */
#if McuHardFault_CONFIG_SETTING_SEMIHOSTING
  /* The following code checks if the hard fault is caused by a semihosting BKPT instruction which is "BKPT 0xAB" (opcode: 0xBEAB)
     The idea is taken from the MCUXpresso IDE/SDK code, so credits and kudos to the MCUXpresso IDE team! */
    " ldrh r2,[r1]                \n"  /* load opcode causing the fault */
    " ldr r3,=0xBEAB              \n"  /* load constant 0xBEAB (BKPT 0xAB) into R3" */
    " cmp r2,r3                   \n"  /* is it the BKPT 0xAB? */
    " beq _SemihostReturn         \n"  /* if yes, return from semihosting */
    " b McuHardFault_HandlerC   \n"  /* if no, dump the register values and halt the system */
  "_SemihostReturn:               \n"  /* returning from semihosting fault */
    " adds r1,#2                  \n"  /* r1 points to the semihosting BKPT instruction. Adjust the PC to skip it (2 bytes) */
    " str r1,[r0,#24]             \n"  /* store back the adjusted PC value to the interrupt stack frame */
    " movs r1,#32                 \n"  /* need to pass back a return value to emulate a successful semihosting operation. 32 is an arbitrary value */
    " str r1,[r0,#0]              \n"  /* store the return value on the stack frame */
    " bx lr                       \n"  /* return from the exception handler back to the application */
#else
    " b McuHardFault_HandlerC   \n"  /* decode more information. R0 contains pointer to stack frame */
#endif
  );
}

This will store all the stacked registers into variables I can inspect:

Summary

With a custom hard fault handler in place, things get a lot easier to solve. So I’m adding that custom handler to my Processor Expert projects to find out what is causing the problem. The only small issue with above approach is that Processor Expert will overwrite my handlers/modifications in Cpu.c, if I do not disable code generation for it. That problem could be solved with a custom handler in the Processor Expert settings. If there is interest about how to do this: post a comment 🙂

Happy Faulting 🙂

34 thoughts on “Debugging Hard Faults on ARM Cortex-M”

Pingback: ARM Cortex-M0+ Interrupts and FreeRTOS | MCU on Eclipse
Gabriel on December 27, 2012 at 06:10 said:

Great write up! Thanks!

LikeLike

Reply ↓
Pingback: A Processor Expert Component to Help with Hard Faults | MCU on Eclipse
Pingback: A new Freedom Board: FRDM-KL05Z | MCU on Eclipse
pradyumna on July 26, 2013 at 10:15 said:

experiencing hard fault with KL05 I’m using Keil how do i solve it? i’m doing a bare board project

LikeLike

Reply ↓
- Erich Styger on July 26, 2013 at 10:28 said:
  
  Hi,
  I have added Keil support in the HardFault component on GitHub. At the writing of this article, it was not not done yet, but now it is present. Are you using the latest one?
  
  LikeLike
  
  Reply ↓
  - pradyumna on July 26, 2013 at 12:31 said:
    
    if you are talking about Keil i have the latest MDK otherwise i did not get you 😐
    
    how to add this hard fault component to my existing project ?
    
    LikeLike
    
    Reply ↓
    - Erich Styger on July 26, 2013 at 12:36 said:
      
      How to use Processor Expert with Keil: https://mcuoneclipse.com/2013/06/28/using-keil-%C2%B5vision-arm-mdk-with-processor-expert-driver-suite/
      How to add my additional components to Processor Expert: https://mcuoneclipse.com/2013/05/09/processor-expert-component-peupd-files-on-github/
      I hope this helps.
      
      LikeLike
  - pradyumna on July 26, 2013 at 12:48 said:
    
    thank you so much for your reply …
    will check n reply
    thank you
    
    LikeLike
    
    Reply ↓
  - pradyumna on July 26, 2013 at 15:23 said:
    
    i could import my Keil project into Eclipse, How to add this Hrd fault component whats the procedure to generate code into the existing api and try solving my existing hard fault
    
    thankyou
    
    LikeLike
    
    Reply ↓
Dan Quist on August 15, 2013 at 18:41 said:

This is great stuff, thank you!

LikeLike

Reply ↓
Pingback: Tutorial: DIY Kinetis SDK Project with Eclipse – Board Configuration | MCU on Eclipse
Sergi on November 4, 2014 at 10:10 said:

Hi Erich.
I have problems of Hard Fault with a KL04.
I use the kinesit Design Studio 1.1.1.
Using this post, the problem is in function “zero_fill_bss()”.

I don’t know how fix this problem.
Can you help me?
Thank’s.

LikeLike

Reply ↓
- Erich Styger on November 4, 2014 at 10:36 said:
  
  Hard do say, but I guess you have hit that heap size problem with the tool chain in KDS. See https://mcuoneclipse.com/2014/07/11/switching-arm-gnu-tool-chain-and-libraries-in-kinetis-design-studio/. I recommend that you go into the CPU properties > Build options and increase the heap size to say 0x400 (under Generate Linker file).
  I hope this helps,
  Erich
  
  LikeLike
  
  Reply ↓
  - Sergi on November 5, 2014 at 08:51 said:
    
    Hi Erich.
    I find the solution in your reply on Freesclae forum.
    
    Thank’s.
    Sergi
    
    LikeLike
    
    Reply ↓
    - Erich Styger on November 5, 2014 at 08:53 said:
      
      Ah, I did not think that it could be that memory/linker map issue. Sorry that I did not thought about this one 😦
      
      LikeLike
William Junqueira on March 4, 2015 at 17:47 said:

Hi Erich,

Im trying to follow your steps but im facing some problems that i can’t find out how to solve. I just copied the suggested code to be generated by processor expert for the function PE_ISR(Cpu_ivINT_Hard_Fault), but the compiler says the labels _MSP and _HALT are already defined in the file ccTv4L8N.s, which is not possible to find a source file in my project. Could you please help me to solve that issue? Btw, My codewarrior is 10.5.

Thanks in advance

LikeLike

Reply ↓
- Erich Styger on March 4, 2015 at 19:39 said:
  
  Hi William,
  no idea from where this file ccTv4L8N.s comes from. Can you check the linker map file if it provides any clue? or the console command line output?
  Other than that, I have a Processor Expert component for this too: https://mcuoneclipse.com/2012/12/28/a-processor-expert-component-to-help-with-hard-faults/
  
  LikeLike
  
  Reply ↓
  - William Junqueira on March 9, 2015 at 15:54 said:
    
    Hey Erich, I just solved this problem changing the option for the vectors handlers back to one handler for all. I don’t know whether this problem is going to happen again. If so, I’m taking your tips.
    
    Thank you!
    
    LikeLike
    
    Reply ↓
Mike on May 6, 2015 at 22:35 said:

Nice info. But my registers tab is blank, what could cause this? I looked at the registers tab before, no problems. Strange…

LikeLike

Reply ↓
- Mike on May 6, 2015 at 22:45 said:
  
  I should add I’m using KDS 2.0.0
  
  LikeLike
  
  Reply ↓
Mike on May 7, 2015 at 00:02 said:

I figured out the problem above (not seeing my registers). But now the problem I’m having is the inline assembly code that you put above is not being executed. The debugger just skips over it. I

LikeLike

Reply ↓
- Erich Styger on May 7, 2015 at 13:08 said:
  
  for asm() blocks, you need to perform assembly/instruction stepping.
  
  LikeLike
  
  Reply ↓
Pingback: Debugging ARM Cortex-M Hard Faults with GDB Custom Command | MCU on Eclipse
Phil on March 14, 2016 at 15:16 said:

Firstly, thanks for an excellent article. A great resource of information.

I’ve included the full custom hardfault handler but (as noted in the ‘Summary’), I’m now having this issue:

> The only small issue with above approach is that Processor Expert will overwrite my handlers/modifications in Cpu.c, if I do not disable code generation for it. That problem could be solved with a custom handler in the Processor Expert settings. If there is interest about how to do this: post a comment

How do we get around this?
I’ve tried changing the processor expert settings, but when I recreate the PE code the function of course reverts back to the default. I’ve tried altering the code but if I put a function call in to the ISR that will then affect the registers that I want to inspect.
Any guidance you can offer is gratefully received.

LikeLike

Reply ↓
- Erich Styger on March 16, 2016 at 12:42 said:
  
  Hi Phil,
  have you seen the Processor Expert component to help with hard faults (https://mcuoneclipse.com/2012/12/28/a-processor-expert-component-to-help-with-hard-faults/)? With this one it will automatically add the correct vector table entry.
  If you would like to get control over the files generated, you can disable code generation say for the CPU component, see https://mcuoneclipse.com/2012/03/23/disable-my-code-generation/.
  I hope this can help you?
  
  LikeLike
  
  Reply ↓
Pingback: ARM Cortex-M, Interrupts and FreeRTOS: Part 1 | MCU on Eclipse
Pingback: Debugging ARM Cortex-M0+ HardFaults | MCU on Eclipse
Pingback: Cortex-M – Debugging runtime memory corruption – LB9MG
Sudeep Chandrasekaran on October 11, 2018 at 10:56 said:

Hi Erich,

This write up is super helpful. I am not clear with the assembly code used to call the default handler . Especially the below four lines:

” movs r0,#4 \n”
” movs r1, lr \n”
” tst r0, r1 \n”
” beq _MSP \n”

From what I understand LR is checked for 3rd bit set(4 byte alignment). How does this tell us which stack(PSP/ MSP) we are using ?

Thanks

LikeLike

Reply ↓
- Erich Styger on October 11, 2018 at 11:09 said:
  
  TST does a bitwise AND operation, so 4 is bit #2. See table 1 in https://www.embeddedrelated.com/showarticle/912.php
  I hope this helps?
  
  LikeLike
  
  Reply ↓
Rudi Gerrits on October 12, 2018 at 09:15 said:

Hi, I see you are reading from offset 20 in the pushed stack frame. This is the pushed LR. Why don’t you read from offset 24, which is the pushed PC?

LikeLike

Reply ↓
- Erich Styger on October 12, 2018 at 20:16 said:
  
  You will need either the PC or the LR to find out where the exception did happen, depening how that function was called.
  
  LikeLike
  
  Reply ↓
Pingback: Tutorial: Adding FreeRTOS to where there is no FreeRTOS | MCU on Eclipse