In “A Processor Expert Component to Help with Hard Faults” I’m using a C handler with some assembly code, created with Processor Expert, to help me with debugging hard faults on ARM Cortex-M. Inspired by a GNU gdb script here, I have now an alternative way. As this approach is using the GDB command line approach, it works both with an Eclipse GUI and with using GDB in command line mode only :-).
The idea is:
- Set a breakpoint in the hard fault exception handler
- When a hard fault occurs, the CPU will call the hard fault exception handler, and the debugger will stop the target
- Execute the ‘armex’ (ARM Exception) script/command in GDB to dump the stacked registers to show the program counter where the problem happened.
.gdbinit Script
There are several ways to extend GDB with own commands. One easy way is to add the extra functions into the .gdbinit scrip which is loaded by GDB on startup.
I have added the following to my .gdbinit file to define my ‘armex’ command:
define armex printf "EXEC_RETURN (LR):\n", info registers $lr if ($lr & (0x4 == 0x4)) printf "Uses MSP 0x%x return.\n", $MSP set $armex_base = $MSP else printf "Uses PSP 0x%x return.\n", $PSP set $armex_base = $PSP end printf "xPSR 0x%x\n", *($armex_base+28) printf "ReturnAddress 0x%x\n", *($armex_base+24) printf "LR (R14) 0x%x\n", *($armex_base+20) printf "R12 0x%x\n", *($armex_base+16) printf "R3 0x%x\n", *($armex_base+12) printf "R2 0x%x\n", *($armex_base+8) printf "R1 0x%x\n", *($armex_base+4) printf "R0 0x%x\n", *($armex_base) printf "Return instruction:\n" x/i *($armex_base+24) printf "LR instruction:\n" x/i *($armex_base+20) end document armex ARMv7 Exception entry behavior. xPSR, ReturnAddress, LR (R14), R12, R3, R2, R1, and R0 end
You can place the .gdbinit file anywhere. I have it placed where my gdb is located inside the Freescale Kinetis Design Studio (C:\Freescale\KDS_3.0.0\toolchain\bin).
To make sure GDB finds the .gdbinit, I specify the path to it in the Eclipse workspace preferences:
Debugging Hard Fault
To debug a hard fault, I set a breakpoint in my hard fault interrupt handler to stop the debugger when the fault happens:
To find out where the problem occurred, I use now the ‘armex’ command in the gdb console:
💡 Use the ‘triangle’ menu of the console to switch to the arm-none-eabi-gdb view
The armex command lists the stacked registers (same as with my handler shown in “Debugging Hard Faults on ARM Cortex-M“). The important information is either the return instruction or the LR instruction information. I can enter that address in the disassembly view to find out where the problem happened:
In the above example, the LR (Link Register or Return Address) was 0xbd2 (0xbd3 with the Thumb Bit set). In the disassembly view this is the address where the handler would return to, so the problem must be just before that. Checking the assembly code there is a branch register indirect
blx r3
The stacked register shows
R3 0x0
Which causes the hard fault. If the problem is not that clear, then simply set a breakpoint around that location and restart the application to debug what happens before the hardfault is triggered. With this, it should be hopefully easy to find and fix the problem.
Summary
I have now yet another way to debug my hard faults: using my custom gdb command to dump the stacked registers. The advantage of this approach is that it does not need any additional resources on the target (no extra handler in the code and no variables), compared to my earlier solution. And the added benefit is now that I know how to extend GDB with my custom commands :-).
Happy Faulting 🙂
Hello Eric
Try a hard fault like
void hard_fault(void)
{
}
and set a break point at the { (which is the return).
Then, when the fault occurs you step out of the routine (in disassember view) and you should be at the line of code that faulted with all register content as it was, showing bad pointers etc., without needing to dump any registers etc.
Regards
Mark
LikeLike
Hi Mark,
unless I’m really seeing the obvious thing: how are you able to step out of a hard fault handler? Yes, I can do this for normal interrupts, but not for a hard fault. So how do you recover from a hard fault?
LikeLike
Eric
A hard interrupt will simply return like any other interrupt, then it will repeat the error and exception back into the hard fault again. It then will return, exception etc. etc. until the watchdog fires.
This means that hard faults are usually very easy to debug because you simply disable the watchdog, and then, when it happens, it results in an endless loop (hard fault, hard fault exception, return, hard fault, hard fault exception, return, hard fault, hard fault except…) and you can then connect the debugger (without resetting), pause and see the instruction causing the problem. As long as the watchdog doesn’t cause a reset, a system can be started without the debugger and left to run (for as long as it takes) until such a fault occurs – then carefully connect the debugger without causing a reset, pause, see the problem and solve it. In some cases it is even possible to manually correct a pointer or such and let the system recover (until it maybe fails again later).
This works well with IAR as long as stepping out of the hard fault in disassemble mode (not in source code view since the debugger gets confused) and I think that KDS manages it too.
Regards
Mark
LikeLike
Additional comment – it is only the core error that is unrecoverable (eg. a hard fault with an invalid handler), which will result in a reset.
Regards
Mark
LikeLike
Pingback: Tutorial: Adafruit WS2812B NeoPixels with the Freescale FRDM-K64F Board – Part 5: DMA | MCU on Eclipse
Hi Erich Styger,
In my application Hard fault is happening before entering to main itself, In disassembly view i am getting the address but its not giving the C source?
Question is How we should debug startup and initialization code before main? can we put breakpoint before main?
LikeLike
So it crashes inside the library. You could use the program counter and check in the linker map file which function it is.
You would need to rebuild the library (which is not easy): https://mcuoneclipse.com/2014/08/23/gnu-libs-with-debug-information-rebuilding-the-gnu-arm-libraries/
But I think it is crashing for you in the malloc for the libraries. Have a look at this one:
https://mcuoneclipse.com/2014/03/16/freertos-malloc-and-sp-check-with-gnu-tools/
Maybe this helps?
And yes, you could put breakpoints before main, but if you don’t have debug information, then you need to do assembly stepping.
Erich
LikeLike
Thanks, I will go through it.
LikeLike
I’m trying to debug a hard fault on a KE06 MCU in KDS in Windows. Is there any chance that you could help with that? The PE Hard Fault component wants to be in the .text portion of memory, and my project doesn’t have that. When I define it, it doesn’t like the leading period.
I then tried the steps here, but I can’t create a .gdbinit because Windows also doesn’t like the preceeding period. The menus are all different in KDS, too.
These look like very helpful tips to help with hard faults, but sadly they are difficult to implement in KDS.
LikeLike
Hi Dan,
I’m using KDS (v3.x) too. And the screenshots in this posts are from KDS too, so not sure why your menus/etc are different?
Yes, you cannot rename a file to name with the dot at the beginning with the Windows Explorer :-(. Simply use the DOS shell (command prompt) for this.
As for “The PE Hard Fault component wants to be in the .text portion of memory, and my project doesn’t have that.”, I’m not sure what your problem is neither. I have created a Processor Expert project (no SDK!) for the KE06Z128, and I can add the HardFault component without any problems? And the component has no requirement on the text section or section name, as far as I can tell. What I’m missing?
I hope that this is of some help?
LikeLike
Pingback: Flashing many ARM Boards without a Host PC | MCU on Eclipse
Thanks for this information. There are a couple problems: gdb parses “$lr & 0x4 == 0x4” as “$lr & (0x4 == 0x4)” which tests the wrong condition. With the common EXC_RETURN value 0xfffffff9 this obscured the fact that when “$lr & 0x04” is non-zero the active stack is PSP, not MSP. The correct check to handle all return modes is:
if (0x04 == ($lr & 0x04))
set $armex_base = $psp
else
set $armex_base = $msp
end
LikeLike
hi pabigot,
indeed, thanks for pointing this out. I have used this approach mostly for bare metal applications, so did not see that bug. I have now updated the code, thank you!
Erich
LikeLike
Hi Erich,
I wrote a small write-up on a method which I didn’t see discussed anywhere:
https://www.element14.com/community/message/199113/l/gdb-assisted-debugging-of-hard-faults#199113
what do you think about this approach?
LikeLike
Hi Peter,
interesting approach! I quickly tried setting the SP based on MSP or PSP in a FreeRTOS thread, but somehow it did not show the stack for me. Not sure what was wrong. I have added your code sequence to my HardFault handler, but have it disable for now. I’m using the GDB which is in KDS V3.2.0 (GNU gdb (GNU Tools for ARM Embedded Processors) 7.6.0.20140731-cvs). Are you using a different version?
LikeLike
I’m not using the toolchain coming with KDS
my versions:
gdb 7.8.2
gcc 4.9.3
LikeLike
I extracted the KDS 3.2.0 installer and used the bundled gdb (7.6.0, as you said).
but it also displays the thread when the hardfault handler is active.
LikeLike
I did a build run using the gcc toolchain supplied with KDS 3.2
now the callstacks with the Segger Thread Awareness is not working anymore and a also I don’t see the callstack when I’m a intentional hardfault.
LikeLike
Interesting, so it depends on the gcc toolchain? Segger Thread Awareness works mostly (not always) for that combination.
LikeLike
I just compiled with the the gcc launchpad toolchain.
stack view also works. but only when I use gdb 7.8
the bundled 7.10 does not play well with my GNU ARM Eclipse Plugin or Segger GDBServer.
LikeLike
Arne from Segger shares my motivation on this topic.
the quality of embedded systems debugging is about to be improved again 🙂
http://forum.segger.com/index.php?page=Thread&postID=11390#post11390
LikeLike
That’s indeed good news! Hopefully this will be in beta drop/release soon 🙂
LikeLike
Hi, Erich! How do you create the .gdbinit file in Windows Explorer? What kind of file is this? Is it a .exe file?
LikeLike
Hi Macro,
.gdbinit is a text file with gdb commands and settings in it. You cannot create a file with a dot at the beginning with the Windows Explorer. Use the Windows command shell/prompt/cmd exe for this, e.g. create a dummy.txt and then rename it with ‘rename dummy.txt .gdbinit’
I hope this helps,
Erich
LikeLike
Thank you, Erich
I followed your advice and now my .gdbinit file is created. I still haven’t faced a Hard Fault condition to test your tool, but sometimes such a situation occurs to some customers and it is hard to find out what caused the Hard Fault condition. And this certainly will be a great tool to help me in those cases.
LikeLike
Hi Marco,
it depends a bit on the core you are using, but common things to trigger a hard fault are for example
*(int*)0x0 = 1;
e.g. writing to read only memory.
LikeLike
Hi, Erich
Thanks for your response. Such an instruction really causes a Hard Fault condition, but after typing the command “armex” in Console, I received the following:
armex
EXEC_RETURN (LR):
Value can’t be converted to integer.
lr 0xfffffff9 -7
Uses MSP 0x
I don’t know what could be causing such a problem. Can you throw some light on that, please?
I tested with a firmware in KDS 3.2 and FreeRTOS on a FRDM-K64F kit.
LikeLike
Hi Marco,
‘armex’ does not ring any bell for me. About which console are you talking about?
But after a hard fault your have to reboot the device.
LikeLike
Hi, Erich
The Console I am using is “[GDB PEMicro Interface Debugging] arm-none-eabi-gdb”. And the interface used is the on board P&E Open-SDA built on FRDM-K64F kit.
I just set a breakpoint at the beginning of “HardFaut_Handler”, run the code with that ilegal memory access and receive that error message I described in my previous message.
LikeLike
Hi Marco,
I suggest to give that special HardFault handler a chance: to me it solved nearly any hard fault problem because it pointed out the location where it happend.
LikeLike
Hi, Erich
I don’t know “armex” didn’t work to me.
I will try the other Hard Fault debugging solution you posted before. In that solution (Simple PC Handler), you suggest to replace the default code with:
__asm volatile (
” movs r0,#4 \n”
” movs r1, lr \n”
” tst r0, r1 \n”
” beq _MSP \n”
” mrs r0, psp \n”
” b _HALT \n”
“_MSP: \n”
” mrs r0, msp \n”
“_HALT: \n”
” ldr r1,[r0,#20] \n”
” bkpt #0 \n”
In my code (that uses KSDK 2 and FreeRTOS, but not Processor Expert), my Hard Fault Handler is already in assembly language and looks like this:
HardFault_Handler:
ldr r0,=HardFault_Handler
bx r0
.size HardFault_Handler, . – HardFault_Handler
.align 1
.thumb_func
.weak SVC_Handler
.type SVC_Handler, %functionSVC_Handler:
ldr r0,=SVC_Handler
bx r0
.size SVC_Handler, . – SVC_Handler
.align 1
.thumb_func
.weak PendSV_Handler
.type PendSV_Handler, %function
My question: Is it necessary to change something or add more code in addition to yours? Sorry for the question. I’m not very used to assembly language, specially in ARM MCU’s.
LikeLike
Hi Marco,
your handlers bascially do nothing. So you won’t get easily the information you would need (what is causing the hardfault). The handler discussed in this article does extra stuff, like getting the values from the stack and storing it in local variables so they can be easily inspected using the debugger. Of course if you are able to do that manually (looking at the stack and memory), you don’t need that, but this might not be an easy task for everything. So if you would like to see the information in the debugger, then you should use such a handler.
LikeLike
Hi, Erich
As I told you previously, “armex” didn’t work to me. So, the last chance I have is to try Extended Handler posted on https://mcuoneclipse.com/2012/11/24/debugging-hard-faults-on-arm-cortex-m/
But I have a problem. As my project does not use Processor Expert, HardFault Handler is in “startup_MK64F12.S” file and is in assembly language. Do you have a similar solution in assembly language?
Thanks!
LikeLike
Hi, Erich again
I just noticed that you left a comment above Extended Handler code that says: “This is called from the HardFault_HandlerAsm with a pointer the Fault stack
as the parameter”. Can you please explain me in more detail how should I do that call in Hard Fault_HandlerAsm? I suppose “HardFault_HandlerAsm” is the same function we find in ““startup_MK64F12.S” file, isn’t it?
Thanks!
LikeLike
Hi, Erich
I just replaced the code of HardFault Handler(ASM) with the one you suggested, just removing the quotation marks and put “HardFault_HandlerC” function in my main.c file. It worked perfectly!
It really is such a great HardFault Handler debugging tool. The best (maybe the only one) I’ve ever seen!
Thank you very much!
LikeLike
Great! For myself, these kind of things are maybe hard to manage the first time, but it is such a great learning experience (I hope it is the same for you). So keep going that way!
LikeLike
For those having problems with ARMEX, my version of the GDB (GNU Tools for Arm Embedded Processors 7-2017-q4-major 8.0.50.20171128-git ) uses $msp and $psp instead of the upper case $MSP and $PSP. Changing to lower case definitions of these fixes the armex command for me.
LikeLike
Thanks for letting us know. I agree that registers should not be case sensitive, but in that case I always can change them to what is accepted?
LikeLike
Yes, changing to all lower case works for me.
The error ” Value can’t be converted to integer.” people keep getting appears because p $MSP evaluates to void (apparently like any other unknown variable/register) which cannot be converted to integer.
p $msp evaluates to the correct value of the register, and if I send GDB “info registers” it prints all the registers with lower case names, so logically they should always be referred to in lower case?
LikeLiked by 1 person
Thanks for the additional explanation, this absolutely makes sense. Are you going to contribute that back to the gdb sources or have you already done that?
LikeLike
Thanks!
Its not an issue with the GDB sources I think, its with the .gdbinit you give, I needed to change all instances of $MSP, $PSP to $msp, $psp, e.g:
if ($lr & (0x4 == 0x4))
printf “Uses MSP 0x%x return.\n”, $MSP
set $armex_base = $MSP
Should be
if ($lr & (0x4 == 0x4))
printf “Uses MSP 0x%x return.\n”, $msp
set $armex_base = $msp
LikeLiked by 1 person
ah, I misunderstood when you wrote ‘my version of gdb’: I thought you changed the GDB sources to accept both lower and upper case. It is clear now 🙂
LikeLiked by 1 person
Hi Chrizlax & Erich,
Thanks ! It works. Changing all the $MSP and $PSP to lowercase does solve the “Value can’t be converted to integer.” issue.
Now, i have more information on the underlining issue. Could someone help me shed some light on this armex report ?
“PE_ISR(Unhandled_ivINT_Hard_Fault);
PE_ISR(Unhandled_ivINT_Hard_Fault)
{
PE_DEBUGHALT();
}”
armex
EXEC_RETURN (LR):
lr 0xffffffe9 4294967273
Uses MSP 0x2002fea8 return.
xPSR 0x5e4f
ReturnAddress 0xa
LR (R14) 0x0
R12 0x5e6c
R3 0x2002ffbe
R2 0x0
R1 0x2
R0 0x2002feb0
Return instruction:
0xa : movs r0, r0
LR instruction:
0x0 : movs r0, r0
Sincerely,
Vu
LikeLike
So your application has caused a hard fault (see https://mcuoneclipse.com/2012/11/24/debugging-hard-faults-on-arm-cortex-m/ and https://mcuoneclipse.com/2012/12/28/a-processor-expert-component-to-help-with-hard-faults/).
But your register dump does not give enough information to track down the issue (LR does not show usable information). You have to debug and step through your code to find out where it happens.
It could be an illegal access to memory or accessing a peripheral which is not clocked.
LikeLike
Hi Erich,
Thanks for your feedback. I captured more information using your “simple PC handler” along with the GDB custom command. I captured the issue and posted in the below thread.
https://community.nxp.com/message/1283392
Thanks,
Vu
LikeLike
Some how my post hasn’t show up in the forum yet. Probably needs approval from the moderator.
LikeLike
Do you mean on the NXP forum? Yes, posts up there need to be approved first.
LikeLike