“Anything that can go wrong, will go wrong”.
strikes again. Well, the modified version of it:
“Anything that can go wrong, will go wrong, but it will wait until it really, really goes wrong”.
It is always amazing to see that systems having a fundamental flaw, they can work for a long period. Only that on day X my application crashes. And when found the problem, I’m wondering how in the world it was *ever* working with that bug in it :-(.
In I had this kind of experience these days week: my ARM Cortex-M0+ application on the Freedom board was working fine for more than 2 months. Then I added some USB CDC support, along with other things, and then my application crashed with a hard fault :-(. Good: it crashed consistently. Bad: as soon as I tried to minimize the application, the problem disappears. And it happened only with higher ARM GNU gcc optimization levels. Thinking and debugging it, it was clear that it had something to do with interrupt timing and interrupt levels. I was even at the point that it could be silicon problem. It turned out, I was wrong. As (nearly) always: I created the problem. But it took an incredible time to show up. Amazing (and embarrassing).
Cortex-M and Interrupts
The search for the problem root cause was a good occasion to re-read all the notes about FreeRTOS and ARM Cortex interrupt handling. Notably
- An article on FreeRTOS.org how to debug Hard faults
- Another article on FreeRTOS.org about Interrupt levels for an RTOS on Cortex-M
The above two articles really had to sink in. And I realized that my FreeRTOS port was not using the right number of interrupt levels available: It assumed 4 interrupt bits (16 levels) which is fine for the ARM Cortex-M4 used in the Kinetis-K. But the Kinetis-L family (ARM Cortex-M0+) only has 2 bits which means 4 interrupt levels. So configPRIO_BITS
in FreeRTOSConfig.h was defined with the wrong number of bits :-(. This is fixed with an additional setting in the Processor Expert FreeRTOS component:
The ARM core/family selection has an impact on the available interrupt levels (Please read again the article about interrupt levels on FreeRTOS.org):
- Lowest Interrupt Priority: This value is informal only: it defines the lowest interrupt level supported by the core (3 for Kinetis-L ARM Cortex-M0+, 15 for Kinetis-K ARM Cortex-M4). Keep in mind that higher numbers mean *lower* priorities for ARM Cortex!
- Library Lowest Interrupt Priority: This is the priority of the RTOS itself (tick timer and performance tick timer). Usually this is as well the lowest priority of the system. That way any interrupt can interrupt the RTOS to avoid interrupt latency.
- Max SysCall Interrupt Priority: Interrupts with priority *higher* (numerically lower!) than this value shall *NOT* ❗ call any RTOS API routines.
So with the above settings, the RTOS will run at level 3 (lowest level), and any interrupts with level 3, 2 and 1 can call RTOS API routines, while the interrupts with level 0 (highest interrupt priority) shall *not* call any RTOS API routines.
Well, while all above are very good changes and fixes, they did not cause my above problem. It was something completely different….
Problem found
After debugging the thing for a very long time, I finally have found the reason in my FreeRTOS Cortex-M0+ port: a register was not correctly restored. The offending code is in port.c, function vOnCounterRestart()
at the end of the function: In case of optimizations set, the register R7 was restored instead of R3. To correct this, the highlighted lines below have been added:
__attribute__ ((naked)) void vOnCounterRestart(void) { ... #if __OPTIMIZE_SIZE__ || __OPTIMIZE__ __asm volatile ( " pop {r3,pc} \n" /* start exit sequence from interrupt: r3 and lr where pushed above */ ); #else __asm volatile ( " pop {r7,pc} \n" /* start exit sequence from interrupt: r7 and lr where pushed above */ ); #endif #endif }
With the wrong register restored, my application from time to time tried to do a function pointer call with the register R7 (which was zero by chance), and crashed with a hard fault. With that things in place, the crash obviously did not happen any more.
Conclusions
Even if something is wrong, things might still work. Only to bite me later on. So that FreeRTOS bug is fixed and the updated version (V1.201) is available here. So if anyone is running into strange problems with my FreeRTOS port and gcc optimizations enabled, it indeed could be my bug above. I feel bad about it, and it has cause me a lot of grief. Nobody is perfect…
Happy Cortexing 🙂
I setting CPU component to build with “Keil ARM C/C++ Compiler”, PE will generate a “portasm.s” that keil complaint can not compile. Keil could not recognize
/* comment */ ; #include “FreeRTOSConfig.h” .etc pseudo code.
any suggestion ?
Cai.
LikeLike
Hi Cai,
the short answer is: I have not tried it with Keil. I know the Keil assembler is different, so I would need to adapt the assembly code.
Maybe you could do this on your end and send me your changes, then I can integrate it.
Otherwise this is on my growing ‘to-do-list’ 🙂
LikeLike
I have download the demo “IAR_Freedom_FreeRTOS” for FRDM-KL25Z from github, but when I import to PE, it show the CPU have been config to “Cortex M4”.
OK , I fixed to M0+, and set “Library Lowest Interrupt Priority” = 3, “Max SysCall Interrupt Priority” = 1, but when I run it in IAR, it can’t run into MyTask(). after any trace, I found when run in vPortStartFirstTask(), the last statement is “svc 0” , but svc run in to unused interrupt handle, the code place in Cpu.c PE_ISR(Cpu_Interrupt), not the expacted vPortPendSVHandler() or vPortSVCHandler().
last I create a new project like your mention above, same problem.
Cai.
LikeLike
now I change back to FreeRTOS V1.203 and Utility V1.083, it work,
it’s glad.
Cai.
LikeLike
Hi cai,
sorry for my late response. I have not had time to look into this, but will in the next hours.
LikeLike
Hi Cai,
yes, that ARM core setting in the project was indeed wrong :-(. I have now fixed this in the project on GitHub.
But this uncovered another issue: interrupts were not enabled in vPortFirstTask() in portasm.s 😦
I fixed this now too and things are on GitHub commited.
Sorry about that.
LikeLike
did not need to sorry, you did a great job, I’ll try it later
LikeLike
Hi Cai,
thanks. I debugged just another interrupt problem (and solved it). That ARM interrupt controller is so complex that it is easy mess up with it :-(.The problem was that I had a case were interrupts were enabled during during startup. I fixed this in portmacro.h and this is now on GitHub too (not as *.PEupd file, but in the repository).
LikeLike
could you help me how to update code from GitHub source to my PE without *.PEupd ?
LikeLike
Hi Cai,
yes, have you seen the Wiki pages at
https://github.com/ErichStyger/mcuoneclipse/wiki/Getting-Started
?
If you are using Git/GitHub, you can automatically update all the components.
If this is just for the FreeRTOS component, then let me know.
Erich
LikeLike
Yes, the new version run smoothly now. ‘Cause I’m not be familiar with CDE, so I spend some time to learning how to deploy component from source(I don’t want to installing to much thing in PE), good job
LikeLike
found another problem, I clear all the component installed before, and deploy FreeRTOS and Utility from source. after that PE complain FreeRTOS property of “Utility” Unassigned interface, no mater create before or new, but I found there is a UTIL1 have been insert to project yet.
LikeLike
Hi Cai,
It looks like you might not have all the component files installed. If you do this by hand, it is very easy that you miss to copy files. I recommend that you try the full package instead.
LikeLike
yes OK now, I copy every thing from git, but there are too much thing in there that I don’t need. In any case, every thing running now, thanks.
LikeLike
I have not used it, but you can delete components from the Component Library view. But the additional components should not be a problem for you at all.
LikeLike