When something goes wrong in an embedded system, a watchdog timer is the last line of defense against a blocked or malfunctioning system. A watchdog is a special timer which needs to be ‘kicked’ in a special way, otherwise the timer will run out and reset the system.
For example, a watchdog is an important safety feature in the E-Vehicle charging controller with Raspberry Pi Pico-W RP2040:

Outline
Watchdog (aka COP: Computer Operating Properly) timer are very useful, and still not used as much as they probably should. Most SDK come with some very basic examples, but do not explain or show the concept with a more complex system with tasks. If you are new to watchdogs, I highly recommend a read of “Great Watchdog Timers For Embedded Systems, by Jack Ganssle”.
There are for sure many different ways how to use a watchdog timer with tasks or threads. In my designs, I’m using the following concept, using FreeRTOS.
- A dedicated watchdog task in a watchdog module is monitoring the system health and periodically kicks the watchdog, according to the hardware watchdog settings (e.g. every second).
- The priority of the watchdog task depends on the system architecture and requirements.Usually I have it running at a lower priority. A low priority will starve the watchdog task if any higher priority task not monitored is getting too much of the CPU time. In FreeRTOS, the IDLE Hook could be used for this too.
- Each task, interrupt or other important entity of the system is reporting the ‘alive time’ to the watchdog module and task. You can think about this like ‘work time reports’, with every worker expecting a certain amount of work time and activity. The ‘alive’ time includes the effective work and the idle (suspended) time. So for a system time duration of 1000 ms, it is expected that the task reports a total of about 1000 ms.
- The watchdog task and module uses that report table as a ‘health’ checklist. If a ‘worker’ is not reporting back (deadlock?), reporting not enough working time (hold up in an unexpected way?) or is reporting too much overtime (task is doing too much work?), then the watchdog task is considering the system as ‘unhealthy’ and does not serve or kick the hardware watchdog timer, causing a reboot of the system.
That concept can be extended or augmented as needed. What is considered as ‘healthy’ for a system, always depends on the system itself.
How this works, I’ll show you in the next sections. You can find the full example project on GitHub. This project uses the internal watchdog timer of the Raspberry Pi RP2040 microcontroller with FreeRTOS, but the concept presented is applicable to other MCUs or operating systems.
McuWatchdog Module
The main work is done inside the McuWatchdog.h module. Below is the interface:
/*
* Copyright (c) 2023, Erich Styger
*
* SPDX-License-Identifier: BSD-3-Clause
*/
#ifndef SRC_MCUWATCHDOG_H_
#define SRC_MCUWATCHDOG_H_
#ifdef __cplusplus
extern "C" {
#endif
#include "McuWatchdog_config.h"
#if McuWatchdog_CONFIG_USE_WATCHDOG
#include "McuRTOS.h"
#include "app_platform.h"
#include <stdint.h>
#define McuWatchdog_REPORT_ID_CURR_TASK (McuWatchdog_REPORT_ID_NOF) /* special id to report time for the current task, which has been registered earlier with McuWatchdog_SetTaskHandle() */
/*!
* \brief Initialize the report structure
* @param id ID of the entry
* @param name Name for the entry
* @param msForOneSec Number of average milliseconds reporting time per second, usually 1000
* @param minPercent Minimum percentage of time reporting needed
* @param maxPercent Maximum percentage of allowed time reporting
*/
void McuWatchdog_InitReportEntry(McuWatchdog_ReportID_e id, const unsigned char *name, uint32_t msForOneSec, uint8_t minPercent, uint8_t maxPercent);
/*!
* \brief Used to start measuring a time for later reporting
* \return Number of RTOS ticks at the start, at the time of call.
*/
TickType_t McuWatchdog_ReportTimeStart(void);
/*!
* \brief Report the time spent, which has been recorded with McuWatchdog_ReportTimeStart()
* \param id Task ID
* \param startTickCount Tick count previously recorded with McuWatchdog_ReportTimeStart()
*/
void McuWatchdog_ReportTimeEnd(McuWatchdog_ReportID_e id, TickType_t startTickCount) ;
/*!
* \brief Delay a task with vTaskDelay for a given number of times, each time for ms, and report the delay.
* \param id Task ID
* \param ms Iteration delay time in milliseconds
* \param nof Number of delays
*/
void McuWatchdog_DelayAndReport(McuWatchdog_ReportID_e id, uint32_t nof, uint32_t ms);
/*!
* \brief Set the task handle for an id. With this we can report time using McuWatchdog_REPORT_ID_CURR_TASK
* \param id ID of item
* \param task FreeRTOS task handle
*/
void McuWatchdog_SetTaskHandle(McuWatchdog_ReportID_e id, TaskHandle_t task);
/*!
* \brief Suspend checking for a given id
* \param id ID of item to be suspended
*/
void McuWatchdog_SuspendCheck(McuWatchdog_ReportID_e id);
/*!
* \brief Resume checking for a given id
* \param id ID of item to be suspended
*/
void McuWatchdog_ResumeCheck(McuWatchdog_ReportID_e id);
/*!
* \brief Report the time spent for an item (id)
* \param id ID of item
* \param ms Time in milliseconds
*/
void McuWatchdog_Report(McuWatchdog_ReportID_e id, uint32_t ms);
/*!
* \brief Enable the watchdog timer. Do this early in the application.
*/
void McuWatchdog_EnableTimer(void);
/*!
* \brief Module de-initialization.
*/
void McuWatchdog_Deinit(void);
/*!
* \brief Module initialization. This creates the monitoring watchdog task.
*/
void McuWatchdog_Init(void);
#endif /* McuWatchdog_CONFIG_USE_WATCHDOG */
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* SRC_MCUWATCHDOG_H_ */
The module can be configured using the McuWatchdog_config.h. It is used to configure the hardware watchdog timer timeout, how often the system health shall be checked and the list of reporter items. An example of it is below:
/*
* Copyright (c) 2023, Erich Styger
*
* SPDX-License-Identifier: BSD-3-Clause
*/
#ifndef MCUWATCHDOG_CONFIG_H_
#define MCUWATCHDOG_CONFIG_H_
#include "app_platform.h"
#ifndef McuWatchdog_CONFIG_USE_WATCHDOG
#define McuWatchdog_CONFIG_USE_WATCHDOG (0)
#endif
#define McuWatchdog_CONFIG_HEALT_CHECK_TIME_SEC (5) /*!< interval for checking health */
#define McuWatchdog_CONFIG_TIMEOUT_MS (1000) /*!< number of ms for hardware watchdog timer */
#define McuWatchdog_CONFIG_DISABLED_FOR_DEBUG (0) /* set to 1 for easier debugging, set to 0 for production code! */
#define McuWatchdog_CONFIG_REPORT_TIME_VALUES (0) /* 1: report time values during safety check, useful for debugging */
/* list of IDs to identify items monitored by the watchdog task */
typedef enum McuWatchdog_ReportID_e {
McuWatchdog_REPORT_ID_TASK_APP,
#if PL_CONFIG_USE_GUI
McuWatchdog_REPORT_ID_TASK_GUI,
#endif
#if PL_CONFIG_USE_SHELL
McuWatchdog_REPORT_ID_TASK_SHELL,
#endif
#if PL_CONFIG_USE_LIGHTS
McuWatchdog_REPORT_ID_TASK_LIGHTS,
#endif
#if PL_CONFIG_USE_WIFI
McuWatchdog_REPORT_ID_TASK_WIFI,
#endif
McuWatchdog_REPORT_ID_NOF /* sentinel, must be last! */
} McuWatchdog_ReportID_e;
#endif /* MCUWATCHDOG_CONFIG_H_ */
The application shall enable the watchdog timer early during startup of the system with McuWatchdog_EnableTimer():
void McuWatchdog_EnableTimer(void) {
#if McuWatchdog_DISABLED_FOR_DEBUG
#warning "Watchdog is disabled"
#else
/* Enable the watchdog, requiring the watchdog to be updated or the chip will reboot
second arg is pause on debug which means the watchdog will pause when stepping through code */
watchdog_enable(McuWatchdog_CONFIG_TIMEOUT_MS, true); /* enable watchdog timer */
#endif
}
From this point on, the watchdog timer is ticking, so the rest of the initialization is covered by it.
The McuWatchdog_Init() is used to check if boot of the system is because of a watchdog or of a normal power-on-reset, and starts the watchdog task:
void McuWatchdog_Init(void) {
if (watchdog_caused_reboot()) {
McuLog_fatal("Rebooted by Watchdog");
} else {
McuLog_info("Clean boot");
}
for(int i=0; i<McuWatchdog_REPORT_ID_NOF; i++) {
McuWatchdog_Recordings[i].ms = 0;
McuWatchdog_Recordings[i].task = NULL;
}
if (xTaskCreate(WatchdogTask, "Watchdog", 900/sizeof(StackType_t), NULL, tskIDLE_PRIORITY+4, NULL) != pdPASS) {
McuLog_fatal("failed creating Watchdog task");
for(;;){} /* error */
}
}
The watchdog task uses a report table for everything reported, so the application has to provide a list of items to be monitored. Below a list of tasks, each supposed to report back of 1000 ms every second, within a boundary of 70 to 120%. The boundary is because the reporting can happen in different chunks of time.
void PL_InitWatchdogReportTable(void) {
McuWatchdog_InitReportEntry(McuWatchdog_REPORT_ID_TASK_APP, "App", 1000, 70, 120);
#if PL_CONFIG_USE_GUI
McuWatchdog_InitReportEntry(McuWatchdog_REPORT_ID_TASK_GUI, "Gui", 1000, 70, 120);
#endif
#if PL_CONFIG_USE_SHELL
McuWatchdog_InitReportEntry(McuWatchdog_REPORT_ID_TASK_SHELL, "Shell", 1000, 70, 120);
#endif
#if PL_CONFIG_USE_LIGHTS
McuWatchdog_InitReportEntry(McuWatchdog_REPORT_ID_TASK_LIGHTS, "Lights", 1000, 70, 120);
#endif
#if PL_CONFIG_USE_WIFI
McuWatchdog_InitReportEntry(McuWatchdog_REPORT_ID_TASK_WIFI, "WiFi", 1000, 70, 120);
#endif
}
Watchdog Task
The watchdog task itself is simple:
static void WatchdogTask(void *pv) {
uint32_t ms = 0;
McuLog_trace("started watchdog task");
for(;;) {
McuWatchdog_StateA();
vTaskDelay(pdMS_TO_TICKS(McuWatchdog_CONFIG_TIMEOUT_MS/4)); /* give back some CPU time. We are doing this here at a higher rate then the HW watchdog timer timeout */
ms += McuWatchdog_CONFIG_TIMEOUT_MS/4;
McuWatchdog_StateB();
if (ms>=McuWatchdog_CONFIG_HEALT_CHECK_TIME_SEC*1000) {
McuWatchdog_CheckHealth(); /* if not healthy, we will block here */
ms = 0;
}
}
}
It uses a clever state machine as proposed by Jack Ganssle:
/* extra safety checks, idea by Jack Ganssle, see "Great Watchdog Timers for Embedded Systems" */
static void McuWatchdog_a(void) {
if (McuWatchdog_State!=0x5555) {
McuLog_fatal("something wrong");
for(;;) {
__asm("nop");
}
}
McuWatchdog_State += 0x1111;
}
static void McuWatchdog_b(void) {
if (McuWatchdog_State!=0x8888) {
McuLog_fatal("something wrong");
for(;;) { /* getting here in case of run-away code?!? */
__asm("nop");
}
}
#if McuWatchdog_DISABLED_FOR_DEBUG
#warning "Watchdog is disabled!"
#else
watchdog_update();
#endif
if (McuWatchdog_State!=0x8888) {
McuLog_fatal("something wrong");
for(;;) { /* getting here in case of run-away code?!? */
__asm("nop");
}
}
McuWatchdog_State = 0; /* reset state */
}
static void McuWatchdog_StateA(void) {
McuWatchdog_State = 0x5555;
McuWatchdog_a();
}
static void McuWatchdog_StateB(void) {
McuWatchdog_State += 0x2222;
McuWatchdog_b(); /* here we kick the dog */
}
Reporting
There are different ways of reporting possible. The simplest one is using this in a task, if task work itself is only minor:
static void task(void *pv) {
for(;;) {
...
vTaskDelay(pdMS_TO_TICKS(100));
#if PL_CONFIG_USE_WATCHDOG
McuWatchdog_Report(McuWatchdog_REPORT_ID_TASK_LIGHTS, 100);
#endif
}
}
If waiting for longer time, then the reporting should be divided into into smaller chunks, e.g. 100 ms:
#if PL_CONFIG_USE_WATCHDOG
McuWatchdog_DelayAndReport(McuWatchdog_REPORT_ID_TASK_LIGHTS, 10, 100);
#else
vTaskDelay(pdMS_TO_TICKS(10*100));
#endif
If a task gets suspended, the reporting and monitoring can be suspended too:
void Lights_Suspend(void) {
currLights.isSupended = true;
#if PL_CONFIG_USE_WATCHDOG
McuWatchdog_SuspendCheck(McuWatchdog_REPORT_ID_TASK_LIGHTS);
#endif
vTaskSuspend(Lights_TaskHandle);
}
void Lights_Resume(void) {
vTaskResume(Lights_TaskHandle);
#if PL_CONFIG_USE_WATCHDOG
McuWatchdog_ResumeCheck(McuWatchdog_REPORT_ID_TASK_LIGHTS);
#endif
currLights.isSupended = false;
}
In case an operation in the task takes potentially longer, and the time is unknown, then the amount of time used can be calculated and reported like this:
McuLog_info("connecting to SSID '%s'...", wifi.ssid);
#if PL_CONFIG_USE_WATCHDOG
TickType_t tickCount = McuWatchdog_ReportTimeStart();
#endif
res = cyw43_arch_wifi_connect_timeout_ms(wifi.ssid, wifi.pass, CYW43_AUTH_WPA2_AES_PSK, 5000); /* can take 1000-3500 ms */
#if PL_CONFIG_USE_WATCHDOG
McuWatchdog_ReportTimeEnd(McuWatchdog_REPORT_ID_TASK_WIFI, tickCount);
#endif
Watchdog Monitoring
With this, the watchdog task can periodically check the report table. If an item is below or above the allowed reporting time, it will report a fatal error and won’t kick the watchdog to have it running out:
static void McuWatchdog_CheckHealth(void) {
uint32_t min, max;
#if McuWatchdog_REPORT_TIME_VALUES
ReportTime();
#endif
for(int i=0; i<McuWatchdog_REPORT_ID_NOF; i++) {
min = (McuWatchdog_CONFIG_HEALT_CHECK_TIME_SEC*reports[i].reportMsPerSec)*reports[i].minPercent/100;
max = (McuWatchdog_CONFIG_HEALT_CHECK_TIME_SEC*reports[i].reportMsPerSec)*reports[i].maxPercent/100;
taskENTER_CRITICAL();
if (McuWatchdog_Recordings[i].ms>=min && McuWatchdog_Recordings[i].ms<=max) {
McuWatchdog_Recordings[i].ms = 0; /* within boundaries, reset counter */
} else if (reports[i].isSuspended) {
McuLog_warn("%s is suspended", reports[i].name);
McuWatchdog_Recordings[i].ms = 0; /* reset counter */
} else {
uint8_t buf[48];
McuUtility_strcpy(buf, sizeof(buf), (unsigned char*)"WDT FAILURE: ");
McuUtility_strcat(buf, sizeof(buf), reports[i].name);
McuUtility_strcat(buf, sizeof(buf), (unsigned char*)" ms:");
McuUtility_strcatNum32u(buf, sizeof(buf), McuWatchdog_Recordings[i].ms);
McuUtility_strcat(buf, sizeof(buf), (unsigned char*)" min:");
McuUtility_strcatNum32u(buf, sizeof(buf), min);
McuUtility_strcat(buf, sizeof(buf), (unsigned char*)" max:");
McuUtility_strcatNum32u(buf, sizeof(buf), max);
McuUtility_strcat(buf, sizeof(buf), (unsigned char*)"\r\n");
McuLog_fatal(buf);
for(;;) {
__asm("nop"); /* wait for WDT to time out */
}
}
taskEXIT_CRITICAL();
}
}
Summary
A watchdog is very useful to automatically reset a system, if it is not healthy enough. In a system with an RTOS, one usable concept is to have dedicated watchdog task, monitoring the system health. Depending on the system, multiple aspects can be monitored. In this article I showed how to monitor the task activity, and in case a task uses less or more time than allowed, it will reset the system using the hardware watchdog timer. Implementing a system with a watchdog of course takes some efforts, but in my view it makes a system better and more reliable, if done correctly. I hope the McuWatchdog module gives you a starting point on your journey with embedded systems.
Happy resetting 🙂
Links
- Jack Ganssle: http://www.ganssle.com/watchdogs.htm
- Post about the EVCC project: Using MQTT with the Raspberry Pi Pico W and HomeAssistant for an Optimized Solar Energy Electrical Vehicle Charger
- Project on GitHub: https://github.com/ErichStyger/mcuoneclipse/tree/master/Examples/RaspberryPiPico/pico_Heidelberg
Such thing exists LwWDG: https://github.com/MaJerle/lwwdg
LikeLike
Thanks for sharing, and nicely documented.
LikeLike