Cycle Counting on ARM Cortex-M with DWT

Some ARM Cortex-M have a DWT (Data Watchpoint and Trace) unit implemented, and it has a nice feature in that unit which counts the execution cycles. The DWT is usually implemented on most Cortex-M3, M4 and M7 devices, including e.g. the NXP Kinetis or LPC devices.

Outline

Execution profiling tools like the SEGGER SystemView are using it to measure the time used for code execution. This post is about how to use it directly from the application code or to enable cycle counting and inspect it during debugging.

Registers and Access Functions

The DWT is usually implemented in Cortex-M3 or higher, but not on Cortex-M0(+). To use the feature, I need to have access to several debug registers. You might use CMSIS-Core header files for this, but as there are very few registers in case CMSIS-Core is not used, here are the needed defines I’m going to use:

  /* DWT (Data Watchpoint and Trace) registers, only exists on ARM Cortex with a DWT unit */
  #define KIN1_DWT_CONTROL             (*((volatile uint32_t*)0xE0001000))
    /*!< DWT Control register */
  #define KIN1_DWT_CYCCNTENA_BIT       (1UL<<0)
    /*!< CYCCNTENA bit in DWT_CONTROL register */
  #define KIN1_DWT_CYCCNT              (*((volatile uint32_t*)0xE0001004))
    /*!< DWT Cycle Counter register */
  #define KIN1_DEMCR                   (*((volatile uint32_t*)0xE000EDFC))
    /*!< DEMCR: Debug Exception and Monitor Control Register */
  #define KIN1_TRCENA_BIT              (1UL<<24)
    /*!< Trace enable bit in DEMCR register */

To use the registers, I have defined a set of ‘function like’ macros I can use in my application code:

#define KIN1_InitCycleCounter() \
  KIN1_DEMCR |= KIN1_TRCENA_BIT
  /*!< TRCENA: Enable trace and debug block DEMCR (Debug Exception and Monitor Control Register */

#define KIN1_ResetCycleCounter() \
  KIN1_DWT_CYCCNT = 0
  /*!< Reset cycle counter */

#define KIN1_EnableCycleCounter() \
  KIN1_DWT_CONTROL |= KIN1_DWT_CYCCNTENA_BIT
  /*!< Enable cycle counter */

#define KIN1_DisableCycleCounter() \
  KIN1_DWT_CONTROL &= ~KIN1_DWT_CYCCNTENA_BIT
  /*!< Disable cycle counter */

#define KIN1_GetCycleCounter() \
  KIN1_DWT_CYCCNT
  /*!< Read cycle counter register */

Typical Usage

To use the cycle counting feature, the DWT has to be configured and enabled. If you are connecting to the target with a debugger, then this is usually already enabled by the debugger. To make it work with no debug session active, I have to initialize it in the code first.

uint32_t cycles; /* number of cycles */

KIN1_InitCycleCounter(); /* enable DWT hardware */
KIN1_ResetCycleCounter(); /* reset cycle counter */
KIN1_EnableCycleCounter(); /* start counting */
foo(); /* call function and count cycles */
cycles = KIN1_GetCycleCounter(); /* get cycle counter */
KIN1_DisableCycleCounter(); /* disable counting if not used any more */

Cycle Counter with Debugger

To monitor the cycle counter during a debug session is easy: add the following expression tothe ‘Expressions’ view:

(*((volatile uint32_t*)0xE0001004))

With this, it shows the current cycle counter:

Cycle Counter in Expressions View

Cycle Counter in Expressions View

Processor Expert Component

To make it even easier to use, I have extended the KinetisTools component with the needed macros and functions. This component will be available with the next release:

Cycle Counting Functions

Cycle Counting Functions

Summary

If your ARM Cortex-M has a DWT, you can use the cycle counter to measure the cycles spent executing code. That could be used for delay loops or to measure execution time.

Happy Cycling 🙂

Links

Advertisements

8 thoughts on “Cycle Counting on ARM Cortex-M with DWT

  1. Hi Erich,
    Just what I was looking for! Well, I think. I need a high speed time stamp mechanism (+/-50uS roughly or better) to measure several event times in an application and I found your discussion here from a Google search (I read your stuff almost daily). I am trying to determine timing relationships between an incoming GPIO signal (which generates an interrupt) compared to inbound/outbound serial data. I wrote a logging function to log this information which I can dump and inspect. via a command line interface. I need to implement this across four K64F targets.
    My question is how do I compute the time from the DWT_CYCCNT? I am guessing this is running at CPU speed but is it incrementing every clock tick or every machine cycle? If machine cycle, then how to translate to time? My target is a Kinetis K64F Cortex-M4.

    As always thank you!

    Like

    • Hi Mike,
      yes, the DWT_CYCCNT runs at the core/system clock speed (SystemCoreClock in CMSIS-Core terms). It is good as a time stamp to measure code execution time in an accurate way. But that time will affected by the overhead to read the DWT_CYCCNT register and what the compiler or pipelines/caches are doing, so might not be always the same depending on pipeline/cache. There is a good article about this here: http://www.carminenoviello.com/2015/09/04/precisely-measure-microseconds-stm32/. But +/- 50us should be really doable. Other than that, if you have a high speed timer available, you can but you need to run that timer say with a 5us clocking (without interrupts, of course!), then read that timer counter register for your measurement.

      Liked by 1 person

  2. Pingback: What is “Realtime Debugging”? | MCU on Eclipse

  3. Pingback: McuOnEclipse Components: 12-Mar-2017 Release | MCU on Eclipse

  4. Pingback: ARM SWO Performance Counters | MCU on Eclipse

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s