Investigating ARM Cortex® M33 core – DSP Acceleration 2 (PowerQuad Matrix Engine Tutorial)

Last week I showed you how to use the Coprocessor interface of PowerQuad to calculate (mostly) unary functions. As an example the natural logarithm ln(x) takes just one operand, whilst the floating divide in PowerQuad requires two operands (x1)/(x2). PowerQuad is very efficient accelerating these functions, requiring just 6 clock cycles for the ln(x) and 6 clock cycles for the float (x1)/(x2). In comparison the single-precision floating point unit in Cortex® M4F and M33F requires 13 clock cycles to perform the same float divide.

But there are two ‘sides’ to the PowerQuad:

  • The Coprocessor interface, using ARMv8-M coprocessor instructions;
  • The AHB bus interface, where we address PowerQuad as a peripheral.

So this week… operating the PowerQuad as a peripheral. I’ll show you how to use the PowerQuad SDK driver in MCUXpresso in a new project, and use the Matrix Engine in the PowerQuad to solve simultaneous equations.

Highlighting the AHB bus interface to PowerQuad

The downside with the coprocessor interface is that we have a single-cycle, 64-bit pipe to the PowerQuad. That is great for sending a simple PowerQuad opcode (such as ‘Perform sin(x)’) and the operand x, but not efficient for matrix, transform or correlation algorithms that require data with more complex parameters. It would be time-consuming to send the twiddle factors and 512-points for a Fast Fourier Transform via the coprocessor interface and into the PowerQuad.

Instead, the peripheral interface of the PowerQuad presents a simple, orthogonal set of registers that permit us to send data pointers to the PowerQuad. For example, when using the Matrix Engine we can pass pointers to the BASE base address for:

  • the input matrices A and B,
  • the result matrix X and,
  • scratch RAM that the engine can use for intermediate results.

PowerQuad needs to know the format for the data structures, and so the four registers are supplemented by FORMAT registers. It looks like this:

Part of PowerQuad peripheral registers, showing the four BASE and FORMAT registers

Next, PowerQuad must be programmed with the Engine that we want to use (Matrix, Cordic etc) and the specific function (Matrix Inverse, Matrix Multiply). These are written to the CONTROL register and in our MCUXpresso IDE example we use the PQ_SetFormat() function and a direct write to the CONTROL register:

Matrix Inversion example, showing steps necessary to program PowerQuad

The LENGTH register takes a parameter that defines the number of datapoints to process (in the case of the Transform engine) or the dimensions of the matrices (in the case of the Matrix engine).

Well, no more theory. In the tutorial video this week, I create a new MCUXpresso IDE project based on the SDK v2.6.3 for lpcxpresso55S69, add the PowerQuad driver and use it to solve a simultaneous force equation:

Pendant lamp in my office – Forces equation

The video shows how to use the coprocessor interface to calculate the sin(x) and cos(x) angles matrix in radians, and then how to configure the PowerQuad to calculate the inverse of this matrix using the Matrix Engine. Once we have the inverse matrix, the forces matrix is calculated by the matrix multiplication function. Everything is shown from first principles and you’ll get a thorough understanding of PowerQuad.

How many clock cycles for the inverse and multiplications?? You’ll have to watch the video to find out… it’s here.

PowerQuad Matrix Engine tutorial

If you enjoy this video, then please subscribe to my channel on YouTube where you will find many more video tutorials using LPC55S69, LPC55S69-EVK and MCUXpresso IDE. Please take the time to send me your comments. I have a further 4 videos to write and will be happy to hear your ideas about what to include.

3 thoughts on “Investigating ARM Cortex® M33 core – DSP Acceleration 2 (PowerQuad Matrix Engine Tutorial)

  1. Looks like a really powerful addition, but to make it immediately usable…
    Is there a nice C++ math library which wraps access to PowerQuad and optimized common functions?
    What about call-compatible replacements for CMSIS functions?
    Thanks for the articles!
    Best Regards, Dave


    • Hi Dave,
      call-compatible replacements for CMSIS functions are available inside SDK library. There is no C++ math library supporting PowerQuad available as far as I know.
      Best Regards, Petr


  2. Thanks Dave, thanks Petr.
    Yes, all of the CMSIS-DSP functions have been realised in a PowerQuad API. You might look at the Driver example in the lpcxpresso55s69 SDK v2.6.3 – see powerquad_cmsis project. So where the CMSIS-DSP library has arm_rfft_q31 (for example) the powerquad_cmsis example implements arm_rfft_q31 also, but passes the calculation to the Transform engine in the PowerQuad.
    Similarly, CMSIS-DSP has arm_sin_q15, q31 and _f32. powerquad_cmsis example shows arm_sin_q15, q31, f32 passed into the PowerQuad… and 8 clock cycles!


What do you think?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.