# ARM SWO ITM Console Bidirectional Standard I/O Retargeting

The ARM Cortex M architecture has many features which are underused, probably simply because engineers are not aware of it. SWO (Single Wire Output) is a single trace pin of the ARM Cortex-M CoreSight debug block. trace pin uses the ITM (Instruction Trace Macrocell) on ARM Cortex. It provides a serial output channel, at a high speed higher than the usual UART, because it is clocked at half or a quarter of the core clock frequency, depending on the core and implementation.

As such, it is an ideal high speed output channel to send text or data to the host. This is how it is usually used, but what is unknown to many: it can be used in a bidirectional way with the help of the debugger.

The topic of this article: how to redirect standard I/O like printf() or scanf() using the SWO ITM console: means both sending *and* receiving data over the SWO debug channel: that way I can use it as a kind of UART with a single pin only.

## Outline

In this article I describe the needed infrastructure and software to use SWO both for output and input. Using SWO as output channel something I have used in multiple projects, but it is the first time I managed to get it working as input channel.

The reason I had to use it because we have a LoRaWAN project with the LPC55S16, and we wanted to use SWO as command line interface to the system.

While exploring the capabilities of SWO, I have implemented SWO standard I/O redirection: this allows things like printf() and even scanf() to use SWO. Sending out data over SWO is kind of easy, and an external tool like the J-Link SWOViewer can be used.

In this article I am using the NXP LPC55S16 and LPC55S69, together with the MCUXpresso IDE 11.7.0 with a LinkServer debug connection. Links to example projects and the implementation McuSWO.c can be found at the end of this article.

## Sending data to the host

The SWO pin is used to transmit data to the host, with the help of a debug probe:

It requires the SWO trace pin present on the debug header:

A common mistake is not routing that hardware pin to the debug header during hardware development.

The application then writes the characters to the ITM port register:

static inline bool SWO_WriteChar(char c, uint8_t portNo) {
volatile int timeout = 5000; /* arbitrary timeout value */
while (ITM->PORT[portNo].u32 == 0) {
/* Wait until STIMx is ready, then send data */
timeout--;
if (timeout==0) {
return false; /* not able to send */
}
}
ITM->PORT[portNo].u8 = c;
return true;
}

## Receiving data from the host

Now because by design the SWO stream is from the target to the host, we need to use a different route back to the target. For this, ARM has defined a special variable as part of the CMSIS-Core:

/* ##################################### Debug In/Output function ########################################### */
/**
\ingroup  CMSIS_Core_FunctionInterface
\defgroup CMSIS_core_DebugFunctions ITM Functions
\brief    Functions that access the ITM debug interface.
@{
*/

extern volatile int32_t ITM_RxBuffer;                              /*!< External variable to receive characters. */
#define                 ITM_RXBUFFER_EMPTY  ((int32_t)0x5AA55AA5U) /*!< Value identifying \ref ITM_RxBuffer is ready for next character. */

So this means that with the help of the debugger I can send data to the target using that variable:

With this, I can implement a receive function like this:

static inline int32_t SWO_ReceiveChar(void) {
int32_t ch = EOF; /* EOF, no character available */

if (ITM_RxBuffer != ITM_RXBUFFER_EMPTY) {
ch = ITM_RxBuffer;
ITM_RxBuffer = ITM_RXBUFFER_EMPTY; /* mark it ready for next character */
}
return ch;
}

## Simple Input and Output

With this I can implement a very simple input and output demo:

Note that in the actual version of the IDE the SWO ITM Console sends the data only after pressing the [Enter] key which is not ideal, but works once you know about it.

## Standard I/O re-targeting

Now having the basics in place, a next level could be to retarget the standard I/O character stream operations. Things like printf() and scanf() are using advantage of the SWO communication channel.

This is named ‘re-targeting’. For this low level routines like _read() and _write() in the standard library need to be overwritten with an implementation using SWO:

First, make sure that a Standard Library with no default low level hooks is used (see Which Embedded GCC Standard Library? newlib, newlib-nano, …):

💡 Note that SWO retargeting does not work with the proprietary NXP RedLib library.

The McuSWO.c module (see link at the end of the article) replaces the low level hooks with a version using SWO, for example for reading:

  int _read(int fd, char *buffer, int size) {
if(fd!=McuSWO_StdIn) { /* 0 is stdin */
return EOF; /* failed */
}
if (!SWO_Enabled(McuSWO_CONFIG_TERMINAL_CHANNEL)) {
return EOF; /* failed */
}
/* only read a single byte */
int32_t c;
do {
} while (c==EOF); /* blocking */
*buffer = c;
return 1; /* number of bytes read */
}

and writing:

int _write(int fd, char *buffer, unsigned int count) {
if(fd!=McuSWO_StdOut && fd!=McuSWO_StdErr) {
return EOF; /* failed */
}
if (!SWO_Enabled(McuSWO_CONFIG_TERMINAL_CHANNEL)) {
return EOF; /* failed */
}
SWO_WriteBuf(buffer, count, McuSWO_CONFIG_TERMINAL_CHANNEL);
return count; /* return the number of chars written */
}

Notice that this is about dealing with a character file device (notice the fd file descriptor). So in order to have things working properly, other file handling routines have to be retarget too. This is all managed by the McuSWO implementation.

That way things like

printf("Using printf(), putc and putchar with SWO\n");
putc('*', stdout);
putc('#', stderr);
putchar('!');
putchar('\n');

will work too.

## scanf(), fgets()

Now about how to read in text or a line. While using scanf() might be a logical way, actually would be a rather bad way because of possible buffer overflow and the need for dealing with line endings, see just one discussion about it on StackOverflow. But if you insist on that way, here is something which works more or less:

 static void ReadLine_scanf(unsigned char *buf, size_t bufSize) {
/* C standard library way with scanf(). Actually, you better do NOT use scanf() for this, use fgets() instead! */
int res;
char ch;

printf("scanf: Enter a single name/word and press ENTER:\n");
assert(bufSize>=20);
res = scanf("%20s%c", buf, &ch); /* note: ch for the newline */
printf("scanf: %s, res: %d, ch:%d\n", buf, res, ch);
}


As noted above in the comment, a much better way is to use fgets():

 static void ReadLine_fgets(unsigned char *buf, size_t bufSize) {
/* C standard library way with fgets() */
char *p;

printf("fgets: Enter a line and press ENTER:\n");
p = fgets((char*)buf, bufSize, stdin);
if (p!=NULL) {
printf("fgets: %s\n", buf);
} else {
printf("fgets FAILED\n");
}
}

## Footprint Consideration

Now you have seen how nicely you can use SWO with printf/scanf/fgets and the like. Well, it looks nice, but actually I recommend that you do not such standard library functions at all!

As always, using printf() and the likes, even with newlib-nano, has a high price. The reason is that the standard library internally allocates buffers and file handles which come with a huge overhead, including dynamic memory allocation. You easily can see this with the heap and stack view.

So this is what is used if no printf() and scanf() is used:

Using things like printf(), putc() and putchar() requires 1.43 kByte of heap:

Now add scanf() to the mix, and I end up using 2.44 Kbyte of RAM!

Needless to say that I do not recommend any of the Standard Library functions. It is possible, but up to you ;-).

What I’m using instead are the functions provided by the McuSWO module, they do not need any heap memory and are more efficient:

void McuSWO_PrintChar(char c, uint8_t portNo);
void McuSWO_StdIOSendChar(uint8_t ch);
bool McuSWO_StdIOKeyPressed(void);
void McuSWO_ReadLineBlocking(unsigned char *buf, size_t bufSize);
uint8_t McuSWO_ReadAppendLine(unsigned char *buf, size_t bufSize);
unsigned McuSWO_printf(const char *fmt, ...);

Using the above only adds around 1.2 KByte of code while the same thing with the Standard Library adds around 14 KByte of code, so more than 10x more.

## Open Points

While things are working well with the IDE 11.7.0 using the Linkserver debug connection with the LPC55Sxx, I was not able to get the input working with PEMICRO and SEGGER connections. Output of SWO data worked fine. I have the latest probe firmware available at this time, so this might be supported in a future version.

The other thing to be aware of: while SWO communication is going on and debugging (stepping, stopping), then SWO communication sometimes failed. Probably because the debugger is doing some heavy lifting in the background, so just to be aware of it.

## Summary

The beauty with SWO is that with the help of a single ping and the debugger I have a bidirectional communication line between the target and the host. Unlike semihosting, it is less intrusive, and without debugger attached it does not affect the application performance.

But using the Standard Library re-targeting, it comes with a price: I recommend to use the McuSWO module routines directly: they are more efficient, and include open argument list too.

If you have a spare UART or want to use RTT: Then go for it, as this is probably easier an the most direct way. But if only what you have is the SWO pin with a debugger, then SWO re-targeting is a solution.

You can find the full implementation with example projects on GitHub (see links below).

Happy redirecting 🙂

## 4 thoughts on “ARM SWO ITM Console Bidirectional Standard I/O Retargeting”

• good catch, thank you! Fixed now 🙂

Like

1. Nice feature to easily add an input channel (to target) with a bit of help from the debugger and definitely a great alternative to semihosting.
But a bit weird that Arm seems to advertise it as an SWO/ITM feature, while it is not more than a convention to name a variable in RAM, which is written via SWD/JTAG.

Like

• Yes, one of the many weird ARM things. My thinking is that they designed it (as it is in the hardware) as an output only channel. Only then realized with the user feedback that indeed it would be more useful if it would be bidirectional. So with that global variable it is indeed a kind of hack, and performance is not great anyway. So actually the Segger RTT way is the way to go in most cases 🙂

Like

This site uses Akismet to reduce spam. Learn how your comment data is processed.