Spilling the Beans: volatile Qualifier

Posted on October 12, 2021 by Erich Styger

It is interesting to see that some aspects (mostly unintended) can stimulate lots of good and fruitful discussions. So this happened with “Spilling the Beans: Endless Loops” (recommended to read 🙂 where using (or not using) volatile for inline assembly created thoughts which warrant an article on that subject.

The volatile qualifier in C/C++ is misunderstood by many programmers, or wrongly used.

Still, ‘volatile’ is very useful if you know what it means for the compiler and what is good use of it.

Code like the one below is simply fully wrong, only trying to hack around the real problem (lack of re-entrancy). So if you see something like this, you better don’t use that code, because that coder clearly did things the wrong way :-(.

In essence, the volatile qualifier marks an object or variable as ‘it can change outside of what the compiler might assume‘, both for read and write operations. And it should be only used for hardware registers.

Assuming the following (illustrative, non-hardware) example:

int var;

Whenever this variable is read, the next read operation might return something different.

Consider the following (simplified) example:

var = 5;
var++;

An optimizing compiler could combine the two operations into a single ‘var= 6;’ because previously it had stored the value of 5. If that variable would be marked with ‘volatile’

volatile int var;

then the compiler has to assume that the write might have side effects (changing other variables) or that a read of ‘var’ will not return what has been previously stored.

Now many developers wrongly reduce ‘volatile’ to ‘prevents compiler optimizations’ which is not the full story: yes, the net effect is kind of like that, but that not the full story.

So the compiler has to do read and writes whenever that variable is accessed, because things might have changed. Nothing more and nothing less. I highly recommend that you read “Nine ways to break your systems code using volatile” by John Regehr.

So don’t think you can ‘control’ the code beyond of telling the compiler that he needs to generate extra read and writes.

What are legitimate uses of volatile? The above example with ‘var’ for sure is not (was for illustrative usage only!).

A good usage of volatile is for hardware registers. Below is a simplified case for an Analog-to-Digital converter hardware which is memory mapped:

typedef struct {
  volatile uint32_t CTRL; /* ADC control register */
  volatile uint32_t VAL;  /* ADC result register */
} ADC;

The registers are marked as ‘volatile’, because the reading or writing to the control register will start or stop a conversion (has a hardware side effect), the register content might change anytime by the hardware, or some parts or bits of it are writable only and cannot be read. Similar to the VAL register which contains the conversion result: it can be changed anytime. So here the ‘volatile’ is appropriate because it warns the compiler about side effects and that the compiler cannot make any assumptions about read and write accesses. You still cannot make assumptions about how the read/write accesses are made (single 32bit? two 16bit accesses? 4 8-bit accesses? order of it?). Here only assembly code will be to the rescue.

And if you have seen volatile for inline assembly code:

__asm volatile("nop");

This is not wrong, but strictly speaking not needed, as the compiler shall not touch/change the assembly instruction. If you need it, then it is probably a sign of a compiler bug.

So in my view volatile should be used for hardware registers only, or in most cases. One use case where it is OK is something like this:

static volatile bool done = false;

void UART_Completed_Interrupt(void) {
  ....
  done = true;
}

void foo(void) {
  done = false;
  UART_SendString("hello world!"); /* shall trigger UART_Completed_Interrupt()! */
  while(!done) {
    /* wait, should add a timeout here! */
  }
}

The above assumes that read and write to ‘done’ is atomic, and that it is used just in the above instances and not somewhere else. While the above ‘works’, I rather would use a semaphore or other signalling.

There is another legitimate use case where ‘volatile’ has to be used to work-around a compiler bug, and where it is not appropriate to rewrite the code. In that case the ‘volatile’ could cause the compiler to skip (wrong) optimizations which in that (very rare) case is appropriate in my view.

I recommend that you search in your code base for the ‘volatile’ qualifier. You might be surprised in how many places it is used: if it is not used for hardware registers, it is very likely wrong.

That example below I showed earlier is such a case:

/* receive state structure */
typedef struct _debug_console_write_ring_buffer
{
    uint32_t ringBufferSize;
    volatile uint32_t ringHead;
    volatile uint32_t ringTail;
    uint8_t ringBuffer[DEBUG_CONSOLE_TRANSMIT_BUFFER_LEN];
} debug_console_write_ring_buffer_t;

The underlying problem is that the code using this ring buffer is not re-entrant and is not using the correct critical sections to make it re-entrant. Adding the ‘volatile’ hides the problem only: it makes the code ‘working’ in ‘most cases’ only, because it reduces the changes that things can go wrong: but again: it is simply wrong. Now: if you have not read “Nine ways to break your systems code using volatile” yet, that would be a good time. 🙂

Again: I recommend that you have a look at the good previous discussion in “Spilling the Beans: Endless Loops“.

Happy volatiling 🙂

37 thoughts on “Spilling the Beans: volatile Qualifier”

JohnCoppola on October 12, 2021 at 08:35 said:

Good point about the reentrancy problem.

What about memory which is affected by DMA? Surely by “hardware register” you mean any memory mapped location that can be altered by external events?

Also, very interesting to read about the C on PDP-11… I wrote for PDP 11/73’s but only in assembler! those were the days! 🙂

LikeLike

Reply ↓
Yasen on October 12, 2021 at 09:02 said:

Thank you for this nice article (as all yours about embedded programming .. keep writing).
But I think that using volatile ONLY for registers is too conservative statement in bare-metal embedded systems. And John Regehr’s example (realy nice article) with interrupt software flag is what I mean – a very common situation in practice (that caused me headaches several times). So, especially this situation, must be kept always in mind.
*****
int done;

__attribute((signal)) void __vector_4 (void) {
done = 1;
}

void wait_for_done (void) {
while (!done) ;
}
****

LikeLiked by 1 person

Reply ↓
- Erich Styger on October 12, 2021 at 18:43 said:
  
  Hi Yasen,
  thank you, and yes, this has been noted: indeed it is too restrictive, and in this case it would be fine. I think I need to add this to the article. Still I prefer a semaphore or similar so I don’t have to block, but this usually requires a runtime environment like an RTOS which is pretty standard for many embedded applications too, at least for the ‘medium’ or ‘larger’ ones.
  
  LikeLike
  
  Reply ↓
mark embeddedpro on October 12, 2021 at 10:50 said:

More great information, Erich, thankyou.
My “rule of thumb” or my default assumption for volatile is this:

If I’m writing a driver, then I will need volatile as some places in my code.
If I’m writing control code that uses the driver, then I don’t need to use volatile in my code.

Your next article is surely ‘static’ 🙂

LikeLiked by 1 person

Reply ↓
- Erich Styger on October 12, 2021 at 18:40 said:
  
  Hi Mark,
  no, I did not consider ‘static’, but you are right: I should add this to my list :-).
  
  LikeLike
  
  Reply ↓
Paul Abbott on October 12, 2021 at 17:05 said:

I’m enjoying these daily refreshers on the basics.

What about this common construct that I’ve been using forever?!

volatile int done_flag;

void ISR_TIMER (void) {
done_flag = 1;
}

void wait_for_done (void) {
while (!done_flag) ;
// do something at regular interval
}

It’s also mentioned in the article you linked as a valid use in addition to hardware register access.
– “The volatile qualifier forces stores to go to memory and loads to come from memory, giving us a way to ensure visibility across multiple computations (threads, interrupts, coroutines, or whatever).”

LikeLiked by 1 person

Reply ↓
- Erich Styger on October 12, 2021 at 18:39 said:
  
  Hi Paul,
  yes, agreed, that’s for me one of the few cases where it is valid. However, one needs to keep in mind that it only works properly if only one is writing it, and only one is reading it, and if read and write operations are atomic.
  In general, I avoid such ‘interrupt polling flag’ as this is wasting CPU cycles. I rather use a synchronization with a semaphore or similar so I don’t have to wait.
  
  LikeLike
  
  Reply ↓
- juanpm on October 12, 2021 at 22:39 said:
  
  I remember that in the Linux kernel volatile is almost strictly forbidden.
  
  If I recall correctly the argument was that it wasn’t variables what should be marked volatile, but particular reads or writes to a memory address, whose behaviour can be made more specific than just “volatile” (e.g. hardware specific caching behaviour, etc.) . And in some cases, volatile-like and non volatile accesses may make sense to be mixed.
  
  I found this on a quick search: https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html
  
  LikeLiked by 1 person
  
  Reply ↓
  - Erich Styger on October 13, 2021 at 20:21 said:
    
    Thanks for that article link, really interesting. And it confirms for me that ‘volatile’ is overused and used in wrong places, just to ‘fix’ things in a in-proper way. For dealing with hardware caching it would be the wrong way too: there ‘sync’ barriers or ‘pipeline flush’ instructions are my choice solving these problems.
    
    LikeLike
    
    Reply ↓
Myself on October 13, 2021 at 12:47 said:

Why do you think you need re-entrancy when you work with ring buffers?
If you have multi-write point, you need to guard anyway, same for read.

For me, its not the best example.

Where I see a bigger problem is for example multiplication)

int32_t
square(volatile int16_t* in) {
return (*in) * (*in);
}

Here result may not really be square.

LikeLiked by 1 person

Reply ↓
- Erich Styger on October 13, 2021 at 13:55 said:
  
  >>Why do you think you need re-entrancy when you work with ring buffers?
  
  A ring buffer usually involves two things: where to enter data and and where to take it out. Together with the data the form the consistency of the ring buffer. ‘concurrent’ access to the Ring Buffer needs to be managed in a re-entrant way, as operations on the ring buffer are usually not atomic.
  For ‘simpler’ producer/consumer problems, there are possible lock free versions, for example https://andrea.lattuada.me/blog/2019/the-design-and-implementation-of-a-lock-free-ring-buffer-with-contiguous-reservations.html
  
  LikeLiked by 1 person
  
  Reply ↓
- Erich Styger on October 13, 2021 at 13:58 said:
  
  About your ‘square()’ example: every code which is using non-private date (possible the pointers in your case) and with the fact that a running program can be interrupted by something like an interrupt or thread, changing that data, is subject of a possible re-entrancy problem.
  
  LikeLike
  
  Reply ↓
JohnCoppola on October 14, 2021 at 05:29 said:

The article on lock free ring buffers is very interesting.

As for the square problem, It should be:
x=*in;
return x*x;

which will always be square!

LikeLiked by 1 person

Reply ↓
- Erich Styger on October 14, 2021 at 05:43 said:
  
  It still would be a reentrancy problem if *in is not atomic and something else is changing the value in between. For example on a 8bit processor that access to the 16 bit variable might be split up in two 8bit accesses. And volatile would not help here as well.
  
  LikeLike
  
  Reply ↓
  - JohnCoppola on October 14, 2021 at 08:15 said:
    
    Yeah, point taken, I just don’t consider 8 bit micros anymore. The same point, but in reverse, applies to the NOP.
    
    LikeLiked by 1 person
    
    Reply ↓
    - Erich Styger on October 14, 2021 at 08:31 said:
      
      The same problem arises with 64bit data types on a 32bit MCU. So you always need to know if something is atomic or not.
      
      LikeLike
Bob Paddock on October 15, 2021 at 15:43 said:

Here is another article to add to the reading list, and it is interesting that it came out about the same day this discussion started (Something in the Aehter?):

“Preventing an optimising compiler from removing or reordering your code.”

https://theunixzoo.co.uk/blog/2021-10-14-preventing-optimisations.html

A separate issue where volatile can appear is ‘Code Motion’. The above discusses it in terms of C++ rather than the embedded level. John did address it to some degree in his article.

When you see “The compiler is broken. My code works when I turn optimization off, my code runs fine.” in the messages boards you can be reasonably sure (I have found real compiler bugs in the early days of GCC-AVR) that there is either a missing volatile or an issue with code motion reordering things unexpectedly.

In the foo() example I have never cared for the whole set/clear flag methodology. Consider a timer IRQ setting a ‘done’ flag. With the main loop clearing the flag, here can be race conditions setup such that IRQs can be missed. Because the timer IRQs are periodic, the fact that timer IRQs are being missed can be hard to notice. The error would show up as a long term time drift rather than an overt bug.

A simple approach to the problem, if not RTOS is around with semaphores etc., is to increment a volatile event variable in the IRQ. Then in the main loop compare it with a last saved copy to see if the saved copy and the event variable are now different. This prevents the whole done set/clear race between the IRQ and the main loop. The event variable needs to be atomic sized for the processor at hand to not introduce other subtle bugs. There are also issues of IRQ rates vs main loop rates that need to be considered etc.

LikeLiked by 1 person

Reply ↓
Alan Hawse on October 17, 2021 at 20:13 said:

The compiler optimization level can make these bugs hard to find. The answer to many of these problems is RTOS. Mostly I have decided that I only write RTOS programs…. that being said these multi cpu mcus can inject all-kinda-brain-damage.

LikeLiked by 1 person

Reply ↓
- Erich Styger on October 17, 2021 at 20:42 said:
  
  Yes, that’s usually what I do. Using semaphore or other RTOS notification mechanism might seem like an overhead, but this all makes the ‘always works’ from a ‘might work with volatile’ thing. Again, volatile only should be used where it is the right thing, for example hardware registers.
  
  LikeLike
  
  Reply ↓
Pozz on October 22, 2021 at 11:53 said:

Maybe it’s a stupid question, but I couldn’t understand why the first piece of code (from fsl_debug_console.c) in the article is wrong.

It’s just a struct definition of a ring buffer. Because the presence of volatile attribute on ringHead and ringTail, I suspect they can be changed in interrupt context and can be used in the mainline code too.

Most probably the mainline code copy new data to transmit in the buffer at ringHead position, incrementing it; ISR get new data from ringTail and increment it. Moreover ISR checks ringTail==ringHead to detect buffer empty condition.

What’s wrong with this? I think volatile is important in this scenario.

LikeLike

Reply ↓
- Erich Styger on October 22, 2021 at 12:53 said:
  
  In the first place, it is a ‘code smell’ which should be flagged by any source code review: using volatile (as with the whole discussion in this and the previous article) is justified in very, very few places. I did not check all border conditions, but I would not trust it. First, it is not necessary from the DbgConsole_SendData() point of view, because this one is disabling all interrupts to create a critical section (that’s fine). So if you create a critical section, you don’t have other accesses, so the volatile is simply not needed, as it is reentrant.
  There is DbgConsole_SendDataReliable() which does the same: disabling interrupts and creating a critical section.
  DbgConsole_Flush() is not ok as it is not using a critical section: it compares ringHead with ringTail, and in-between the two accesses an interrupt might happen and invalidate the return value: the flush function might return success even if this is wrong which is not good.
  The bigger issue is with DbgConsole_SerialManagerTxCallback() which does not create a critical section at all which accesses and modifies both ringTail and ringHead the same time, in a non-reentrant way without critical section. I don’t see that there is a critical section established outside calling the callback() which could be implemented, but that would be not a good solution as this extends the section beyond what is needed. Especially if the CS is done with disabling all interrupts, creating lots of latency in the system.
  The implementation *might* work in all cases, but the volatile is not warranted here in my case, as it does not do this ‘polling loop’ as explained as a possible exception. So if the volatile here really makes or breaks the implementation, then it only hides a true reentrancy issue imho.
  I hope this helps.
  
  LikeLike
  
  Reply ↓
  - Pozz on October 22, 2021 at 13:52 said:
    
    I’m sorry but I don’t know the full code of this source file, so I can’t follow you throughly.
    
    I agree with you about critical sections (disabling interrupts): if you disable interrupts before accessing variables changed in ISR, volatile keyword isn’t needed and should be avoided.
    
    However there are many situations where disabling interrupts are not desired (because of latency) or at least it isn’t strictly needed.
    Even Nigel Jones, that you cited, in one of his article[1] suggests to use volatile attribute for variables accessed in ISR. He talks about “Global variables modified by an interrupt service routine”. He write “Global variables”, but they can be static variables used in a well-confined single source file (a driver?).
    
    Your article is very critical against volatile and this is ok for an uncontrolled use. IMHO is very useful in writing ISRs in bare metal systems where you don’t have locks, mutexes, semaphores and similar tools.
    
    I understand volatile could be sub-optimal in some cases, because it forces the compiler to skip optimizations. However the same thing can be said for critical sections.
    Entering a critical section only for reading and incrementing ringHead (while ringHead is only accessed for reading in ISR) isn’t necessary and can lead to a sub-optimal solution (increased interrupt latency).
    
    In bare metal and simple systems, entering critical section means disabling all interrupts. I don’t know if this could be better than define volatile a couple of variables, in a well confined and well controlled part of code (a uart driver).
    
    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    
    LikeLike
    
    Reply ↓
    - Erich Styger on October 23, 2021 at 08:16 said:
      
      >> I’m sorry but I don’t know the full code of this source file, so I can’t follow you throughly.
      
      You can find the code in the NXP MCUXpresso SDK, and for example here:
      https://github.com/ErichStyger/mcuoneclipse/blob/master/Examples/MCUXpresso/FRDM-K22F/FRDM-K22F_FreeRTOS/utilities/fsl_debug_console.c
      There are different versions of that implementation, depending on the SDK version.
      
      LikeLike
    - Erich Styger on October 23, 2021 at 09:15 said:
      
      >>However there are many situations where disabling interrupts are not desired (because of latency) or at least it isn’t strictly needed.
      Even Nigel Jones, that you cited, in one of his article[1] suggests to use volatile attribute for variables accessed in ISR. He talks about “Global variables modified by an interrupt service routine”. He write “Global variables”, but they can be static variables used in a well-confined single source file (a driver?).
      
      ‘Global variables’ mean objects with a static (not dynamic address), their linkage (static or external) does not matter in this context. If they have a static address are not ‘private’ to a single usage, they are subject of a reentrancy problem (I recommend the article about that topic by the legendary Jack Ganssle: http://www.ganssle.com/articles/areentra.htm).
      It is important to understand that the ‘volatile’ only tells the compiler to do explicit read and writes, nothing else. It does *not* solve the reentrancy problem, these are two separate things. But many developers mix the two things. I disagree that it is needed to use volatile in general in ISRs: if you need to make sure that writes are done in a certain order, volatile does not help here. You will need proper memory barriers and flushing.
      
      The only place where volatile is ensured (apart for hardware registers) is for something like this in the application code polling a flag like:
      volatile bool flagSetByInterrupt;
      …
      while(!flagSetByInterrupt) { /* wait */ } // variable is volatile to ensure reading it during every access by the compiler
      The above only works correctly if the access is atomic.
      
      >>Your article is very critical against volatile and this is ok for an uncontrolled use. IMHO is very useful in writing ISRs in bare metal systems where you don’t have locks, mutexes, semaphores and similar tools.
      
      My point is exactly against this: ‘volatile’ is *never* a replacement for mutexes, semaphores or similar tools. Volatile is never the correct tool to ensure reentrancy, you need a critical section for this!
      
      >>I understand volatile could be sub-optimal in some cases, because it forces the compiler to skip optimizations. However the same thing can be said for critical sections.
      
      The point is that volatile aims *not* at disabling optimizations. Yes, the net effect can be looked like this, but it only tells the compiler that the object might change between read and writes, therefore it has to do forced read operation for reading it and doing forced write for writing it. As pointed out by point 5 in https://blog.regehr.org/archives/28 it does not give any guarantee about the code ordering.
      And: a critical section does not provide this to you neither!
      
      >>Entering a critical section only for reading and incrementing ringHead (while ringHead is only accessed for reading in ISR) isn’t necessary and can lead to a sub-optimal solution (increased interrupt latency).
      
      A critical section is well needed even for read operations, if an interruption of the control flow can cause inconsistent states. It is a common misunderstanding that only reading things is safe. For example
      if (ringHead==ringTail) { /* buffer is empty */
      are just two read operations. But what if between the two reads it gets interrupted and an item added to the buffer? The code following the comparison is then wrong and leads to wrong results :-(.
      
      In summary:
      – ‘volatile’ only tells the compiler that the object might change after a read or write, forcing the compiler to do explicit reads and writes. Nothing else. Really nothing else.
      – ‘reentrant’ or ‘reentrancy’ is an attribute to section or piece of code, ensuring correctness of that this piece of code, that it can re-entered at any time (for example by a task or interrupt or whatever which can re-enter it). Reentrancy is a topic of shared code with shared data which can be changed, so it would affect self-modifying code as well (not usually a subject in embedded programming, but just added here for completeness).
      – a ‘critical section’ is a tool or method to ensure reentrancy: tools to ensure reentrancy are disabling interrupts, semaphore, mutex, etc
      – ‘access order’ is yet another topic: none of the above solves that: for this you need proper flushing/memory barriers/etc to make it happen.
      
      >>In bare metal and simple systems, entering critical section means disabling all interrupts.
      No, not necessarily: it is just one way (brute force). And in most cases you even don’t need to disable all interrupts. I recommend the series starting with https://mcuoneclipse.com/2016/08/14/arm-cortex-m-interrupts-and-freertos-part-1/ which shows that for example you only need to disable some interrupts (depending on your hardware). As pointed out above: reentrancy needs to be ensure that the code can be re-entered correctly under all circumstances, protecting it from interrupted. So if you have a ring buffer for UART, you might just need to protect it from interrupted by that specific UART interrupt, not by every interrupt. There is a common misunderstanding that you always need to protect things against all interrupts: there are many cases where you only need to protect against a few. But: you need to make sure that your protection is reentrant too!
      
      Last but not least: I recommend reading yet another article I wrote: https://mcuoneclipse.com/2014/01/26/entercritical-and-exitcritical-why-things-are-failing-badly/
      
      I hope this helps. I know this is a complex topic, and still many embedded applications fail to implement things correctly.
      
      LikeLike
Mark Butcher on November 25, 2021 at 16:38 said:

Hi Eric

Is this a bad use of volatile to solve a GCC optimisation issue?

Code:

// Called to activated an endpoint, calling the hardware activation after setting endpoint variables
//
static void fnActivateEndpoint(const USB_ENDPOINT_DESCRIPTOR *ptrEndpointDesc, unsigned short usMaxLength, int iChannel)
{
volatile unsigned short usEndpointLength = ptrEndpointDesc->wMaxPacketSize[0];
…. (non-relevant code removed)

usEndpointLength |= (ptrEndpointDesc->wMaxPacketSize[1] <wMaxPacketSize[1] << 8);
when -Os is used, when usEndpointLength is not declared as volatile.

The reason is due to packed struct USB_ENDPOINT_DESCRIPTOR as follows

typedef struct _PACK stUSB_ENDPOINT_DESCRIPTOR
{
unsigned char bLength; // descriptor size in bytes
unsigned char bDescriptorType; // device descriptor
unsigned char bEndpointAddress; // direction and address of endpoint
unsigned char bmAttributes; // endpoint attributes
unsigned char wMaxPacketSize[2]; // endpoint FIFO size
unsigned char bInterval; // polling interval in ms
} USB_ENDPOINT_DESCRIPTOR;

which means that wMaxPacketSize[0] is possibly at an uneven address.

With optimisation at high the compiler tries to optimise the calculation of the length by reading the array as a half-word. On a Cortex-M7 this is not allowed and so it hard faults.
By declaring usEndpointLength as volatile it doesn't hard fault since (if I understand correctly) the compiler is then 'forced' to do the operation as coded, which is reading the two array entries as bytes and then constructing the 16 bit length value (since it can't assume that its own value hasn't changed between the two operations)..

As your article states, some use of volatile may be incorrect and mask the real issue. Do you consider this 'workaround' to be a misuse of the volatile keyword and is there are better way, or is it a legitimate use case?

Thanks

Mark

LikeLiked by 1 person

Reply ↓
- Erich Styger on November 25, 2021 at 21:43 said:
  
  Hi Mark,
  I don’t see this as a valid use case of volatile. I see a valid use case where the compiler optimization is somehow wrong. But in this case it is not: there is no guarantee or way to specify the memory access in a high level language as C or C++. To me this would be a good use case to use some assembly access routines. The correct way to deal with this would be to have/use some pragma of some kind which would specify the memory access (half-word, full-word, etc) for an object. Using volatile will cause some performance penalty too, so here again: I would use assembly code for such low level accesses.
  The other (less ideal too) way would be to compile the module/function accessing/using the data structure with a lower compiler optimization: but here again there would be a performance drawback.
  
  LikeLike
  
  Reply ↓
  - Mark Butcher on November 25, 2021 at 23:15 said:
    
    Hi Eric
    
    I want to avoid assembler because the code above is used for 7 different processor architectures and so would need 7 different assembler solutions (as assembly code is generally dedicated to the processor) to be developed and maintained, Also, if used in the future on further architectures it will need the same additional work.
    
    In some cases I build the file with -O1, which solves it but that is a nuisance since it has to be set up for the file for every IDE and also for every project it is used in and if forgotten someone can lose a day’s work trying to identify the same crash again.
    
    Pragmas are also a big nuisance since that are typically not standardised and so again need to be maintained as different IDE and different versions are used: again a very sub-optimal solution in practice even if purists may be of the opinion that they would be cleanest.
    
    Optimising of individual routines (again using pragmas) works (with the pragma hassle of course) but reduce performance of the complete subroutine.
    
    In this practical case the volatile variable definition causes only 4 additional assembler instructions and the instructions to then represent the C code exactly (where two byte reads are made and the two bytes concatonated) so doesn’t actually represent any loss in performance over the designer’s intention.
    
    I have also had this problem in code like this:
    unsigned long ulBlockLengthInBytes = ((ptrCapacity->ucBlockLengthInBytes[0] <ucBlockLengthInBytes[1] <ucBlockLengthInBytes[2] <ucBlockLengthInBytes[3]);
    
    where with high optimisation the newer GCC versions optimise this to (equivalently)
    unsigned long ulBlockLengthInBytes = (unsigned long *)(ptrCapacity->ucBlockLengthInBytes); (as it recognises that it is just reading 4 bytes and shifting their bits into a long work position) and again hard faults on any architecture that can’t access the pointer if it happens to not be long word aligned, whereby I solved it by setting -O1 in the projects that are using it (but looking for a better solution due to the maintenance overhead (and risk of projects initially crashing when forgotten) when used as a part of a library shared between many projects.
    
    I have also noted that mbedTLS based projects (for AWS IoT) built with -Os and newer GCC versions can crash during TLS handshakes and so I have had to drop these down to -O1 until the causes have been analysed. This shows that even if new optimsation techniques may not be wrong they can cause widely distributed libraries to have new problems.
    
    Since I have found that “volatiling” affected variables is a simple and effective solution to these difficulties up to now (since it effectively de-optimises just the variable access use in a very compatible way) it is a shame that it is frowned upon, especially as all other solutions that I have looked into have less that ideal characteristics since they have little standarisation and rely on specialities of assembly languages or individual tools chains. In the case of the use of volatile I have analysed the assembly code with and without volatile so understand how it is affecting the instruction generation and how it is avoiding the issue (as far as I can see in a 100% guaranteed way that will not break with time) so I am still toying with finding other workarounds or accepting the volatile use when accompanied with an analysis of why it may be a decent method in each use case (????????).
    
    Regards
    
    Mark
    
    LikeLiked by 1 person
    
    Reply ↓
    - Erich Styger on November 27, 2021 at 07:32 said:
      
      Hi Mark,
      thanks for all the details. Strictly speaking, in your case you are using volatile to band-aid or cover a compiler/code generation problem with hardware accesses.
      
      I have used volatile in the past as a workaround for wrong compiler register optimization until this was fixed by the compiler. But this was clearly a compiler bug (happening even without optimizations), so volatile was the band-aid because rewriting that particular code sequence was not possible. Later on the volatile has been removed as the compiler had fixed that bug.
      
      I think in your cases the usage of volatile for that ulBlockLengthInBytes falls into the same category.
      
      I feel it is not the right approach if it is about specific memory layout (packed), because there really the code needs to be tight to the hardware, and the programming language has no way to properly deal with such things, hence my thinking that if there is such a thing, then really the solution would be using assembly. Because who knows if a newer version of the compiler might reorder things in a different way, etc? If the access and order of accesses needs to be guaranteed, then I think assembly is the only way. And yes, it is painful.
      
      As for mbedTLS: I have faced similar issues, in different projects. I did not dig down too much on it, but here I think there are code bugs involved (maybe even reentrancy issues?) and there I feel the compiler optimization levels simply uncover the bugs.
      
      Back to the original question/problem: hats off that you checked the generated overhead with volatile! I feel that using volatile for it is not the ‘correct’ solution. But as an engineer sometimes you have to cut corners, and it is not only about getting it ‘right’, but as well ‘getting it done and working’. So in an ‘academic’ sense, volatile would not be justified. But in an ‘engineering’ sense I feel you have justified the usage of it in that case.
      
      I hope this helps, and keep going your outstanding work with the uTasker project!
      
      LikeLike
    - Mark Butcher on November 29, 2021 at 02:30 said:
      
      Hi Eric
      
      After much deliberation I have done this:
      
      unsigned short usEndpointLength = fnSafeGetBufLittleShort(ptrEndpointDesc->wMaxPacketSize);
      
      where
      static unsigned short fnSafeGetBufLittleShort(const unsigned char *ptrBuf)
      {
      volatile register unsigned short usValue = *ptrBuf++;
      usValue |= (*ptrBuf << 8);
      return usValue;
      }
      
      so that I have a 'special' routine which I can use to control this.
      
      I have also gone for the volatile workaround, noting the following:
      
      A. If volatile is not used the new routine (in-lined anyway with high optimisation level) in the assembler code is
      ldrh r2, [r4, #4]
      which crashes on M7 processors if the buffer location is not aligned, and is a single instruction as the optimiser realises that the operation can be performed by a short word read (on little-endian processor) since the byte ordering results in the bytes being in the correct place.
      
      B. If volatile "is" used the assembler code is (still in-lined)
      ldrb r0, [r4, #4]
      ldrb r1, [r4, #5]
      orr.w r0, r0, r1, lsl #8
      
      and so uses byte accesses and is still efficient.
      
      C. I will comment the routine explaining the back ground and the use case in case anyone is against its use – the can then remove the volatile keyword and edit the routine (according to their IDE / compiler) to remove optimisation from it.
      
      D. I have the following argument as to why I consider that, in this case, the volatile key word is in fact suitable:
      D.1 Consider a driver case where a watchdog (accessible only as bytes – other accesses are not allowed) needs to be written with an exact sequence:
      WDOG_R1 = 0x55; // address is 0x40020001
      WDOG_R2 = 0xAA; // address is 0x40020002
      and the programmer 'not' using volatile to define the registers.
      With optimisation the compiler decides that two byte writes is as waste of time and writes 0xaa55 as short word instead. I am sure most people have experienced such HW requirements and the failure that results when C code is used with an optimising compiler.
      D.2 In this case every experienced engineer and academic will say that 'of course' the volatile keyword should be used so that the compiler doesn't change ordering or make assumptions of the register contents or behavior. Setting the keyword then results in the compiler writing the 0x55 as a byte, followed by writing the 0xaa as a byte, to the correct registers, in the same order as the C code writes it. It works and everyone is happy.
      D.3. Now I compare this case with my code (the routine that I want to operate exactly as I have written it). And I find that it is, apart from the fact that it reads instead of writes, almost identical: it want to read two bytes in the defined order. The big difference is that this is not necessarily 'driver' code that is accessing registers. Instead it is 'general purpose' code that is accessing memory (although the pointer could also be to memory mapped register space)
      D.4 Now I ask myself, if the two pieces of code are essentially the same (wanting two byte accesses in the written order) and the theory and idea of the volatile keyword ensure that the generated accesses and order are respected, which is it that everybody would be happy with the use in the "watchdog" case (as it is defined as "driver code") but may be against it in the other case (because it is not "driver code")?.
      
      After much deliberation I haven't been able to identify any logic between being required to treat the two cases differently and now am confident that the volatile use is justified, good and even correct. As noted, if any library user is of a different opinion they can easily adapt it to use whatever solution they are more comfortable with – but at the end of the day I don't expect that better code, long term reliability, efficiency or portability will result in doing it differently.
      
      Regards
      
      Mark
      
      LikeLiked by 1 person
    - Erich Styger on December 1, 2021 at 11:53 said:
      
      Hi Mark,
      hats off for all the details provided, thank you. I too believe that in this case using the ‘volatile’ does what it is necessary, even if it might not be intended for this. The example with the watchdog is something I have seen (and used) too. But here again volatile is not a synchronization point, it just ‘makes it the way I wish it should be’, but only as a part of a side effect. So from an ‘academic’ point of view it is not appropriate, but from a practical one it is indeed appropriate (or better: does the job).
      
      Erich
      
      LikeLike
    - Mark Butcher on December 8, 2021 at 00:23 said:
      
      Hi Eric
      
      Try this one!
      
      Code:
      
      VECTOR_TABLE *ptrVect = (VECTOR_TABLE *)RAM_START_ADDRESS_ITC;
      VECTOR_TABLE_OFFSET_REG = (unsigned long)RAM_START_ADDRESS_ITC; // position the vector table at the bottom of instruction RAM
      ptrVect->ptrNMI = irq_NMI;
      ptrVect->ptrHardFault = irq_hard_fault;
      ptrVect->ptrMemManagement = irq_memory_man;
      ptrVect->ptrBusFault = irq_bus_fault;
      ptrVect->ptrUsageFault = irq_usage_fault;
      ptrVect->ptrDebugMonitor = irq_debug_monitor;
      ptrVect->ptrSysTick = irq_default;
      
      VECTOR_TABLE_OFFSET_REG is VTOR in Cortex-M7 and RAM_START_ADDRESS_ITC is the start of RAM, which happens to be at address 0x00000000 in this process.
      
      GCC (with optimisation, but maybe without any optimisation level(?) – I actually had the problem 2 years ago with a new GCC release and don’t remember any more).
      
      The code runs but the subsequent interrupts operation fails since the vectors ptrVect->ptrNMI, etc. are not filled out by the above code.
      
      If I do this
      VECTOR_TABLE_OFFSET_REG = (unsigned long)RAM_START_ADDRESS_ITC; // position the vector table at the bottom of instruction RAM
      VECTOR_TABLE *ptrVect = VECTOR_TABLE_OFFSET_REG;
      
      It fails too if VECTOR_TABLE_OFFSET_REGISTER is “not” volatile.
      
      If VECTOR_TABLE_OFFSET_REGISTER “is” volatile it works.
      
      Another detail: If RAM_START_ADDRESS_ITC is not at the address 0x00000000 but, say, 0x00000400 it works in all three versions.
      
      I used volatile to solve it but is it the right way????
      
      Happy puzzlin’
      
      Regards
      
      Mark
      
      LikeLiked by 1 person
    - Erich Styger on December 26, 2021 at 07:40 said:
      
      Hi Mark,
      I tried to reproduce it (GNU ARM Embedded 2021.07), and here things are correct. So I think this has been really a compiler bug. And to me, using volatile to work-around a compiler bug is good usage of it, and I have used it that way in the past too. It is interesting to see that it fails in your case with the value of zero: It could be because of wrong register constant propagation in the compiler? But I did not find any specific entry in the forums/release notes about this one.
      
      LikeLike
    - Mark Butcher on December 26, 2021 at 15:17 said:
      
      Eric
      My taking on that (it started when I ported the project to i.MX RT since its ITC is at 0x00000000, where the interrupt vectors table is best located) is that the compiler rejected the use of a zero pointer.
      In fact it was completely removing “all” code that used the zero pointer without any error or warning being generated.
      I found that if I set the pointer to any non-zero value the code was present so the reasoning was that if the compiler couldn’t be sure that it was zero (by reading the 0 from a “volatile” register that was set with the same required value) it couldn’t presume its value and so needed to keep the code; which it then does.
      I had never see a compiler removing the use of a zero pointer before although, since a NULL pointer may be considered as invalid, I didn’t think of it as a compiler error but rather as the compiler purposely restraining the use of a NULL pointer (the fact that it could remove large chunks of code without even issuing a warning was more like an error).
      It happened over two years ago with the compiler version integrated in MCUXpresso at -Os and I haven’t tested how the latest one handles it – as we know, these details can change with each version…in any case the workaround avoids the potential issue and have no side effects like code size increase.
      Regards
      Mark
      
      LikeLike
    - Erich Styger on December 26, 2021 at 15:53 said:
      
      Hi Mark,
      Yes, I had the same thinking that the compiler somehow tries to ‘optimize’ NULL pointer accesses.
      But the compiler cannot make any assumption about what the NULL pointer value is. It is a macro with just a special value indicating that it is ‘invalid’. Most systems use zero for it, but it could be -1, it could be anything. I have used other architectures which do have valid data at address zero, and never seen a problem with it, an never had to use volatile for it. So not sure what was triggering this, but this definitely looks like a compiler bug to me.
      
      LikeLike
Pingback: Spilling the Beans: storage class and linkage in C, including static locals | MCU on Eclipse
Dambo on January 17, 2022 at 01:07 said:

From previous post (there is no reply button):
„ ‘Global variables’ mean objects with a static (not dynamic address), their linkage (static or external) does not matter in this context.”
So what about „local static variable”? This is also a global?

LikeLiked by 1 person

Reply ↓
- Erich Styger on January 17, 2022 at 05:42 said:
  
  Yes, static local variables are treated like global variables. They have static addresses, are initialized in the startup code, and because they are static they have static linkage. The only difference is that their scope (visibility) is limited to the function in which they are defined.
  I hope this helps?
  
  LikeLike
  
  Reply ↓