It is always good to have a close look what ends up in a microcontroller FLASH memory. For example using EHEP Eclipse plugin to inspect the binary file:
Obviously it has path and source file information in it. Why is that? And is this really needed?
What about:
- Privacy: the path or file name might expose information (secret project name?) or might be used for reverse engineering?
- Size: The strings add up to the final data/FLASH size, so this increases the need for ROM space?
So let’s have a look what is the reason for this and how it could be avoided or at least reduced.
Outline
This article covers why information about file names and path information can be present in a binary. It goes through how assert()
checks are used, how they can be enabled or disabled, how the information about the files can be avoided, changed or removed to address privacy or code size concerns.
assert()
The reason for the file name strings are using asserts like this for example:
void McuLED_GetDefaultConfig(McuLED_Config_t *config) { assert(config!=NULL); memcpy(config, &defaultConfig, sizeof(*config)); }
Asserts are used to verify a condition to catch error cases (a NULL Pointer in above case). The assert checks if the condition is true or not. If false it can trigger an error handler.
The assert is typically a macro. A typical library implementation is like below, found in assert.h:
#ifdef NDEBUG /* required by ANSI standard */ # define assert(__e) ((void)0) #else # define assert(__e) ((__e) ? (void)0 : __assert_func (__FILE__, __LINE__, \ __ASSERT_FUNC, #__e)) #endif
The macro (if turned on) uses the __FILE__
macro/preprocessor symbol which gets filled by the compiler with the file name. It is used with the __LINE__
preprocessor symbol to write an error message, indicating the file name and line number where the assertion failed.
💡 If that
__FILE__
gets resolved to just the file name or the file name with the path depends on the implementation of the compiler, more about this later.
To which extend the information about the file name and path to it might be useful for reverse engineering of course depends: at least it might expose some information you do not want to share.
Turning Asserts Off
As we can see, we can completely turn off the assert functionality with having NDEBUG
defined. This is usually defined for a Release build (see Debug vs. Release?). So one solution is obviously to have the usual DEBUG changed to NDEBUG<
, or just to have it present in the list of defines like below:
__FILE__
The advantage of having NDEBUG this is of course these file names will be removed because not generated by the asserts. As another side effect, this can reduce code size too depending how many asserts are present in the code (see Tutorial: How to Optimize Code and RAM Size).
But as the safety checks with the asserts are gone too. So what can I do to keep the asserts, but not exposing the full path with the __FILE__ macro?
The thing is that the C/C++ standard does not specify if it is with the full path or not. Some compilers implement dedicated options to configure exactly that. For the GNU gcc the compiler basically is using what I’m passing on the command line. So if I compile a file with the full path, this is what ends up resolved by __FILE__. In the example below the full path is used:
What gets passed to the compiler as file name depends on your build environment, e.g. how you call the compiler in the make file.
Eclipse for example usually uses a relative path to the file:
💡 In Eclipse CDT the relative paths are relative to the ‘output’ folder which is where the binaries and object files are stored. Usually this is the ‘Debug’ or ‘Release’ folder in your project.
Again it is up to the compiler what is then used for __FILE__. For the above file compiled with “../source/main.c” the result is interesting:
So obviously the compiler has shorten the path to something I have not expected: “.main.c”
Compiling it with an absolute path shows that __FILE__ is absolute too:
__BASE_FILE__
When I first saw that there is a __BASE_FILE__ macro in gcc, I thought this could be the solution and just have the file name without path. But this is not the case:
Confirmed by the documentation on https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
__BASE_FILE__
This macro expands to the name of the main input file, in the form of a C string constant. This is the source file that was specified on the command line of the preprocessor or C compiler.
So to me it is the same as __FILE__. Well, not really: if used in a included (header) file it reports the includer file and not the included file (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42579#c3). So this one might not be helpful, but let’s see :-).
Linux
Because paths are handled differently on Windows and Linux, I created a a simple test on Raspberry Pi (using gcc for that ARM core):
#include &amp;lt;stdio.h&amp;gt; int main(void) { printf("__FILE__ is '%s'\n", __FILE__); return 0; }
Compiled with
gcc main.c
it gives
__FILE__ is 'main.c'
which is expected.
Compiled with
gcc ../0_test/main.c
it gives
__FILE__ is '../0_test/main.c'
which is different from what I had on Windows.
Finally compiled it with
gcc /home/pi/aembs/0_test/main.c
it gives
__FILE__ is '/home/pi/aembs/0_test/main.c'
To me things are a bit more consistent on Linux. It seems that GNU gcc for ARM has an issue with relative paths on windows and somehow truncates it to a single dot (.).
Absolute Paths
The first thing would be to get rid of the path or make it at least relative. This depends what build environment is used. In Eclipse look at the build console output what kind of path is used to the file:
If using Linked Files or Folders, they get expanded to the to an absolute path which is passed to the build tools:
Unfortunately depending how the project is organized, this cannot be easily changed. The project is still portable because it is a relative path, but an absolute path gets passed to the compiler.
-ffile-prefix-map Option
The GNU compiler has a nice option (see https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html) which can be used to cut of paths in the preprocessor:
When compiling files residing in directory old, record any references to them in the result of the compilation as if the files resided in directory new instead. Specifying this option is equivalent to specifying all the individual -f*-prefix-map options. This can be used to make reproducible builds that are location independent. See also -fmacro-prefix-map and -fdebug-prefix-map.
Say if I want to cut-off “c:/tmp/” from the __FILE__, I can use the following option:
-fmacro-prefix-map=c:/tmp/=
That way a “c:/tmp/main.c” gets mapped just to “main.c”:
If a path has spaces, use a double quoted path, e.g.
-fmacro-prefix-map="c:/path with spaces/"="new path"
Of course this means I have to do this for multiple directories depending on my application structure, but at least that way I can keep the strings short and still useful.
For that example shown at the beginning of the article I can easily make that path shorter and get the first part of the absolute path removed:
So with this I have to shorten the path/file name as much as I want. In addition to that, what could be done with the __FILE__ macro to just have it represent the file name?
__FILE__ without path: __FILENAME__
With the __FILE__ being the problem, why not create a new one (__FILENAME__) which only contains the file name and not path?
That’s actually not that hard to do:
#define __FILENAME__ (strrchr("/"__FILE__, '/') + 1)
In above macro the strrchr() is from <strings.h> and locates the last character (in this case ‘/’) in a string (in this case __FILE__) and points past it (+1). A bit of trickery is the implicit string concatenation with the prefix of the “/”: that way the string always has a ‘/’ present.
But wait! strrchr()
is a function call: first it is not efficient and with the __FILE__ it still will have the string with the path in my binary :-(. And yes, indeed looking at the assembly code confirms this:
#define __FILENAME__ (strrchr("/"__FILE__, '/') + 1) const char *fileName; ... str = __FILENAME__;
gives:
Definitely not good if there would be such a call for each __FILE__ usage.
Actually there is help with gcc built-in functions (see GNU gcc printf() and BuiltIn Optimizations and list of gcc built-in functions).
First, make sure that you don’t have a -fno-builtin in your project settings, so remove that option:
With that option removed or not present, the gcc compiler can optimize, replace or inline standard library functions. Because just removing the option might not be enough, I do call the built-in function directly:
#define __FILENAME__ (__builtin_strrchr("/"__FILE__, '/') + 1)
💡 Note that strrchr()
does return a pointer into the __FILE__ string.
With this, no extra call and it uses just the pointer to the constant string “main.c” which is at address 0x2f7c below:
So with this I have a __FILENAME__ macro which other than the normal __FILE__ one only contains the file name and no path.
Using __FILENAME__ in assert()?
Instead of using __FILE__ I can use __FILENAME__ and I should be fine. That’s OK for my own code which I can change from using __FILE__ to __FILENAME__.
Unfortunately this does not work for the assert() macro which is inside <assert.h>: I can disable the assert() with
NDEBUG but I cannot easily overwrite it with my own define (I could define my own
__assert_func(), but it is the
assert() which uses the
__FILE__ below:
#ifdef NDEBUG /* required by ANSI standard */ # define assert(__e) ((void)0) #else # define assert(__e) ((__e) ? (void)0 : __assert_func (__FILE__, __LINE__, \ __ASSERT_FUNC, #__e)) #endif
Well, I could change the <assert.h> library header file or recompile the GNU standard library. But I rather want to keep it as it is because this is not easy and I want to keep the library as it is.
Redefining __FILE__
If the __FILE__ macro is not what I want, why not changing that macro instead?
#define __FILE__ (__builtin_strrchr("/"__FILE__, '/') + 1)
To get rid of the recursion, I can rewrite it as
#define __FILE__ (__builtin_strrchr("/"__BASE_FILE__, '/') + 1)
This of course raises a gcc warning which I can suppress with
-Wno-builtin-macro-redefined
To have this define present for every file I compile in the project, I use the -include option with the following file:
/* * __FILE__def.h * * Copyright (c) 2020: Erich Styger * License: PDX-License-Identifier: BSD-3-Clause */ #ifndef FILE__DEF_H_ #define FILE__DEF_H_ /* Redefine the __FILE__ macro so it contains just the file name and no path * Add -Wno-builtin-macro-redefined to the compiler options to suppress the warning about this. */ #define __FILE__ (__builtin_strrchr("/"__BASE_FILE__, '/') + 1) #endif /* FILE__DEF_H_ */
To include it, I use the -include
option:
With this, no warnings, I still have the asserts in place, the file names are without path information plus I save FLASH space 🙂
Assert callback
The last thing is about what should happen in case the assertion triggers. By default the library will print an error message like this using printf():
What I recommend is overwriting the callbacks with custom routines: this not only avoids using the printf() bloat, it gives you the ability to do custom actions (blink an LED) or log the error. Below is what I usually use with the McuLog library:
/* overwrite assertion callback */ #include "McuLog.h" void __assertion_failed(char *_Expr) { McuLog_fatal(_Expr); McuLog_fatal("Assert failed!"); __asm volatile("bkpt #0"); for(;;) { __asm("nop"); } } void __assert_func(const char *file, int line, const char *func, const char *expr) { McuLog_fatal("%s:%d %s() %s", file, line, func, expr); McuLog_fatal("Assert failed!"); __asm volatile("bkpt #0"); for(;;) { __asm("nop"); } }
Simply add the two above functions to the code as the base implementation in the GNU library is marked as ‘weak’ and can be easily overwritten.
Summary
Using the assert()
in the source code is a good thing to catch errors early. The assert()
checks a condition and if it fails the default implementation reports the source file name (__FILE__
) and line number (__LINE__
). That way the path and source file name gets added as constant strings to the binary which can be a concern both because of privacy and/or code size. What exactly is represented with __FILE__
depends on the compiler and how the file gets passed to the compiler. The asserts can be turned off with the NDEBUG
macro. In case asserts shall be still checked in a release binary, the assert can be overloaded and modified to whatever you want.
With the help of this article you can now turn on/off asserts, limit or replace the used path information to the files, having the file name without path using the __FILE__
preprocessor macro and the ability to use custom assert()
hooks. Congratulations!
Happy asserting 🙂
Hi, some questions please.
1.- How can I see the disassembly of the binary in MCUXpresso?
2.- Do you know if this is possible to see in all IDE based on Eclipse as in ST for STM32?
3.- Can it be seen in any binary or only in the full projects with sources developed in MCUXpresso?
4.- Is it possible to modify it or is it only displayed?
Regards
LikeLike
1. While Debugging: Window > Show View > Disassembly. Or: see the Disassemble menu in https://mcuoneclipse.com/2018/07/08/creating-disassembly-listings-with-gnu-tools-and-eclipse/
2. Yes, see https://mcuoneclipse.com/2018/07/08/creating-disassembly-listings-with-gnu-tools-and-eclipse/ with the exception of the Disassemble menu which is a great feature of the NXP MCUXpresso IDE only. Other IDEs might have similar things too.
3. In any binary
4. Depends. You can change the assembly code in the debugger (if in RAM), or you can create the listing as in https://mcuoneclipse.com/2018/07/08/creating-disassembly-listings-with-gnu-tools-and-eclipse/ and then run it through the compiler/assembler again. You even can create C source code out of it again (see https://mcuoneclipse.com/2019/05/26/reverse-engineering-of-a-not-so-secure-iot-device/).
LikeLike
Clang has an additional macro called `__FILE_NAME__` with the value of the last path component of `__FILE__`. Documented here: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-macros
LikeLike
Yes, I saw that when I was looking for a solution. There were several gcc pull request to have that implemented too, but they did not make it which is too bad.
Others have created patches for an compiler option to have __FILE__ without path (as it is for example in IAR compiler), but this did not make it into the GNU ARM one neither.
LikeLike
GCC 12 will be adding support for __FILE_NAME__.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42579
LikeLiked by 1 person
Thanks for the heads up!
LikeLike
Hi Erich,
This partially worked for me. The call to strrchr does get optimized out, and just the relevant portion of the filename gets used, however, the full path is still included in the binary. I tried a bunch of different optimization levels and options, but couldn’t find anything that eliminated the full path from the binary. Do you have any suggestions?
These are my options:
-Og -mcmodel=medium -g3 -Wall -mcpu=n25f -ffunction-sections -fdata-sections -c -fmessage-length=0 -fomit-frame-pointer -fno-strict-aliasing -Werror -fstrict-volatile-bitfields
I’m using the AndeSight IDE (eclipse-based) with an Andes gcc compiler, so maybe things are just too different.
Thanks,
Ben
LikeLiked by 1 person
Hi Ben,
I did not know about this AndeSight IDE. I checked their web site and it seems to me that they are not using the standard ARM/GNU compiler, so the compiler/linker might not be doing the same thing. Your options look fine. Did you check/verify from where that absolute path is used? Maybe you are using something different or assert is still using the full path?
LikeLike
Hi Erich,
Thanks for your response. I’ve tried to eliminate variables to solve this issue so I’m not working with “assert” yet. I just have
#define __FILE_NAME__ (__builtin_strrchr(“/”__FILE__, ‘/’) + 1)
and then I call
printf(“__FILE_NAME__ is %s\n”, __FILE_NAME__);
in main(). The functionality is correct and no calls to strrchr are found in the assembly, but the full path of the file can still be found in the binary.
Thanks,
Ben
LikeLiked by 1 person
I guess you have something wrong with your includes. Place that #define directly in front of your printf and check the assembly code.
LikeLike
“I guess you have something wrong with your includes. Place that #define directly in front of your printf and check the assembly code.”
No luck with that either. And I just got a response from my question to Andes – they don’t think it’s possible. Must be something about their toolchain…
LikeLiked by 1 person
So this seems indeed a problem with the tool chain, which is very strange to me.
LikeLike
thanks it has been vey usefull to me!
LikeLiked by 1 person
glad to hear it helped you!
LikeLike