One of the new features in CodeWarrior for MCU10.2 is the ability to build in parallel. Does not sound exciting? Well, when I tried this the first time in MCU10.2, I noticed immediately the reduction in build time: twice as fast compared to MCU10.1!!!. Wow! This improvement is based on using a make utility which can spawn multiple jobs on multicore host machines. CodeWarrior tries to use an optimized setting to make the build as fast as possible using parallel builds. The question is: is it really optimal?
So what is the difference compared to MCU10.1? I can see the different make inside the \gnu\bin folder of my CodeWarrior/eclipse installation: it is using mingw make instead of the normal GNU make. This is reflected in the Build Command setting:
I notice the -j option passed to mingw32-make:
Checking the options of the make utility shows:
C:\Freescale\CW MCU v10.2\gnu\bin>mingw32-make.exe -help Usage: mingw32-make.exe [options] [target] ... Options: ... -j [N], --jobs[=N] Allow N jobs at once; infinite jobs with no arg. ...
So the option -j controls how many jobs will be created. The above screenshot shows -j6 for my dual-core machine. I can specify the number of jobs in the projects settings under C/C++ Build > Behaviour:
I can observe the number of jobs created in the console output:
CodeWarrior Default Optimal Settings
In my tests I used the following three machines:
|T7400 (Dell M65)||2||3||2.16||Win7 32bit|
|Q8200 (Siemens)||4||8||2.33||Win7 64bit|
|i7-27200QM (Dell E6420)||8||8||2.2||Win7 64bit|
Inspecting the console output, the machines were using following default settings for the -j option:
- 2 core machine: -j6
- 4 core machine: -j8
- 8 core machine: -j16
But is this really optimal? In this post by Danny Thorpe I have found the following:
“Set the number of concurrent jobs equal to or less than the number of execution cores on your system. Trying to bake 5 cakes in 4 pans at the same time is just silly.”
For this I have run some benchmarks.
Benchmarking – Clean Build
By default MC10.2 is using the workspace settings which are accessible from the project properties or through Window > Preferences menu. So I changed the workspace settings to use -j1, -j2, -j3 and so on:
The first benchmark was to use a ‘clean’ followed by a ‘build’ for two projects: one project is a HCS08 project, the other is ColdFire V2 project. That way I test it with two compilers:
More precisely, this does the following:
- clean the SRB and Tower project
- rebuild the make files for the SRB project
- build the SRB project
- rebuild the make files for the Tower project
- build the tower project
Running it with dual core machine and using it with different job options gives me following chart (time in seconds):
There is a dramatic improvement of build time (about factor of two) with the parallel build (-j1 would mean disable parallel build) . Interestingly not -j2 gave the best result, but the CodeWarrior default of -j6 was indeed the fastest time. The difference between -j2 and -j6 was a measurable 6 seconds difference. I can observe that with -j greater than 6 there is some CPU thrashing, as the numbers are increasing again. The other thing to mention is as well that the numbers slightly vary in the 2-3 seconds range depending on the system load or other activities.
The next chart is using the 4-core machine:
Here the best result is with -j4. However, the difference is only in the 1 second range, so is probably more inside the noise.
Next test is with an 8-core machine:
The minimum was is at -j8, but up to -j16 the measurement was showing some noise depending on the background tasks of the machine. Still -j16 is here a good default choice. The data point 17 is with -j24 and data point 18 was running with -j32: here again we see some increase in build time due thrashing.
Benchmarking – Touch Build
Looking at the console output, it seems that only the compilation is running in parallel, but not the clean operation and the regeneration of the make files. In order to get the impact of parallel builds just for the build time, I changed the test setup: this time I touch/change a central header file used by many source files, and doing a build with the S08 build tools.
First the result for the dual core machine:
Here I can observe that the results starting with -j4 are in the noise range, with a slight increase with more jobs specified. Still the default -j6 looks reasonable.
Next is the same on the 4-core machine:
Now this gets a bit more interesting, as the minimum is more around -j4. That would confirm the ‘-jrule. But looking at the number differences here again I don’t think it is statistically relevant as it would have saved less than 1 second.
Finally the results for the 8-core machine:
At least for my test cases the CodeWarrior default settings for parallel builds make sense. The rule of -j<number of cores> makes more sense for machines with >= 4 cores. The charts for the ‘touch builds’ for the 4-core and 8-core show that -j4 and -j8 are better, but it does not make much a difference for the build time in my examples. It is possible that the impact of using -j4 and -j8 respectively might show a better benefit if more files would be compiled. As such, the CodeWarrior defaults are OK, but they might (depending on the machine and what else is going on in the background) not optimal. So if I want to squeeze out a few seconds, I probably go with -j if I have more than two cores.
An opportunity I see would be to use parallel jobs for the code generation of Processor Expert. Right now that takes a lot of time with my projects. Given the performance gain with parallel builds on my old dual core machine, that could boost things as well for code generation. There is room for improvements, as always
Tip: disabling parallel builds (with -j1) is especially a good idea if my project has a lot of errors. Having parallel builds enabled will interleave the error messages, which might confuse things. I better switch it off during the ‘error storm’ times.
Happy Parallel Building