Using Parallel Builds: what is optimal?


One of the new features in CodeWarrior for MCU10.2 is the ability to build in parallel. Does not sound exciting? Well, when I tried this the first time in MCU10.2, I noticed immediately the reduction in build time: twice as fast compared to MCU10.1!!!. Wow! This improvement is based on using a make utility which can spawn multiple jobs on multicore host machines. CodeWarrior tries to use an optimized setting to make the build as fast as possible using parallel builds. The question is: is it really optimal?

Mingw32-Make

So what is the difference compared to MCU10.1? I can see the different make inside the \gnu\bin folder of my CodeWarrior/eclipse installation: it is using mingw make instead of the normal GNU make. This is reflected in the Build Command setting:

Build Command using mingw32-make

Build Command using mingw32-make

I notice the -j option passed to mingw32-make:

Build command with -j option

Build command with -j option

Checking the options of the make utility shows:

C:\Freescale\CW MCU v10.2\gnu\bin>mingw32-make.exe -help
Usage: mingw32-make.exe [options] [target] ...
Options:
...
  -j [N], --jobs[=N]          Allow N jobs at once; infinite jobs with no arg.
...

So the option -j controls how many jobs will be created. The above screenshot shows -j6 for my dual-core machine. I can specify the number of jobs in the projects settings under C/C++ Build > Behaviour:

Default Parallel Build Settings

Default Parallel Build Settings

I can observe the number of jobs created in the console output:

Build with -j3 creating three compiler jobs

Build with -j3 creating three compiler jobs

CodeWarrior Default Optimal Settings

In my tests I used the following three machines:

CPU Cores RAM (GB) GHz OS
T7400 (Dell M65) 2 3 2.16 Win7 32bit
Q8200 (Siemens) 4 8 2.33 Win7 64bit
i7-27200QM (Dell E6420) 8 8 2.2 Win7 64bit

Inspecting the console output, the machines were using following default settings for the -j option:

  • 2 core machine: -j6
  • 4 core machine: -j8
  • 8 core machine: -j16

But is this really optimal? In this post by Danny Thorpe I have found the following:

“Set the number of concurrent jobs equal to or less than the number of execution cores on your system.  Trying to bake 5 cakes in 4 pans at the same time is just silly.”

For this I have run some benchmarks.

Benchmarking – Clean Build

By default MC10.2 is using the workspace settings which are accessible from the project properties or through Window > Preferences menu. So I changed the workspace settings to use -j1, -j2, -j3 and so on:

Workspace Parallel Build Settings

Workspace Parallel Build Settings

The first benchmark was to use a ‘clean’ followed by a ‘build’ for two projects: one project is a HCS08 project, the other is ColdFire V2 project. That way I test it with two compilers:

Clean and Build Benchmark

Clean and Build Benchmark

More precisely, this does the following:

  1. clean the SRB and Tower project
  2. rebuild the make files for the SRB project
  3. build the SRB project
  4. rebuild the make files for the Tower project
  5. build the tower project

Running it with dual core machine and using it with different job options gives me following chart (time in seconds):

Clean Build with 2-Core Machine

Clean Build with 2-Core Machine

There is a dramatic improvement of build time (about factor of two) with the parallel build (-j1 would mean disable parallel build) :-). Interestingly not -j2 gave the best result, but the CodeWarrior default of -j6 was indeed the fastest time. The difference between -j2 and -j6 was a measurable 6 seconds difference. I can observe that with -j greater than 6 there is some CPU thrashing, as the numbers are increasing again. The other thing to mention is as well that the numbers slightly vary in the 2-3 seconds range depending on the system load or other activities.

The next chart is using the 4-core machine:

Clean Build with 4-Core Machine

Clean Build with 4-Core Machine

Here the best result is with -j4. However, the difference is only in the 1 second range, so is probably more inside the noise.

Next test is with an 8-core machine:

Clean Build with 8-Core Machine

Clean Build with 8-Core Machine

The minimum was is at -j8, but up to -j16 the measurement was showing some noise depending on the background tasks of the machine. Still -j16 is here a good default choice. The data point 17 is with -j24 and data point 18 was running with -j32: here again we see some increase in build time due thrashing.

Benchmarking – Touch Build

Looking at the console output, it seems that only the compilation  is running in parallel, but not the clean operation and the regeneration of the make files. In order to get the impact of parallel builds just for the build time, I changed the test setup: this time I touch/change a central header file used by many source files, and doing a build with the S08 build tools.

First the result for the dual core machine:

Touch Build with 2-Core Machine

Touch Build with 2-Core Machine

Here I can observe that the results starting with -j4 are in the noise range, with a slight increase with more jobs specified. Still the default -j6 looks reasonable.

Next is the same on the 4-core machine:

Touch Build with 4-Core Machine

Touch Build with 4-Core Machine

Now this gets a bit more interesting, as the minimum is more around -j4. That would confirm the ‘-jrule. But looking at the number differences here again I don’t think it is statistically relevant as it would have saved less than 1 second.

Finally the results for the 8-core machine:

Touch Build with 8-Core Machine

Touch Build with 8-Core Machine

Conclusions

At least for my test cases the CodeWarrior default settings for parallel builds make sense. The rule of -j<number of cores> makes more sense for machines with >= 4 cores. The charts for the ‘touch builds’ for the 4-core and 8-core show that -j4 and -j8 are better, but it does not make much a difference for the build time in my examples. It is possible that the impact of using -j4 and -j8 respectively might show a better benefit if more files would be compiled. As such, the CodeWarrior defaults are OK, but they might (depending on the machine and what else is going on in the background) not optimal. So if I want to squeeze out a few seconds, I probably go with -j if I have more than two cores.

An opportunity I see would be to use parallel jobs for the code generation of Processor Expert. Right now that takes a lot of time with my projects. Given the performance gain with parallel builds on my old dual core machine, that could boost things as well for code generation. There is room for improvements, as always 🙂

Tip: disabling parallel builds (with -j1) is especially a good idea if my project has a lot of errors. Having parallel builds enabled will interleave the error messages, which might confuse things. I better switch it off during the ‘error storm’ times.

Happy Parallel Building 🙂

5 thoughts on “Using Parallel Builds: what is optimal?

  1. Great post, Erich. The last paragraph about the interleaved error messages is especially helpful–I was confused by this for a while when first using 10.2.

    Like

  2. Thanks, Michael. I noticed the problem with interleaved messages when I was porting project which had a lot of unresolved includes. Disabling parallel builds allowed me to solve the problems one after each other.

    Like

  3. Pingback: Filter my Problems | MCU on Eclipse

  4. Pingback: Reducing the build time with gcc for ARM and CodeWarrior | MCU on Eclipse

  5. Pingback: Reducing Build Time in Eclipse with Parallel Build | MCU on Eclipse

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s