Using Git Submodules in GitLab CI/CD Pipelines

Git Submodules allow me to keep a git repository as a sub-directory in another git repository. This let me clone another repository into my project and keep sources in and libraries/SDKs in sync.

This can be a challenge if using CI/CD runners. They have to clone the repositories in a recursive way. It gets more complex if the sub-modules are not public. Because the CI/CD runner does not have access rights to the non-public repositories.

GitLab CI/CD Pipeline with successful private submodule usage

In this article I explain how I’m using git sub-modules in my CI/CD GitLab pipeline, both for public and private repositories

Outline

I’m using GitLab CI/CD runners for my projects. For example I have the GitLab private ‘grp_00’ repository with two submodules:

VS Code with git Sub-Modules
  • McuLib is a public repository hosted on GitHub.
  • common is private repository hosted on an internal GitLab server.

It took me a while to get the GitLab runner working with such a scenario. If you are searching the documents, the information is not easily available. And examples or instructions on the internet are outdated and do not work with a more recent GitLab version.

That’s why I believe this article should help you if you are using GitLab CI/CD with git submodules.

CI/CD Runner Recursive Clone

Because the repository contains sub-modules, I have to tell the GitLab runner to check it out in a recursive way. For this I add the following variable to the .gitlab-ci.yml:

GIT_SUBMODULE_STRATEGY: recursive

This works fine for public sub-repositories, as no special authentication is needed. For example you can easily clone public repositories in the runner that way.

.gitmodules

The project sub-modules are listed in the .gitmodules file inside the main repository:

[submodule "projects/robot/common"]
path = projects/robot/common
url = git@gitlab. ......../common.git
[submodule "projects/robot/McuLib"]
path = projects/robot/McuLib
url = https://github.com/ErichStyger/McuLib

Non-Public Repositories

The issue starts with non-public repositories. The reason is that the runner is a docker container. The container has the access rights of its own repository. But it does not have the credential and access rights of protected or non-public repositories.

Below is the runner log which fails for the mentioned project:

Updating/initializing submodules recursively with git depth set to 20...
Submodule 'projects/esp_ctf/McuLib' (https://github.com/ErichStyger/McuLib) registered for path 'projects/esp_ctf/McuLib'
Submodule 'projects/esp_ctf/common' (git@gitlab.switch.ch:hslu/edu/.../common.git) registered for path 'projects/esp_ctf/common'
Submodule 'projects/robot/McuLib' (https://github.com/ErichStyger/McuLib) registered for path 'projects/robot/McuLib'
Submodule 'projects/robot/common' (git@gitlab.switch.ch:hslu/edu/.../common.git) registered for path 'projects/robot/common'
Synchronizing submodule url for 'projects/esp_ctf/McuLib'
Synchronizing submodule url for 'projects/esp_ctf/common'
Synchronizing submodule url for 'projects/robot/McuLib'
Synchronizing submodule url for 'projects/robot/common'
Entering 'projects/esp_ctf/McuLib'
Entering 'projects/esp_ctf/common'
Entering 'projects/robot/McuLib'
Entering 'projects/robot/common'
Entering 'projects/esp_ctf/McuLib'
HEAD is now at ccaac34 fixed small typo
Entering 'projects/esp_ctf/common'
HEAD is now at c5cabc2 moved common into subrepo, initial commit
Entering 'projects/robot/McuLib'
HEAD is now at ccaac34 fixed small typo
Entering 'projects/robot/common'
HEAD is now at c5cabc2 moved common into subrepo, initial commit
error: cannot run ssh: No such file or directory
fatal: unable to fork
Unable to fetch in submodule path 'projects/robot/common'; trying to directly fetch 4dd1b4ea9dc3bd9151d796dd5427aa72a4b7d077:
error: cannot run ssh: No such file or directory
fatal: unable to fork
fatal: Fetched in submodule path 'projects/robot/common', but it did not contain 4dd1b4ea9dc3bd9151d796dd5427aa72a4b7d077. Direct fetching of that commit failed.
Updating submodules failed. Retrying...

It fails because it cannot access the private ‘common’ repository.

The runner is able to access its own repository. Because the GitLab system creates a short-lived access token for it, not visible to the outside.

No SSH

The important message is this:

Entering 'projects/robot/common'
HEAD is now at c5cabc2 moved common into subrepo, initial commit
error: cannot run ssh: No such file or directory
fatal: unable to fork

The runner tries to connect using SSH. Because this is the connection and access method I have in .gitmodules :

url = git@gitlab. ......../common.git

💡 The same problem would happen if I would use HTTPS. Because the runner needs login information to access the repository.

The challenge is: how can I tell it the credentials? I don’t want to add the SSH keys or password to the docker image. This would be a security issue.

Job Tokens

The solution is to create some special access tokens in the sub-module. The access token gives special rights to another GitLab repository, so it can access other private repos.

For this, go to the settings of the submodule repository:

Submodule Repository Settings

In the Settings, choose CI/CD:

CI/CD Settings

Go there to the Job token permissions:

Job token permissions

Add a new permission entry with the ‘Add‘ button, and select ‘Group or project‘:

Adding Job token for repository

Then specify the full path the repository which shall be capable of using the token:

Specify repository which can use the token

Next, I recommend using ‘Fine-grained permissions‘ and only allow reading the repository:

Press ‘Add’ and you have a new Job token created:

Added Job Token

Then press the Save button to store it.

Forcing HTTPS

The last thing is about not using SSH, but HTTPS access instead. In addition to the recursive GIT_SUBMODULE_STRATEGY strategy, I have to force using HTTPS in the .gitlab-ci.yml:

  GIT_SUBMODULE_STRATEGY: recursive
GIT_SUBMODULE_FORCE_HTTPS: "true"
Submodule Setting Variables for the Pipeline

Result

With this, private and public sub-module cloning works in a GitLab CI/CD pipeline :-).

submodule clone and checkout with GitLab runner

Summary

Git submodules are very useful. I can build a project based on modules and libraries. Each is based on its own git repository. That way I can easily re-use code and keep them in sync. Nonetheless, such ‘recursive’ modules can be a challenge in a CI/CD environment, especially if different login details are used. Credentials should be secret, and are not known by the runner or docker container.

The solution in GitLab is to set a variable to enable recursive repository cloning. There is another (not well documented?) GitLab option required to force HTTPS connection to the sub-modules. Finally, an access token needs to be setup in the ‘sub’ repository. The token allows the ‘main’ repository runner to read the sub-module.

With this, I’m have a successful GitLab pipeline. Both for private and public submodules. And a YES: a passing pipeline is something very rewarding 🙂

passed pipeline with private submodules

Happy gitlabing:-)

Links

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.