A (multi-) monorepo setup with Git Submodules
tl;dr: Git submodules provide a practical development setup for monorepos that want to share modules with other monorepos. Repos inside repos, multiple times... its awesome!
Introduction
Monorepos simplify the development setup for non-trivial apps significantly. They allow us to develop apps and modules at the same time in a practical way.
A mono repository could look like this:
/app-repository /app1 /app2 /shared-module
Where app1
and app2
operate on a similar domain and some utilities are captured in the shared module.
Shared modules can be linked on source code level, which enables use of in-progress features, leading to fast feedback on their APIs and usefulness.
Limitations
The shared module in the example above might be useful beyond app1
and app2
for another app, let's call it app3
. app3
lives in its own repository for whatever reason (maybe it targets a different domain and/or is developed by a different organization/community).
If the build output of shared-module
is published, app3
can make use of its functionality. However app3
could also drive the development and maintenance of shared-module
if it would have been setup in a monorepository with shared-module
, similar to how app1
and app2
have been setup.
Git Submodules
Git submodules allow us to have a setup where shared-module
participates in more than one monorepository. In order to do that, shared-module
needs its own repository:
/module-repository /shared-module
Then above repository can be mounted as a Git Submodule in other (mono-)repositories:
/mono-repository1 /app1 /app2 /generic-module-as-a-git-submodule
/mono-repository2 /app3 /generic-module-as-a-git-submodule
The generic-module-as-a-git-submodule
entries in the above schema are on the repository layer links to a specific commit of some git repository (identified by its url). The mono-repositories do not include the sources of their submodules.
When cloning one of the above mono-repositories however, the local working tree can have all their submodules source files checked out. This enables linking the modules components on source code level.
Working with submodules
Commands executed inside a submodule change the submodule, not the parent. Executing commands inside the parent change the parent, not the submodule. They are nicely isolated on the repositoriy layer, but the working trees put the source files side-by-side on our filesystem, allowing us to work as if everything was coming from one upstream.
Adding a submodule
git submodule add <path/to/repo.git>
Adding a submodule sets up a .gitmodules
file in the parent project, specifying the local paths and git URLs of all submodules, as well as one file per submodule that keeps track of the referenced commit. Example entry in .gitsubmodules
:
[submodule "my-module"] path = my-module url = https://github.com/jannikbuschke/my-module.git
Cloning a repository that has submodules
git clone <domain/repository.git> --recursive
When cloning your domain repository that uses submodules, it's important to use the --recursive flag
, otherwise the submodules are not initialized and you will eventually figure out that recovering non-initialized submodules is painful.
Referencing new commits in the parent repository
After committing in the submodule, the parent repository remains unchanged. We need to explicitly reference the new commit if we want to pick it up. git status
and git diff
will tell us that our submodule has a new commit, and what its hash is. If we execute git add <submodule>
and git commit -m "<useful message>"
the new commit would be referenced in our repository.
Summary
Git submodules seem to not be very popular. At least whenever I took some minutes to research I got the impression that its not a proper solution: "easy to mess things up", "bad documented", "weird behavior". I did experience some of these pains, but considere them just part of the norma l learning path.The benefits outweight the minor pitfalls by far.
Some things I stumpled in and need to be watched out for:
The developer has the responsibility to keep the upstream repositories consistent. If a submodules commit is not synced to its upstream, the mono-repository that references that commit should not by synced upstream either. Else other people or your CI pipeline will checkout the repository with a submodule that references a commit that only exists on some other developers machine.
Other than that its important to use the --recursive
flag when cloning the parent repository. Else your submodules will be empty, and its a bit weird to initialize afterwards. Also when navigating to the submodules, make sure to checkout a branch by its name. Your submodules will start with a reference to a commit by its hash.
Removing and renaming the submodules path is also something that is not straight forward. My practical advice here would be to modify the .gitsubmodule
file and then clone the containing-repository into a new location.
Conclusion: A multi-monorepo setup provided by git submodules is very powerful. If you build more than a couple of apps and want to share code, or you are into OSS and want to use but also actively develop shared projects, give it a try.