Split a GitHub repository into a main repos and a submodule

Context

We have a local folder MAIN associated with a GitHub repository, also called MAIN. The structure is as follows:

MAIN
├── 1_dir_A
│   ├── content_AA
│   │   ├── file_AAA.ext
│   │   └── file_AAB.ext
│   └── content_AB
│       ├── file_ABA.ext
│       ├── file_ABB.ext
│       └── file_ABC.ext
├── 2_dir_B
│   ├── content_BA
│   │   ├── file_BAA.ext
│   │   └── file_BAB.ext
│   └── content_BB
│       ├── file_BBA.ext
│       └── file_BBB.ext
├── README.md
├── .git
└── .gitignore

Goal

We want to:

  • keep the structure of MAIN
  • make a new folder TOTO associated with the content of 2_dir_B
  • keep the history of 2_dir_B in the new repository
  • (default) keep the history of 2_dir_B in the main repository
  • define 2_dir_B as a submodule of MAIN

Method

  1. Make sure that the MAIN repository is up-to-date with your local folder, using git pull and/or git push commands.

Duplicate the MAIN repository

  1. Clone the MAIN repository to initate the new one:
git clone https://github.com/user-name/MAIN.git TOTO
  1. Move to TOTO:
cd TOTO

Get all branches

  1. (optional) Get all the branches and tags from the MAIN repository, because they have not been imported using git clone:
git branch -a  # check the current branches
git fetch origin
git branch -a  # check the current branches
  1. (optional) If there are SOME branches to copy, run:
git checkout -b SOME origin/SOME
git branch -a  # check the current branches

Detach from origin

  1. Detach the local repository TOTO from the MAIN repository.
git remote -v  # check the remote host: MAIN.git
git remote rm origin
git remote -v  # no more host

Subset the new repository

  1. Filter the directory to keep only the 2_dir_B folder. We should --force because there is no origin.
git filter-repo --path 2_dir_B/ --force
  1. Move the 2_dir_B folder content to the root:
git mv 2_dir_B/* .
git mv 2_dir_B/.gitignore .gitignore # for the hidden files
rm -Rf 2_dir_B
  1. Add the untracked files from MAIN to TOTO. This step can/should/must be done manually by using the graphical file explorer…

  2. Check that TOTO contains everything present in MAIN/2_dir_B:

diff -r /path/to/TOTO /path/to/MAIN/2_dir_B
  1. Ensure to ignore the folders and files that should not be tracked by the git system:
nano .gitignore # fill in manually
  1. Define the commit, but do not push since there is no remote yet.
git status  # everything should be added
git add *   # if necessary, for instance .gitignore
git commit -m "Filter only 2_dir_B content"

Define the new repository

  1. On GitHub, create a new empty (no README, no LICENSE) repository, for instance called TITI.

  2. Define the new origin for the TOTO folder:

git remote add origin https://github.com/username/TITI.git
git remote -v  # check the remote host
  1. Push the changes. We use --force if no file has been changed but only the git history has been filtered. We use --all to include all the branches.
git push --force --all origin
git push --tags origin # optionally, if there are tags

Define a submodule in MAIN

  1. Go to the MAIN directory:
cd /path/to/MAIN
  1. Remove the 2_dir_B folder, both from tracking and locally:
git rm -r 2_dir_B
rm -Rf 2_dir_B/
  1. (optional) To eventually remove this folder from the history, use git filter-branch.
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch 2_dir_B' HEAD
git push origin main --force

Note: I don’t know how to modify this to apply it on all branches. Actually, one should use git filter-repo --path 2_dir_B/ --invert-paths to remove a folder from history. But, it removes also the remote so I have no idea how to make it works.

  1. Define 2_dir_B as a submodule of MAIN. It creates a .gitmodules file.
git submodule add https://github.com/username/TITI.git 2_dir_B
git commit -m "Define submodule 2_dir_B"
git push origin main # push the .gitmodules file

That’s all ! :smile:

Output

The MAIN directory has not changed:

MAIN
├── 1_dir_A
│   ├── content_AA
│   │   ├── file_AAA.ext
│   │   └── file_AAB.ext
│   └── content_AB
│       ├── file_ABA.ext
│       ├── file_ABB.ext
│       └── file_ABC.ext
├── 2_dir_B
│   ├── content_BA
│   │   ├── file_BAA.ext
│   │   └── file_BAB.ext
│   └── content_BB
│       ├── file_BBA.ext
│       └── file_BBB.ext
├── README.md
├── .git
└── .gitignore

However, on the MAIN GitHub repository, 2_dir_B points to the last commit synchronized. For instance 2_dir_B @3c8464b.

The TOTO folder contains:

TOTO
├── content_BA
│   ├── file_BAA.ext
│   └── file_BAB.ext
├── content_BB
│   ├── file_BBA.ext
│   └── file_BBB.ext
├── .git
└── .gitignore

On the TITI GitHub repository, there is no trace of the MAIN, but the history of all the tracked files is conserved from their creation in the MAIN repository.

Resources

Useful resources: