Split a GitHub repository into a main repos and a submodule
Context
We have a local folder MAIN
associated with a GitHub repository, also called MAIN
. The structure is as follows:
MAIN
├── 1_dir_A
│ ├── content_AA
│ │ ├── file_AAA.ext
│ │ └── file_AAB.ext
│ └── content_AB
│ ├── file_ABA.ext
│ ├── file_ABB.ext
│ └── file_ABC.ext
├── 2_dir_B
│ ├── content_BA
│ │ ├── file_BAA.ext
│ │ └── file_BAB.ext
│ └── content_BB
│ ├── file_BBA.ext
│ └── file_BBB.ext
├── README.md
├── .git
└── .gitignore
Goal
We want to:
- keep the structure of
MAIN
- make a new folder
TOTO
associated with the content of2_dir_B
- keep the history of
2_dir_B
in the new repository - (default) keep the history of
2_dir_B
in the main repository - define
2_dir_B
as a submodule ofMAIN
Method
- Make sure that the
MAIN
repository is up-to-date with your local folder, usinggit pull
and/orgit push
commands.
Duplicate the MAIN
repository
- Clone the
MAIN
repository to initate the new one:
git clone https://github.com/user-name/MAIN.git TOTO
- Move to
TOTO
:
cd TOTO
Get all branches
- (optional) Get all the branches and tags from the
MAIN
repository, because they have not been imported usinggit clone
:
git branch -a # check the current branches
git fetch origin
git branch -a # check the current branches
- (optional) If there are SOME branches to copy, run:
git checkout -b SOME origin/SOME
git branch -a # check the current branches
Detach from origin
- Detach the local repository
TOTO
from theMAIN
repository.
git remote -v # check the remote host: MAIN.git
git remote rm origin
git remote -v # no more host
Subset the new repository
- Filter the directory to keep only the
2_dir_B
folder. We should--force
because there is no origin.
git filter-repo --path 2_dir_B/ --force
- Move the
2_dir_B
folder content to the root:
git mv 2_dir_B/* .
git mv 2_dir_B/.gitignore .gitignore # for the hidden files
rm -Rf 2_dir_B
-
Add the untracked files from
MAIN
toTOTO
. This step can/should/must be done manually by using the graphical file explorer… -
Check that
TOTO
contains everything present inMAIN/2_dir_B
:
diff -r /path/to/TOTO /path/to/MAIN/2_dir_B
- Ensure to ignore the folders and files that should not be tracked by the
git
system:
nano .gitignore # fill in manually
- Define the commit, but do not push since there is no remote yet.
git status # everything should be added
git add * # if necessary, for instance .gitignore
git commit -m "Filter only 2_dir_B content"
Define the new repository
-
On GitHub, create a new empty (no README, no LICENSE) repository, for instance called
TITI
. -
Define the new origin for the
TOTO
folder:
git remote add origin https://github.com/username/TITI.git
git remote -v # check the remote host
- Push the changes. We use
--force
if no file has been changed but only the git history has been filtered. We use--all
to include all the branches.
git push --force --all origin
git push --tags origin # optionally, if there are tags
Define a submodule in MAIN
- Go to the
MAIN
directory:
cd /path/to/MAIN
- Remove the
2_dir_B
folder, both from tracking and locally:
git rm -r 2_dir_B
rm -Rf 2_dir_B/
- (optional) To eventually remove this folder from the history, use
git filter-branch
.
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch 2_dir_B' HEAD
git push origin main --force
Note: I don’t know how to modify this to apply it on all branches. Actually, one should use git filter-repo --path 2_dir_B/ --invert-paths
to remove a folder from history. But, it removes also the remote so I have no idea how to make it works.
- Define
2_dir_B
as a submodule ofMAIN
. It creates a.gitmodules
file.
git submodule add https://github.com/username/TITI.git 2_dir_B
git commit -m "Define submodule 2_dir_B"
git push origin main # push the .gitmodules file
That’s all ! :smile:
Output
The MAIN
directory has not changed:
MAIN
├── 1_dir_A
│ ├── content_AA
│ │ ├── file_AAA.ext
│ │ └── file_AAB.ext
│ └── content_AB
│ ├── file_ABA.ext
│ ├── file_ABB.ext
│ └── file_ABC.ext
├── 2_dir_B
│ ├── content_BA
│ │ ├── file_BAA.ext
│ │ └── file_BAB.ext
│ └── content_BB
│ ├── file_BBA.ext
│ └── file_BBB.ext
├── README.md
├── .git
└── .gitignore
However, on the MAIN
GitHub repository, 2_dir_B
points to the last commit synchronized. For instance 2_dir_B @3c8464b
.
The TOTO
folder contains:
TOTO
├── content_BA
│ ├── file_BAA.ext
│ └── file_BAB.ext
├── content_BB
│ ├── file_BBA.ext
│ └── file_BBB.ext
├── .git
└── .gitignore
On the TITI
GitHub repository, there is no trace of the MAIN
, but the history of all the tracked files is conserved from their creation in the MAIN
repository.
Resources
Useful resources: