Split a GitHub repository into a main repos and a submodule
Context
We have a local folder MAIN associated with a GitHub repository, also called MAIN. The structure is as follows:
MAIN
├── 1_dir_A
│ ├── content_AA
│ │ ├── file_AAA.ext
│ │ └── file_AAB.ext
│ └── content_AB
│ ├── file_ABA.ext
│ ├── file_ABB.ext
│ └── file_ABC.ext
├── 2_dir_B
│ ├── content_BA
│ │ ├── file_BAA.ext
│ │ └── file_BAB.ext
│ └── content_BB
│ ├── file_BBA.ext
│ └── file_BBB.ext
├── README.md
├── .git
└── .gitignoreGoal
We want to:
- keep the structure of
MAIN - make a new folder
TOTOassociated with the content of2_dir_B - keep the history of
2_dir_Bin the new repository - (default) keep the history of
2_dir_Bin the main repository - define
2_dir_Bas a submodule ofMAIN
Method
- Make sure that the
MAINrepository is up-to-date with your local folder, usinggit pulland/orgit pushcommands.
Duplicate the MAIN repository
- Clone the
MAINrepository to initate the new one:
git clone https://github.com/user-name/MAIN.git TOTO- Move to
TOTO:
cd TOTOGet all branches
- (optional) Get all the branches and tags from the
MAINrepository, because they have not been imported usinggit clone:
git branch -a # check the current branches
git fetch origin
git branch -a # check the current branches- (optional) If there are SOME branches to copy, run:
git checkout -b SOME origin/SOME
git branch -a # check the current branchesDetach from origin
- Detach the local repository
TOTOfrom theMAINrepository.
git remote -v # check the remote host: MAIN.git
git remote rm origin
git remote -v # no more hostSubset the new repository
- Filter the directory to keep only the
2_dir_Bfolder. We should--forcebecause there is no origin.
git filter-repo --path 2_dir_B/ --force- Move the
2_dir_Bfolder content to the root:
git mv 2_dir_B/* .
git mv 2_dir_B/.gitignore .gitignore # for the hidden files
rm -Rf 2_dir_B-
Add the untracked files from
MAINtoTOTO. This step can/should/must be done manually by using the graphical file explorer… -
Check that
TOTOcontains everything present inMAIN/2_dir_B:
diff -r /path/to/TOTO /path/to/MAIN/2_dir_B- Ensure to ignore the folders and files that should not be tracked by the
gitsystem:
nano .gitignore # fill in manually- Define the commit, but do not push since there is no remote yet.
git status # everything should be added
git add * # if necessary, for instance .gitignore
git commit -m "Filter only 2_dir_B content"Define the new repository
-
On GitHub, create a new empty (no README, no LICENSE) repository, for instance called
TITI. -
Define the new origin for the
TOTOfolder:
git remote add origin https://github.com/username/TITI.git
git remote -v # check the remote host- Push the changes. We use
--forceif no file has been changed but only the git history has been filtered. We use--allto include all the branches.
git push --force --all origin
git push --tags origin # optionally, if there are tagsDefine a submodule in MAIN
- Go to the
MAINdirectory:
cd /path/to/MAIN- Remove the
2_dir_Bfolder, both from tracking and locally:
git rm -r 2_dir_B
rm -Rf 2_dir_B/- (optional) To eventually remove this folder from the history, use
git filter-branch.
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch 2_dir_B' HEAD
git push origin main --forceNote: I don’t know how to modify this to apply it on all branches. Actually, one should use git filter-repo --path 2_dir_B/ --invert-paths to remove a folder from history. But, it removes also the remote so I have no idea how to make it works.
- Define
2_dir_Bas a submodule ofMAIN. It creates a.gitmodulesfile.
git submodule add https://github.com/username/TITI.git 2_dir_B
git commit -m "Define submodule 2_dir_B"
git push origin main # push the .gitmodules fileThat’s all ! :smile:
Output
The MAIN directory has not changed:
MAIN
├── 1_dir_A
│ ├── content_AA
│ │ ├── file_AAA.ext
│ │ └── file_AAB.ext
│ └── content_AB
│ ├── file_ABA.ext
│ ├── file_ABB.ext
│ └── file_ABC.ext
├── 2_dir_B
│ ├── content_BA
│ │ ├── file_BAA.ext
│ │ └── file_BAB.ext
│ └── content_BB
│ ├── file_BBA.ext
│ └── file_BBB.ext
├── README.md
├── .git
└── .gitignoreHowever, on the MAIN GitHub repository, 2_dir_B points to the last commit synchronized. For instance 2_dir_B @3c8464b.
The TOTO folder contains:
TOTO
├── content_BA
│ ├── file_BAA.ext
│ └── file_BAB.ext
├── content_BB
│ ├── file_BBA.ext
│ └── file_BBB.ext
├── .git
└── .gitignoreOn the TITI GitHub repository, there is no trace of the MAIN, but the history of all the tracked files is conserved from their creation in the MAIN repository.
Resources
Useful resources: