Merging 2 git repos with persisting commit history

Lately I faced a case for the first time in my career to merge 2 working repos with large amount of logs into one repo and the challenge here was to keep the history for both repos after merging.

git code source unsplash

Doing that without caring that much about the history is super easy, simply by adding one of them copy/paste into the other one, in this case you will keep only the history of the target repo but what if you would like to create even a brand new repo that will hold both of them with a new setup (which was my case)? now the problem shines more as using the copy/paste technique will make you lose both histories.

To explain the issue better let's assume that we have a repo Original_A that has a history of 4 years of changes and repo Original_B that also has about 3 years of different changes, and we would like to have both of them in a new monorepo (as we decided in the end) because we found quite some features in monorepos that fits us best, talking about that maybe in a different post.

thinking source unsplash

The target look in the end is to have both apps inside a new repo in different folders, to be honest the mentioned article (in resources) was quite helpful in giving me a starting point but was really misleading and not precise about the changes where and in which path, that's why I decided to write this post in a clearer way.

Let's assume that the final result should be:

NewRepo
  |_ ProjectA // a folder that holds the content of Original_A repo
  |_ ProjectB // a folder that holds the content of Original_B repo

Implementation

Step One: Cloning original repos

You should have the 2 original repos that you want to merge in your projects folder first, I assume you should have cloned them already so you can skip this step.

Hint: You should do that outside the new repo folder, maybe in the same level of the new repo folder.

Step Two: Rewriting the git history of both repos

⚠️ Update: after some trials/reads you could use git subtree, Also sometimes you can even skip this step in case you would like to keep the commits ids as it was for tracking or anything else

Since both repos content will be inside different folders inside the new repo, so it is important to re-write all the paths of all files into the right folder in the new repo before moving them, so we are going to use the command git filter-branch which has quite a lot of options but very dangerous to use, try to avoid it if you can.

Go to the path of the first repo Original_A and run the following command:

git filter-branch --index-filter \
  'git ls-files -s | sed "s-\t\"*-&ProjectA/-" |
   GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
   git update-index --index-info &&
   mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD

Don't worry I'll explain line by line.

  • line 1 it is the main command as I mentioned earlier and it takes one of the options which is --index-filter you can read more about it from here but concisely it's used usually to rewrite the index of the files based on the commits.
  • line 2 we are listing all the files using git ls-files -s and then re-name them to start with ProjectA/ as the root folder instead of the original repo(that only for the final setup folder name), read more about the sed command from here.
  • line 3 we set the current index file based on the list we got before into a variable called $GIT_INDEX_FILE.new.
  • line 4 is for updating the indexes with the new ones read more.
  • line 5 finally moving the index to the index.new that we created.

Note: if you are using macOS replace \t in line 2 with control+v, TAB so it should become as follow:

'git ls-files -s | sed "s-control+v, TAB\"*-&ProjectA/-" |

Also, It was mentioned in one of the articles in resources that if the repos that you are merging has a single root folder which is rarely happening IMO, but anyway you can use this command instead:

git filter-branch --force --tree-filter \
  'mv NewRepo/ ProjectA/ || true' --tag-name-filter cat -- --all

And of course in both cases you need to repeat that for each repo.

Step Three: Create a new Repo

All the previous steps were with the original repos but now we are going to create a brand new directory in the same path with the other repos that should be eventually a monorepo, first create a new directory (folder) and open the terminal in its path, which we can consider NewRepo.

Inside NewRepo folder we should have both repos merged in 2 folder ProjectA and ProjectB, now in the terminal initiate NewRepo folder as a git repo simply by running this command in the folder path:

git init

Step Four: Add the 2 repos as remote repos for the "NewRepo" and fetch them

To set these two repos as remote repos to fetch their content into the new repo as follow:

git remote add --fetch ProjectA ../Original_A/
git remote add --fetch ProjectB ../Original_B/

Now nothing happened, you will not find any change in your folder, will remain empty.

Step Five: Merge the 2 remote repos

Now we are in the final step to merge the history of both repos, the good thing you should do here is to merge them one by one to make sure that the root files would not conflict in both of them.

Good to mention is that to merge the disparate branches (repos) which is now disabled by default in git but can be enabled with the --allow-unrelated-histories flag.

First run the following command:

git merge ProjectA/master --allow-unrelated-histories

You will find all content of the Original_A repo in your root of NewRepo, including the history, so if you run the following command you should find the whole log of the original repo:

git log --oneline

Now you can create a folder ProjectA in the root and move all the files merged into it, again to prevent any conflicts after merging the other repo, then you run the some command for the other repo:

git merge ProjectB/master --allow-unrelated-histories

Then you do the same thing again, create a folder ProjectB and move all the files merged from Original_B, also now by running git log you will find also the log history of the Original_B along side with the Original_A history, all together in the same place and the good thing is that there will not be any conflicts since they are originally 2 different codebase.

Hint: you can replace the branch master in the command with whatever branch you would like to merge from.

In the end I hope that this article was helpful for you, and if you have any comment or need some help with a similar case you can easily reach out to me on twitter @med7atdawoud

Resources

Tot ziens 👋