How to extract a script and its history from a git repository
A few days ago, I needed to extract a script from a git repository maintained by another team and import it into my own repository.
The first thing I did was to copy paste the script in a new repository and start from here. However, I later realized that I also wanted to keep the changesets of the script from the old repository. To complicate things, the script was also modified in the new one.
For the purpose of this post, let’s call
myscript the script I want to import,
myrepo the new repository I created to hold this script and
oldrepo the repository where it was previously maintained.
Extracting the history
The first step was to extract the changesets in the old repository that modified my script. To do so, I first created a new branch to work on.
$ cd /path/to/oldrepo $ git checkout -b export_myscript
Then, I used
filter-branch command from git to only extract the commits I was interested in.
$ git filter-branch --prune-empty --tree-filter \ 'find -not -name myscript -delete'
filter-branch command will loop over every commit of the current branch and execute the command provided. The
find command I used will delete all the files whose name is not
myscript. Then, thanks to the
--prune-empty option, it will remove the empty commits, which are basically the ones that don’t alter
This has the same result as if you manually checkout all the commits one by one, execute the command and perform a
git commit --amend. However, this is much faster and much more powerful.
You can always rollback your changes by doing
git reset --hard master if your parent branch is
master. Also, the current branch is backed up under the name
refs/original/refs/heads/YOUR_BRANCH. So in this case you could do:
$ git reset --hard refs/original/refs/heads/export_myscript
To add a little more fun, the repository contained a submodule. In that case, you first need to unregister the submodule with the following command.
$ git submodule deinit mysubmodule
Then, you can remove the submodule folder from all the commits. Because a submodule is not a regular file, it needs to be removed with the
git rm command.
$ git filter-branch -f --prune-empty --index-filter \ "git rm -r -f --cached --ignore-unmatch mysubmodule"
The advantage of the
--index-filter is that it doesn’t need to checkout the files before executing the command, making it much faster. However, it is limited to the git commands, that’s why I didn’t use it before. Finally, the
-f flag allows you to remove the backup from the previous run of the
Extracting a folder
If you need to extract a folder instead of a file, there is a very simple way to do so. You just need to run the following command:
$ git checkout extract-foodir $ git filter-branch --subdirectory-filter foodir -- --all
After this command, the
foodir directory becomes the new root of the repository and all your commits are rewritten to reflect this change.
You can also use the
subtree command which is even more straightforward. In fact, the following command will directly create a new branch
extract-foodir whose root is the
$ git subtree split --prefix=foodir -b extract-foodir
subtree command is much more powerful as it allows you to keep the subdirectory in the original repository and merge your changes back and forth. You can read more in the official subtree documentation.
Importing the history in the new repository
Finally, I need to import the history in our repository. To do so, I will import the old repository content in the new one.
$ cd /path/to/myrepo $ git remote add oldrepo /path/to/oldrepo $ git fetch oldrepo export_myscript
Then I will reapply all the commits of my repository on top of the
export_myscript branch I just imported. I will do that on a new branch called
$ git checkout -b import_myscript $ git rebase oldrepo/export_myscript
Git will recalculate all the diffs between commits. The consequence is that the commit corresponding to the copy pasting of
myrepo should be empty. This can be verified with a
git log --stat.
I can now reset your master branch to your
import_myscript branch and do some cleanup.
$ git checkout master $ git reset --hard import_myscript $ git branch -d import_myscript $ git remote rm oldrepo $ cd /path/to/oldrepo $ git checkout master $ git branch -D export_myscript
This article should give you a glimpse at the power of git. If you want to learn more about git, I recommend you the Pro Git book, especially chapter 7 in which you will learn some advanced tricks. Chapter 10 is also interesting as it explains how git works internally.