Splitting a git repository
For several years I've kept all my perl source code under version control. This is good. However, I was keeping all my distributions - all 40-odd of them - in a single repository. This is bad. It means that anyone who wants to check out the code has to check out 40 distributions, some of them very big, that they're not interested in as well as the one they are interested in.
So I've split the repository up into lots of seperate ones, and I've uplaoded them to Github instead of keeping them on my own machine. Normally I'm dead set against uploading my data to Teh Clowd, because you lose control over it and it's hard to make backups. Git and Github are an exception to this. My own checkouts - on my laptop and elsewhere - are complete copies of the entire repository, so if Github were to go out of business overnight, I'd not lose a damned thing, I'd just need to find somewhere else to act as the public front-end for my repositories. And it's all stuff that I want to be public anyway, so I really don't care if they lose a copy!
Splitting a git repository while still keeping all the history is a bit tricky, but the lovely Paul Johnson gave me a recipe, which I reproduce here with a few minor changes. Assuming that your monolithic repository contains a bunch of directories, each of which is to become a seperate repository ...
mkdir split-repo
cd split-repo
for i in \`cd ../monolithic-repo;ls\`; do
git clone --no-hardlinks ADDRESS_OF_REPOSITORY $i
cd $i
git filter-branch --subdirectory-filter $i HEAD -- --all
rm -r .git/refs/original
git reflog expire --expire=now --all
git gc --aggressive
git prune
git remote rm origin
git remote add origin git@github.com:YOUR_USERNAME/$i.git
git push origin master
cd -
done
This leaves the original repository unchanged, so if anything goes wrong you need not worry. I did get some warnings and errors from 'git gc' and 'git prune' about it being out of memory when trying to compress files, but that's because my repository has some very big files. These errors were in fact harmless and just meant that the new copies of the repositories on my laptop were wasting lots of disk space. Once I'd uploaded them to github, deleted the local copy, and then re-downloaded from github, that was fixed.