Greetings, fellow traveller. By this point in your journey you may have ran across the enigmatic Eric S. Raymond a couple of times; I assume you have hit upon this page because of a legendary and exact blade he has forged called reposurgeon
. However, you, like me, are young, and are not yet wise enough in the ways of SVN to even make a toy conversion to Git – fear not. I will give you just enough to get started.
All commands below were tested on a fresh new Debian 10 VM, brought up by Vagrant and Virtualbox:
rm -rf reposurgeon-tutorial/ && mkdir reposurgeon-tutorial/
cd reposurgeon-tutorial/
vagrant init debian/buster64
vagrant up
vagrant ssh
if you want to fully reproduce my environment. Oh, and one last note – we’re going to install things only just before we need them, for pedagogical purposes; in a real environment, you probably want to collect all those apt
s together into one place.
Let’s make a toy Subversion (SVN) repository
Before you can convert a Subversion (hereafter referred to by its command line program name, svn
, capitalized as in “SVN”) repository, you need to find one. Preferably a small one.
This is not trivial in 2024! When you’re trying to learn Git this is a non-issue, GitHub has almost half a billion repos online as of the current time of writing – but SVN is undergoing the species typical behavior of most long tail database-like software, that is to say, it is dying a very slow death.
We’re going to sidestep that problem, by creating our own SVN repo to play around with, in /tmp
. Begin by installing svn
from wherever you get your software. For instance:
sudo apt update -y
sudo apt install subversion -y
In a lot of ways, SVN actually has a simpler basic mental model than git
does. git
is decentralized; SVN has a good old fashioned server, that is expected to hold the full (often enormous) history of the project, and you, dear end user, are a humble client, checking out only the HEAD of the server. We’ll simulate both of these in silico now:
# Create a SVN repo, "server-side".
svnadmin create /tmp/hello
# Check out the "server" HEAD to a "client-side" folder.
svn co file:///tmp/hello /tmp/hello-work
# Make a few test commits.
cd /tmp/hello-work
echo "Garbage Data 1" > garbage1.txt
svn add garbage1.txt
svn commit -m "Garbage Commit 1"
echo "Garbage Data 2" > garbage2.txt
svn add garbage2.txt
svn commit -m "Garbage Commit 2"
There is no svn push
. As soon as you commit
, these changes are written into SVN’s internal databases. And – another interesting difference between Git and SVN – if you then run
cd /tmp/hello
find -name garbage1.txt # should return nothing
you’ll notice that there is no file called garbage1.txt
anywhere in the SVN server-side master repository. So where is it?
sudo apt install tree -y
tree db/
will show you something like
db
├── current
├── format
├── fs-type
├── fsfs.conf
├── min-unpacked-rev
├── rep-cache.db
├── rep-cache.db-journal
├── revprops
│ └── 0
│ ├── 0
│ ├── 1 # here!
│ └── 2 # here!
├── revs
│ └── 0
│ ├── 0
│ ├── 1 # here!
│ └── 2 # here!
├── transactions
├── txn-current
├── txn-current-lock
├── txn-protorevs
├── uuid
└── write-lock
Aha! So that’s where it is. And
grep -r Garbage
will show us where our Commits and our garbage data were stored, to confirm our intuitions:
vagrant@buster:/tmp/hello$ grep -r Garbage
db/revprops/0/2:Garbage Commit 2
db/revprops/0/1:Garbage Commit 1
Binary file db/revs/0/2 matches
Binary file db/revs/0/1 matches
We won’t beat this dead horse much, but it’s instructive for someone who was born and raised in the age of Git to see that yes, SVN really is different from Git in a lot of ways. And this explains a bit about why SVN is still used in places where huge shared files are common, like manufacturing: The “head-only” checkout style keeps downloads as small as they can be, while still preserving some kindof useful history, and the database-over-filesystem approach makes it easier to work with huge binaries as well.
Are these good reasons to stick with SVN in 2024? Of course not. But that’s why we’re here, and a little nuance never hurts.
Let’s get the latest and greatest reposurgeon
Did I raise some eyebrows when I went with Debian 10 “Buster”, instead of the latest release? Good – reposurgeon
is still an ongoing project, and we want to make sure we have the latest version.
reposurgeon
was rewritten from Python to Go in recent years, so we will first follow the instructions at https://go.dev/doc/install to get Go installed, with a little shell scripting built in:
cd # blank cd just return you to home
wget https://golang.org/dl/go1.22.1.linux-amd64.tar.gz
sudo rm -rf /usr/local/go
sudo tar -C /usr/local -xzf go1.22.1.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
go version # should be 1.22.1
The project currently lives at ESR’s Gitlab repo, so we will next do
sudo apt install git -y
cd # blank cd just return you to home
git clone https://gitlab.com/esr/reposurgeon.git
Just two more dependencies left:
sudo apt install build-essential asciidoctor -y
cd reposurgeon/
make
Happy with what you see? Go ahead and install it to your $PATH
by running
sudo make install
Check it with
reposurgeon version
If it’s 4.38 or above, you’re good to go, my friend.
Converting our toy repo with reposurgeon
Now let’s get to playing around with reposurgeon
.
mkdir /tmp/scratch
cd /tmp/scratch
What follows here is a very brief guide meant to prepare you for reading the “true” guide, 4. A Guide to Repository Conversion.
Per 4.3, start by running
repotool initmake hello # in /tmp/scratch
When it asks you what VCS you want to convert from, type svn
. When it asks you what VCS you want to convert to, type git
.
repotool: what VCS do you want to convert from? svn
repotool: what VCS do you want to convert to? git
repotool: generating Makefile, some variables in it need to be set.
repotool: generating a stub options file.
repotool: generating a stub lift file.
repotool: generating a stub map file.
Change the REMOTE_URL
variable in the Makefile
to
REMOTE_URL = file:///tmp/hello
(N.B., file://
URLs may not work on earlier versions of reposurgeon
. There was a reason we built it from scratch, instead of using Debian 10’s apt
version!)
Again run
make
You should now see 2 folders in /tmp/scratch
: One called hello-mirror
, which is a full mirror of the original SVN repo; and one called hello-git
, whic contains reposurgeon’s out-of-the-box Git conversion.
If you
cd hello-git/
git log
you should now see 2 commits, that look something like
commit da2435b14afccf0c6f518633efda2592ea01e0a4 (HEAD -> unbranched)
Author: vagrant <vagrant>
Date: Thu Mar 28 07:49:32 2024 +0000
Garbage Commit 2
commit e98d42229e7157a99100f340bf00fdf54a7efd9f
Author: vagrant <vagrant>
Date: Thu Mar 28 07:49:31 2024 +0000
Garbage Commit 1
Et voila! Your repo has been converted.
Convert SVN authors to Git authors
We could do better, though. So much better. I’ll go over only the absolute simplest of human-level fixups, more to give you an idea of how you’re supposed to work with reposurgeon
than anything.
SVN repos save the whoami
of their committers, rather than the (name, email) 2-tuples Git prefers. reposurgeon
knows this! reposurgeon
respects this! And reposurgeon
has first class support for this very simple kind of conversion. First
cd /tmp/scratch
, then edit the hello.map
file to contain the following:
# Author map for hello
vagrant = Vagrant Wanderer <vagrant.wanderer@example.com>
Then simply rerun
make # in /tmp/scratch
and
cd hello-git/
git log
again to see your handiwork:
commit 1f44b3b376419381cc49a20de6da23794ba33b70 (HEAD -> unbranched)
Author: Vagrant Wanderer <vagrant.wanderer@example.com>
Date: Thu Mar 28 18:25:01 2024 +0000
Garbage Commit 2
commit a382b234d6ecf22ed65bd8412add24b26226f2d3
Author: Vagrant Wanderer <vagrant.wanderer@example.com>
Date: Thu Mar 28 18:25:01 2024 +0000
Garbage Commit 1
You may have noticed that the commit hashes have changed from the first time we did this. Does this mean reposurgeon
is nondeterministic? I can’t speak for the whole tool, but we can run
cd /tmp/scratch
make local-clobber
make
to delete and rebuild the new Git repo from scratch. Are the commit hashes the same as the first time?
Step 2: Draw the rest of the owl
Verily, I have given you the smallest taste of the power reposurgeon
offers. No, but seriously, this is a phenomenal tool and both its source code and the ways you learn to interact with it are great examples of old-school Unix hackerdom.
ESR is a terrific writer as well, and I invite you to now go to the official reposurgeon
documentation online, or, even better, to the most up-to-date documentation you generated yourself:
sudo apt install lynx -y # command line web browser
cd ~/reposurgeon/
lynx repository-editing.html
Read away, young wanderer! Happy hacking!