A gentle introduction to ESR's `reposurgeon`

Some of us need our hands held more than others

Add me on LinkedIn! You can cite this post as a reason if you're shy.

Greetings, fellow traveller. By this point in your journey you may have ran across the enigmatic Eric S. Raymond a couple of times; I assume you have hit upon this page because of a legendary and exact blade he has forged called reposurgeon. However, you, like me, are young, and are not yet wise enough in the ways of SVN to even make a toy conversion to Git – fear not. I will give you just enough to get started.

All commands below were tested on a fresh new Debian 10 VM, brought up by Vagrant and Virtualbox:

1 2 3 4 5 6 rm -rf reposurgeon-tutorial/ && mkdir reposurgeon-tutorial/ cd reposurgeon-tutorial/ vagrant init debian/buster64 vagrant up vagrant ssh

if you want to fully reproduce my environment. Oh, and one last note – we’re going to install things only just before we need them, for pedagogical purposes; in a real environment, you probably want to collect all those apts together into one place.

Let’s make a toy Subversion (SVN) repository

Before you can convert a Subversion (hereafter referred to by its command line program name, svn, capitalized as in “SVN”) repository, you need to find one. Preferably a small one.

This is not trivial in 2024! When you’re trying to learn Git this is a non-issue, GitHub has almost half a billion repos online as of the current time of writing – but SVN is undergoing the species typical behavior of most long tail database-like software, that is to say, it is dying a very slow death.

We’re going to sidestep that problem, by creating our own SVN repo to play around with, in /tmp. Begin by installing svn from wherever you get your software. For instance:

1 2 sudo apt update -y sudo apt install subversion -y

In a lot of ways, SVN actually has a simpler basic mental model than git does. git is decentralized; SVN has a good old fashioned server, that is expected to hold the full (often enormous) history of the project, and you, dear end user, are a humble client, checking out only the HEAD of the server. We’ll simulate both of these in silico now:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Create a SVN repo, "server-side". svnadmin create /tmp/hello # Check out the "server" HEAD to a "client-side" folder. svn co file:///tmp/hello /tmp/hello-work # Make a few test commits. cd /tmp/hello-work echo "Garbage Data 1" > garbage1.txt svn add garbage1.txt svn commit -m "Garbage Commit 1" echo "Garbage Data 2" > garbage2.txt svn add garbage2.txt svn commit -m "Garbage Commit 2"

There is no svn push. As soon as you commit, these changes are written into SVN’s internal databases. And – another interesting difference between Git and SVN – if you then run

1 2 cd /tmp/hello find -name garbage1.txt # should return nothing

you’ll notice that there is no file called garbage1.txt anywhere in the SVN server-side master repository. So where is it?

1 2 3 sudo apt install tree -y tree db/

will show you something like

db ├── current ├── format ├── fs-type ├── fsfs.conf ├── min-unpacked-rev ├── rep-cache.db ├── rep-cache.db-journal ├── revprops │   └── 0 │   ├── 0 │   ├── 1 # here! │   └── 2 # here! ├── revs │   └── 0 │   ├── 0 │   ├── 1 # here! │   └── 2 # here! ├── transactions ├── txn-current ├── txn-current-lock ├── txn-protorevs ├── uuid └── write-lock

Aha! So that’s where it is. And

1 grep -r Garbage

will show us where our Commits and our garbage data were stored, to confirm our intuitions:

vagrant@buster:/tmp/hello$ grep -r Garbage db/revprops/0/2:Garbage Commit 2 db/revprops/0/1:Garbage Commit 1 Binary file db/revs/0/2 matches Binary file db/revs/0/1 matches

We won’t beat this dead horse much, but it’s instructive for someone who was born and raised in the age of Git to see that yes, SVN really is different from Git in a lot of ways. And this explains a bit about why SVN is still used in places where huge shared files are common, like manufacturing: The “head-only” checkout style keeps downloads as small as they can be, while still preserving some kindof useful history, and the database-over-filesystem approach makes it easier to work with huge binaries as well.

Are these good reasons to stick with SVN in 2024? Of course not. But that’s why we’re here, and a little nuance never hurts.

Let’s get the latest and greatest reposurgeon

Did I raise some eyebrows when I went with Debian 10 “Buster”, instead of the latest release? Good – reposurgeon is still an ongoing project, and we want to make sure we have the latest version.

reposurgeon was rewritten from Python to Go ion recent years, so we will first follow the instructions at https://go.dev/doc/install to get Go installed, with a little shell scripting built in:

1 2 3 4 5 6 7 8 9 cd # blank cd just return you to home wget https://golang.org/dl/go1.22.1.linux-amd64.tar.gz sudo rm -rf /usr/local/go sudo tar -C /usr/local -xzf go1.22.1.linux-amd64.tar.gz export PATH=$PATH:/usr/local/go/bin go version # should be 1.22.1

The project currently lives at ESR’s Gitlab repo, so we will next do

1 2 3 4 sudo apt install git -y cd # blank cd just return you to home git clone https://gitlab.com/esr/reposurgeon.git

Just two more dependencies left:

1 2 3 4 sudo apt install build-essential asciidoctor -y cd reposurgeon/ make

Happy with what you see? Go ahead and install it to your $PATH by running

1 sudo make install

Check it with

1 reposurgeon version

If it’s 4.38 or above, you’re good to go, my friend.

Converting our toy repo with reposurgeon

Now let’s get to playing around with reposurgeon.

1 2 mkdir /tmp/scratch cd /tmp/scratch

What follows here is a very brief guide meant to prepare you for reading the “true” guide, 4. A Guide to Repository Conversion.

Per 4.3, start by running

1 repotool initmake hello # in /tmp/scratch

When it asks you what VCS you want to convert from, type svn. When it asks you what VCS you want to convert to, type git.

repotool: what VCS do you want to convert from? svn repotool: what VCS do you want to convert to? git repotool: generating Makefile, some variables in it need to be set. repotool: generating a stub options file. repotool: generating a stub lift file. repotool: generating a stub map file.

Change the REMOTE_URL variable in the Makefile to

REMOTE_URL = file:///tmp/hello

(N.B., file:// URLs may not work on earlier versions of reposurgeon. There was a reason we built it from scratch, instead of using Debian 10’s apt version!)

Again run

make

You should now see 2 folders in /tmp/scratch: One called hello-mirror, which is a full mirror of the original SVN repo; and one called hello-git, whic contains reposurgeon’s out-of-the-box Git conversion.

If you

1 2 cd hello-git/ git log

you should now see 2 commits, that look something like

commit da2435b14afccf0c6f518633efda2592ea01e0a4 (HEAD -> unbranched) Author: vagrant <vagrant> Date: Thu Mar 28 07:49:32 2024 +0000 Garbage Commit 2 commit e98d42229e7157a99100f340bf00fdf54a7efd9f Author: vagrant <vagrant> Date: Thu Mar 28 07:49:31 2024 +0000 Garbage Commit 1

Et voila! Your repo has been converted.

Convert SVN authors to Git authors

We could do better, though. So much better. I’ll go over only the absolute simplest of human-level fixups, more to give you an idea of how you’re supposed to work with reposurgeon than anything.

SVN repos save the whoami of their committers, rather than the (name, email) 2-tuples Git prefers. reposurgeon knows this! reposurgeon respects this! And reposurgeon has first class support for this very simple kind of conversion. First

1 cd /tmp/scratch

, then edit the hello.map file to contain the following:

# Author map for hello vagrant = Vagrant Wanderer <vagrant.wanderer@example.com>

Then simply rerun

1 make # in /tmp/scratch

and

1 2 cd hello-git/ git log

again to see your handiwork:

commit 1f44b3b376419381cc49a20de6da23794ba33b70 (HEAD -> unbranched) Author: Vagrant Wanderer <vagrant.wanderer@example.com> Date: Thu Mar 28 18:25:01 2024 +0000 Garbage Commit 2 commit a382b234d6ecf22ed65bd8412add24b26226f2d3 Author: Vagrant Wanderer <vagrant.wanderer@example.com> Date: Thu Mar 28 18:25:01 2024 +0000 Garbage Commit 1

You may have noticed that the commit hashes have changed from the first time we did this. Does this mean reposurgeon is nondeterministic? I can’t speak for the whole tool, but we can run

1 2 3 cd /tmp/scratch make local-clobber make

to delete and rebuild the new Git repo from scratch. Are the commit hashes the same as the first time?

Step 2: Draw the rest of the owl

Verily, I have given you the smallest taste of the power reposurgeon offers. No, but seriously, this is a phenomenal tool and both its source code and the ways you learn to interact with it are great examples of old-school Unix hackerdom.

ESR is a terrific writer as well, and I invite you to now go to the official reposurgeon documentation online, or, even better, to the most up-to-date documentation you generated yourself:

1 2 3 4 sudo apt install lynx -y # command line web browser cd ~/reposurgeon/ lynx repository-editing.html

Read away, young wanderer! Happy hacking!