A gentle introduction to ESR's `reposurgeon`

Some of us need our hands held more than others

Add me on LinkedIn! You can cite this post as a reason if you're shy.

Greetings, fellow traveller. By this point in your journey you may have ran across the enigmatic Eric S. Raymond a couple of times; I assume you have hit upon this page because of a legendary and exact blade he has forged called reposurgeon. However, you, like me, are young, and are not yet wise enough in the ways of SVN to even make a toy conversion to Git – fear not. I will give you just enough to get started.

All commands below were tested on a fresh new Debian 10 VM, brought up by Vagrant and Virtualbox:

rm -rf reposurgeon-tutorial/ && mkdir reposurgeon-tutorial/
cd reposurgeon-tutorial/

vagrant init debian/buster64
vagrant up
vagrant ssh

if you want to fully reproduce my environment. Oh, and one last note – we’re going to install things only just before we need them, for pedagogical purposes; in a real environment, you probably want to collect all those apts together into one place.

Let’s make a toy Subversion (SVN) repository

Before you can convert a Subversion (hereafter referred to by its command line program name, svn, capitalized as in “SVN”) repository, you need to find one. Preferably a small one.

This is not trivial in 2024! When you’re trying to learn Git this is a non-issue, GitHub has almost half a billion repos online as of the current time of writing – but SVN is undergoing the species typical behavior of most long tail database-like software, that is to say, it is dying a very slow death.

We’re going to sidestep that problem, by creating our own SVN repo to play around with, in /tmp. Begin by installing svn from wherever you get your software. For instance:

sudo apt update -y
sudo apt install subversion -y

In a lot of ways, SVN actually has a simpler basic mental model than git does. git is decentralized; SVN has a good old fashioned server, that is expected to hold the full (often enormous) history of the project, and you, dear end user, are a humble client, checking out only the HEAD of the server. We’ll simulate both of these in silico now:

# Create a SVN repo, "server-side".
svnadmin create /tmp/hello

# Check out the "server" HEAD to a "client-side" folder.
svn co file:///tmp/hello /tmp/hello-work

# Make a few test commits.
cd /tmp/hello-work
echo "Garbage Data 1" > garbage1.txt
svn add garbage1.txt
svn commit -m "Garbage Commit 1"
echo "Garbage Data 2" > garbage2.txt
svn add garbage2.txt
svn commit -m "Garbage Commit 2"

There is no svn push. As soon as you commit, these changes are written into SVN’s internal databases. And – another interesting difference between Git and SVN – if you then run

cd /tmp/hello
find -name garbage1.txt    # should return nothing

you’ll notice that there is no file called garbage1.txt anywhere in the SVN server-side master repository. So where is it?

sudo apt install tree -y

tree db/

will show you something like

db
├── current
├── format
├── fs-type
├── fsfs.conf
├── min-unpacked-rev
├── rep-cache.db
├── rep-cache.db-journal
├── revprops
│   └── 0
│       ├── 0
│       ├── 1                  # here!
│       └── 2                  # here!
├── revs
│   └── 0
│       ├── 0
│       ├── 1                  # here!
│       └── 2                  # here!
├── transactions
├── txn-current
├── txn-current-lock
├── txn-protorevs
├── uuid
└── write-lock

Aha! So that’s where it is. And

grep -r Garbage

will show us where our Commits and our garbage data were stored, to confirm our intuitions:

vagrant@buster:/tmp/hello$ grep -r Garbage

db/revprops/0/2:Garbage Commit 2
db/revprops/0/1:Garbage Commit 1
Binary file db/revs/0/2 matches
Binary file db/revs/0/1 matches

We won’t beat this dead horse much, but it’s instructive for someone who was born and raised in the age of Git to see that yes, SVN really is different from Git in a lot of ways. And this explains a bit about why SVN is still used in places where huge shared files are common, like manufacturing: The “head-only” checkout style keeps downloads as small as they can be, while still preserving some kindof useful history, and the database-over-filesystem approach makes it easier to work with huge binaries as well.

Are these good reasons to stick with SVN in 2024? Of course not. But that’s why we’re here, and a little nuance never hurts.

Let’s get the latest and greatest reposurgeon

Did I raise some eyebrows when I went with Debian 10 “Buster”, instead of the latest release? Good – reposurgeon is still an ongoing project, and we want to make sure we have the latest version.

reposurgeon was rewritten from Python to Go in recent years, so we will first follow the instructions at https://go.dev/doc/install to get Go installed, with a little shell scripting built in:

cd    # blank cd just return you to home
wget https://golang.org/dl/go1.22.1.linux-amd64.tar.gz

sudo rm -rf /usr/local/go
sudo tar -C /usr/local -xzf go1.22.1.linux-amd64.tar.gz

export PATH=$PATH:/usr/local/go/bin

go version    # should be 1.22.1

The project currently lives at ESR’s Gitlab repo, so we will next do

sudo apt install git -y

cd    # blank cd just return you to home
git clone https://gitlab.com/esr/reposurgeon.git

Just two more dependencies left:

sudo apt install build-essential asciidoctor -y

cd reposurgeon/
make

Happy with what you see? Go ahead and install it to your $PATH by running

sudo make install

Check it with

reposurgeon version

If it’s 4.38 or above, you’re good to go, my friend.

Converting our toy repo with reposurgeon

Now let’s get to playing around with reposurgeon.

mkdir /tmp/scratch
cd /tmp/scratch

What follows here is a very brief guide meant to prepare you for reading the “true” guide, 4. A Guide to Repository Conversion.

Per 4.3, start by running

repotool initmake hello        # in /tmp/scratch

When it asks you what VCS you want to convert from, type svn. When it asks you what VCS you want to convert to, type git.

repotool: what VCS do you want to convert from? svn
repotool: what VCS do you want to convert to? git
repotool: generating Makefile, some variables in it need to be set.
repotool: generating a stub options file.
repotool: generating a stub lift file.
repotool: generating a stub map file.

Change the REMOTE_URL variable in the Makefile to

REMOTE_URL = file:///tmp/hello

(N.B., file:// URLs may not work on earlier versions of reposurgeon. There was a reason we built it from scratch, instead of using Debian 10’s apt version!)

Again run

make

You should now see 2 folders in /tmp/scratch: One called hello-mirror, which is a full mirror of the original SVN repo; and one called hello-git, whic contains reposurgeon’s out-of-the-box Git conversion.

If you

cd hello-git/
git log

you should now see 2 commits, that look something like

commit da2435b14afccf0c6f518633efda2592ea01e0a4 (HEAD -> unbranched)
Author: vagrant <vagrant>
Date:   Thu Mar 28 07:49:32 2024 +0000

    Garbage Commit 2

commit e98d42229e7157a99100f340bf00fdf54a7efd9f
Author: vagrant <vagrant>
Date:   Thu Mar 28 07:49:31 2024 +0000

    Garbage Commit 1

Et voila! Your repo has been converted.

Convert SVN authors to Git authors

We could do better, though. So much better. I’ll go over only the absolute simplest of human-level fixups, more to give you an idea of how you’re supposed to work with reposurgeon than anything.

SVN repos save the whoami of their committers, rather than the (name, email) 2-tuples Git prefers. reposurgeon knows this! reposurgeon respects this! And reposurgeon has first class support for this very simple kind of conversion. First

cd /tmp/scratch

, then edit the hello.map file to contain the following:

# Author map for hello
vagrant = Vagrant Wanderer <vagrant.wanderer@example.com>

Then simply rerun

make        # in /tmp/scratch

and

cd hello-git/
git log

again to see your handiwork:

commit 1f44b3b376419381cc49a20de6da23794ba33b70 (HEAD -> unbranched)
Author: Vagrant Wanderer <vagrant.wanderer@example.com>
Date:   Thu Mar 28 18:25:01 2024 +0000

    Garbage Commit 2

commit a382b234d6ecf22ed65bd8412add24b26226f2d3
Author: Vagrant Wanderer <vagrant.wanderer@example.com>
Date:   Thu Mar 28 18:25:01 2024 +0000

    Garbage Commit 1

You may have noticed that the commit hashes have changed from the first time we did this. Does this mean reposurgeon is nondeterministic? I can’t speak for the whole tool, but we can run

cd /tmp/scratch
make local-clobber
make

to delete and rebuild the new Git repo from scratch. Are the commit hashes the same as the first time?

Step 2: Draw the rest of the owl

Verily, I have given you the smallest taste of the power reposurgeon offers. No, but seriously, this is a phenomenal tool and both its source code and the ways you learn to interact with it are great examples of old-school Unix hackerdom.

ESR is a terrific writer as well, and I invite you to now go to the official reposurgeon documentation online, or, even better, to the most up-to-date documentation you generated yourself:

sudo apt install lynx -y    # command line web browser

cd ~/reposurgeon/
lynx repository-editing.html

Read away, young wanderer! Happy hacking!


Home

Friends of the stable sort: alex (sbsbsb.sbs)

Death comes last to the party! Meanwhile I'm biding my time! So you can't take your own life - that's cutting in line!