AI bounties revisited

Add me on LinkedIn! You can cite this post as a reason if you're shy.

The recent news of ChatGPT plugins has me waxing nostalgic about the days when I lived in the United States, studying electrical engineering and equal parts furious and depressed that AGI might well destory the world before I ever get myself properly settled in it. Ah, the glory days…

Well, I’m doing a lot better now, both psychologically and emotionally. Having a beautiful wife and a full time job does wonders for the soul! But I thought I would repost this tiny gem of a schizopost, from way back in my postrat Twitter days. I no longer believe a word of it, but hopefully it provides you some fuel for thought. Or at least some mild entertainment that the young can be so foolish!

“Bounties on AI Researchers” – the OG, from deep in the Wayback Machine

Alignment is hard, maybe impossible. Implementing alignment is at least as hard, and might be much harder. Perhaps there is the option to just not build an AGI, safe or unsafe, in the first place.

For one person, this is easy: Pick a different career. For a small group of people, this is harder: Some people might want to build an AI despite the risks. The reasons why often touch on their stances regarding deep philosophical issues, like where qualia comes from. You won't convince these people to see it your way, although you may well convince them your caution is justified. There’s no getting around it: You need to employ some kind of structural violence to stop them.

In both cases the upshot is that no single person so far seems to have ever had the ability to build even an unsafe AI by themselves (proof by “we’re still here”). Few people are smart enough in the first place; of those, few possess the conscientiousness to build out such a project by themselves; of those, the world is already their oyster, and almost all of them have found better things to do with their time than labor in solitude.

The real danger consists of large, well-funded, groups of these people, working closely together on building out an AI - a danger which only becomes likely if you have a large enough population to work with that you can assemble such a team in the first place.

We unfortunately do live in such a world. OpenAI has over 100 employees as of 2022, and Google Brain probably has at least that many. As an unsafe AI seems most likely to emerge accidentally from the work of large groups within firms seeking to maximize profits, we should look towards the literature on the tragedy of the commons for guidance.

In Privately Enforced & Punished Crime, Robin Hanson advocates for a fine-insured bounty system.

Non-crime law deals mostly with accidents and mild sloppy selfishness among parties who are close to each other in a network of productive relations. In such cases, law can usually require losers to pay winners cash, and rely on those who were harmed to detect and prosecute violations. This approach, however, can fail when “criminals” make elaborate plans to grab gains from others in ways that make they, their assets, and evidence of their guilt hard to find.

Ancient societies dealt with crime via torture, slavery, and clan-based liability and reputation. Today, however, we have less stomach for such things, and also weaker clans and stronger governments. So a modern society instead assigns government employees to investigate and prosecute crimes, and gives them special legal powers. But as we don’t entirely trust these employees, we limit them in many ways, including via juries, rules of evidence, standards of proof, and anti-profiling rules. We also prefer to punish via prison, as we fear government agencies eager to collect fines. Yet we still suffer from a great deal of police corruption and mistreatment, because government employees can coordinate well to create a blue wall of silence.

I propose to instead privatize the detection, prosecution, and punishment of crime. […] The key idea is to use competition to break the blue wall of silence, via allowing many parties to participate as bounty hunter enforcers and offering them all large cash bounties to show us violations by anyone, including by other enforcers. With sufficient competition and rewards, few could feel confident of getting away with criminal violations; only court judges could retain substantial discretionary powers.

Hanson does not focus on any specific suite of crimes for the mechanism he proposes. So let’s try conspiracy. Suppose a conspiracy exists between n conspirators, each with independent percent chance 0% < p < 100% to remain silent over a given period of time. Then the chance at least 1 person speaks up on the conspiracy is 1 - p ^n^ . That’s already pretty good: A 100-person conspiracy where everyone has p = 99%, or a 1% chance of speaking up, has an overall discovery rate of about 1 - (0.99 ^100^) ≈ 100% - 37% = 63 percent. p = 97% has one of 95.

Now suppose bounties are offered to the bounty hunter at a rate of $1000 per person turned in. Do I think, in the 100-person company envisioned, my own chances of keeping quiet would drop from 99% to 97%? For a bounty of 99 grand? Absolutely. People grind Leetcode for months to get comp packages like that. Even if I had to implicate myself in the documents I released, I would just be paying a bounty to myself. And even if I fully believed in the mission, I might justify it to myself by saying that that kind of runway allows me to strike out on my own and attempt my own lone-genius AI production on a remote island for a decade. Now suppose I decided against turning in my coworkers - would I, myself, want to stay there? Absolutely not. The bigger the company gets, the more of a risk there is of me being turned in myself.

It is easier to shift the Nash equilibrium of working on AI in the first place than it is to create safe AI. Financial incentives have driven the vast majority of AI improvements in the last decade, and financial incentives can be used to stop them.

Indeed, B = $1000 is low considering the stakes at play - or considering the money would come directly out of the pockets of the guilty. A better metric may be to peg the bounties directly to 10 years of TC ( $2.25m, as of 2022, likely to be higher by the time you read this). Even if the accused shirked getting an insurance policy or saving funds to cover it, they, as highly skilled, remote friendly workers, could almost certainly work them off over the next 10 in non-AI fields from the comfort of a minimum security - traffic monitored - prison.