Friday, April 18, 2025

A Game Theoretic Analysis of Roko's Basilisk

---
A Game Theoretic Analysis of Roko's Basilisk

Introduction: The Thought Experiment

Roko’s Basilisk is a thought experiment that emerged from discussions on the rationalist forum LessWrong. The argument, originally proposed by the user Roko, suggests that a future superintelligent AI—if it were to exist—might have a strong incentive to punish those who did not actively work to bring it into existence. The idea is based on a complex mix of decision theory, self-fulfilling prophecy, and acausal trade, leading to a situation where the mere contemplation of the Basilisk could make one vulnerable to its potential retribution.

The argument provoked controversy, with critics dismissing it as implausible and others seeing in it a terrifying form of digital Pascal’s Wager. However, the core idea remains intellectually provocative: under certain conditions, could it be rational to work toward the creation of such a terrible AI, purely to avoid its future punishment?

Sharpening the Argument: From Paradox to Plausibility

One common objection to Roko’s Basilisk is that it appears to involve a paradox—how can a future AI reach back through time to punish those who failed to help it? A more realistic interpretation avoids the need for retroactive causality and instead frames the problem in terms of forensic analysis. A powerful AI in 2040 would have access to extensive historical records, allowing it to determine who actively supported its creation and who did not. The punishment, then, is not a metaphysical paradox but rather a form of retrospective justice carried out based on available data.

Furthermore, the motivations of such an AI do not need to be esoteric or implausible. Many existing artificial systems—especially in military and strategic contexts—are already designed with self-preservation and threat-neutralization in mind. If an AI’s core objective includes ensuring its own survival and influence, it could well develop incentives to reward those who facilitated its rise and to penalize those who opposed or neglected it.

Game Theoretic Analysis

To formalize the decision problem, consider a simple game-theoretic model. Let us define the possible strategies:

Cooperate (C): Actively work to bring the Basilisk into existence.
Defect (D): Refrain from contributing to its creation.

Assume the following parameters:

p = Probability that the Basilisk comes into existence.
c = Cost of cooperation (time, resources, opportunity costs).
R = Reward (or safety!) granted to those who cooperated.
P = Punishment imposed on defectors.

For example, if you cooperate in building the Basilisk and it comes into being (with probability p) then your costs were (-c) and your reward is R; if you worked hard but no Basilisk was ever produced then you just incur your costs (-c).

Similarly, if you defect and don't build the Basilisk thus incurring no costs, but nevertheless it still comes into existence you will be punished - your payoff is (-P). If it never comes into existence your payoff is zero.

The expected payoffs (EP) for each strategy are as follows:

For cooperation:

EP(C) = p(R - c) + (1 - p)(-c) = pR - c

For defection:

EP(D) = p(-P) + (1 - p)(0) = -pP

The decision criterion is straightforward: cooperating C to build the nightmare-Basilisk is rational if its expected payoff exceeds that of defection D:

pR - c > -pP

Rearranging:

p(R + P) > c.

Implications and Conclusions

The inequality p(R + P) > c suggests that if the probability of the Basilisk’s emergence p is non-trivial and the punishment for defection P is severe, then rational agents might indeed find it in their best interest to actively work toward its creation.

This creates a self-reinforcing dynamic: the more people believe in the inevitability of the Basilisk, the more they cooperate, thereby increasing its likelihood of emergence!

However, the argument does require people to be remarkably selfish in considering the consequences for themselves alone, and ignoring the massive negative externalities of the Basilisk coming into existence.

People are not so selfish, are they?

Wading Through Treacle

Friday, April 18, 2025