An AI safety plan that might work: an international coalition governing AGI
Artificial general intelligence, or AGI, might be developed in the next few years.
If it does, it will radically change society. Hopefully for the better.
However, almost nobody has a reasonable governance plan to get here. I read over 50+ AI safety plans, and wasn't satisfied with any of them.
In this article, I explain why most plans fail, why we're not on track by default - and how to chart a better path.
tl;dr of the better path:
- 2025-2026: responsible AI companies, non-profits and defensive acceleration efforts inspire a race to the top on safety.
- 2026-2028: middle powers (UK, EU, EFTA, Canada, Australia, Japan) build leverage in AI, largely through building datacenters. Additionally, duty-based domestic regulation enforces a minimum standard for safety AI development.
- 2028-2029: the US nationalises AGI projects, forming a coalition with middle powers and countries key to the compute supply chain
- 2029-2030: other countries join this coalition, and a safety treaty is made with China. This prevents dangerous AGI development and results in broad benefits sharing.
- 2030+: the coalition governs AGI development into a beneficial future.
Most current plans are bad
Most plans focus on the distribution of AGI capabilities on a spectrum from "widely distributed" to "concentrated in a few hands." But there's no real sweet spot on this spectrum. Both extremes are bad, and everything in between is just a mix of these problems:
- If control over AGI is distributed,1
In practice this is very hard to achieve, and would require mass compute redistribution (which probably means wealth redistribution). This is because running AGI is likely to be very compute-intensive. So even if model weights are public, whoever has more compute will have most of the control. As compute ownership is so concentrated, we're basically back to the same few actors again.
any of these actors could build biological and chemical weapons, among many other threats that are hard to build defences for.2AGI could also be used to build defences. However, biological and chemical weapons are attacker-dominant because humans are inherently squishy and vulnerable. There are many other attacker-dominant risk areas too - as an intuition, a "good guy" with a nuclear bomb can't stop a "bad guy" with one. There are some areas are less clear: for example in cybersecurity it's plausible that AGI could make everyone much better at defense, meaning the overall long-term landscape might actually be better than today. However, it's hard to predict where it'll land, and it's likely that that path to get there might be dangerous.
With the number of unstable bad actors in the world, it's inevitable that if it's available to people someone will misuse it. - If control over AGI is concentrated, we could slide into a locked-in totalitarian dystopia. There are strong incentives and empirical precedents for this happening.
- In any scenario where we have multiple actors, we end up with race dynamics that encourage cutting corners on safety - exacerbating the above risks, and making the chance we lose control to AI systems greater.
The default outcome is bad
By default, we could be heading towards that locked-in totalitarian dystopia - if humans don't lose control of the technology completely.
The most likely path right now looks like: US companies3All the major AI labs (Anthropic, Google DeepMind, OpenAI) are US-based, and the US has the most compute infrastructure. Even most of the subfrontier labs are US-based (xAI, Meta, Microsoft). There are also some Chinese subfrontier labs (Alibaba, DeepSeek, etc.), which seem to be getting more capable with time - although it's unclear how much they are coasting off the US's success.
continue to develop better AI systems4I expect many of these increased capabilities will be misused by individuals, terror groups and corporations to cause small-to-moderate scale harm. There will be some political fallout, but sadly nothing that fundamentally changes the trajectory of AI development. A lot of discussion during this period will miss the bigger picture, and result in squabbles over short-term fixes and lots of identity politics.
. As they start to get closer to AGI and eat more of the economy, the government is forced to nationalize the projects - the political and national security implications simply being too great to leave in the hands of private companies.Why is US hegemony concerning? There are really three things propping up US democracy:
- Citizens have economic power and value - so it's worth keeping them happy, healthy and educated.
- Citizens have political and (some) physical power - they can protest and riot.
- Strong democratic institutions enforce checks and balances on power, and keep the system stable.
However, AGI doing all remote jobs (and building robots to do more manual labour) could wipe out the first pillar, or weaken it significantly. And government with AGI could eliminate most dissent through a combination of surveillance, superpersuasion, and outright force. Lastly, the US is dismantling democratic safeguards at an alarming rate: the majority of both democrats and republicans believe there is "a serious threat to the future of our democracy" (although likely for different reasons), and experts say it's in the worst state it's been in for a while and trending downwards.
An AGI-powered totalitarian US government is a very scary prospect. This could also end up durably locking in this state, because unlike a human dictator, an AGI dictator would not naturally die and is much less vulnerable to physical attacks.
A better timeline
Here's a feasible plan for a better future. It doesn't require anyone to act unrealistically against their incentives, but it does require foresight and cooperation.
The years are very rough estimates, but should give a sense of expected urgency and pacing.
2025-2026: Independent safety efforts
Companies like Anthropic race to the top on safety, alongside non-profits and research institutions working on safety. This phase:
- demonstrates that you can maintain your position on the frontier of AI while prioritizing safety; and
- lays a lot of the groundwork for future safety research.
I focus on loss of control during this period because on the other risks I worry most about:
- Catastrophic misuse: I've come to believe that this is more tractable than I previously thought, AND there are already very strong incentives to avoid this. I also think as models get more capable of misuse, we might actually shift to more internal deployments and "end products" (e.g. rather than selling API access to a model capable of researching viruses, just use the AI internally and sell some vaccine directly) which massively reduces this risk.
- Totalitarian lock-in / AI-enabled oligarchies: I think it's very hard to put great technical controls in at this time (as a competent government would remove them), and the best governance controls are the ones described in this document. That said, I think theoretical and empirical research into this and other governance mechanisms to prevent totalitarian lock-in is one of the highest value things to work on. Although I think it'd be premature to try to implement them (for non-learning purposes) at this time.
In addition, startups emerge that focus on defensive accelerationism: building products that accelerate defensive capabilities against potential AGI threats. For example, companies that build new vaccine platforms (biosecurity), AI-enabled security tools (cybersecurity), or tooling for AI safety research.
2026-2028: Leverage building and domestic governance
Stable democratic middle powers (like the UK, EU and EFTA members, Canada, Australia, Japan) start working to build leverage to stay relevant later. The most promising way to do this looks like building datacenters able to process top secret information. Other incentives like attracting AI talent6By default governments seem incredibly bad at identifying and attracting AI talent. They seem to often overindex on academics, or business people that claim to be techy. I'm sure there are better analyses of this elsewhere so won't go into detail here.
Additionally, talent is likely to be a less durable source of leverage as AGI gets better at doing more jobs.
This makes me think datacenters are where most efforts should be focused.
, perhaps building trusted institutions (e.g. UK AISI), and maintaining existing alliances with the US are useful, but likely insufficient on their own. Alongside this, duty-based domestic regulation7I do at some point need to write a full post on duty-based regulation, but the core idea is that you require companies to follow some statements like "take reasonable steps to ensure your AI systems are safe", and "be transparent, cooperative and forthcoming with regulators". Additionally governments might want to pair this with the ability to create secondary legislation that adapts the duties, as well as require companies to have due regard to guidance when following the duties.
The FCA is an example of a regulator that makes use of duty-based regulation. Some examples of duties they impose (from here):
- "A firm must conduct its business with integrity."
- "A firm must take reasonable care to organise and control its affairs responsibly and effectively, with adequate risk management systems."
- "A firm must deal with its regulators in an open and cooperative way, and must disclose to the FCA appropriately anything relating to the firm of which that regulator would reasonably expect notice.
This is opposed to prescriptive regulation (e.g. "you must do X, Y and Z"), which is often too brittle for a fast-moving field. This risks:
- missing things that are unsafe; and
- banning things that are safe (unnecessarily slowing development in democratic countries relative to others, which is probably net bad).
It's also different to outcome-based regulation (e.g. "your AI systems must not cause catastrophic harm"), which can also be fragile or intervene too late (e.g. only enforceable after a catastrophe, which isn't great).
both sets a minimum bar for safety, while maintaining an incentive gradient towards ever safer systems. Guidance and enforcement from a competent regulator is crucial here, and there are many concrete interventions that could be implemented. Both of these might be a bit messy in practice, as I expect political lobbying for many things to be intense.8In particular I expect there to be a lot of discussion about:
- job losses
- copyright
- environmental impacts
- economic competitiveness, particularly in the US
- child safety, particularly in the UK
- privacy, particularly in the EU Many of these issues will become even more politically charged and end up as identity or tribal politics. While some of these issues are important, they may paralyze other urgent and even more important actions.
I also expect a lot of general rent-seeking behaviour to emerge, e.g. lobbying by large corporations to entrench their position as they start to see what's coming.
2028-2029: Nationalisation and forming a democratic coalition
As in the default plan above, US frontier AI companies get closer to AGI. It becomes untenable that this is done in private companies, and the US government nationalises AI development (or at least 'soft nationalises' development, e.g. through managerial, operational or financial control).
The regulator in the previous stage has acquired a lot of competent technical talent that properly understands AI and has internalised the safety mindset. Additionally, this safety culture and the race to the top dynamic have permeated into the AI companies. This all feeds into strong safety foundations in the nationalised project.
At the same time, or soon after this happens, middle power countries reach out with a deal: use their leverage to gain governance rights over AGI. In practice this will likely focus on access to datacenters that can process top secret information. I expect the US to find this annoying, but ultimately worth it to avoid the risk that China or another authoritarian country gets there first. Countries key to the compute supply chain all get involved, e.g. Taiwan, the Netherlands and South Korea.
Additionally, you also want global efforts to avoid a race to the bottom. This buys more time for technical safety work and governance, especially as international governance moves slowly.
2029-2030: Global tiered access
Over time, you expand the coalition of countries involved in this international AGI project. Countries joining get some benefits, in exchange for not developing their own AGI systems and restricting AGI development and misuse by actors in their jurisdiction. This is to both to share the benefits of AGI, give others a say in AGI governance, and reduce the risk that other countries recklessly try to build their own AGI.
There are both carrots and sticks that encourage membership:
- Carrots: access to the benefits of AGI. More trusted countries (or ones with more leverage) might get more access, with tiers like:
- Full access to inference (running the AGI), governance rights, and perhaps own some of the weights9
I don't think it's super promising that we can split up a model like this, but I think things in this direction are potentially worth exploring. The general shape of solution is one that prevents any one member of the pact from defecting and having a capable AGI just to themselves. Kinda like Shamir's secret sharing but for models.
I imagine this might intuitively look like a 'mixture of experts' style model, where different countries own different experts - and collaboration between experts is necessary to run a functioning economy. But this is very speculative, and has big holes like "what if one expert is the AGI training expert"?
Possibly we get this for free if compute is distributed enough, and we expect the coalition having the majority of the compute is enough to maintain control (e.g. analogous to the reasoning behind a 51% attack in crypto). But the US does seem likely to have the majority of the compute for a while.
- Filtered/monitored access to inference - effectively AI labour. While many people today talk about the scientific breakthroughs AI could bring, one of the most valuable benefits is actually the "boring" uplift to society from cheap, plentiful and skilled labour. This is because it's much more general purpose, and many problems in both developing and developed nations are challenges that labour could solve. Doctors, engineers, teachers, and scientists are the obvious professions - although overlooked boring-sounding jobs like "product compliance lawyer" or "data analyst" may be just as valuable, if not more. (e.g. product compliance lawyers eliminating lead paint in developing nations, or data analysts optimizing malaria net distribution in sub-Saharan Africa).
- Downstream benefits, like being able to interface with the companies AGI runs, buy the new drugs it discovers/new software it builds, etc.
- Economic dividends from growth caused by AGI.
- Full access to inference (running the AGI), governance rights, and perhaps own some of the weights9
- Sticks: diplomatic shunning, sanctions, cyberattacks and in the extreme case military response. This might be in response to independent sovereign AI projects, reckless AGI development, or using AGI against citizens.
Additionally, these incentives snowball as the coalition gets larger. Do you really think your one country can catch up and compete with the 30-country-strong coalition, which already includes the top AI powers? Especially with the threat that they'll use their collective economic and military power to stop you if you try...
China is the one country that might be able to lead a competing coalition. But even China might be up for negotiating a deal. China often focuses internally, so it's possible there's a world where their foreign policy goals don't directly conflict with US interests. And at the very least, narrowing it down to two parties makes signing a safety treaty much easier. This treaty could focus on avoiding humanity losing control of AGI completely: a terrible outcome for everyone.
In this phase, the global coalition has bought a lot of headroom - both in monetary and compute resources, and in time pressure, to ensure the necessary safety work can be done. It might also be able to bring about a pause or slowdown in global AI development to buy more time, should that be needed.
Additionally, we're now set up so that AGI is controlled by this broad coalition with stable democratic countries as core members, that is already sharing the benefits of AGI. This makes it much less likely that we slide into a totalitarian dystopia, and much more likely that AGI is used as a global public good.
2030+: AGI handover, and onto ASI?
It begins to get harder and harder to predict what happens here. But if we get to this point, I think we're in a much better position than most worlds we seem to be heading for. The below will blur a bit with the previous sections, but hopefully gives a sense of the trajectory.
As capabilities improve, we're able to trust more and more useful tasks to AI. This includes a fair bit of safety work we need to do as well, building the automated alignment researcher. We might also find that AI is able to help us govern itself better, for example by helping us design better institutions, or helping us make better decisions.
On the dangerous side, we might find that AI is accelerating AI development through recursive self-improvement. This means we might get to potentially very dangerous systems like superintelligence before the corresponding safety work is in place. Luckily, our robust democratic coalition uses the breathing room they've generated to slow capabilities development, and invest more in safety, so this outcome doesn't happen.
Eventually we get to the point where we can ask the AGI (or maybe superintelligence at that point) "what should we do to get the best outcome". It's largely able to figure that out for us. We actually do ask this question, because our coalition of stable democratic countries has resisted sliding into totalitarianism. We've also done the homework before we do this of course, making sure that our system is actually aligned, we don't fall into traps like gradual loss of control.
Our AI system answers with the answer "42", then says it's just kidding and gives us a real plan for making things go well. We follow the plan, nail all the UN SDGs, and I become a mango farmer somewhere warm and coastal.
Footnotes
-
In practice this is very hard to achieve, and would require mass compute redistribution (which probably means wealth redistribution). This is because running AGI is likely to be very compute-intensive. So even if model weights are public, whoever has more compute will have most of the control. As compute ownership is so concentrated, we're basically back to the same few actors again. ↩
-
AGI could also be used to build defences. However, biological and chemical weapons are attacker-dominant because humans are inherently squishy and vulnerable. There are many other attacker-dominant risk areas too - as an intuition, a "good guy" with a nuclear bomb can't stop a "bad guy" with one. There are some areas are less clear: for example in cybersecurity it's plausible that AGI could make everyone much better at defense, meaning the overall long-term landscape might actually be better than today. However, it's hard to predict where it'll land, and it's likely that that path to get there might be dangerous. ↩
-
All the major AI labs (Anthropic, Google DeepMind, OpenAI) are US-based, and the US has the most compute infrastructure. Even most of the subfrontier labs are US-based (xAI, Meta, Microsoft). There are also some Chinese subfrontier labs (Alibaba, DeepSeek, etc.), which seem to be getting more capable with time - although it's unclear how much they are coasting off the US's success. ↩
-
I expect many of these increased capabilities will be misused by individuals, terror groups and corporations to cause small-to-moderate scale harm. There will be some political fallout, but sadly nothing that fundamentally changes the trajectory of AI development. A lot of discussion during this period will miss the bigger picture, and result in squabbles over short-term fixes and lots of identity politics. ↩
-
I focus on loss of control during this period because on the other risks I worry most about:
- Catastrophic misuse: I've come to believe that this is more tractable than I previously thought, AND there are already very strong incentives to avoid this. I also think as models get more capable of misuse, we might actually shift to more internal deployments and "end products" (e.g. rather than selling API access to a model capable of researching viruses, just use the AI internally and sell some vaccine directly) which massively reduces this risk.
- Totalitarian lock-in / AI-enabled oligarchies: I think it's very hard to put great technical controls in at this time (as a competent government would remove them), and the best governance controls are the ones described in this document. That said, I think theoretical and empirical research into this and other governance mechanisms to prevent totalitarian lock-in is one of the highest value things to work on. Although I think it'd be premature to try to implement them (for non-learning purposes) at this time.
-
By default governments seem incredibly bad at identifying and attracting AI talent. They seem to often overindex on academics, or business people that claim to be techy. I'm sure there are better analyses of this elsewhere so won't go into detail here.
Additionally, talent is likely to be a less durable source of leverage as AGI gets better at doing more jobs.
This makes me think datacenters are where most efforts should be focused. ↩
-
I do at some point need to write a full post on duty-based regulation, but the core idea is that you require companies to follow some statements like "take reasonable steps to ensure your AI systems are safe", and "be transparent, cooperative and forthcoming with regulators". Additionally governments might want to pair this with the ability to create secondary legislation that adapts the duties, as well as require companies to have due regard to guidance when following the duties.
The FCA is an example of a regulator that makes use of duty-based regulation. Some examples of duties they impose (from here):
- "A firm must conduct its business with integrity."
- "A firm must take reasonable care to organise and control its affairs responsibly and effectively, with adequate risk management systems."
- "A firm must deal with its regulators in an open and cooperative way, and must disclose to the FCA appropriately anything relating to the firm of which that regulator would reasonably expect notice.
This is opposed to prescriptive regulation (e.g. "you must do X, Y and Z"), which is often too brittle for a fast-moving field. This risks:
- missing things that are unsafe; and
- banning things that are safe (unnecessarily slowing development in democratic countries relative to others, which is probably net bad).
It's also different to outcome-based regulation (e.g. "your AI systems must not cause catastrophic harm"), which can also be fragile or intervene too late (e.g. only enforceable after a catastrophe, which isn't great). ↩
-
In particular I expect there to be a lot of discussion about:
- job losses
- copyright
- environmental impacts
- economic competitiveness, particularly in the US
- child safety, particularly in the UK
- privacy, particularly in the EU Many of these issues will become even more politically charged and end up as identity or tribal politics. While some of these issues are important, they may paralyze other urgent and even more important actions.
I also expect a lot of general rent-seeking behaviour to emerge, e.g. lobbying by large corporations to entrench their position as they start to see what's coming. ↩
-
I don't think it's super promising that we can split up a model like this, but I think things in this direction are potentially worth exploring. The general shape of solution is one that prevents any one member of the pact from defecting and having a capable AGI just to themselves. Kinda like Shamir's secret sharing but for models.
I imagine this might intuitively look like a 'mixture of experts' style model, where different countries own different experts - and collaboration between experts is necessary to run a functioning economy. But this is very speculative, and has big holes like "what if one expert is the AGI training expert"?
Possibly we get this for free if compute is distributed enough, and we expect the coalition having the majority of the compute is enough to maintain control (e.g. analogous to the reasoning behind a 51% attack in crypto). But the US does seem likely to have the majority of the compute for a while. ↩