Alignment Is NOT All You Need
AI risk discussions often focus on malfunctions, misuse, and misalignment. But this often misses other key challenges from advanced AI systems:
- Coordination: Race dynamics may encourage unsafe AI deployment, even from ‘safe’ actors.
- Power: First-movers with advanced AI could gain permanent military, economic, and/or political dominance.
- Economics: When AI generates all wealth, humans have no leverage to ensure they are treated well.
These are all huge hurdles, and need solutions before advanced AI arrives.
Preamble: advanced AI
This article assumes we might develop human-level AI in the next few years. If you don’t agree with this assumption, this article probably isn’t for you.1For pieces that explore this assumption see:
- A previous article that briefly explains how scaling up the compute and data we use to train AI systems might get us there.
- Part 1 of Leopold Aschenbrenner’s Situational Awareness series, which explores past and future advances in AI in much more detail than my piece.
- Arjun Ramani and Zhengdong Wang’s excellent summary of arguments for why transformative AI might be difficult to achieve, as a counter to the above two pieces.
Also for what it’s worth, that we might have human-level AI in the next few years is the position of even many AI safety skeptics. For example, Yann Lecun thinks humanlike or perhaps superhuman intelligence “may not be decades but it’s several years” away.
I’ll call this advanced AI to distinguish it from today’s AI systems. I’m imagining it as more competent versions of current AI systems2For example, a model that can do all of the following:
- use standard computer interfaces, similar to Claude’s Computer Use or AI Digest’s AI Agent Demo, possibly trained with lots of reinforcement learning to get good at achieving computer tasks
- call tools to operate faster than a computer interface would allow them to, similar to Anthropic’s model context protocol integrations
- reason clearly and effectively in a wide range of domains, perhaps using reinforcement learning on reasoning chains, similar to OpenAI’s o3 model
- carry out job tasks end-to-end, trained on demonstration data and feedback from millions of experts, similar to what companies like Outlier are collecting
I think this is a fairly safe assumption, and actually think future AI systems might look a lot weirder than we can imagine right now (because we’ll innovate and develop newer weirder things). But this is enough for the rest of the article to hold.
that can do what most remote workers can. This AI would be superhuman across many domains, and human-level at almost all economically relevant domains.Common AI risk thinking
Risks from advanced AI systems are often categorised into the holy trinity of ‘ways this could all go terribly wrong’:
- Malfunctions: We already see this with existing AI systems making discriminatory decisions in hiring, criminal justice, and other domains. These could lead to global catastrophes in future, for example in military decision-making contexts.
- Misuse: As AI systems become more capable, they could be used by bad actors such as terrorist groups, criminal gangs, and disturbed individuals. AI could enable sophisticated cyberattacks, bioterrorism, or drone warfare.
- Loss of control (or “misalignment”): AI systems not trying to do what we want them to. As Paul Christiano describes in "What Failure Looks Like," this can happen both gradually and quickly as systems become more capable and harder to oversee.
If you’ve been in AI safety circles for a while, you’ll probably have nodded along the above - isn’t that the obvious way to split up the space? You might also think the corresponding responses are:
- Malfunctions: These will largely resolve themselves as we get more competent AI systems, and we have many existing tools to tackle these risks. We need to be cautious deploying AI systems while they make these mistakes, but they’re unlikely to lead to a global catastrophe outside specific contexts.
- Misuse: We can tackle most of these threats with existing interventions (e.g. how we stop bioterrorists today), and make society more resilient to a lot of the other threats (e.g. improving cybersecurity of critical infrastructure). AI systems can help us with this too. Alignment might also help here, if the most capable models have non-removable safeguards that refuse harmful queries.
- Misalignment: Oh boy. This is tough - people have been hacking away at this for years and we’re not really sure how to crack it.3
In reality, there is huge divergence as to how hard people actually think this will be. Some people think it’s near impossible, some think it’s doable but people are working in the wrong places, and others think it’s easy. In general, people who have been thinking about it for a while conclude that it’s pretty difficult. (If you think it’s easy, please do share a working proof/demo of your solution! This would save a lot of people a lot of work.)
We might need to solve fundamental problems in machine learning, decision theory, and philosophy: and fast.
Part of this is that it’s awkward for actors such as AI companies or governments to write about risks where they are the ‘baddies’. Because they have managed to set the narrative a lot of the time, this might not have been explored as much.
That said, there are some examples of AI companies acknowledging this, such as Sam Altman back in 2022 (although there is relatively little public research done by AI companies on this, and since this interview where Sam claimed the board can fire him, it did try to fire him but he came back two weeks later).
(The concerns raised in this article are not new, but I haven’t seen them written down succinctly together.)
1. The Coordination Problem
First-mover advantage creates intense pressure to rush to deploy advanced AI. This might mean even if we have a solution to the alignment problem, it doesn’t get implemented properly. And even responsible actors choosing to slow down for safety reasons risk ceding advantage to less careful competitors.
Global coordination might help resolve this (national regulations being insufficient given frontier AI models are being developed in several countries already). But global coordination is usually slow, and difficult to agree on particularly where there are great benefits to defectors and limited enforcement mechanisms. AI is developing fast, and while compute governance schemes offer some hope for enforcement there has been little practical action here.
Holden Karnofsky’s piece “Racing through a minefield” comes to mind for more on this.
2. The Power Distribution Problem
Okay. So we’ve solved malfunctions, prevented common misuse, solved the alignment problem and magically got global coordination to only deploy intent-aligned AI systems. All in a day's work, right?
Unfortunately, we’re still not safe.
Think about what advanced AI means: systems that can innovate, research, and work better than humans across most domains. Whoever controls these systems essentially controls the world's productive capacity. This is different from previous technological revolutions - the industrial revolution’s machines amplified human output, but advanced AI might fully replace them.5Some colleagues swear by the horse analogy from Humans Need Not Apply giving them a good intuition here.
This creates several problems, all pointing towards an AI-enabled oligarchy:
- Military dominance: The first actor with advanced AI could rapidly develop overwhelmingly superior weapons and defensive systems.
- Economic dominance: AI-powered economies could outcompete all others, concentrating wealth and power to an unprecedented degree.
- Political dominance: With intellectual (and likely military and economic) superiority, AI-controlling entities could set global policy.
The first actors to get advanced AI, 2027 (colorized)
Traditional regulatory approaches seem insufficient here. How do you enforce regulations against an actor with overwhelming technological superiority?
A first thought might be to make sure everyone gets access to advanced AI (à la Yann Lecun). However, this is hard to enforce in practice: as it still depends on the first actor being nice enough to share it this way. If model weights are released openly like Meta’s Llama models, it’s also unlikely to result in fairness: it just means dominance by whoever has the most compute rather than whoever developed the model. (Not to mention bringing back our common misuse concerns from earlier).
3. The Economic Transition Problem
Let’s say we’re in a lucky world - where the actor developing AI chooses not to dominate all others. It’s still unclear how we get to a world where humans have any economic power if all the jobs are automated by advanced AI.
The same thing keeps coming up in all my discussions about this…
However, “universal basic income” with no further details isn’t the answer. In particular, most UBI proposals lack discussion of:
- The intelligence curse: Countries where most wealth comes from resources rather than human productivity tend to develop poor institutions and high inequality (the resource curse). What happens when AI makes the whole world like this? Is there any real incentive to continue a UBI scheme when the population offers no value in return? Rudolf Laine’s recent article “Capital, AGI, and human ambition” explores this further, as will an upcoming piece by my colleague Luke Drago.
- International distribution: Even if nations home to AI companies implement UBI, what about other countries? To try to convince the US to share huge amounts of wealth with Russia and China seems difficult.
Common counter arguments
Just use AI to solve these problems
Before we have highly-capable AI systems, it may not be good enough to solve our problems. And these problems arise when we have highly-capable AI systems.
The market will solve it
If the market is efficient, it’s likely to make things worse. It’ll accelerate the deployment of AI systems to replace humans, as well as the accumulation of power to a few actors before governments can react.
Humans always adapt / previous technology has created new jobs
Previous technologies have created some new jobs, and freed people up to work on challenges that previously nobody was working on. But with AI, those new jobs might themselves be taken up by AI, and we may run out of problems to solve: making humans economically irrelevant.6Some authors argue that humans might still have a comparative advantage in a world with AI, although I disagree with this - largely for reasoning discussed by ‘Matt’ in the comments of that article.
This seems a much more challenging constraint to adapt to. Additionally, new technologies have tended to roll out much more slowly - the industrial revolution spanning about 60 years, rather than perhaps 3 years for TAI. There’s no rule that says we’ll make it.We'll all get income from being artists and poets
AI art is already edging out humans both in competitions and in the market for everyday art. Sure, we might see premium markets for "AI-free" art or "authentic human experiences" - like we see markets for handmade crafts today. But this is likely to be a tiny economic niche. How many people today buy hand-forged tools versus machine-made ones? How many artisanal weavers can make a living today? These markets exist but can't support more than a tiny fraction of the population. (And no, it’s not just that people don’t have enough wealth and AI-created wealth would create demand: try to find a billionaire who buys a ‘hand-made’ phone).
We’ll all get income from being prompt engineers or AI trainers
This is temporary at best - advanced AI systems will likely be able to write better prompts and train themselves more effectively than humans can. Prompt engineering seems particularly vulnerable: can you imagine something better suited to automating with AI? The whole job is generating text towards some goal where you can test and get feedback on lots of different variations quickly, often by using fairly standard and well-documented techniques.
We’ll all get income from doing manual labour
Robotics research is already advancing rapidly. Being able to spin up millions of robotics engineers (with perfect coordination, and expert knowledge) could mean shortly after we have advanced AI we get advanced robotics. Even for ‘manual’ jobs like construction work, success requires significant cognitive skills: planning, adaptation, and complex decision-making. AI could handle these cognitive aspects, reducing specialized jobs to simpler physical tasks that could be done by anyone. This means even if manual jobs remain temporarily, wages would crash as the entire displaced workforce competed for them.
Conclusion
These challenges - coordination, power distribution, and economic transition - exist independently of the alignment problem.7Sorry for the bad news, but this still misses many other advanced AI issues. These include:
- Figuring out human purpose after AI can do everything better than humans.
- Solving moral philosophy. We’ve looked at some of the ethical basics (e.g. assuming people not starving = good). However, if we’re making heavy use of advanced AI in the economy and society, it’ll need to make more nuanced value judgments. This might mean having to figure out a lot of moral philosophy, in not very much time. And if objective moral facts don’t exist this becomes a very sticky problem - whose ethics should we be accepting? Do we have person-affecting views or not? (I think this affects what society should be doing a lot).
- Considering whether advanced AI systems carry any moral weight, and how to treat them if they do (AI welfare). Understanding what makes things have subjective conscious experience is hard, so hard in fact they called it ‘the hard problem of consciousness’ (no, I’m not making this up).
- Preventing agential s-risks, particularly stemming from AI systems with conflicting goals. I won’t get into details here, but the linked article gives a good introduction.
- Figuring out how to co-exist with digital people, if technology enabling this converges with AI systems or AI welfare. I think this is more speculative than a lot of the other problems: it might be that digital people just don’t happen until after advanced AI, or don’t happen at all.
- [Almost certainly many other things that I can’t list off the top of my mind right now. If you’ve got to the bottom of this footnote, you’re likely curious enough to go and find them yourself!]
We need to find solutions to these challenges, ideally before we're in crisis mode (and battling an adversary that might have 1000x the intellectual resources of everyone else).
P.S. At BlueDot Impact, we're working on developing a field strategy to address these kinds of problems. If you're interested in helping us, we're hiring an AI Safety Strategist or would be happy to explore other ways to collaborate.
Acknowledgments
Many thanks to Rudolf Laine, Luke Drago, Dewi Erwan, Will Saunter, and Bilal Chughtai for insightful conversations that made many of these ideas much more crisp.
If you enjoyed this article, I think you might enjoy Rudolf’s “By default, capital will matter more than ever after AGI” which explores parts of the power distribution and economic transition problems in more detail.
Footnotes
-
For pieces that explore this assumption see:
- A previous article that briefly explains how scaling up the compute and data we use to train AI systems might get us there.
- Part 1 of Leopold Aschenbrenner’s Situational Awareness series, which explores past and future advances in AI in much more detail than my piece.
- Arjun Ramani and Zhengdong Wang’s excellent summary of arguments for why transformative AI might be difficult to achieve, as a counter to the above two pieces.
Also for what it’s worth, that we might have human-level AI in the next few years is the position of even many AI safety skeptics. For example, Yann Lecun thinks humanlike or perhaps superhuman intelligence “may not be decades but it’s several years” away. ↩
-
For example, a model that can do all of the following:
- use standard computer interfaces, similar to Claude’s Computer Use or AI Digest’s AI Agent Demo, possibly trained with lots of reinforcement learning to get good at achieving computer tasks
- call tools to operate faster than a computer interface would allow them to, similar to Anthropic’s model context protocol integrations
- reason clearly and effectively in a wide range of domains, perhaps using reinforcement learning on reasoning chains, similar to OpenAI’s o3 model
- carry out job tasks end-to-end, trained on demonstration data and feedback from millions of experts, similar to what companies like Outlier are collecting
I think this is a fairly safe assumption, and actually think future AI systems might look a lot weirder than we can imagine right now (because we’ll innovate and develop newer weirder things). But this is enough for the rest of the article to hold. ↩
-
In reality, there is huge divergence as to how hard people actually think this will be. Some people think it’s near impossible, some think it’s doable but people are working in the wrong places, and others think it’s easy. In general, people who have been thinking about it for a while conclude that it’s pretty difficult. (If you think it’s easy, please do share a working proof/demo of your solution! This would save a lot of people a lot of work.) ↩
-
Part of this is that it’s awkward for actors such as AI companies or governments to write about risks where they are the ‘baddies’. Because they have managed to set the narrative a lot of the time, this might not have been explored as much.
That said, there are some examples of AI companies acknowledging this, such as Sam Altman back in 2022 (although there is relatively little public research done by AI companies on this, and since this interview where Sam claimed the board can fire him, it did try to fire him but he came back two weeks later). ↩
-
Some colleagues swear by the horse analogy from Humans Need Not Apply giving them a good intuition here. ↩
-
Some authors argue that humans might still have a comparative advantage in a world with AI, although I disagree with this - largely for reasoning discussed by ‘Matt’ in the comments of that article. ↩
-
Sorry for the bad news, but this still misses many other advanced AI issues. These include:
- Figuring out human purpose after AI can do everything better than humans.
- Solving moral philosophy. We’ve looked at some of the ethical basics (e.g. assuming people not starving = good). However, if we’re making heavy use of advanced AI in the economy and society, it’ll need to make more nuanced value judgments. This might mean having to figure out a lot of moral philosophy, in not very much time. And if objective moral facts don’t exist this becomes a very sticky problem - whose ethics should we be accepting? Do we have person-affecting views or not? (I think this affects what society should be doing a lot).
- Considering whether advanced AI systems carry any moral weight, and how to treat them if they do (AI welfare). Understanding what makes things have subjective conscious experience is hard, so hard in fact they called it ‘the hard problem of consciousness’ (no, I’m not making this up).
- Preventing agential s-risks, particularly stemming from AI systems with conflicting goals. I won’t get into details here, but the linked article gives a good introduction.
- Figuring out how to co-exist with digital people, if technology enabling this converges with AI systems or AI welfare. I think this is more speculative than a lot of the other problems: it might be that digital people just don’t happen until after advanced AI, or don’t happen at all.
- [Almost certainly many other things that I can’t list off the top of my mind right now. If you’ve got to the bottom of this footnote, you’re likely curious enough to go and find them yourself!]