Interesting, I largely agree with your conclusions (if not exactly the arguments that got you there).
- current AI models for the most part have no real agency and so no SIILTBness
- the imaginary hyperintelligent AGIs have agency, but their imagined SIILTBness is a projection of a limited set of human capacities – they have relentless goal-following, but not no empathy (the inborn tendency of humans to partly share each others goals). Basically the model for AGIs are monsters like the xenomorphs from Alien, which in turn are representations of relentless amoral capitalism.
- Alignment research is basically saying, hey, we created these monsters, can we now turn them into housepets so they don't kill us, which is pretty hilarious.
- Part of the fascination with monsters is their partial humanness, or partial SIILTBness in your terms. They are all sort of, "what it is to be an agent with some human traits, magnified and stripped of their compensating tendencies".
- just because AGIs are obviously monsters (that is, projections of humanities fears of aspects of itself) doesn't mean they can't be real dangers as well.
Crossposted from LessWrong for another user who doesn't have a paid subscription to your substack:
My impression from Section 10 is that you think that, if future researchers train embodied AIs with robot bodies, then we CAN wind up with powerful AIs that can do the kinds of things that humans can do, like understand what’s going on, creatively solve problems, take initiative, get stuff done, make plans, pivot when the plans fail, invent new technology, etc. Is that correct?
If so, do you think that (A) nobody will ever make AI that way, (B) this type of AI definitely won’t want to crush humanity, (C) this type of AI definitely wouldn’t be able to crush humanity even if it wanted to? (It can be more than one of the above. Or something else?)
(I disagree with all three, briefly because, respectively, (A) “never” is a very long time, (B) we haven’t solved The Alignment Problem, and (C) we will eventually be able to make AIs that can run essentially the same algorithms as run adult John von Neumann’s brain, but 100× faster, and with the ability to instantly self-replicate, self-replicate, and there can eventually be billions of different AIs of this sort with different skills and experiences, etc. etc.)"
I think even my final section 10 plausible pathway is rife with massively unlikely developments, but as you say, "forever" is a long time. My position, to a zeroth-order approximation, is that any AI that's capable of full SIILTBness (via being situated and embodied in a world-like experience training stream) would automatically also have properties that render the so-called "alignment problem" within the realm of normal engineering risk management, and not special. It would be closer to "aligning" with sharks or Sars-cov-2 at best. I don't think the prospect of significantly higher computational power makes a difference, because the difficulties all lie in navigating what I call worldlike experiences. The hyperanthropomorphism crowd seems to think that computational power by itself is a meta knob that can scale agency (across all SIILTB dimensions) arbitrarily relative to humans, ie construct beings that are as far beyond us as we are beyond viruses (not that we're doing great against viruses). I think this is fundamentally wrong, and over-indexing on the power of computation to navigate the worldlike situated experiences. Computation is a powerful force, but reality is more powerful. Agency relative to the difficulty of reality is NOT a linear function of computational capability. Having 100x more computation than humans does NOT make an agent 100x more generally agent-y in the world. In fact, I think there is a saturation function and we're already close to it, which is partly why perhaps biology hasn't bothered trying to evolve smarter beings than us. The key weakness in the fear scenario is revealed by the insistence on "instant self-replication" -- reality isn't exactly as friendly to that kind of runaway explosion as this crowd seems to think. This is my big problem with this whole approach. Scare phrases like "instant self-replication" do way too much work, and people pretend 2001-style supercomputing monoliths can go geometrically across the matter of the universe the way bacteria do in a petri dish. Yeah, no. I'll need more than that kind of hand-waving "warp drive" speculation.
That would be C. You seem to be largely failing to recognize the power even speed, single-mindedness and ability to coordinate with perfect replicas gives you for free, without going into added ability to plan and do long term thinking. You are underestimating the latter too, but this is a simpler thing to explain. The reason you or I cannot take over the U.S. government isn't really because we aren't "smart" enough, it's because we couldn't coordinate with other people to make it happen without getting arrested. An AI does not face our usual coordination problems because it can just fork().
The above scenario doesn't require smarter than human intelligence and only relies on an AI's ability to do things hackers do right now faster. If you could copy my brain, put me on a laptop, give me access to a company's AWS account I could accomplish this kind of destruction just by replicating myself.
I wrote this mostly as an exercise to explain to other LWers the kinds of catastrophic actions you could take even if you weren't capable of developing new technology like nanosystems, which *would* be something it could do. So you should treat "encrypting the hard disk of most computers" (which for the record would be the worst thing to happen in your lifetime) as a strong *lower* bound on the worst types of things that could happen if you just let an unaligned superintelligence out of the box tomorrow.
I think everything specific you're mentioning is well within the scope of the history of normal engineering risk management and requires no special handling or esoteric "alignment" theology or speculations about "AGIs". A natural extension of learning to deal with things like nuclear reactor meltdowns or toxic chemical leaks. Conversely, the fact that you and the LW crowd generally reduce my kind of position to a throwaway 1-liner tells me you're too trapped in your own mental models to entertain any other perspective on AI seriously. You guys think I don't get it, I think you don't get it. Which is fine. Really. Let a hundred philosophies of AI flourish.
I think debate across this philosophical divide is largely futile. The chasm is just too wide. But I also think the divide is not particularly important to bridge. You guys do you, other schools of AI will do the same. Let the verdict of history be the judge.
My point with this essay is not to engage in a debate across the divide (which would be about as meaningful as a creationist/evolutionist debate) but to establish a boundary condition for my own scope of interest.
> I think everything specific you're mentioning is well within the scope of the history of normal engineering risk management and requires no special handling
If that's your takeaway from what I wrote, which was intended to be an explicit lower bound and nevertheless implies a catastrophic loss of infrastructure, you're not really thinking clearly. When nuclear reactor meltdowns happen humanity survives, and humans often choose to do something differently the next time or maybe just decide to shut down nuclear power plants within that region entirely. We would not be able to "just" write up an incident report after a near total shutdown of compute. If that was the worst issue we faced we'd probably rather face a hundred simultaneous nuclear meltdowns. This is all the while AI labs in 3rd or 4th place like FAIR continue to remain skeptical of the possibility of nuclear reactor meltdowns and publicly state they're just going to try to pull ahead by building a plant without worrying about all that stuff first.
In other words, you can think of our current security measures as insufficient for three coexisting reasons:
1. Your analogy suggests there is special regulatory acknowledgement of the danger destructive AIs can pose like the danger poorly managed nuclear power plants can pose; there is none. Your government currently does not recognize that nuclear meltdowns are a thing that can happen and will not prevent any AI lab from starting a fire. The default scenario for the development of the above AI in our current research climate is that FAIR immediately releases it to the public like they do all of their research, and then some kid in china does what I described (or worse) with his coreweave subscription for no reason more particular than that he's fucking around. The problem is not preventing one state sponsored company within your state from releasing the worm-nuke (or worse), the problem is making sure *nobody with access to sufficient GPUs* does so. That is basically impossible on our own, and:
2. Failure is generally irrecoverable. As I mentioned, the above scenario is a very generous one where all the best AI systems manage to do is destroy a few trillion dollars worth of economic value and cause some political and social shakeups and thereby convince people that alignment is a problem. It is the example I use for people who seem pathologically skeptical of anything that sounds too "sci-fi"; a "July 4" alien invasion scenario, you could say, where you assume the aliens are only allowed to use things that look like already developed weapon systems. AI systems actually "capable" enough to do treacherous turns or invent new technology would at minimum be able to kill us through disease (read: nanosystems) or by mulliganing for robotics R&D in some field like agriculture. And we can't easily repurpose such systems to help patch those extinction exploits, because:
3. Preventing an AI from being misaligned technically in the limit of planning, agency, and world modeling abilities (which is what alignment researchers gloss over as intelligence) with modern ML solutions involves avoiding ten different failure scenarios that will exist by default the way we currently design such systems. If the first company to make AIs that are smart enough to get a job is one of the ones that recognizes the problem, they will be unlikely to have the lead time to redo necessary components of their training architecture without letting another company get the chance to turn on an AI that develops a botulinum-injecting Covid-19.
> Conversely, the fact that you and the LW crowd generally reduce my kind of position to a throwaway 1-liner tells me you're too trapped in your own mental models to entertain any other perspective on AI seriously
Your relevant objections can be reduced to one line without loss of precision because they're not actually that deep, and mostly dance around the empirical claims about AI systems that "alignment people" make to talk about how transformers won't lead to artificial general intelligence, or to condescendingly preach about how there's not a literal scale of intelligence from salamander to human, or to insult us. Your insistence that having a conversation with us would be analogous to a philosophy or theology debate, or that there's some sort of bizarre inferential gap, is indicative of the problem. AI safety researchers have a very specific technical challenge, which they would be happy to break down for you in arbitrary detail without using the world "intelligence" at all. Far from myself not "thinking you're getting it", reading your posts gives me the impression you haven't even spoken to an alignment researcher before and are operating on some strawman about what their actual concerns are.
> You guys do you, other schools of AI will do the same. Let the verdict of history be the judge.
I would prefer to do so, but the problem is that the other schools of AI are going to kill everyone. I won't be able to tell you I told you so because we'll all be dead.
"AI safety researchers have a very specific technical challenge, which they would be happy to break down for you in arbitrary detail without using the world "intelligence" at all."
I for one would be very interested in that in whatever format (paper, book). Of course I'm assuming that by "without using the word "intelligence"" you also mean without replacing it by an even bigger undefined abstraction.
> AI safety researchers have a very specific technical challenge, which they would be happy to break down for you in arbitrary detail without using the world "intelligence" at all
Would greatly appreciate pointers to reading material along these lines!
Thanks, this was a fascinating read (as always). There does appear to be something of a philosophical divide (between the views in this post and the concerns of the alignment crowd), but I’m not sure it is unsurmountable - or put another way, I think there would be considerable value in trying to close it a bit (for instance, to people who maybe aren’t as in the weeds on the philosophical or technical side, or people shaping public policy). Anyway, I’ve tried to pull out what I see as the main claims of this post, and contextualise them in the alignment conversation, to try to reduce (maybe my own) confusion. There is some overlap with the comments below, but hopefully is additive. Link here: https://www.lesswrong.com/posts/3hJCcdirKqPJeu6aW/responding-to-beyond-hyperanthropomorphism
Interesting, I largely agree with your conclusions (if not exactly the arguments that got you there).
- current AI models for the most part have no real agency and so no SIILTBness
- the imaginary hyperintelligent AGIs have agency, but their imagined SIILTBness is a projection of a limited set of human capacities – they have relentless goal-following, but not no empathy (the inborn tendency of humans to partly share each others goals). Basically the model for AGIs are monsters like the xenomorphs from Alien, which in turn are representations of relentless amoral capitalism.
- Alignment research is basically saying, hey, we created these monsters, can we now turn them into housepets so they don't kill us, which is pretty hilarious.
- Part of the fascination with monsters is their partial humanness, or partial SIILTBness in your terms. They are all sort of, "what it is to be an agent with some human traits, magnified and stripped of their compensating tendencies".
- just because AGIs are obviously monsters (that is, projections of humanities fears of aspects of itself) doesn't mean they can't be real dangers as well.
Crossposted from LessWrong for another user who doesn't have a paid subscription to your substack:
My impression from Section 10 is that you think that, if future researchers train embodied AIs with robot bodies, then we CAN wind up with powerful AIs that can do the kinds of things that humans can do, like understand what’s going on, creatively solve problems, take initiative, get stuff done, make plans, pivot when the plans fail, invent new technology, etc. Is that correct?
If so, do you think that (A) nobody will ever make AI that way, (B) this type of AI definitely won’t want to crush humanity, (C) this type of AI definitely wouldn’t be able to crush humanity even if it wanted to? (It can be more than one of the above. Or something else?)
(I disagree with all three, briefly because, respectively, (A) “never” is a very long time, (B) we haven’t solved The Alignment Problem, and (C) we will eventually be able to make AIs that can run essentially the same algorithms as run adult John von Neumann’s brain, but 100× faster, and with the ability to instantly self-replicate, self-replicate, and there can eventually be billions of different AIs of this sort with different skills and experiences, etc. etc.)"
See here for other responses: https://www.lesswrong.com/posts/Pd6LcQ7zA2yHj3zW3/beyond-hyperanthropomorphism
I think even my final section 10 plausible pathway is rife with massively unlikely developments, but as you say, "forever" is a long time. My position, to a zeroth-order approximation, is that any AI that's capable of full SIILTBness (via being situated and embodied in a world-like experience training stream) would automatically also have properties that render the so-called "alignment problem" within the realm of normal engineering risk management, and not special. It would be closer to "aligning" with sharks or Sars-cov-2 at best. I don't think the prospect of significantly higher computational power makes a difference, because the difficulties all lie in navigating what I call worldlike experiences. The hyperanthropomorphism crowd seems to think that computational power by itself is a meta knob that can scale agency (across all SIILTB dimensions) arbitrarily relative to humans, ie construct beings that are as far beyond us as we are beyond viruses (not that we're doing great against viruses). I think this is fundamentally wrong, and over-indexing on the power of computation to navigate the worldlike situated experiences. Computation is a powerful force, but reality is more powerful. Agency relative to the difficulty of reality is NOT a linear function of computational capability. Having 100x more computation than humans does NOT make an agent 100x more generally agent-y in the world. In fact, I think there is a saturation function and we're already close to it, which is partly why perhaps biology hasn't bothered trying to evolve smarter beings than us. The key weakness in the fear scenario is revealed by the insistence on "instant self-replication" -- reality isn't exactly as friendly to that kind of runaway explosion as this crowd seems to think. This is my big problem with this whole approach. Scare phrases like "instant self-replication" do way too much work, and people pretend 2001-style supercomputing monoliths can go geometrically across the matter of the universe the way bacteria do in a petri dish. Yeah, no. I'll need more than that kind of hand-waving "warp drive" speculation.
That would be C. You seem to be largely failing to recognize the power even speed, single-mindedness and ability to coordinate with perfect replicas gives you for free, without going into added ability to plan and do long term thinking. You are underestimating the latter too, but this is a simpler thing to explain. The reason you or I cannot take over the U.S. government isn't really because we aren't "smart" enough, it's because we couldn't coordinate with other people to make it happen without getting arrested. An AI does not face our usual coordination problems because it can just fork().
Here's a comment I wrote a while back that spends part of its time outlining how computer security vulnerabilities alone put us at pretty catastrophic risk. I can go into more detail if you think any part of this is underspecified, like you wanna know why AI-enabled security improvements will not be sufficient to prevent this from happening: https://www.lesswrong.com/posts/ervaGwJ2ZcwqfCcLx/agi-ruin-scenarios-are-likely-and-disjunctive?commentId=iugR8kurGZEnTgxbE
The above scenario doesn't require smarter than human intelligence and only relies on an AI's ability to do things hackers do right now faster. If you could copy my brain, put me on a laptop, give me access to a company's AWS account I could accomplish this kind of destruction just by replicating myself.
I wrote this mostly as an exercise to explain to other LWers the kinds of catastrophic actions you could take even if you weren't capable of developing new technology like nanosystems, which *would* be something it could do. So you should treat "encrypting the hard disk of most computers" (which for the record would be the worst thing to happen in your lifetime) as a strong *lower* bound on the worst types of things that could happen if you just let an unaligned superintelligence out of the box tomorrow.
I think everything specific you're mentioning is well within the scope of the history of normal engineering risk management and requires no special handling or esoteric "alignment" theology or speculations about "AGIs". A natural extension of learning to deal with things like nuclear reactor meltdowns or toxic chemical leaks. Conversely, the fact that you and the LW crowd generally reduce my kind of position to a throwaway 1-liner tells me you're too trapped in your own mental models to entertain any other perspective on AI seriously. You guys think I don't get it, I think you don't get it. Which is fine. Really. Let a hundred philosophies of AI flourish.
I think debate across this philosophical divide is largely futile. The chasm is just too wide. But I also think the divide is not particularly important to bridge. You guys do you, other schools of AI will do the same. Let the verdict of history be the judge.
My point with this essay is not to engage in a debate across the divide (which would be about as meaningful as a creationist/evolutionist debate) but to establish a boundary condition for my own scope of interest.
> I think everything specific you're mentioning is well within the scope of the history of normal engineering risk management and requires no special handling
If that's your takeaway from what I wrote, which was intended to be an explicit lower bound and nevertheless implies a catastrophic loss of infrastructure, you're not really thinking clearly. When nuclear reactor meltdowns happen humanity survives, and humans often choose to do something differently the next time or maybe just decide to shut down nuclear power plants within that region entirely. We would not be able to "just" write up an incident report after a near total shutdown of compute. If that was the worst issue we faced we'd probably rather face a hundred simultaneous nuclear meltdowns. This is all the while AI labs in 3rd or 4th place like FAIR continue to remain skeptical of the possibility of nuclear reactor meltdowns and publicly state they're just going to try to pull ahead by building a plant without worrying about all that stuff first.
In other words, you can think of our current security measures as insufficient for three coexisting reasons:
1. Your analogy suggests there is special regulatory acknowledgement of the danger destructive AIs can pose like the danger poorly managed nuclear power plants can pose; there is none. Your government currently does not recognize that nuclear meltdowns are a thing that can happen and will not prevent any AI lab from starting a fire. The default scenario for the development of the above AI in our current research climate is that FAIR immediately releases it to the public like they do all of their research, and then some kid in china does what I described (or worse) with his coreweave subscription for no reason more particular than that he's fucking around. The problem is not preventing one state sponsored company within your state from releasing the worm-nuke (or worse), the problem is making sure *nobody with access to sufficient GPUs* does so. That is basically impossible on our own, and:
2. Failure is generally irrecoverable. As I mentioned, the above scenario is a very generous one where all the best AI systems manage to do is destroy a few trillion dollars worth of economic value and cause some political and social shakeups and thereby convince people that alignment is a problem. It is the example I use for people who seem pathologically skeptical of anything that sounds too "sci-fi"; a "July 4" alien invasion scenario, you could say, where you assume the aliens are only allowed to use things that look like already developed weapon systems. AI systems actually "capable" enough to do treacherous turns or invent new technology would at minimum be able to kill us through disease (read: nanosystems) or by mulliganing for robotics R&D in some field like agriculture. And we can't easily repurpose such systems to help patch those extinction exploits, because:
3. Preventing an AI from being misaligned technically in the limit of planning, agency, and world modeling abilities (which is what alignment researchers gloss over as intelligence) with modern ML solutions involves avoiding ten different failure scenarios that will exist by default the way we currently design such systems. If the first company to make AIs that are smart enough to get a job is one of the ones that recognizes the problem, they will be unlikely to have the lead time to redo necessary components of their training architecture without letting another company get the chance to turn on an AI that develops a botulinum-injecting Covid-19.
> Conversely, the fact that you and the LW crowd generally reduce my kind of position to a throwaway 1-liner tells me you're too trapped in your own mental models to entertain any other perspective on AI seriously
Your relevant objections can be reduced to one line without loss of precision because they're not actually that deep, and mostly dance around the empirical claims about AI systems that "alignment people" make to talk about how transformers won't lead to artificial general intelligence, or to condescendingly preach about how there's not a literal scale of intelligence from salamander to human, or to insult us. Your insistence that having a conversation with us would be analogous to a philosophy or theology debate, or that there's some sort of bizarre inferential gap, is indicative of the problem. AI safety researchers have a very specific technical challenge, which they would be happy to break down for you in arbitrary detail without using the world "intelligence" at all. Far from myself not "thinking you're getting it", reading your posts gives me the impression you haven't even spoken to an alignment researcher before and are operating on some strawman about what their actual concerns are.
> You guys do you, other schools of AI will do the same. Let the verdict of history be the judge.
I would prefer to do so, but the problem is that the other schools of AI are going to kill everyone. I won't be able to tell you I told you so because we'll all be dead.
"AI safety researchers have a very specific technical challenge, which they would be happy to break down for you in arbitrary detail without using the world "intelligence" at all."
I for one would be very interested in that in whatever format (paper, book). Of course I'm assuming that by "without using the word "intelligence"" you also mean without replacing it by an even bigger undefined abstraction.
> AI safety researchers have a very specific technical challenge, which they would be happy to break down for you in arbitrary detail without using the world "intelligence" at all
Would greatly appreciate pointers to reading material along these lines!
Thank you! I thought I was the only one seeing this painfully obviously paper tiger
Thanks, this was a fascinating read (as always). There does appear to be something of a philosophical divide (between the views in this post and the concerns of the alignment crowd), but I’m not sure it is unsurmountable - or put another way, I think there would be considerable value in trying to close it a bit (for instance, to people who maybe aren’t as in the weeds on the philosophical or technical side, or people shaping public policy). Anyway, I’ve tried to pull out what I see as the main claims of this post, and contextualise them in the alignment conversation, to try to reduce (maybe my own) confusion. There is some overlap with the comments below, but hopefully is additive. Link here: https://www.lesswrong.com/posts/3hJCcdirKqPJeu6aW/responding-to-beyond-hyperanthropomorphism