This research note is part of the Mediocre Computing series
We think of the difference between robots and computers primarily in terms of mobility and manipulative ability. Computers think. AI computers “think like humans think they think.”1 Robots think kinda like humans think they think… and they move around and grasp and manipulate things kinda like humans think they do.
If computers are analogues of our brains, robots are analogues of selected subsets of the embodiments of our brains.
Specifically, they are analogues, preferably somewhat skeuomorphic, of our hands and feet. For the former, one aspect of our hands — opposable thumbs — is particularly important. The simplest robotic grippers tend to be be mitten-like, having only two “fingers” — but they are opposable.
There is something vaguely disappointing about a wheeled robot, or one that uses a principle besides the opposable thumb (such as say an electromagnet) to achieve manipulative capabilities.
A self-driving car feels, to me at least, less like a portal to a weird future of strange possibilities, and more like an adjacent-possible cul-de-sac to our gasoline-guzzling automotive present. A highly socialized AI embodied within, and limited by, an existing pre-robotic device we understand very well — a car. It is as limited by the imagination of the past as the notion of flying cars.
But are mobility and manipulative ability really the essence of what a robot is, or could be? Do they constitute the truly weird and generative aspect of robotics? The aspect that makes then an exciting technology not just for today or five years from now, but say five hundred years from now?
What is it about robots that is exciting on a 500-year time-scale?
I want to argue that mobility and manipulative ability by themselves are not enough, and are in fact secondary. That we can have a sessile, plant-like robot that satisfies sufficiently sophisticated definitions of robot better than the crude walking-grasping robots to which our defaults are anchored today.
I’ve come to believe that robots ought to be defined primarily in terms of a regime of sensing capabilities, rather than mobility or manipulation capabilities. Sensing capabilities that make them interesting on a 500-year time scale.
But what kind of sensing? Is a thermostat that senses a point temperature a robot? How about a surveillance camera? Or say a camera like the Oak-D, with onboard vision-AI capabilities, which we are evaluating for use in the Yak Rover project?
I do think a camera is an example of the kind of sensing I want to talk about, and a single-point thermostat is not. But not for the reason you might think. I’m not a “vision supremacist” the way many people in robotics today are.
A hint as to where I’m going with this is in the headline question: can robots yearn for phantom limbs? It is a question that gets at the 500-year interestingness of robots.
But I’ll get to that.
Let let’s back up to mobility and manipulation, where thanks to 200 years of mechanical engineering history, culminating in what Boston Dynamics robots can do today, we have better formed intuitions than in sensing.
The Kinesthetic Envelope
In my 9/30/21 issue, On Robots, I explored some of the ontological foundations of robotics, and after dismissing the uninspired “smart household appliance” and “naive Hollywood anthropomorphic” visions of robots, argued that the biomorphic bias of robotics is not about egocentric conceit, but about exploring an engineering design space that evolution has explored far better than humans have — one that assumes a wild, undomesticated environment, engaged via highly generalized capabilities.
Interestingly, Asimov — a chemist by training, who understood very little about the technology of the field he named — kinda got this point much better than many modern too-practical roboticists do.
In On Robots, I landed on this definition in part by generalizing the implicit definition of a robot in Asimov’s stories and essays, and steel-manning his argument for anthropomorphism:
A robot is a sufficiently complex, loosely biomorphic machine with a domain-adapted universal computing capability.
It’s a nebulous category, but it captures what’s interesting about the design direction represented by biology, and its tradeoffs. For eg, how to solve physical problems without significantly shaping or specializing the environment to suit the machine. Simple example: a hand can twist-and-turn a wide range of shapes, but in a torque-limited way. A fixed-size spanner can only handle a single size of nut, but apply a great deal more torque. An environment where the spanner is truly useful needs to have nuts of the right size in it, but almost any environment is one where a hand’s twisting-and-turning ability is useful.
This characterization of robotics immediately sheds light on why wheeled robots are less satisfying than legged ones. It is not about anthropomorphic conceits. Legs are simply far more versatile, and can get you around in more kinds of terrain.
Wheels solve for speed and/or energy efficiency on relatively flat, approximately two-dimensional terrain. Legs, and more generally, limbs, solve for scope of mobility in essentially arbitrary three-dimensional terrains.
If you have four limbs with some manipulative capability in the end-effectors, you can go beyond mobility on connected, roughly 2d surfaces, to mobility in highly disconnected 3d spaces like forests, by enabling motion primitives involved in climbing, swinging, and finely controlled ballistic arcs (monkey jumping, not kangaroo jumping or Evel Knievel jumping).
Conceptual boundaries are unstable here of course. Once you talk of grasping as an element in mobility, you start to blur mobility and manipulation. A monkey (or monkey-bot) climbing a tree is effectively manipulating its body rather than an object in the environment. The mechanics of swinging yourself from a branch, and of swinging a broken bit of branch to hit another monkey, are not very different. You just care about opposite sides of Newton’s third law in the two cases.
You can imagine an environment that is itself mobile, in active or passive ways (such as the swings and ferris wheels of an amusement park) and start to blur the boundaries between the capabilities of the nominal embodiment of the robot, and the situated embodiment of the robot plus everything in its environment it might use to expand what I call its kinesthetic envelope.
Let’s define this. The kinesthetic envelope of a monkey or robot is its full range of mobility and manipulation capabilities in a given environment. It is a relative definition, and a generalization of what roboticists call the configuration space of a robot (you can think of it as the joint configuration space of the robot and its environment, including the boundary-redefining manipulation possibilities of elements of the environment).
A monkey-body has a far bigger kinesthetic envelope in a forest than on a flat parking lot. No wonder monkeys prefer the former. They are more powerful there.
Kinesthetics as a Language
The kinesthetic envelope is one aspect of what Richard Dawkins called the extended phenotype of an organism. The only difference is, when you’re talking robots, you’re generally talking outside-in human design rather than selfish genes expressing a phenotype in an environment inside-out.2
In thinking about the kinesthetic envelope, it is important to not get trapped in the limiting dichotomy of a “robot” (or biological organism) versus the “tools” it uses.
This limit is strongly evident in the very impressive recent SayCan demonstration by Google robotics researchers. The SayCan system uses large language models (the kind that are also involved in GPT-3 and DallE-2) to get what looks like an off-the-shelf wheeled manipulator robot to respond intelligently to natural language commands.
This is at once deeply inspiring and deeply depressing: we are reductively mapping the kinesthetic abilities of a being with a very different body morphology, not just to our own kinesthetic envelope, but to a linguistic map thereof.
That’s a double reduction.
Can you do better?
For starters, you can throw out the language model as an intermediary representation, and perhaps try to train a robot’s model of its kinesthetic envelope the way martial arts teachers try to teach their students — to see past the nominal function (encoded in the names of objects and body parts, as well as UX conventions). A punch can be a block. The hilt of a sword might be employed as the weapon-end in some uses, like a club.
I suspect Boston Dynamics’ Atlas is programmed this way, by a human movement coach using a pre-verbal supervisory mode (I don’t know for sure, but the behaviors suggest they are using what are called maneuver automata).
This is better than using natural language, but still limiting. After all, Atlas is only anthropomorphic to first order. Clearly, it is more/less capable than human bodies in various subtle ways, and a human-driven supervisory training will not get to all parts of its kinesthetic envelope. But at least, the first-order similarity, and lack of a language layer, leads to richer results than in the case of the SayCan robot butler.
Can you do even better? Of course.
The robot in the SayCan video is not at all like a human. It has wheels, one arm, and a weirdly positioned camera. It is clear from the video that it is not moving and manipulating like a human (even though it is thinking and talking in human terms). I suspect some sort of primitive classical planning, within its configuration space, is going on there.
But it is also not clear that it is moving and manipulating within a be-all-you-can-be understanding of its own kinesthetic envelope. It does not know its own body’s language.
Body Language
The central issue here is what is sometimes called functional fixedness. Seeing an object not for what it is, but in terms of a symbolic-functional model of what it can do within a reference domain, circumscribed by constraints inherited from that domain.
Getting past functional fixedness is a powerfully liberating thing.
A cute example involves the myth that monkeys peel bananas from the “wrong” end. Though they sometimes do, it’s basically a myth. But the underlying point isn’t: a ripe banana is easier to peel starting from the “wrong” end. Partly through lazy imitation, and partly due to the functional-fixedness created by language models of the world, we tend to think of the stem of a banana as a sort of pull-tab handle.
But there’s more going on here. Once you see past functional fixedness, you start to see past arbitrary conventions in relating bodies to environments, or bodies to themselves.
While a rock or a stick might serve as a tool in a narrow sense for either a monkey or a robot, the affordances of the environment go well beyond aspects we see in bundled ways as “tools” with “functions.” A monkey using a branch to swing itself up a tree, or a hanging vine to Tarzan-swing to another tree, is not exactly using those elements of the environment as “tools” per se. It is seeing its entire kinesthetic envelope in a way that is reshaped by the presence of those external elements.
In other words, to the right kind of beginner-mind or computer, the entire environment of a body is functionally unfixed. It just is. How you are present in it is limited mainly by your imagination, not definitions.
In humans, language adds another strait-jacket layer to the imagination.
This applies within a body too! Our language strongly constrains how we see parts — in terms of functional names.
For example, I started this essay with a distinction between wheels and legs. But wheels can act like legs too!
The 6-wheel Mars rover rocker-bogie suspension, which I’m replicating in my own rover build, is actually rather leg-like in many ways. The clever differential bar mechanism works something like a hip joint, shifting weight among the wheels to improve traction. On some terrains, the mechanism will move more like legs than rolling like a wheeled undercarriage. Humans experience this too — with roller skates.
The broader point here is that the phenomenology of generalized mobility and manipulation, in relation to a specific environment, should inspire us to look past the nominal morphology of a robot to the revealed shape of its kinesthetic envelope.
Ie, not how it moves and manipulates, but how it thinks it is moving and manipulating.
Artificial distinctions between the “body” of a robot and meaningfully manipulable aspects of the “environment” are just linguistically encoded limiting ideas in our heads (or the robots’ heads). What is seen as a “tool” versus “background” is similarly, a function of imagination, not reality. And so are relationships between parts of a body.
It’s not that symbolically encoded (linguistically or otherwise) maps of kinesthetic territories are bad per se; they just hard-code a particular pattern of tradeoffs adapted to particular environments.
For example, self-driving cars, arguably the biggest category of successful modern robots, began with a DARPA grand challenge in “wild” terrain (a race in the desert), and only later tackled built environments, which were seen as a more advanced challenge.
In a certain narrow sense, the built environment — with highly structured and learnable traffic rules, signs, and routine behaviors by other drivers — is more “difficult” to navigate than a wild desert. But once you know the language of traffic, and conventional driving, it is easier. Built environments help you trade part of a difficult robotics problem for a simpler, essentially linguistic problem. Instead of driving on the territory, you drive on the map — if you have one.
One way to think of this is that “smart” environments encode a “standard” kinesthetic envelope in a global way. So the driverless car learns to speak a “civilized” language comprising maneuvers like lane changes, smooth turns, and parallel parking. It is not a “be all you can be” mobility-and-manipulation language for car-shaped robots, but it is a very powerful goal-oriented language in a built environment, and to the extent cars co-evolved with such environments, the tradeoff is worth it.
How would a robot learn what you might think of as its native kinesthetic language in a given environment? The One True Language of mobility and manipulation that maximizes its kinesthetic envelope in that environment? The language that turns it into the robotic equivalent of Bruce Willis in Die Hard, busting through walls and crawling through the ducts and elevator shafts?
One approach might be to analyze motion from first principles and try and get to a very general “physics” based language of mobility and manipulation. This was roughly the old-fashioned robotics approach. There is a classic text by Matthew Mason called The Mechanics of Robotic Manipulation that takes this approach.
Today, you’re more likely to use something like reinforcement learning to discover the language in a brute force way, similar to how AlphaGoZero learned a brute-force Go language that proved far superior to the human language. There’s probably merit to both approaches, and good solutions will likely draw on both (unlike classic deep learning domains like language models, vision, and recommendations systems, the presence of physics and deep mathematical structure to domains like motion hint at potential for first-principles speed-ups of deep-learning approaches).
Will we get there with say self-driving cars?
Of course we will.
It’s just not a priority right now. For now, it’s easier to teach cars a human-biased “civilized language” of the built environment.
Eventually, Fast and Furious: Tesla Drift will pit a properly “wilderness trained” driverless car against Vin Diesel, and Hollywood will rig the story so that Vin Diesel wins. Perhaps the car’s character will be voiced by a synthesized Bruce Willis voice, since the actor sadly suffers from aphasia now.
But let’s back out of this fascinating bunnytrail of kinesthetic envelopes and talk about the less familiar world of sensory envelopes.
Sensory Envelopes
In his evolving definition of what he calls “Newbots,” my roboticist friend Jascha Wilcox has been arguing that you can just dump a lot of cheap cameras into a robot design and meet most of your sensing needs that way, instead of adding a complex set of specialized sensors.
As a simple example, a camera pointed at a rotating wheel with a sufficiently high frame rate can determine how fast it is spinning, so you don’t need a specialized tachometer.
A camera with an infrared range can roughly measure temperature ranges of common interest (optical spectrum light alone is enough for high temperatures of course). If compute is cheap enough, you can infer material properties like masses, densities, and elasticities by analyzing geometries and kinematics in videos.
Cameras, in short, allow you to replace complex, multi-modal sensing with more raw compute. Assuming you have (or can deep-learn) the knowledge required.
A lot of people have jumped on this insight, which is why we have a kind of growing vision-supremacism in robotics. I think it was triggered by Elon Musk arguing that LIDAR was unnecessary and you could do everything needed for self-driving with ordinary cameras (though of course, that is perhaps moot now, since LIDAR has evolved to the point that the new iPad Pro has it).
But what makes cameras so special among sensors? Why did vision evolve (genetically and culturally) to be the dominant sense for humans? Why does it seem poised to evolve into the dominant sense for robots too?
The general reason that applies to both animal vision and robot vision is that cameras and eyes present a nearly unbeatable combination of density and range. There are nearly a 100 million photoreceptor cells in the eye. Though it’s not a good way to think about it, the eye is apparently about 576 megapixels. The top commercially available camera is apparently comparable at about 400MP. Even the iPhone 13, with a 12 MP sensor, can produce imagery past the human indifference point.
Whether we’re talking biological or CCD, vision sensors have really high density. They pack a lot of bitstreams into a very small sensor.
Think about it this way: if each pixel were an individual sensor that you tried to read with a GPIO pin on a microcontroller, you’d end up with an entirely ridiculous “brain.” We wouldn’t be able to build such a brain today. It would take more pins than we can put on a chip package. We only have high-density vision today because we can exploit the particular advantages of CCDs.
And this high sensor density comes with extraordinary range. You can see Andromeda with the naked eye! Spiders have evolved to orient by the Moon!
The next-most dense-and-long-range sensors in biology pale in comparison. Hearing has a decent spectral range of 30-3000 Hz, and we can hear loud sounds from miles away. Dogs basically have a near-field olfactory “vision” with a fairly rich and dense sensory organ.
Taste? A coarse sensor array on tongues with a range of zero (the substance has to be in contact).
Touch? A very big but low-resolution sensor array (with multiple modes like pressure and temperature) spread across the 1.5-2 square meter surface of the human body, and again a range of approximately zero.
Proprioception? Kinda like generalized internal touch, with I suspect similar density/range parameters, but with specialized additions, such as the semi-circular canals in our ears which contribute to the balance and orientation aspect of proprioception.
I am not a biologist, but I’m guessing there’s a whole long tail of specialized “point” sensors in the body that operate via chemical transmission largely below conscious awareness, but we don’t think of these as generalized open-ended “senses” like vision.
For example, there are specialized photoreceptors in the eyes that help regulate our chronobiology in relation to day length. They are “time” organs rather than space organs.
But this long tail is better thought of as closed-loop signaling pathways for specific, relatively immutable fixed functions, with low potential for “unfixing,” especially in relation to the changing environment. Maybe I am wrong, but I doubt the signaling pathways of insulin regulation can be flexibly co-opted for some other function.
So to a reasonable first approximation, mammalian bodies have 5+1=6 open senses (counting proprioception as distinct from touch). Or rather open sensory fields. Each of these open sensory fields has a density over an extent, a range, and a flexible, functionally unfixed character that makes it programmable in relation to the environment.
In a given environment, they can be ranked in terms of general utility, in terms of contribution to the sensory envelope.
Our sensing abilities take the form of 6 open sensory fields that overlap to some extent in a way that allows for sensor-field fusion. The visual-auditory-olfactory fused sensing field is of course the most important for most mammals.
When we pay attention to something, we are closing the loop on one or more of these open sensor fields. Depending on how we relate to that thing, we may or may not be reshaping our body envelope.
Body Envelopes
The sensory envelope relates to the environment in a read-only way, while the kinesthetic envelope is write-only. Put them together and you get a body envelope.
This is why we can think of plants, which have very limited kinesthetic capabilities, as having a crude body envelope, and qualifying as robot-like. Plants have phototropic (orienting towards the sun) and negatively geotropic (away from gravity) kinesthetic envelopes. In very limited ways, within flowers for example, they might have a bit more ability to move and manipulate.
In animals, the mobility and manipulation needs of sensing are outsourced to those capabilities. Our head, if you think about it, is actually a limb much like the other four. Except that the end-effector is a spherical sensor payload pod of eyes/ears/nose rather than a manipulator (unless you count head-butts in fighting and headers in soccer).
This insight only recently sunk in for me, when I was trying to design a general modular rover platform recently, and realized I could design a generalized “limb” with swappable end effectors for various combinations of mobility, manipulation, and sensing. It’s not particularly efficient to unify the three kinds of “limbs” into one, but it’s useful and doable in some environments.
In general, it is not particularly useful to make a fundamental distinction between sensory and actuatory modalities for a robot.
We hold an object in our hand and rotate it to examine it with our eyes.
We bring it up to our noses to sniff or lick it.
We shake it to generate a noise we can hear.
We press objects against our skin to sense their relative temperature.
Our body design even blurs the distinction at a structural level. For example, finger tips are more sensitive in terms of touch, a case of the touch sensor field being extra concentrated at the most flexible manipulation locus. That collocation of sensory and motor capabilities is as important in our tactile capability as opposability of thumbs. It allows for tight, delicate, closed-loop manipulation. This is the literal fingerspitzengefuhl — finger-tips feeling — that we talk about metaphorically in Boydian strategy theory.
A dramatic perspective on this idea is the cortical homunculus, or “how the brain sees the body.” If you visualize the body in terms of the density of sensorimotor nerve endings, you get a very weird goblin like creature. Click through to that wikipedia article for various visualizations of this homunculus. You’ve always been in goblin-mode, you just didn’t realize it.
The coarse distinction is useful: sensory fields have a read-only relationship with the environment, kinesthetic fields have a write-only relationship with the environment. Together, they have a full read-write relationship, and the most important thing we read and write with this read-write ability is our own body envelopes.
So when you combine a fused multi-sensory field with a kinesthetic field, you get a body envelope — a kind of containing surface that mediates, and is mediated by, a read-write relationship with the environment, closing loops of being that are not closed internally within the minimalist biological body.
Normally, we think of a body envelope in terms of a “skin” that represents a separation between “me” and “the world,” and this is of course a very important and useful approximation in many situations — it is the boundary that we must enforce and defend for basic survival. It is the boundary that contains most of our risk. If you “save your skin” you’ve basically saved yourself.
But more generally, not everything we do with our bodies is usefully thought of as being bounded by the skin.
The body envelope is most usefully thought of as a subsymbolic combined kinesthetic-sensory field map of situated agency.
The cortical homunculus, in other words, changes its shape based on where you are and what you’re doing.
How do we know it’s a map rather than a territory? Because humans can experience phantom-limb syndrome.
Phantom Limbs
Human amputees experience phantom limb syndrome — the ghostly sense of the limb still being there. The prosaic physical reason of course is that though the limb is gone, the nerve endings are still there, and the mind that had wrapped itself around awareness of that limb hasn’t adapted.
There is an unprocessed misregistration between constructed-and-experienced body and sensed-and-actuated body. The eyes tell the amputee’s brain that it is missing a limb, but it cannot be reconciled with the proprioceptive sense telling the brain that it is still there. The body envelope is in an inconsistent state. You have not one, but two corticular homunculi going at once. It is the embodiment edition of multiple-personality disorder.
The success of treatments like mirror therapy to treat “pain” in limbs that are “not there” tells us a lot about our minds and bodies, but in particular, it tells us that the mind models the body using something like an unconscious map. A map that can be wrong, and corrected.
In mirror therapy for example, you exploit the mirror symmetry of the normal human body to fool the brain into accepting a reflection of one limb as visual feedback of the state of the other limb. The brain then “unclenches” in a way that relieves pain in a limb that isn’t there.
Phantom limb syndrome reveals a lot.
It reveals that, to a first approximation, setting aside subtler questions of proprioceptive qualia, the body envelope is our sense of self.
It reveals that the mind models the body as a connected whole.
It reveals that the body envelope is essentially a computational construct.
It reveals that sensory modalities construct this envelope based on self-sensing of the environment in a way that prefigures the envelope.
It reveals that the body envelope is a dynamic thing, that changes shape and other properties based on movement and manipulation — and amputation.
Other related phenomena tell us other interesting consistent things. For example, the phenomenon of referred pain, wherein trauma to one part of the body is experienced in another part, tells us that the body envelope is kinda programmable to be functionally effective, even if it violates naive consistencies and correspondences.
Various sorts of synesthesia tell us that categories of qualia like “light” and “sound” are not biologically fundamental, but constructed to be effective and adaptive.
Phenomena like blindsight tell us that what we see is more than what we think we see. Neuroplastic rewiring of sensory sensitivities in one part of the body envelope to compensate for a loss in another (aka how Daredevil is a superhero) tells us that the whole shape of being is merely an adaptive fiction, not anything fundamental. Various sorts of gender dysphoria reveal just how challenging it is to deal with the malleable potentialities of the body envelope.
Yet, it is a fiction that undeniably has a certain integrity and overall consistency to it.
The body envelope is not just any old fiction, but an existentially necessary one that must meet certain standards to sustain a sense of being. And sensing seems to play a bigger role in this necessity than kinesthetics do. You can be a quadriplegic and still be a functioning genius physicist like Stephen Hawking was, but if all your senses are cut off you will become deeply mentally disabled. The truly disembodied brain cannot truly survive autonomously.
This is why I think embodiment of brains — and “embrainment of bodies”3 — as being central to AI, but that’s a debate for another day.
Let’s take stock of why the body envelope might have the characteristics it does.
Computational Efficiency is obviously one factor. Many computations involved in survival become easier if our normal body envelope is spatially coextensive with the touch sensor-field topology — ie our skin. Exploiting features like the physical symmetry of our body also leads to body envelopes with high integrity.
Situation Awareness is another factor. The body envelope changes shape and sensitivity based on the posture we adopt, and which parts of it we actively attend to. This can have quasi-physical effects. Just as a phased-array antenna can steer the beam of the antenna programmatically, the mind can steer the sensitivity of the body envelope in various directions. This lesson has recently been driven home for me via some lessons in the Alexander Technique I’ve been taking (from Michael Ashcroft’s Expanding Awareness online course, highly recommended), which is all about learning to manage the shape of your bodily awareness field.
Boundary Intelligence is another factor. I have a long twitter thread about this that I might write up at some point, but you can distinguish usefully between boundary and interior intelligence, and think of the former as the intelligence involved in managing information flows across your body envelope — what traditional computation thinks of as “I/O” is a great deal more in robotics and embodied computation.
But perhaps the deepest reason the body envelope is the way it is, is to sort the world properly into what we care about and what we don’t.
The body envelope contains those parts of the universe we care about the most, and do the most to protect and nurture. It is effectively an embodied opinion on the meaning of life.
This leads to interesting social-metaphoric extensions. In English, the “heart” is metaphorically extended to include loved ones (in Hindi, the liver — jigar or kaleja plays the same role).
We can imagine this extending to robots in obvious ways. Robot “minds” might supervene on robot “bodies” in interesting ways that don’t line up neatly to nominal individual body envelopes. Arguably biology already does this with eusocial insects, where “individuals” of some castes have more agency and perhaps experience a “body politic” envelope that extends to include the bodies of “individuals” of lower castes.
Imagine for instance, a circular robot swarm, where the outer ring of individuals is experienced by a single computational entity as a skin-like “body envelope.”
Humans experience this sort of thing in milder ways, through behaviors like dancing, team sports, and sex. But because the communication and shared computation bandwidth between individuals is (for now, though brain-to-brain interfaces are coming) extremely limited and transient, our body envelopes tend to be mostly individual-centric, most of the time.
But this obviously need not be the case for robots. Many logical robots might span many physical robots in non-obvious ways.
There is only one possible conclusion you can draw from all this rich phenomenology. The body envelope is the shape of caring. It is how a being chooses to be present in the world.
So to ask can robots yearn for phantom limbs is to ask can robots care?
Robotic Body Envelopes
Imagine you chop off a robot’s leg with a cleaver. Or a wheel unit. What happens?
In a primitive robot, nothing might happen. If the robot has no sensed and constructed model of the body envelope it might continue to operate as though nothing had changed. The code might continue to send on/off signals to motors that are no longer there. If there are sensors, there may be no way to interpret their silence as a change in the body envelope.
Depending on the programming model, what you’d get is some mix of errors and weird dysfunctional behaviors.
This is not a bad state to achieve incidentally. Many simple insects appear to have that kind of programming. Their body envelope loops are not properly closed. They continue to display automaton-like behaviors when the function is obviously broken. I am not sure I believe insects can experience phantom limbs.
But more sophisticated organisms react differently. They seem to have an ongoing process of inspecting and updating the body envelope based on the assessed state of the assumed body.
Let’s talk about normal body envelope evolution first, before getting to phantom limbs.
For a robot to maintain a proper body envelope, it of course has to first detect changes in its current body envelope — and remember this can include elements of the extended phenotype, such as a stick it is holding, or a branch it is swinging from.
But mere detection is not enough. The changes have to be synthesized into an updated, coherent, and consistent new body envelope.
Some changes can be sensed or predicted internally, and the body envelope can change in anticipation. For example, if a robot intends to move a limb 30 degrees, it can update the body envelope appropriately. If it has a joint-angle sensor, it can regulate the limb angle to 30 degrees via closed-loop control, and depending on the accuracy and tolerances, update the bodily envelope to reflect the sensed rather than intended angle. Say a limit cycle between 29.8 and 30.2 degrees due to a chatter in the servo-motor (just like you can tell when your hand is shaking due to nervousness).
If it has generalized vision, it can inspect the new limb position in a more open-loop way, and make further adjustments and modulate the body envelope into a particular alignment relative to the environment.
This phenomenology can get arbitrarily complex, but as a starting point, we can distinguish among a few “normal” body envelope update regimes of increasing sophistication:
Open-loop updates based on intended actions
Proprioceptive updates based on closed-loop specialized sensing
Environment-relative updates based on open-loop generalized self-sensing
Extended-phenotype sensing based on current extended sense of the body, including tools and environmental affordances
The more sophisticated the body envelope, the more sophisticated the ways in which you can mess with it.
For example, suppose a wheeled robot has only level 1 sensing of its steering. It can in theory do dead reckoning. The following program should make a robot trace a square and bring it back to its starting point:
FORWARD 10
RIGHT 90 deg
FORWARD 10
RIGHT 90 deg
FORWARD 10
RIGHT 90 deg
FORWARD 10
A level 2 error occurs if the steering motor is not accurate, which is not that interesting. But you can cause a Level 3 error if you simply pick up the robot while it’s trying to execute this program, rotate it randomly, and put it down again. It has no way of knowing! Its body envelope is not equipped with absolute orientation sensing. Unlike my cat, it does not know when it has been picked up and put down again.
Something like this can be done to humans by the way, and research from Nvidia has demonstrated that you can slightly shift the visual field of a VR headset during what are known as saccades (rapid unconscious eye movements during which you are temporarily “blind”) in a way that the brain won’t detect. This means you can get the person to walk in a circle while thinking they are walking in an infinite straight line (or a larger circle). This is one way to solve an important problem in VR — making small physical spaces serve for simulating large or even infinite ones.
Equivalents of biological phenomena like object permanence, saccade interpolation, and frame assumptions can all be exploited in interesting ways to hack robot behaviors too — but the robot has to have a sufficiently sophisticated sense of its own body envelope first! Otherwise you’re just vandalizing a dumb appliance in uninteresting ways.
And at some level of sophistication, you would get the equivalent of phantom limb syndrome.
The basic principle is not hard to grasp. Imagine a robot is holding a stick. Due to an accidental loosening of the grip, the stick is dropped but the robot doesn’t detect this. It still thinks it is holding the stick, and behaves accordingly.
Just substitute a loose limb for a stick and you’ve got primitive phantom limb syndrome. It’s effectively the same sort of issue as my simple earlier example. We’re just messing with body envelope integrity rather than “I think I’m traversing a square.”
But that’s just the beginning. What if the limb had a bunch of touch sensors and ripping it off severed the wires to those sensors?
Robots that Care
What if the brain estimated an energy budget based on actively sending power to the sensors and actuators of the severed limb, and the sensor return path, upon being severed, failed in a way that indicated “high energy use” that’s interpreted as “pain”? What if it could use a camera to detect the amputation and reprogram its boundary?
And obviously, incompletely detected discontinuous alterations of the body envelope can cause severe misregistrations and dysphorias in the sensed integrity of self — ie, phantom limb syndrome.
At this point, I think we’re getting to true robots — robots that can care. Because they’re meaningfully dividing the world into parts they care about, and parts they don’t, via a highly sophisticated and malleable body envelope. These are robots that experience media as extensions of self, like humans do. They have a subjectivity in a McLuhan sense. At least in a limited sense, there is something it is like to be such a robot.
Which means we’re talking robots that don’t just maintain idealized notions of body envelope, but have preferences relating to those notions. Preferences which can be revealed through behavior.
A sufficiently complex robot will not just be able to detect when a limb has been chopped off and experience phantom-limb syndrome in relation to it, it will be capable of caring about the lost ideal of bodily integrity, and yearning to get it back.
And to the extent the body envelope that enables that is a fluid, programmable one that can extend into the environment in material and social ways, the robot will be able to care about almost anything, and yearn for almost anything.
And that’s a kind of interestingness in robots that I think will matter even 500 years from now, long after today’s Spots, Atlases, Teslas and Asimos are distant historical memories. A kind of interestingness that would be interesting to give rise to, even if we ourselves don’t make it past the next century.
Thanks to Benjamin Bratton for this efficient characterization, at a Berggruen Institute event last week. The added degree of indirection makes all the difference in what one thinks of AI.
While there is a tradition of what is called evolutionary robotics, going back to Von Neumann’s universal constructor, it is at a very early stage of development, and there is no such thing as growing a robot from an egg via the interaction of an ontogenic process and an environment. We’ll probably get there by 2122, but let’s stick to 2022 for now.
Thanks to Nancy Baker Cahill for highlighting this duplexity, at the same Berggruen event