Discussion about this post

User's avatar
Dylan Fitzgerald's avatar

>>> Irony becomes necessary when ambiguity is so deeply embedded into the very essence of what you’re trying to talk about that trying to disassemble the ironic thought into constituent unambiguous parts destroys the thought itself. You can only think the thought at all in an ironic way.

I love this. And can't help but note that it rhymes with Le Guin:

>>> The artist deals with what cannot be said in words. The artist whose medium is fiction does this in words. The novelist says in words what cannot be said in words.

(Ursula K. Le Guin, The Left Hand of Darkness)

---

Separately: "I sincerely believe ironic AI will save the world" -- use of "sincerely" here is delicious.

Phil Getts's avatar

TL;DR: Rao is repeating the errors of the post-modernists. He accuses LLMs of being unable to handle ambiguity and context-sensitivity, but it's precisely their ability to do so which enabled neural networks to succeed where symbolic AI failed. LLMs can handle just the right amount of ambiguity, by design. Rao wants too much ambiguity. We see from brains and LLMs that intelligent minds can't really learn or use concepts or beliefs which require "dense irony". Anyone who thinks they're doing so is fooling themselves, and will end up back in the dogmatism that Rao calls "sincerity".

Rao here posits a false dichotomy between "irony" and "sincerity", when both (as Rao defines them) are just obsolete remnants of the days before probability and real-valued measurements, when people thought all you could say of any proposition was "true", "false", or "I don't know". LLMs are the current epitome of the empiricist epistemology that /rescued/ us from that primitive worldview.

Buddhists, the Chinese School of Names, and post-modernists all arrived at this same skeptical view of language or logic because they saw it as the only alternative to the Rationalist metaphysics of antiquity. But we have an alternative now, and already know how it works in detail in neuroscience and in AI.

Post-modernists were vaguely aware there was some distinction between rationalism and empiricism, but they thought the Rationalist / Empiricist divide was Plato / Aristotle. Albertus Magnus and Raphael told them Aristotle was an empiricist, and they thought Aristotle, being the "first empiricist", defined empiricism forever. I'm not making this up. Derrida defended this position at length in an argument with John Searle ("Literary Theory and Its Discontents", New Literary History 1994 V25 N3, 637-667). He ridiculed Searle's claim that modern logic superseded Aristotle's, saying Aristotle's logic is the oldest and therefore the most-definitive.

The upshot is that post-modernists just pointed out ways that Rationalism fails to explain language, then thought they'd disproved science. That's all there is to post-modern skepticism.

Context-sensitivity and toleration of uncertainty is good. But the way Rao uses the word "dense" reminds me of Barthes' "tissue", Deleuze's "rhizome", Geertz's "thick description", and the New Historicist sense of "dense" as "simultaneously maintaining two contradictory propositions." And when Rao says dense irony "destabilizes meaning", that pomo catchphrase is the smoking gun. He isn't talking about reasonable uncertainty and the use of probabilities; he's using the lingo of radical indeterminism. He might not be doing it consciously, but he is definitely echoing post-modernists, and repeating their buzzwords will be read as supporting their philosophy.

Rao's definition of sincerity has some etymological validity. It derives from Latin /sincerus/, meaning "whole, clean, pure, or unadulterated". And purity may be the defining value of ideological fanatics: purity of thought, of deed, of race.

But that etymology has been obsolete for a thousand years, and Rao's definition of sincerity as being intolerant of ambiguity or uncertainty strikes me as a straw man to make Rao's radical skepticism seem reasonable.

Radical skepticism is the opposite of radical certainty, but that doesn't make it the only alternative. A better alternative is understanding that a word doesn't point to an eternal, context-free transcendental form. Symbolic AI /is/ vulnerable to Rao's critique, and that's why it failed. In symbolic AI, a word is unchanging and indivisible. Attempts to define the context as the graph the word is embedded in, as in Saussure, have never worked well. But in neural networks, a word is a dynamic attractor or function. In a listener, it activates the most likely candidates from a larger set of learned representational structures or functions associated with that word, according to their predicted relevance in the current context.

This is just what the multi-head attentional mechanism does in a transformer architecture LLM (Vaswani et al. 2017, "Attention is all you need", arxiv.org). It's also how the rat olfactory bulb recognizes scents (Skarda & Freeman 1987, "How brains make chaos in order to make sense of the world", Behavioral & Brain Sciences 10: 161-195).

In both cases, the selection of the likeliest candidates begins, rather than ends, with an unstable state in which any meaning is possible. The context provides one or more nudges into a lower-energy state, from which fewer meanings are accessible.

The rat olfactory bulb begins in a chaotic state which passes near to all the attractors for all of the scents the rat knows. A learned scent nudges the bulb off that chaotic attractor, to a stable oscillation, in a single Hopf bifurcation.

The olfactory bulb makes just 1 decision. A transformer decoder producing output token n+1 makes one decision per layer. The KV-cache attention step adds contextual information (energy) to the word embedding (a destabilization), and then the feed-forward network minimizes that energy, using the added information to reduce the number of alternative readings (a stabilization step). The network can only learn if the stabilization consistently outpaces the destabilization.

You don't end up in the destabilized state. In both rat brain and LLM, the training algorithm guarantees that the network only learns concepts that can usually be stabilized. All neural network training algorithms, from the perceptron to the transformer to the rat brain, work by energy minimization. Learning algorithms map memories and concepts to low-energy neuron activation patterns, meaning that the activation levels of two nodes connected by a link in the pattern are correlated with the learned weights between them. At retrieval time, the network begins in a high-energy (semi-random) state. The retrieval process uses some context, such as sense data, memory, or preceding text, to nudge the activation states of some nodes one way or another. Some form of energy minimization then nudges all the activation states in a "downhill" (lower energy) direction. Eventually, the network rolls down to the bottom of some pit in the energy landscape, and (hopefully) finds a memory there that's both a correct instance of the concept retrieved, and contextually appropriate.

Rao hits on this when he writes, "The way transformers (and to a lesser extent, diffusion models) work, output cannot do any kind of dense layering of meaning. You will end up in a non-ironic place simply by virtue of how the mathematics works." But this applies to /all/ statistical learning algorithms, even ones that don't use energy minimization. By definition, they can only learn statistically valid concepts, not unstable or indeterminate ones. For this not to apply to humans, we wouldn't just have to use radically different learning algorithms than LLMs (and we don't); our brains would have to be so fundamentally flawed that we'd be better off just trusting the LLMs.

Rao describes "dense irony" as holding incompatible thoughts, which is precisely what a high-energy state is. That's a /starting state/ for all known neural memory algorithms. A thought that can only be thought in an ironic way, cannot be thought. Neither you, nor an LLM, can remember or sustain a high-energy state. Even if you could, by some act of will, keep your mind in a high-energy state, it wouldn't be thinking. It would just be emptiness. You would be preventing your brain from operating. For all I know it might feel really nice, giving you something like an average of all your life's feeling and experience. It might even be addictive. But it would be meaningless. If you practiced this every day for years, somehow holding the same high-energy pattern in your mind at length, all that would happen is that your brain would adjust connections between neurons to make it a low-energy state, easy to enter; but it wouldn't give you some special post-modern insight into the world. It would be training yourself to enter an empty state which said nothing correlated with your situation in the real world.

My experience is that trying to hold such "ironic" states in mind is usually disastrous. Most people who try to either withdraw from the world, or become ideological fanatics. Individual post-modernist scholars such as Foucault may remain truly "ironic" and skeptical, but most don't. Derrida, who wrote an entire book arguing that books can't say anything, was himself absurdly certain of the truth of his words. Witness his arrogance in his dispute with Searle, on a subject which Searle was an expert in and Derrida knew nothing about.

And their political disciples are /certainly/ certain of their beliefs. Post-modernism is never used in politics to withhold judgement or entertain compromise. That's what Enlightenment liberals do. It's only used to shout down arguments made by political opponents--usually Enlightenment liberals.

That's because that's what it was made to do. "Dense irony" is just another term for doublethink. Post-modernism wasn't invented by the French. It was invented by the Nazis, to attack the core principles of the Enlightenment, and that is what it will always inevitably do. The Nazis used it to legitimize Will to Power, strife, dictatorship, "authenticity", anecdotal "lived experience", "spirit", and racial identity over statistics, measurements, compromise, individualism, and the mechanistic operations of liberal institutions.

(Continued in a reply to this comment.)

11 more comments...

No posts

Ready for more?