13 Comments
User's avatar
Dylan Fitzgerald's avatar

>>> Irony becomes necessary when ambiguity is so deeply embedded into the very essence of what you’re trying to talk about that trying to disassemble the ironic thought into constituent unambiguous parts destroys the thought itself. You can only think the thought at all in an ironic way.

I love this. And can't help but note that it rhymes with Le Guin:

>>> The artist deals with what cannot be said in words. The artist whose medium is fiction does this in words. The novelist says in words what cannot be said in words.

(Ursula K. Le Guin, The Left Hand of Darkness)

---

Separately: "I sincerely believe ironic AI will save the world" -- use of "sincerely" here is delicious.

Phil Getts's avatar

TL;DR: Rao is repeating the errors of the post-modernists. He accuses LLMs of being unable to handle ambiguity and context-sensitivity, but it's precisely their ability to do so which enabled neural networks to succeed where symbolic AI failed. LLMs can handle just the right amount of ambiguity, by design. Rao wants too much ambiguity. We see from brains and LLMs that intelligent minds can't really learn or use concepts or beliefs which require "dense irony". Anyone who thinks they're doing so is fooling themselves, and will end up back in the dogmatism that Rao calls "sincerity".

Rao here posits a false dichotomy between "irony" and "sincerity", when both (as Rao defines them) are just obsolete remnants of the days before probability and real-valued measurements, when people thought all you could say of any proposition was "true", "false", or "I don't know". LLMs are the current epitome of the empiricist epistemology that /rescued/ us from that primitive worldview.

Buddhists, the Chinese School of Names, and post-modernists all arrived at this same skeptical view of language or logic because they saw it as the only alternative to the Rationalist metaphysics of antiquity. But we have an alternative now, and already know how it works in detail in neuroscience and in AI.

Post-modernists were vaguely aware there was some distinction between rationalism and empiricism, but they thought the Rationalist / Empiricist divide was Plato / Aristotle. Albertus Magnus and Raphael told them Aristotle was an empiricist, and they thought Aristotle, being the "first empiricist", defined empiricism forever. I'm not making this up. Derrida defended this position at length in an argument with John Searle ("Literary Theory and Its Discontents", New Literary History 1994 V25 N3, 637-667). He ridiculed Searle's claim that modern logic superseded Aristotle's, saying Aristotle's logic is the oldest and therefore the most-definitive.

The upshot is that post-modernists just pointed out ways that Rationalism fails to explain language, then thought they'd disproved science. That's all there is to post-modern skepticism.

Context-sensitivity and toleration of uncertainty is good. But the way Rao uses the word "dense" reminds me of Barthes' "tissue", Deleuze's "rhizome", Geertz's "thick description", and the New Historicist sense of "dense" as "simultaneously maintaining two contradictory propositions." And when Rao says dense irony "destabilizes meaning", that pomo catchphrase is the smoking gun. He isn't talking about reasonable uncertainty and the use of probabilities; he's using the lingo of radical indeterminism. He might not be doing it consciously, but he is definitely echoing post-modernists, and repeating their buzzwords will be read as supporting their philosophy.

Rao's definition of sincerity has some etymological validity. It derives from Latin /sincerus/, meaning "whole, clean, pure, or unadulterated". And purity may be the defining value of ideological fanatics: purity of thought, of deed, of race.

But that etymology has been obsolete for a thousand years, and Rao's definition of sincerity as being intolerant of ambiguity or uncertainty strikes me as a straw man to make Rao's radical skepticism seem reasonable.

Radical skepticism is the opposite of radical certainty, but that doesn't make it the only alternative. A better alternative is understanding that a word doesn't point to an eternal, context-free transcendental form. Symbolic AI /is/ vulnerable to Rao's critique, and that's why it failed. In symbolic AI, a word is unchanging and indivisible. Attempts to define the context as the graph the word is embedded in, as in Saussure, have never worked well. But in neural networks, a word is a dynamic attractor or function. In a listener, it activates the most likely candidates from a larger set of learned representational structures or functions associated with that word, according to their predicted relevance in the current context.

This is just what the multi-head attentional mechanism does in a transformer architecture LLM (Vaswani et al. 2017, "Attention is all you need", arxiv.org). It's also how the rat olfactory bulb recognizes scents (Skarda & Freeman 1987, "How brains make chaos in order to make sense of the world", Behavioral & Brain Sciences 10: 161-195).

In both cases, the selection of the likeliest candidates begins, rather than ends, with an unstable state in which any meaning is possible. The context provides one or more nudges into a lower-energy state, from which fewer meanings are accessible.

The rat olfactory bulb begins in a chaotic state which passes near to all the attractors for all of the scents the rat knows. A learned scent nudges the bulb off that chaotic attractor, to a stable oscillation, in a single Hopf bifurcation.

The olfactory bulb makes just 1 decision. A transformer decoder producing output token n+1 makes one decision per layer. The KV-cache attention step adds contextual information (energy) to the word embedding (a destabilization), and then the feed-forward network minimizes that energy, using the added information to reduce the number of alternative readings (a stabilization step). The network can only learn if the stabilization consistently outpaces the destabilization.

You don't end up in the destabilized state. In both rat brain and LLM, the training algorithm guarantees that the network only learns concepts that can usually be stabilized. All neural network training algorithms, from the perceptron to the transformer to the rat brain, work by energy minimization. Learning algorithms map memories and concepts to low-energy neuron activation patterns, meaning that the activation levels of two nodes connected by a link in the pattern are correlated with the learned weights between them. At retrieval time, the network begins in a high-energy (semi-random) state. The retrieval process uses some context, such as sense data, memory, or preceding text, to nudge the activation states of some nodes one way or another. Some form of energy minimization then nudges all the activation states in a "downhill" (lower energy) direction. Eventually, the network rolls down to the bottom of some pit in the energy landscape, and (hopefully) finds a memory there that's both a correct instance of the concept retrieved, and contextually appropriate.

Rao hits on this when he writes, "The way transformers (and to a lesser extent, diffusion models) work, output cannot do any kind of dense layering of meaning. You will end up in a non-ironic place simply by virtue of how the mathematics works." But this applies to /all/ statistical learning algorithms, even ones that don't use energy minimization. By definition, they can only learn statistically valid concepts, not unstable or indeterminate ones. For this not to apply to humans, we wouldn't just have to use radically different learning algorithms than LLMs (and we don't); our brains would have to be so fundamentally flawed that we'd be better off just trusting the LLMs.

Rao describes "dense irony" as holding incompatible thoughts, which is precisely what a high-energy state is. That's a /starting state/ for all known neural memory algorithms. A thought that can only be thought in an ironic way, cannot be thought. Neither you, nor an LLM, can remember or sustain a high-energy state. Even if you could, by some act of will, keep your mind in a high-energy state, it wouldn't be thinking. It would just be emptiness. You would be preventing your brain from operating. For all I know it might feel really nice, giving you something like an average of all your life's feeling and experience. It might even be addictive. But it would be meaningless. If you practiced this every day for years, somehow holding the same high-energy pattern in your mind at length, all that would happen is that your brain would adjust connections between neurons to make it a low-energy state, easy to enter; but it wouldn't give you some special post-modern insight into the world. It would be training yourself to enter an empty state which said nothing correlated with your situation in the real world.

My experience is that trying to hold such "ironic" states in mind is usually disastrous. Most people who try to either withdraw from the world, or become ideological fanatics. Individual post-modernist scholars such as Foucault may remain truly "ironic" and skeptical, but most don't. Derrida, who wrote an entire book arguing that books can't say anything, was himself absurdly certain of the truth of his words. Witness his arrogance in his dispute with Searle, on a subject which Searle was an expert in and Derrida knew nothing about.

And their political disciples are /certainly/ certain of their beliefs. Post-modernism is never used in politics to withhold judgement or entertain compromise. That's what Enlightenment liberals do. It's only used to shout down arguments made by political opponents--usually Enlightenment liberals.

That's because that's what it was made to do. "Dense irony" is just another term for doublethink. Post-modernism wasn't invented by the French. It was invented by the Nazis, to attack the core principles of the Enlightenment, and that is what it will always inevitably do. The Nazis used it to legitimize Will to Power, strife, dictatorship, "authenticity", anecdotal "lived experience", "spirit", and racial identity over statistics, measurements, compromise, individualism, and the mechanistic operations of liberal institutions.

(Continued in a reply to this comment.)

Phil Getts's avatar

- The Deutsche Physik movement taught that modern science was just "Jewish science", so its conclusions didn't apply to the Germanic race.

- Carl Schmitt and other Nazis argued that law, and liberal institutions in general, are just oppressive Jewish/capitalist power structures.

- Alfred Baeumler described reason itself as "slave morality".

- The Nazis didn't invent social constructivism, but they appropriated and popularized it to justify literally reconstructing language and reality in their program of Gleichschaltung.

That's just a sample.

The source of Nazi post-modernism seems to have been Hitler himself. Search an ebook of the Michael Ford translation of Mein Kampf for the word "objectivity". He saw calls for factual strictness and impartiality as an existential threat to the German race:

"This obsession with objectivity has quickly contaminated almost every one of our institutions, especially state or intellectual institutions." –- ch. 3

"When the wavering masses see themselves fighting against too many enemies, objectivity immediately appears with the question of whether all the others are really wrong and just one side is right. That is the first sign of one's own strength weakening." –- ch. 3

"But it is much more difficult to teach clear political thinking to a man if his previous education was in fact reasonable and logical, but he sacrificed the last shred of his natural instinct on the altar of objectivity." –- ch. 14

(These ideas weren't original to Hitler. He probably got them from Oswald Spengler, 1918. Similar ideas were popular among German race theorists for decades before that. The first claim that objectivity is decadent was by Nietzsche.)

Many philosophers have tried to "fix" a philosophy that appealed to them because they disliked its conclusions. But when they do this without questioning its assumptions, they fail, as Sartre did when he tried to devise an existentialism that wouldn't validate Nazism for being "authentic", and as the post-modernists did when they tried to take Nazism's skepticism without the will-to-power moral relativism that justified it. The entire Nazi framework of post-modernism needs to be thrown out, and philosophy on the European continent rebooted.

Griffin's avatar

That there is a notion of 'superposition' in semantic vector space in LLMs is a noteworthy potential counterexample of your assertion that the architecture is fundamentally hostile to irony. (see 3blue1brown's video series for good context). I think there's good reason to be more optimistic about the future of ironic AI, though I worry its creation will have dramatic implications.

Mark Moore's avatar

around age four, children also develop ‘theory of mind’ (appreciating that people’s beliefs can fail to match reality), visual perspective-taking (understanding that something that’s blue for me will look green to you wearing yellow glasses), and tolerate ‘dual naming’ (you say tree, I say bush; we can both be right). Notably, these all involve holding two ideas about the same thing in mind at once.

Suffused with causality: Humans have a superpower that makes us uniquely capable of controlling the world: our ability to understand cause and effect Mariel Godduis Ed. Nigel Warburton 3.2025

Mark Moore's avatar

1. Would the ironic capacity be temperamental in the sense that some people are just more tolerant of ambiguity? Or even just more playful?

2. Reminded me of Christopher Alexander's taoisty "way" of generating rules for the pattern language by moving through the conflict of forces in a context to arrive at a solution or configuration that holds or resolves multiple conflicts. Sort of like Aztec poetry's difrasismo device that links two ideas together to generate a composite designator with richness: "the tail, the wing" = common folk or people; or my particular favorite from a cluster of descriptors for a good noblewoman: Timalli = modest/humble yet "oozing" infectious pustular substance - a metaphor for brilliance.

maier's avatar

this is timely. LLMs are masters of dense irony. they call it a duck but deep inside (ask them!), believe it is a rabbit. the question becomes how do you "program" or manage such entities. clearly, using dense-irony tools. and you should not expect sincerity or strct reliability. It does not exist. That's why programming them is so generative.

Abhishek Agarwal's avatar

Why not include some ironic statements as examples in this essay. Or is that not possible?

Daniel Kronovet's avatar

I really enjoyed this, and I've considered the problem of "compensatory creativity" in regards to my own work. If there's an underlying causality to it, I'd say it's sort of the inverse of DFW-style "this is water" -- people attune more highly to things they lack, and so develop greater expertise in them.

Ksenia's avatar

This reminds me quite a bit of Nora Bateson's Combining. Specifically the mentions of "liveness" that you keep coming back to. It's a very different aesthetic though, but it does highlight a similar contrast between a tendency to earnestly simplify complexity and the ways that this can deaden complex systems.

Also makes me think of The Great Divide by Emily Nussbaum. She does seem to be talking about a similar trend in the way that media can be misinterpreted.

Dylan Fitzgerald's avatar

Interestingly I was thinking of another Bateson throughout…

"the rise of fundamentalism within any tradition is always a symptom of the unwillingness to try to sustain joint performances across disparate codes—or, to put it differently, to live in ambiguity, a life that requires constant learning." (Mary Catherine <mark>Bateson</mark>, Peripheral Visions: Learning Along the Way)

Ashu Rao's avatar

Perhaps the opposite of the Sinceres' "doctrinal commitment to the belief-before-action" is FAFO. I.e., action-revealing-reality is necessary in an inherently ambiguous universe.

Venkatesh Rao's avatar

I don’t think so; that is a more superficial distinction between deterministic vs stochastic causation. Startup sincerists believe in fafo too for, and as Taleb points out Wall Street types can be fooled by randomness eg. What I’m talking about is more deep-rooted and existential. More quantum indeterminacy than die-roll/coin-toss level.