vgr: The Twitter Years (2007-22)
I made my twitter into a very nice online book
I just finished my most complex vibe-coding project yet: Converting my twitter archive into a book comprising 101 of my best threads plus a chapter with 396 of my best single tweets.
The online version is free and live at venkateshrao.com/twitter-book. Here’s a brief excerpt from the newly written Preface chapter, to give you a taste:
This book is an attempt to capture the essence of my 15 years as an active twitter user (I’m going to use the lowercase spelling except when referring to named subcultures within twitter), under the handle @vgr, in a form that does not entirely murder the spirit of the live experience of being there, enmeshed in hundreds of live-wire conversations unfolding over years, through an era when the platform was the place the narrative of our world unfolded. In the chapters that follow, you’ll find a compendium of a few hundred of my best single tweets (Chapter 1), and 101 of my best threads (Chapters 2-102). That’s a small fraction of the 150k+ tweets I posted through the years this book covers, but hopefully it’s an interesting distillation. I’m still on there, though I mostly only browse the feed. I no longer post actively except for the rare boost of stuff I, or friends, are up to elsewhere.
Through 2007-22, twitter was neither the biggest social media site, nor the most representative. But it was the most consequential place, not just on the internet, but arguably the planet. Entire political regimes rose and fell, careers were made and destroyed, vast cultural movements arose and died down. Other platforms may have featured more aggregate activity, and accumulated orders of magnitude more social dark matter, but twitter was where events broke into the main currents of history. The suburbs of reddit may have accumulated deep intelligence, but twitter was where some were promoted to historic consequentiality. Facebook ads and groups may have shaped elections, but twitter was where we collectively decided what it all meant. YouTube might have been where endless warrens of conspiratorial imaginaries were constructed, but twitter was where we determined which ones were going to shape the almighty Discourse. 4chan might have produced many world-changing memes, but twitter was where that world-changing actually played out.
But twitter was more than a distribution zone for culture manufactured elsewhere. Increasingly, it became the site of cultural production. As a blogger who initially signed up to promote my posts on Ribbonfarm, I initially thought of it as a successor to RSS. Dumb pipes, just stochastic rather than deterministic. But it quickly became clear that was an absurdly bad mental model. Twitter was the tail that would come to wag the rest of the social-media dog. You can read my very early 2007-era understanding of twitter in this old blog post, The Twitter Zone and Virtual Geography. Now, nearly 20 years later, that mental model feels, not wrong per se (it was sophisticated for its time), but charmingly naive. What we thought was a low-stakes global office watercooler turned out to be the site of future epistemic world wars, in which the fates of civilizations would be decided.
Read the whole Preface here.
The book weighs in at 119,000 words (the future print version will be approximately 350-400 pages). This represents about 0.26% of my twitter data. I have plans to transform the rest of my archive into some sort of MCPified queryable oracle thingie on IPFS (probably merged with ribbonfarm archives), but right now, the site is already set up to be very LLM-friendly. Give the link to your LLM and you’ll be able to chat with it about the contents.
This is a production beta. Please post comments with typos and other issues here. I will be doing further clean-up gradually, though this is already pretty clean.
This online version is pretty snazzy. You can hover over the link emoji to the right of any included tweet or chapter title to copy it for sharing. I deliberately chose not to include likes/retweets data, in part because it made the presentation look cluttered, and in part because the data is obsolete anyway since this archive is from late 2022 when I stopped posting. But mainly because I think this new form factor allows the focus to be on the actual content of what I was posting rather than the stale social proof indicators attached to it.
Print and ebook versions are next (make sure you’re subscribed to this email list to be alerted when those versions are available to buy).
Conversational Context
The hardest problem was deciding what to do about conversational context. Ultimately I decided not to include anyone else’s tweets, but instead link them in footnotes. Not just for copyright reasons, but because the information presentation problem suddenly gets very complex. Yes, this butchers the nonlinear conversational nature of twitter at its best, but on balance I figured this butchered serialization was the right way to do this.
The full twitter experience is not really serializable into a book-like artifact, and I decided not to try. Maybe if enough users from the twitter years do what I’ve done (download their archives and host it in an LLM-friendly public-commons form), someone could do a larger project recreating a kind of time-capsule theme park version of at least pockets of old twitter, frozen around 2022 November before Musk took over. As far as I can tell, X is not going to be friendly or supportive for such a project, not just because it is politically hostile to old twitter, but because that historical data is now a competitive advantage for training Groq.
But you have rights to your slice of it too, so you should do something like this if you think your archives are valuable for completing the larger picture of what old twitter was like.
I think there are a few projects like this already underway. Somebody reached out at some point asking me to put my archives on their site, but I’m wary of that re-aggregation approach. I think it is best if we all individually put our archival digital selves online like this, and made it public. That way, we don’t trade one aggregation play for another.
Archival Selves
More broadly, this project was the opening battle in what I consider a longer campaign to craft an “Act 1” archival self of myself, based on my online activity, 2007-2024, up to when I retired ribbonfarm. Call it vgrAct1.ghost or something (is .ghost a tld? It should be). A conceit perhaps, but also fun. It’s going to include all of ribbonfarm content (the boss level), my book Tempo, this newsletter’s content until the Contraptions rebrand, plus perhaps also Quora content and other random stuff I have scattered around.
It feels weirdly liberating to archive even a small slice of a past self this way. Also practically useful, since my memory has gone from exceptional to shitty in the last few years, and I’d like a version of myself with better memory to talk to as I get older. The online version of this book with its planned oracularized features is going to be more than just a bookified mirror of my twitter account. It’s going to be a personalized prosthetic memory for me.
Production Backstory
Though the final outcome hopefully looks like a well-designed conventional book, the production process was anything but.
It took a serious amount of wrangling with specialized scripts to extract and process my best threads (fortunately I’d made an index thread of threads towards the end which helped) and surface the best single tweets. There was also a lot of grimy data cleaning to do, handling images and links and so on. Plus broken threads to patch, quote-continuations, etc.
Some things I had to give up on. I used to run a lot of really fun polls, but that data seems incomplete in my archives (the questions are there but not the voting data). I also gave up on the video-heavy threads I did around my robotics tinkering and a few other kinds of book-unfriendly content.
There were two natural phases: A one-time extract/normalize/clean process to produce well-staged raw material and then an iterative build process that slowly constructed the book the way I wanted. If you want to do something like what I did, I suggest something similar.
It’s the sort of technical work I’m good at managing but would hate to have to do myself.
I started out in ChatGPT, having it generate code in the web chat interface and cutting and pasting the python scripts into my code editor and running the build processes myself in the shell. This was janky and error-prone, and also at some point the chat got so long, ChatGPT started choking and getting deeply confused.
This week, I finally migrated the project over to Claude Code and the difference was night and day. I was able to finish the project smoothly over just 4-5 hours (and about $35 worth of tokens — less than I’d have paid a programmer for a single hour of time).
You can look at my code on GitHub if you want. It’s not really reusable since it’s a pretty bespoke pipeline built to suit my archive and twitter style (lots of threading etc) but it might serve as a good reference design to do your own. My suspicion is that twitter was open and messy enough, no one-size-fits-all pipeline for bookifying people’s archives will be possible. But maybe someone can figure out a good 80-20 type solution that is a good starting point for almost everybody.
This project was only one of several I got going this week. I was dragging my feet over jumping onto the Claude Code bandwagon because I knew I’d go hypomanic with it once I did. But I finally jumped in, and yes, I did go hypomanic as almost everyone who tries it seems to. I’ll post about my other projects in the coming weeks. This archival-self category of projects is fun, but the real fun is scaffolding my Act 2 self, to support my current and future projects, as well as figuring out things to do that weren’t possible at all before AI. Complex configurancies are cooking.
Fire ze slop cannons!



Freakin' 😎