Your LLM Was Never Supposed To Beat an Atari at Chess
It's okay. I suck at chess too. Cue the dunking.
By now, everyone’s seen the headline that ChatGPT (specifically the 4o model) was beaten at chess by the ‘AI’ on a 1970s era Atari 2600 chess cartridge.
The implication, of course, is that these large language models (LLMs) aren’t “intelligent” because they can’t beat something built on 8-bit plastic and hopes. But that completely misses the point, because what an LLM is built for isn’t what a chess engine is built for.
As an experiment, I recently played a full game against a bot on Chess.com named “Dave”, which I’m pretty sure was the lowest level option I had, with GPT-4o as my coach. I provided each move, described what I was seeing, and ChatGPT walked with me turn-by-turn through the game:
Suggesting strategies, sometimes with more than one option depending on how aggressively I wanted to troll my fake opponent
Tracking piece locations (with moderate success; as the game went on, it would often forget which squares were occupied, especially by the opponent’s pawns)
Keeping morale high, laughing off our Ls while holding out hope for a victory
Explaining every moment of psychological warfare (and later, grieving our loss together while suggesting we rally to kick Dave’s ass again in a subsequent match, because I’ve trained my version of ChatGPT to never say die)
Did it blunder occasionally? Sure. Did it hallucinate ghost pawns? Absolutely.
But it also went into detail about why it suggested the moves it suggested, explained why the opponent made the decisions it did and on some level, actually made me want to understand chess.
LLM vs Chess Engine: What They’re Built For
After the match, I asked ChatGPT to put together a table on how engines trained specifically to play chess operate, versus LLMs that have mostly learned to predict the next move from reading match summaries and the like:
When I was playing Dave, my LLM coach didn't try to be Stockfish. It acted like a teacher, a strategist, and occasionally a chaos demon. It explained why certain moves were good, what kind of traps I might fall into, and how to think through each fork in the road, whether tactical, psychological, or existential.
GPT-4o and I were just vibing, while Dave was likely backed by a neural network trained on billions of moves. (Dave is synonymous with kingpin; chess moves on your checkerboard, king him). Seriously, of course a purpose-built system trained to play 30 moves ahead is going to wipe the floor with my textual helper monkey.
But what can I learn from handing off the game to that?