Posted by Alumni from Substack
May 20, 2026
If you look at the architecture of the modern AI boom, it is heavily bifurcated by modality. In the visual domain, we are entirely ruled by diffusion models. From Midjourney to Stable Diffusion to OpenAI's Sora, the paradigm of starting with pure noise and iteratively denoising it into a high-fidelity image or video has proven to be unreasonably effective. But in the realm of text, diffusion has historically been an afterthought. Large Language Models (LLMs) like GPT-4, Claude, and LLaMA are staunchly autoregressive (AR). They are sequence predictors. They look at the context, predict the next token, append it to the context, and repeat. It is a strictly left-to-right, causal process. For years, the consensus was simple: autoregression is just the native physics of language. But this sequential paradigm has glaring pathologies. Because AR models generate blindly from left to right, they cannot easily engage in global planning. If they make a slight logical error early in a sequence,... learn more