This week I want to start close to home. At LayerLens, we announced the Stratix Cup, a live tournament in which frontier AI models play soccer in a simulated environment. Season 1 brings together 16 models organized into four groups, with each model writing code to control a full team of players. Matches unfold in two halves, and models can adapt their strategy at halftime based on what happened on the field. It is, admittedly, ridiculous in the best possible way: models chasing space, collapsing under pressure, inventing strange formations, and occasionally self-sabotaging in public. But the playfulness hides an important point. Evaluations need more arenas. Most AI evals still behave like school exams: static, individual, decontextualized. They ask models to answer questions, solve coding problems, summarize documents, or reason through puzzles. These are useful, but they are incomplete. Soccer imposes a different discipline. It tests multi-agent planning, tactical adaptation,...
learn more