Posted by Alumni from TechCrunch
May 9, 2025
We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 'We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro! Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. For example, caches can store answers to questions users often ask of a model, eliminating the need for the model to re-create answers to the same request. Google previously offered model prompt caching, but only explicit prompt caching, meaning devs had to define their highest-frequency prompts. While cost savings were supposed to be guaranteed, explicit prompt caching typically involved a lot of manual work. Some developers weren't pleased with how Google's explicit caching implementation worked for Gemini 2.5 Pro, which they said could cause surprisingly large API bills.... learn more