
Have you ever wondered what it would be like if Google knew exactly what you wanted to search for even before you started typing? Well, that’s the future Google is aiming for.
Currently, Google is pushing this innovation onto our devices with small AI models that rival much larger ones in performance.
What’s happening. In a recent research paper presented at EMNLP 2025, Google researchers have introduced a groundbreaking approach. By dividing “intent understanding” into smaller, manageable steps, they have enabled small multimodal LLMs (MLLMs) to deliver results comparable to more powerful systems like Gemini 1.5 Pro. These models operate faster, at a lower cost, and crucially, they keep data processing on the device.
The paper, “Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition,” details how Google deduces user intent based on their interactions with apps and websites, such as clicks, scrolling, and screen changes over time.
The future is intent extraction. Presently, most large AI models infer intent from user behavior via the cloud, leading to speed, cost, and privacy issues. By dividing the process into two straightforward steps, Google addresses these concerns effectively with on-device models.
Step one: Each interaction is individually summarized. The model records what appeared on the screen, what action the user took, and a preliminary guess of their intent.
Step two: Another model reviews these summaries, focusing solely on factual information. It dismisses guesses and formulates a concise statement outlining the user’s overall goal for their session. This targeted approach prevents the common pitfalls when smaller models are asked to process long chains of actions at once.
How the researchers measure success. Success is determined with Bi-Fact, where small models employing the step-by-step strategy consistently outperform other small-model methods, as evidenced by their F1 scores.
Models like Gemini 1.5 Flash, despite being only 8B, match the performance of the Gemini 1.5 Pro on mobile data. Errors diminish since unfounded guesses are removed, speeding up operation and reducing costs compared to large cloud-based models.
How it works. Intent is analyzed by breaking it down into distinct facts, identifying missing or fabricated details. This process reveals how and where understanding fails, offering insights into how systems misinterpret meaning and miss crucial information.
The research further shows that noisy training data impacts large end-to-end models more significantly than this structured approach. The decomposed system remains robust against the unpredictability of real user behavior.
Why we care. For Google to develop tools that suggest actions or answers before a query is entered, understanding user intent from behavioral patterns across apps, browsers, and screens is essential. This research is a major step towards that vision. Although keywords will remain important, optimizing for clear, logical user paths will take precedence over mere query inputs.
The Google Research blog post. Small models, big results: Achieving superior intent extraction through decomposition
Inspired by this post on Search Engine Land.


Leave a Reply