I’ve discovered that Microsoft Advertising is rolling out a captivating new feature that could transform how we see Shopping campaigns in Bing search results. These multi-image ads offer eCommerce brands a unique opportunity to showcase their products more vividly, potentially capturing shopper attention even before they click.
What’s new. Now, I can include multiple product images in a single Shopping ad, allowing shoppers to preview various angles, styles, or variations directly within the search results. This approach could be a game-changer for advertisers.
The design is crafted to enhance visual engagement and provide more informative ads. It allows consumers like myself to quickly compare options without the need to leave the results page.
How it works:
I can upload additional images using the optional additional_image_link attribute in the product feed.
There is an option to include up to 10 images, which I can separate by commas.
The images will appear alongside pricing and retailer information in Shopping results.
Why we care. From my perspective, multi-image ads have the potential to boost engagement and purchase intent by offering a more comprehensive visual representation of a product. More imagery can highlight features, colors, and design elements that a single image might miss.
Discovery. This feature was initially noticed by digital marketer Arpan Banerjee, who shared it on LinkedIn.
The bottom line. For retailers like you and me, multi-image Shopping ads provide more creative freedom and give shoppers a richer context immediately. This shift has the potential to enhance ad performance and reshape how products are presented in search results.
As an ecommerce enthusiast, I know how crucial it is for our products to be easily understandable by AI systems. In today’s visually-driven market, designing images that AI can interpret accurately, from OCR-ready labels to visuals aligned with sentiment, is essential.
The power of images and videos to tell complex stories instantly is unparalleled. In our digital store, these visuals are not just content—they are tools that aid in making purchase decisions.
Generative search systems capture objects, embedded text, and style to deduce potential use cases. Language Learning Models (LLMs) then bring to light the assets that best respond to a shopper’s inquiries. Essentially, each image becomes structured data that breaks down buying barriers, amplifying discoverability in multimodal searches when someone takes a photo or uploads a screenshot.
Visual search as a shopping behavior
Our customers often use visual search for quick decision-making: snapping photos, scanning labels, or comparing products to decide “Will this work for me?” It’s vital that our photos fulfill this need, showing scale, size cues, real colors, and comparisons.
Multimodal search reshaping behaviors
With visual search on the rise, Google Lens handling 20 billion monthly queries mostly from younger users, it’s a clear sign of changing behaviors. These behaviors fall into distinct intent categories.
Quick capture and identification
Taking a photo to identify an item (like “What plant is this?”) helps with quick recognition and troubleshooting, accelerating issue resolution and product verification.
Visual comparison
By showing a product and asking systems to “find a dupe” or analyze “room style,” we bypass complex descriptions, promoting faster cross-category shopping and suitability checks.
Information processing
Displaying ingredient lists or foreign texts prompts real-time data conversion, avoiding manual reentry or the need for alternative instruction sources.
Modification search
Asking for product variations like “this but in blue” allows for specific attribute searches without chasing model numbers, indicating a shift from text-based navigation to visual exploration.
Multimodal AI has made instant recognition, decision support, and creative exploration accessible, reducing friction in ecommerce and information journeys.
You can check a detailed table of multimodal visual search types here.
Prioritizing content and quality for purchase decisions
We must ensure that our product images spotlight the details customers care about, like pockets or stitching. Images convey these abstract ideas authentically, prompting shoppers to answer questions such as whether a particular style is suitable for them.
Original images are crucial; they highlight effort, uniqueness, and skill, making our content more personable and credible.
Making products machine-readable for image vision
For products to be machine-readable, all visual elements need to be easily interpreted by AI. This begins with the design of images and packaging.
Products and packaging as landing pages
Ecommerce packaging should be crafted like a digital asset, thriving in a world driven by multimodal AI searches.
If AI or search engines fail to read packaging, the product might as well be invisible at the peak of consumer interest.
Designing for OCR-friendliness and authenticity
Google Lens and leading LLMs employ optical character recognition (OCR) to extract and index data from physical goods. Therefore, text and visuals on our packaging need to be OCR-friendly.
Use high-contrast color schemes—black text on white backgrounds is ideal. Ensure that critical information is in clean, sans-serif fonts on solid backgrounds without patterns. Treat physical product labeling with the same care as a landing page, much like Cetaphil does.
Avoid these common errors:
Low contrast.
Decorative or script fonts.
Busy patterns.
Curved or creased surfaces.
Glossy materials that disrupt text visibility.
Document OCR fail points and analyze why they occur. Run a grayscale test to ensure text remains legible without color.
Add a QR code to each product for direct access to a webpage with structured, machine-readable HTML information.
High-resolution, multi-angle product images are optimal, especially for items needing authenticity checks. Genuine photos excel in accuracy and credibility, outperforming AI-generated images.
In an AI-driven context, it’s about more than just your product. AI builds contextual databases, examining every object in an image, which helps infer the brand’s market position.
Elements like props, backgrounds, and adjacent items fine-tune our brand’s digital persona. With each visual placement, we send out signals—be it luxury, sportiness, or utility—all influencing the brand’s perception machine-wise.
Guarding these adjacency signals is now intrinsic to brand management. Strategic curation helps AI accurately interpret our brand’s value, setting us up to appear in high-value conversational queries.
Conduct a co-occurrence audit for brand context
We should set up processes to evaluate brand context for multimodal AI searches systematically. Using tools like AI Modes, ChatGPT searches, or similar LLM models, gather relevant lifestyle or product photos to input into these systems. A prompt like:
“List each object in the image. From these, describe the potential owner.”
This step enriches our understanding of the machine’s narrative, helping us adjust any disconnects, like misaligned perception due to unintended signals. From there, we craft specific guidelines for props, contextual elements, and visual do’s and don’ts for our creative teams to safeguard brand narrative.
Refining this alignment ensures that machines perceive our brand consistently with our strategic goals, bolstering our presence in new-gen search settings.
Brand control across the visual layers
Using the brand control quadrant, we efficiently manage brand visibility through machine interpretation, focusing on four key layers—some we own outright, others we can influence.
Known brand layers
Here, we have visuals like official logos and branded imagery, which are typically controlled and recognized by both our audience and AI.
Visual strategy:
Create a visual knowledge database.
Regularly evaluate adjacent objects in brand visuals.
Develop an “Object Bible” to avoid narrative misalignment, ensuring lifestyle cues uphold our brand image.
Latent brand
These include “wild” images like user photos and social posts that can lead to unexpected inferences about our brand’s standing.
Audit these occurrences to prevent unintended associations.
Shadow brand
This involves old brand assets and materials that could be unintentionally made public, influencing AI’s interpretation of us.
Audit all public archives for outdated visuals; remove or update them.
Ensure that current branded visuals reflect our strategies.
AI-narrated brand
AI synthesizes narratives by blending visual and emotional cues with text, which could introduce competitor tones or mismatched perceptions.
Visual strategy:
Use AI tools like Google Cloud Vision to verify tonal alignment.
Adjust mismatched assets to ensure narrative cohesion.
Sentiment alignment: balancing visual tone and emotional context
Beyond supplying information, images capture emotion and attention within moments, shaping customer perceptions.
In AI-driven searches, this emotional resonance becomes a direct signal, evaluated for emotional tone, sentiment, and context.
The affective quality of each image is assessed by LLMs, along with sentiment and contextual tone to match content with the user’s emotional state and intent.
We need to deliberately design and inspect our imagery’s emotional tone, using tools like Microsoft Azure’s Computer Vision API to:
Score emotions in images broadly.
Assess facial expressions for emotion probabilities, allowing imagery to be accurately targeted—like promoting calmness in a yoga line or confidence in business wear.
Align image emotion with marketing targets. Ensure the imagery arouses the right emotions and resonates with our audience.
Start by recognizing the emotional baseline in your imagery, rigorously testing for consistency with AI tools.
Matching your brand narrative with AI perception
We must focus on authenticity in product photos, ensuring every asset is designed for machine-readability and maintaining visual context and sentiment meticulously.
Treat packaging and online visuals as digital assets; conduct regular audits for object proximity, emotional tone, and clear identification.
AI will craft a narrative for our brand with or without guidance, so it’s essential to ensure every visual aligns with the intended story.
As I explore the latest updates to ChatGPT, I’m excited to share that it now incorporates more images into its answers, bringing a fresh, multimodal approach to search. This enhancement makes images just as vital as text for exploring brands and products.
OpenAI has unveiled this visual upgrade, which pulls images from the web to enrich answers about a variety of topics, such as people, places, and products. It’s a fascinating development that shifts ChatGPT from providing simple text responses to offering a more interactive search experience.
How it works. With this update, ChatGPT becomes more than just a text generator. It now offers a search experience similar to what I’m used to:
Images will appear when they add clarity to the information.
These images, sourced from the web, align with the most relevant text.
If I’m curious about an image, clicking on it expands it to its original size and shows the source.
Where it’s live. The rollout of this update is occurring globally, and I’ve noticed it gradually becoming available across all ChatGPT plans that I access:
I’ve used it on web, iOS, and Android platforms.
It’s important to note that it only works with responses created by GPT 5.1.
Why we care. I realize that search is evolving to be more multimodal, integrating text, images, videos, and audio. Beyond ensuring that my brand is part of AI-driven replies, it’s crucial to consider how our visuals show up when ChatGPT responds to queries.
I understand that today’s consumers are constantly bombarded online.
I mean, I too find myself scrolling YouTube Shorts, tracking TikTok influencers, navigating Gmail promotions, and doubting if that viral Facebook video is real or AI-driven—all before I even have lunch!
The path from intent to conversion used to be straightforward, but now, in this attention-driven economy, making purchase decisions has become a complex affair.
Yet, many advertisers haven’t adapted to this reality. They still focus solely on search-based intent, missing out on entire audiences who don’t make it to the search bar.
Google’s Demand Gen campaigns are my secret weapon here, allowing me to escape this trap by fostering discovery and condensing the sales funnel.
Success isn’t complicated, but it requires mastering three elements: engaging creative content, strategic audience outreach, and rigorous testing methods.
The Demand Gen Opportunity
I see Demand Gen as the perfect blend of Google’s visual placements like YouTube, Gmail, and Discover matched with refined audience targeting and creative optimization.
Think of it as social advertising uniquely adapted for Google’s ecosystem. These campaigns tap into users’ browsing habits rather than their search activities, making them ideal for raising brand awareness.
Consumer behavior has undeniably shifted towards visual discovery, demanding more consumer touchpoints before sealing the deal.
YouTube, after all, is a largely visual platform and is now the second-most-used social media platform with a whopping 2.6 billion users worldwide.
In this new landscape, the purchase funnel is not only noisier but also more complex.
Unfortunately, many marketers still treat Demand Gen like search, expecting instant conversions—a mindset that misses the point.
To me, Demand Gen is about breaking consumption patterns, igniting interest, and nurturing intent over time.
Marketers who can shift their mindset will see their performance compound, growing stronger with each impression.
This is my go-to guide for nailing Demand Gen campaigns right from the start.
Element 1: Creative That Commands Attention
Thanks to modern tools, creating high-quality assets no longer requires expensive agencies.
And this matters—a lot. Visual content is a major conversion driver.
YouTube viewers are twice as likely to purchase something they’ve seen in a video and four times more likely to seek new products on the platform.
If advertisers don’t master visual storytelling, they’ll miss speaking the language of today’s consumers.
The Four-Part Framework for Demand Gen Creative
Crafting successful creative assets doesn’t have to be a guessing game. The best assets adhere to a four-part framework:
Grab attention immediately: Capture interest within the first three seconds to stop that scroll.
Build brand recognition: Maintain a consistent visual identity across all placements to fortify brand recall.
Create emotional resonance: Make the viewer feel something meaningful.
Provide clear direction: Guide viewers on what to do after watching.
Testing Creative Approaches
I believe testing is pivotal in refining creative content. Experiment with various types like educational, product-focused, and testimonial formats.
Educational content is great for awareness at the funnel’s top, while testimonials enhance consideration mid-funnel and product-focused creatives encourage conversion at its base.
Finding what resonates with your audience is key, and optimizing for each unique platform—what works on YouTube may not on Gmail—is crucial.
Element 2: An Audience Strategy That Matches Intent
I always think of audience strategy as an extension of creative development. Every audience is unique and should be addressed differently at various funnel stages.
Before spending a dime, I make sure to identify who my audience is and the actions I want them to take.
To do this, I start with the classic reporter’s questions:
Who is your target audience?
What are you trying to convey?
Where do they find their information?
Why would they care about your message?
Once audiences are defined, I align messages to their respective stages, aiming to guide them smoothly through the journey.
My goal is to nudge them to the next step without rushing them into a conversion.
Having set up my Demand Gen ads, it’s time to delve into testing and optimization.
Variables abound in these campaigns; hence, I meticulously test one element at a time for clarity and precision.
This endeavor isn’t about pinpointing one solution but focuses on persistent optimization. Trends change, and what works today may need tweaking in a few months.
Establishing Testing Parameters
I typically classify my testing into three main categories:
Creative: Discover which creative elements resonate more. This could include content types, hooks, or video styles.
Placement: Determine which approaches work where by testing on Gmail, Discover, and YouTube.
Audience: Compare performances across differing audiences, such as custom vs. lookalike or remarketing vs. prospecting.
As I continue testing, performance trends inform future creative, messaging, and placement choices.
Consistently successful approaches allow scaling through budget increases for particular placements or audiences.
Set Realistic Time Horizons
Initial Demand Gen outcomes don’t reflect longer-term impact. Brand awareness takes time to build.
I advise allowing a 60 to 90-day period for campaigns to stabilize and gain traction.
Why Demand Gen Campaigns Fail
Failures in Demand Gen execution are rare. More often, it’s mismeasured and prematurely abandoned campaigns that falter.
This leads many away from Demand Gen entirely.
Here’s how I steer clear of prevalent missteps:
Unrealistic Expectations
Many start Demand Gen campaigns expecting similar returns to those of direct search campaigns.
Once those high expectations aren’t met, campaigns get abandoned.
The remedy is setting realistic expectations from the start.
Demand Gen builds brands and fills sales funnels, providing compound results if given the room to operate.
Measurement Myopia
This often accompanies unrealistic expectations. Relying solely on last-click attribution undervalues Demand Gen’s impact.
I suggest considering these alternatives:
Use platform comparables: A Google Ads metric similar to social ads’ view-through method.
Observation mode: Incorporate Demand Gen audiences into search campaigns to track if brand searches rise.
Holistic brand metrics: Evaluate if brand growth is happening across channels, indicative of brand awareness.
If only last-click returns are considered, you undervalue your efforts.
Unrealistic Timelines
Don’t halt campaigns within 30 days if results disappoint, and avoid hasty changes.
I stay committed to a 60 to 90-day evaluation period while managing stakeholder expectations regarding timing.
Master Discovery to Win the Future
Attention is at its peak, and the progression of paid media leans towards visuals and discovery.
Brands sticking to search will face growth challenges.
Success in this terrain relies on three pillars:
Engaging creative.
Thoughtful audience targeting.
Consistent testing.
Together, they foster performance and grow brand awareness.
The competitive edge will favor those mastering discovery today.
Large budgets aren’t essential for starting. Commitment to principles and patience with results suffice.
Demand Gen campaigns can embed your brand in your audience’s daily online life.