Josh Blyskal from Profound came to Tech SEO Connect with something we don’t see enough of in this space: actual data at scale. His team analyzed over 250 million AI responses and 3 billion citations across eight major answer engines to understand what actually drives visibility in AI search.
The findings challenge some assumptions and confirm others. Traditional SEO metrics matter—but they only predict about 4-7% of citation behavior. So what’s driving the other 93-96%? That’s what Blyskal spent his presentation unpacking.
The New Reality: Answer Engines Arbitrate the Relationship
Blyskal opened with a framing that should be obvious but often isn’t stated clearly: answer engines now sit between users and websites. They arbitrate whether your brand, product, or service deserves to make it into the final answer.
The scale is staggering. ChatGPT is the fifth most-viewed website in the world. Google AI Overviews has 2 billion monthly active users. “Search is undergoing a fundamental shift,” Blyskal said. No argument there.
So what drives this new relationship? His team analyzed 1,311 pages to measure how traditional SEO metrics correlate with AEO performance. Referring domains, total backlinks, authority score—there’s a relationship, but the R-squared values were brutal: 0.066, 0.044, and 0.009. These metrics predict only 4-7% of citation behavior.
“Traditional SEO metrics are essential, but they’re not going to solely drive your AEO visibility gains,” he concluded. The residual analysis looked like a scatter gun—massive over- and under-performance that SEO metrics couldn’t explain.
What Answer Engines Actually See
Before deciding whether to cite you, answer engines see only a few pieces of data: URL, title tag, meta description, and a content snippet. That’s your billboard. Blyskal broke down what the data shows about each.
URLs
Natural language URLs—5-7 words that describe what the page is about—earned 11.4% more citations than comparable pages with random CMS-generated strings. Answer engines want to know: does this URL suggest I’ll find the answer here?
Even more interesting: semantically aligning your URL to the query format (not just the topic, but framing it as a question) can provide up to a 5% citation gain. This held across tens of thousands of pages.
Titles
Semantic title alignment showed an even stronger effect—up to 6% citation gain for well-aligned titles. The relationship was tight and consistent across the dataset.
Meta Descriptions
This one was counterintuitive: spoil your content in your meta description. Blyskal shared a case study from the corporate card space where the top-cited page literally used the meta description as an answer snippet: “The best corporate card is A, B, C, and D.” That page had nearly 2x the citation rate of competitors.
The meta description is now a marketing vehicle for answer engines. Give them the answer upfront.
Semantic Chunks: Answer Engines Don’t Summarize
A critical technical point: answer engines do not have a summarization feature in the RAG pipeline. They can only extract a certain amount of text at a time. If your answer is spread across five paragraphs, you’ll miss the opportunity to influence the response.
“You have to condense, densify that specific answer,” Blyskal said. Make sure it fits within the snippet size the engine can extract.
The Recency Bias Is Real
This stat should change how you think about content calendars: 50% of top-cited content is less than 13 weeks old. 75% is less than 26 weeks old.
Publishing cadence is now a competitive advantage in AEO. The reason goes back to query fanout—answer engines inject freshness requirements into their searches even when the user doesn’t ask for it. “Best hot yoga spot in San Diego 2025.” “Best corporate credit card December 2025.” The freshness signal is baked into the retrieval process.
Content That Wins: Opinion Beats Listicles
Here’s a shift: listicles, which were the number-one hack six months ago, are now the second most-cited content type. Blog and opinion content has taken the top spot.
Together, listicles and blog content account for 61% of all net citations. But the trend is toward what Blyskal called “hyper-opinionated content.”
Why? Answer engines are lazy. When asked a question (outside of deep research mode), they need to go to the web in three seconds, find a framework of analysis, and present it as their own opinion. Blog content that says “I tried 50 of them, here’s how to think about it” provides that framework. Listicles provide the specific recommendations. The engine grabs from both and synthesizes.
Blyskal shared a case study with Ramp, the fintech company. By writing opinionated content with explicit frameworks of analysis, they 7x’d their citation rate within a month.
UGC Is Now a Search Surface
Reddit is the number-one cited source in AI search, aggregated across all models. First in Perplexity, second in AI Overviews, second in ChatGPT, first in AI Mode.
Blyskal’s interpretation: Reddit is an analog for “what do real people think.” When ChatGPT increases or decreases Reddit citations, it’s adjusting how much it values human opinion in answers. Your brand’s visibility is now impacted by what happens in Reddit threads and Quora discussions.
YouTube is the second most-cited source overall. First in Google AI Overviews, second in Perplexity, third in AI Mode. The key factor: platforms that can see transcripts cite YouTube heavily. ChatGPT can’t see transcripts, so YouTube citations there are minimal (138th).
Interesting Perplexity behavior: it cites YouTube videos with far fewer views than other platforms. 37% of Perplexity’s YouTube citations have less than 1,000 views. It’s doing semantic matching, not popularity matching.
Query Fanout: The Real Targeting Layer
Six months ago, Blyskal reported 19% overlap between ChatGPT and Google at the domain level. With millions of query fanouts now analyzed, that number is 39%. Still not strong alignment, but higher than previously understood.
And Bing? “Bing isn’t driving anything,” he said. About 9% of citations. “Bing is not part of the story anymore.”
One key finding: LLM attention is spread more equitably through the SERP than human attention. There’s real value to ranking lower—if you’re position 8 or 10, you can still win citations because answer engines don’t have the same position bias humans do.
But you do need to be indexed. If you’re not in Google, you won’t be in ChatGPT’s SERP API results. No-index pages create headwinds for citation.
Blyskal shared a practical example of query fanout value: a food industry client wanted to rank for “where to buy pumpkin or cinnamon ingredients suited to fall cocktails.” The query fanout revealed the actual retrieval target was “gourmet spice shop”—not pumpkin, not fall cocktails. Without fanout data, they would have optimized for the wrong terms entirely.
Commerce: The Move from Scraping to Product Feeds
Blyskal made a prediction about commerce in AI search: we’re moving from scraping to product feeds. The current method—ChatGPT trying to scrape Macy’s site to see what belts are in stock—will look antiquated within a year.
The future is direct data exchange. ChatGPT already has an Uber connector—when you ask for a ride, it doesn’t scrape Uber’s frontend. It connects to the backend API. That model will extend to retail.
For now, the actionable advice: think like a Fortune 500 even if you’re not one. Think about your products as JSON. Field completeness matters—if you’re missing the dimensions field, you’ll miss long-tail queries like “compact suitcase for travel.” The engine won’t know your product qualifies.
One striking data point: FAQs showed up 848% more often in top-cited product pages than bottom-cited pages. Video appeared 103% more often. Product ratings were 36% higher. These are the factors that correlate with product page citations.
My Takeaways
Blyskal’s talk was the most data-rich of the conference. When someone analyzes 250 million responses, you pay attention to the patterns they find.
What I’m implementing:
- SEO metrics are necessary but not sufficient. They predict only 4-7% of citation behavior. The other 93% comes from content factors, freshness, and platform-specific dynamics.
- Optimize URLs and titles for semantic alignment. Natural language URLs earn 11.4% more citations. Semantic alignment to query format adds another 5-6%. These are meaningful gains from basic optimizations.
- Spoil content in meta descriptions. Give the answer upfront. The meta description is now a marketing vehicle for answer engines.
- Freshness is a competitive advantage. 75% of top-cited content is less than six months old. Publishing cadence matters in AEO.
- Write opinionated content with frameworks. Answer engines are lazy—they need frameworks of analysis they can present as their own opinion. Opinion content now beats listicles for citations.
- Reddit and YouTube are search surfaces now. Your social team, PR team, and comms team need to be looped in. What happens on these platforms directly impacts AI visibility.
- Query fanout reveals the real targets. The terms users type aren’t the terms answer engines search for. Fanout data shows you where to actually intersect your content.
Blyskal closed with a call to action: “This is the moment when we can raise our hands to the CMOs and CEOs and say, we know this stuff. We’re ready.” The data is there. The opportunity is there. Time to use it.







