Linking at Scale: A New Approach to Optimization

“What is a technical SEO conference without a topic about internal linking?” Serge Bezborodov asked. Fair point. But his talk wasn’t the usual internal linking advice—it was about solving the problem that makes internal linking nearly impossible on large sites: scale.

Bezborodov, who’s been obsessing over internal linking for a decade, spent months hitting walls trying to build a solution that actually works for sites with tens of thousands or millions of pages. He came to Tech SEO Connect with the answer—and he’s giving it away as open source.

The Problem with Text Embeddings at Scale

You’ve probably seen the articles: crawl your site with Screaming Frog, use AI text embeddings, write a Python script, and you’ll have perfect internal linking. It sounds simple.

And it works—for blogs with enough content. But Bezborodov’s context is different: big websites with thousands or millions of product pages, where unique content is rare and you’re thinking in terms of listing pages and product detail pages.

The math problem is brutal. To find the best internal linking matches, the standard approach compares every page to every other page. For 10,000 pages, that’s 100 million permutations. For 100,000 pages, it becomes computationally absurd.

But there’s a worse problem: many product pages have almost no content for embeddings to work with. Bezborodov showed an O-ring product page—just a rubber seal with minimal description. One site he worked with had 400 million O-ring SKUs. What can text embeddings possibly extract from that?

Another example: a boat propeller page where the actual product title and description totaled 12 words, but the full page body was 1,200 words of boilerplate. “How can our poor text embeddings, with 1% of needed content, understand and calculate it?” he asked.

The Breakthrough: Category-Based Matching

Bezborodov’s solution is almost embarrassingly simple, which is why he wondered why he didn’t find it earlier: use your existing category structure.

Instead of comparing every page to every other page across the entire site, compare pages within the same category, then move up to the parent category. Your catalog is already divided semantically—use that.

By selecting how many levels up the category tree you go, you can fine-tune whether your internal linking is narrow (tightly related products) or wider (broader category relevance). It’s adjustable based on semantics and the specific site.

The speed improvement is dramatic. What used to take half a day for millions of pages becomes fast because you’re only calculating within bounded sets, not across the entire site.

Implementation Principles

Bezborodov emphasized that internal linking is where most projects die—not in the analysis, but in getting it deployed. His rules for actually getting internal linking implemented:

Don’t touch what already works. Your dev team already doesn’t like you because they think you’ll break things. Build a layer on top of existing internal linking instead of replacing it. Legacy code works; leave it alone.

It’s not a one-time job. Deploy, reassess crawl budget, check indexation and rankings, then fine-tune. “It won’t work from the first touch.” Plan for iteration.

Think in donors and acceptors. Donors are pages with crawl budget and authority. Acceptors are pages you want to improve. You’re passing link equity from good pages to underlinked pages.

Start with acceptors. These should be indexable, exclude pagination and parameters, and be business-value pages. You might have 50,000 underlinked pages, but narrow down to a manageable set through categories and filters.

Use logs for donor selection. The best donors are pages that actually get crawled. If you have log data, use crawl budget as your filter.

Breadcrumbs define matching depth. How deep you go in the category tree determines how semantically tight your linking is. Start around 0.6 cosine similarity; 0.7-0.8 is usually too restrictive.

Generate anchors from the target page. Use H1 or title from the target page, remove templates, and you have your anchor text. For more sophistication, blend in Google Search Console keyword data to add semantic richness—especially useful for those content-thin product pages.

Start Small, Test Everything

Bezborodov’s practical advice: don’t try to roll out internal linking for your entire site at once. Start small enough that you can review the output in a spreadsheet.

“When you have a million rows, you will not pay enough attention,” he said. You need to be able to assess what the algorithm is doing, catch problems, and adjust parameters. Deploy, test, iterate.

The Open Source Tool

Bezborodov released a free tool on GitHub that implements everything he described. It’s called JetOctopus Internal Linker (inspired by JR Oaks’ Python script from last year’s conference), and it’s designed to work with Screaming Frog exports.

The workflow: crawl with Screaming Frog using OpenAI text embeddings (or Google Gemini), extract breadcrumbs, export three files (donors, acceptors, existing internal links), and feed them to the application. It won’t duplicate existing links, handles anchor generation including Search Console keyword blending, and can process up to a million URLs (though he noted that ate over 100GB of RAM).

His suggestion for learning the 20-30 parameters: paste the GitHub link into ChatGPT and ask it to explain. “Nobody will read the manual.”

A Call for Open Source SEO Tools

Bezborodov closed with something that resonated: almost everything we use to build websites is open source—PHP, Python, JavaScript, frameworks. But very little in the SEO tool space is open source.

His ask wasn’t for people to contribute code. It was simpler: use the tool, implement internal linking, share feedback on what worked and what didn’t. His hope is that by next year’s conference, someone else presents an improved version that becomes a standard tool in the SEO stack.

My Takeaways

Internal linking is one of those topics that’s easy to analyze and hard to implement. Bezborodov gave us both a conceptual framework and a free tool to actually do it at scale.

What I’m taking away:

Category structure solves the scale problem. Stop trying to compare every page to every page. Your site taxonomy already provides semantic boundaries—use them to make the problem tractable.
Text embeddings have limits. They work great for content-rich pages. For product pages with 12 words of actual content buried in 1,200 words of boilerplate, you need a different approach.
Build layers, don’t replace. The fastest path to implementation is leaving existing internal linking alone and adding a new layer on top. Dev teams will thank you.
Anchor text can add semantic value. For content-thin pages, using Search Console keyword data in anchor text is a way to inject semantic information that the page itself doesn’t have.
Start small and iterate. Internal linking won’t work on the first try. Keep the initial scope reviewable, deploy, measure, adjust.

The tool is free on GitHub. For those of us working with large e-commerce or catalog sites, it’s worth testing. And if you do, share your results—that’s how open source gets better.

Internal Linking at Scale: Serge Bezborodov’s Open Source Solution at Tech SEO Connect

The Problem with Text Embeddings at Scale

The Breakthrough: Category-Based Matching

Implementation Principles

Start Small, Test Everything

The Open Source Tool

A Call for Open Source SEO Tools

My Takeaways

Author: Eric Richmond

Related Posts

Need More Credits

📦 Credit Packs

🔄 Never Run Out - Auto-Refill

Premium Feature

Choose Your Plan

Use Your Own API Key