[Semantic Interlinking Without OpenAI: How to Connect Content With a 130MB Local Model]

Analyze with AI

Get AI-powered insights from this Mad Devs tech article:

How we replaced manual link checking with a fully automated system that runs on CPU, requires no API keys, and costs $0 even on a million pages.

The problem: manual site interlinking is so last century

We have a glossary generated by an AI agent; each term gets a "Related Topics" section, a list of related terms at the end of the article. Logically, these terms should be clickable links to actual glossary pages.

But there's a catch: the agent writing the content doesn't know which terms already exist in the database and which don't. It generates related topics "out of thin air," based on web searches and general knowledge.

What happens in practice:

The agent suggests a link to "A/B Testing," but we have "Split Testing" in the database.
Suggests "Machine Learning," but no one has added it to the queue yet.
Misses obvious connections to terms already in the CMS.

Result: Readers see topics or terms on the page, but there are no links, meaning we lose engagement and SEO weight. Plus, it's just inconvenient for readers without interlinking.

Research: 5 approaches from simple to complex

We analyzed different approaches to automatic interlinking:

Strict matching – filtering related topics by exact match with existing terms. Simple, but loses semantics ("A/B Testing" ≠ "Split Testing").
Pre-prompt injection – passing the agent a list of existing terms in the prompt. Works, but limits creativity and context size.
Embedding-based semantic search – using vector representations to find similar terms. Exactly what we need, but where do we get embeddings?
Graph-based linking – building connection graphs and finding clusters. Overkill for our task.
API solutions – OpenAI Embeddings, Cohere, Anthropic. They work but require keys and money.

We settled on option 3, but with a critical condition: everything must work locally, without external APIs, and super fast.

The solution: FastEmbed + 130MB local model

How awesome that FastEmbed exists, a library for generating embeddings that:

Runs completely locally on CPU.
Automatically downloads models from HuggingFace on first launch.
Uses lightweight models (ours weighs 130MB).
Requires no keys, no sign-ups, no GPU.
Easily integrates as a Python application module.

We chose the BAAI/bge-small-en-v1.5 model; it provides a good balance between quality and size. The first download takes about 30 seconds, then the model is cached. The convenient thing is that FastEmbed does all this under the hood: no need to worry about providing these models to it; it downloads and uses them on its own.

How the pipeline works:

Agent generates HTML with Related Topics ↓ Parse HTML, extract topic list ↓ Generate embeddings for each topic (FastEmbed) ↓ Compare with the embeddings of existing terms in CMS ↓ If cosine similarity > 0.90, replace text with link ↓ Save enriched HTML

The code that does the magic

Here's a simplified version of our SemanticRelinkingManager:

``python
from fastembed import TextEmbedding
import numpy as np
from bs4 import BeautifulSoup

class SemanticRelinkingManager:
    def __init__(self, cms_writer, threshold=0.90):
        self.cms = cms_writer
        self.threshold = threshold
        # 130MB model, downloads automatically from HuggingFace
        self.model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
        self.refresh_knowledge_base()
        
    def refresh_knowledge_base(self):
        """Generate embeddings for all existing terms."""
        terms = self.cms.get_all_terms()  # [{slug, title}, ...]
        
        # Title-only strategy - sufficient for glossary
        titles = [t['title'] for t in terms]
        self.embeddings = np.array(list(self.model.embed(titles)))
        self.slugs = [t['slug'] for t in terms]
        self.titles = titles
        
    def linkify_html(self, html_content):
        """Finds Related Topics and turns them into links."""
        soup = BeautifulSoup(html_content, 'html.parser')
        related_p = soup.find('p', class_='related')
        
        if not related_p:
            return html_content
            
        # Extract topics: "Related Topics: A, B, C" → ["A", "B", "C"]
        text = related_p.get_text()
        _, topics_str = text.split("Related Topics:", 1)
        topics = [t.strip() for t in topics_str.split(',')]
        
        # Generate embeddings for agent's topics
        topic_embeddings = list(self.model.embed(topics))
        
        # For each topic, find the best match in the database
        for i, topic in enumerate(topics):
            scores = np.dot(self.embeddings, topic_embeddings[i])
            best_idx = np.argmax(scores)
            best_score = scores[best_idx]
            
            if best_score >= self.threshold:
                # Match found - create link
                slug = self.slugs[best_idx]
                link_html = f'<a href="/glossary/{slug}">{topic}</a>'
                # Replace in DOM
                
        return str(soup)

Key points:

Title-only matching – for a glossary, comparing only term names is sufficient, not the entire content.
Threshold 0.90 – determined experimentally: lower gives too many false positives, higher misses valid connections.
Incremental updates – each new term is immediately added to the knowledge base and available for linking subsequent terms.

Why this works in production

No dependency on external APIs

The model, downloaded once on first launch, lives in ~/.cache/fastembed. The Internet is no longer needed, nor are tokens, rate limits, or billing.

Works on CPU without lag

Generating an embedding for one term takes milliseconds. Even a batch of 100 terms processes in seconds. We run this in Docker on a regular VPS without a GPU.

Automatic fallback

If a similar term isn't found (score < 0.90), text remains as is—just non-clickable. Nothing breaks; content stays readable.

Scales to thousands of terms

Model size is fixed; search time depends linearly on the number of terms. Even with 10,000 terms, it's seconds, not minutes.

Features and limitations

Doesn't work perfectly with:

Abbreviations and acronyms ("API" vs. "Application Programming Interface").
Very short terms (less than 3 characters, not enough context for embedding).
Terms with the same meaning but different spelling ("e-commerce" vs. "ecommerce").

Requires:

~130MB disk space for the model.
~100MB RAM at runtime.
Initial model download from the internet (or cached volume in Docker).

When this approach is ideal

✅ Perfect for:

Glossaries, dictionaries, and knowledge bases with clear terms.
Documentation requiring semantic interlinking.
Any content with "Related Topics" or "See Also" sections.
Systems running 24/7 unattended (not afraid of rate limits).
Possibly a future module for all CMS systems: automatic interlinking.

❌ Not suitable for:

Real-time search (latency ~100-500 ms per query).
Tasks requiring 100% accuracy (there are always edge cases).
Content in languages not covered by the chosen model (our selection works well with English).

Bottom line: autonomous interlinking for $0 per year

We replaced manual link checking with an automated system that:

Requires no API keys.
Runs on CPU.
Costs $0 regardless of content volume.
Doesn't break without internet.
Scales to thousands of pages.

Minimum starter kit:

pip install fastembed numpy beautifulsoup4

That's it, now you have working semantic interlinking inside your application without sign-ups, keys, or paid tokens.