Build a hybrid RAG router: live web search + vector store

Superhighway guides

Most production RAG systems need two retrieval sources: a vector store for your own documents (PDFs, docs, code) and live web search for current events, recent releases, and anything outside your corpus. A vector store answers "what does our refund policy say?" but has no idea what shipped in Python 3.13 last week. This guide builds a router that sends each query to the right source.

The pattern is a thin classifier in front of two retrievers:

query → [router] → Superhighway /search  → LLM
                 → vector store         →

1. The routing decision

The router's only job is deciding where a query goes. The rule of thumb: if answering correctly depends on information that changes over time, route to live web search; if it depends on your proprietary corpus, route to the vector store.

Query typeRoute toExample
Current events, news, recent releasesSuperhighway /search"What's new in Python 3.13?"
Your own documents, policies, codeVector store"What does our refund policy say?"
Deep synthesis from multiple web pagesSuperhighway /research"Summarize the state of AI agent frameworks"
Questions that could be eitherBoth, then merge"How does the new OpenAI API compare to ours?"

2. Simple keyword router

The cheapest classifier is a list of words that signal recency. No LLM call, no latency — a good default that catches most "live web" queries:

import os, requests

SUPERHIGHWAY_KEY = os.environ["SUPERHIGHWAY_API_KEY"]

# Keywords that signal a "live web" query
WEB_SIGNALS = [
    "latest", "recent", "new", "current", "today", "2024", "2025", "2026",
    "news", "update", "release", "announced", "just", "now"
]

def route_query(query: str) -> str:
    q = query.lower()
    if any(signal in q for signal in WEB_SIGNALS):
        return "web"
    return "vector"

def web_search(query: str, count: int = 5) -> list[dict]:
    r = requests.get(
        "https://superhighway.walls.sh/search",
        params={"q": query, "count": count},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"},
        timeout=15,
    )
    r.raise_for_status()
    return r.json()["results"]

def retrieve(query: str, vector_store) -> list[str]:
    route = route_query(query)
    if route == "web":
        results = web_search(query)
        return [f"{r['title']}: {r['description']}" for r in results]
    else:
        # Your vector store retrieval here
        return vector_store.similarity_search(query, k=5)

3. LLM-based router (more accurate)

Keyword matching misses queries that need fresh data without an obvious trigger word ("Who is the current CEO of OpenAI?"). For those, ask a small, fast model to classify the query:

import anthropic

client = anthropic.Anthropic()

def llm_route(query: str) -> str:
    """Ask the LLM whether this query needs live web data."""
    response = client.messages.create(
        model="claude-haiku-4-5",  # fast, cheap classifier
        max_tokens=10,
        messages=[{
            "role": "user",
            "content": f"""Does this question require live, up-to-date web information (current events, recent releases, news)?
Answer only 'web' or 'static'.

Question: {query}"""
        }]
    )
    return response.content[0].text.strip().lower()

claude-haiku-4-5 is fast and cheap enough to run on every query; a GPT-4o-mini or similar small model works too. Cache the result keyed on the query if the same questions recur, so you only pay for the classification once.

4. Hybrid retrieval (both sources, then merge)

For queries where the route is genuinely ambiguous, don't guess — pull from both sources and let the LLM weigh what's relevant against the question:

def hybrid_retrieve(query: str, vector_store) -> str:
    """Retrieve from both sources and return combined context."""
    # Run both in parallel (or sequentially for simplicity)
    web_results = web_search(query, count=3)
    vector_results = vector_store.similarity_search(query, k=3)

    web_context = "\n".join([
        f"[Web] {r['title']}: {r['description']}" for r in web_results
    ])
    vector_context = "\n".join([doc.page_content for doc in vector_results])

    return f"## Live Web Results\n{web_context}\n\n## Your Documents\n{vector_context}"

Pass hybrid_retrieve(query, store) straight to your LLM as context. The model naturally weights recent web results against your documents — and labeling each block ([Web] vs ## Your Documents) helps it cite the right source.

5. Full pipeline with LangChain

Here's the router wired into a LangChain LCEL chain. route_and_retrieve is the only custom piece — everything else is a standard retrieval-augmented prompt:

from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

def route_and_retrieve(input: dict) -> str:
    query = input["question"]
    route = route_query(query)  # or llm_route(query)

    if route == "web":
        results = web_search(query)
        return "\n".join([f"{r['title']}: {r['description']}" for r in results])
    else:
        docs = vector_store.similarity_search(query, k=5)
        return "\n".join([doc.page_content for doc in docs])

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""Answer the question using the context below.

Context:
{context}

Question: {question}""")

chain = (
    {"context": RunnableLambda(route_and_retrieve), "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = chain.invoke({"question": "What are the latest updates to LangChain?"})
print(answer)

6. Which approach to use

ApproachBest whenTrade-off
Keyword routerSimple, fast, no LLM costCan miss edge cases
LLM classifierAccurate, handles nuanceAdds one LLM call per query
Hybrid (both)Unknown query type, high stakesHigher cost, more context
Web-only (no vector store)No proprietary docsSee the RAG guide

Start with the keyword router, add the LLM classifier when you see misroutes, and reach for hybrid retrieval only on the queries that matter most.

Get your API key at /pricing (free tier: 1,000 calls/month). For web-only RAG without a vector store, see the web-search RAG guide.