Add live web search to a LiteLLM agent

Superhighway guides

LiteLLM is the open-source gateway that puts one OpenAI-compatible interface in front of 100+ models — GPT-4, Claude, Gemini, Llama, Mistral, and the rest. Whether you call litellm.completion() from the Python SDK or run it as a proxy server for a whole team, the function-calling and tool plumbing is identical to OpenAI's. That's the superpower here: wire Superhighway's web search in once and it works against every backend LiteLLM routes to. Two paths below — function calling with a free key, or the MCP client to load all five tools.

Two paths

PathBest forWhat you need
Function calling + RESTOne tool (web search), explicit control, any model backendA free API key from /pricing + a model provider key
MCP clientAll five tools auto-loaded with less code; optional x402 walletnpx + a free key (or a funded Base wallet)

Get a free API key

Create a free-tier key at superhighway.walls.sh — no credit card. Pass it as a bearer token on every request. Set it alongside whichever model provider key LiteLLM needs:

export SUPERHIGHWAY_API_KEY=sk_...        # from /pricing
export ANTHROPIC_API_KEY=sk-ant-...       # or OPENAI_API_KEY, GEMINI_API_KEY, ...

Path 1 — function calling (REST API + free key)

pip install litellm requests

Define the tool once in OpenAI's tool schema. LiteLLM passes it through unchanged to whatever model you name in completion(), so the same definition drives Claude, GPT-4, Gemini, or a local Llama.

import json
import os
import requests
import litellm

web_search_tool = {
    "type": "function",
    "function": {
        "name": "web_search",
        "description": "Search the live web for current information. "
                       "Returns ranked results with title, URL, and description.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query"},
                "limit": {"type": "integer", "description": "Max results (default 5)"},
            },
            "required": ["query"],
        },
    },
}

def web_search(query: str, limit: int = 5) -> str:
    resp = requests.get(
        "https://superhighway.walls.sh/search",
        params={"q": query, "limit": limit},
        headers={"Authorization": f"Bearer {os.environ['SUPERHIGHWAY_API_KEY']}"},
    )
    resp.raise_for_status()
    results = resp.json().get("results", [])
    return "\n\n".join(
        f"{r['title']}\n{r['url']}\n{r.get('description', '')}" for r in results
    )

Now run the standard two-call tool loop: ask the model with tools=[...], run any tool calls it returns, append the results as tool messages, and call again so it can answer. Swap the model string and nothing else changes.

messages = [{"role": "user", "content": "What are the latest AI agent frameworks?"}]

# Works against ANY backend — just change the model string:
#   "anthropic/claude-sonnet-4-5"  ·  "openai/gpt-4o"  ·  "gemini/gemini-2.5-pro"
MODEL = "anthropic/claude-sonnet-4-5"

response = litellm.completion(
    model=MODEL,
    messages=messages,
    tools=[web_search_tool],
    tool_choice="auto",
)

msg = response.choices[0].message
if msg.tool_calls:
    messages.append(msg)  # the assistant turn that requested the tool
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        result = web_search(**args)
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": result,
        })
    # Second call: the model reads the results and writes the answer
    final = litellm.completion(model=MODEL, messages=messages, tools=[web_search_tool])
    print(final.choices[0].message.content)
else:
    print(msg.content)

That's the whole pattern. Because LiteLLM normalizes every provider's tool-calling format to OpenAI's, the exact same code runs against Anthropic, OpenAI, Google, or an Ollama model — only the MODEL string moves. The model and provider keys are read from the environment automatically.

Path 2 — MCP client (all five tools, less code)

Recent LiteLLM ships an MCP client that connects to a stdio MCP server and loads all of its tools for you — no per-tool schema to write. Point it at npx -y superhighway-mcp and you get web_search, news_search, image_search, scrape_page, and research in one shot. The model picks the right tool per turn.

pip install litellm mcp
import json
import os
import litellm
from litellm.experimental_mcp_client import load_mcp_tools
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

server_params = StdioServerParameters(
    command="npx",
    args=["-y", "superhighway-mcp"],
    env={"SUPERHIGHWAY_API_KEY": os.environ["SUPERHIGHWAY_API_KEY"]},
)

async def run(prompt: str):
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            # Loads all five tools as OpenAI-format tool schemas
            tools = await load_mcp_tools(session=session, format="openai")

            messages = [{"role": "user", "content": prompt}]
            response = litellm.completion(
                model="anthropic/claude-sonnet-4-5",
                messages=messages,
                tools=tools,
                tool_choice="auto",
            )
            msg = response.choices[0].message
            messages.append(msg)

            for call in (msg.tool_calls or []):
                # Execute the tool back through the MCP session
                result = await session.call_tool(
                    call.function.name,
                    json.loads(call.function.arguments),
                )
                messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": result.content[0].text,
                })

            final = litellm.completion(
                model="anthropic/claude-sonnet-4-5",
                messages=messages,
                tools=tools,
            )
            print(final.choices[0].message.content)

import asyncio
asyncio.run(run("What are the latest developments in AI agent frameworks?"))

One load_mcp_tools call replaces five hand-written schemas, and the same swap-the-model-string portability applies — the MCP tools are normalized to OpenAI format, so any backend LiteLLM routes to can call them.

x402 pay-per-call. To let an autonomous agent pay per call with no API key, swap the server env for a funded Base wallet:

    env={"AGENT_PRIVATE_KEY": os.environ["AGENT_PRIVATE_KEY"], "X402_NETWORK": "base"},

Each call then settles automatically in USDC on Base — $0.001 a search — via the x402 protocol. No signup, no key, no human in the loop.

Serve it to every app via the LiteLLM proxy

Run LiteLLM as a proxy server and you can register Superhighway's MCP server centrally, so every downstream app that talks to the proxy inherits the tools — no per-app wiring. Add it to config.yaml:

mcp_servers:
  superhighway:
    command: npx
    args: ["-y", "superhighway-mcp"]
    env:
      SUPERHIGHWAY_API_KEY: os.environ/SUPERHIGHWAY_API_KEY

Start the proxy with litellm --config config.yaml. Clients hitting the proxy's OpenAI-compatible endpoint can now discover and call the Superhighway tools, with all the usual LiteLLM benefits layered on — cost tracking, rate limits, load balancing, and logging across every model and every team.

Which tools are available

Tool / endpointWhat it returnsPrice
web_search / /searchRanked web results — title, URL, description$0.001
news_search / /newsRecent news with published dates$0.001
image_search / /imagesImage URLs, thumbnails, source pages$0.001
scrape_page / /scrapeAny URL → clean markdown text$0.002
research / /researchSearch + read top pages in one call$0.005

Full API reference: /openapi.json. Get a free API key or learn the x402 wallet flow.