Add live web search to Groq with function calling

Superhighway guides

Groq runs open models like Llama 3.3 on custom hardware that returns completions — and tool calls — in hundreds of milliseconds. That speed compounds in agent loops: when a model makes several sequential web searches before it can answer, each round-trip is fast, so the whole agent feels instant. Groq's API is OpenAI-compatible, so function calling uses client.chat.completions.create() with the familiar tools array. This guide wires Superhighway's live web search into a Groq app through the groq Python SDK.

Why Groq for function calling

Groq's inference returns tool-call decisions in hundreds of milliseconds. Function-calling agents spend most of their wall-clock time waiting on the model to decide which tool to call and to read back results between steps. With fast inference, a multi-step research loop — search, read the results, search again, then answer — completes quickly even when the model makes several tool calls in sequence. That makes Groq a strong fit for agents that chain many tool calls to ground an answer in live data.

1. Install

You need the Groq SDK and requests to call Superhighway (no extra SDK — Superhighway is a plain REST API):

pip install groq requests

2. Basic function calling

Declare a web_search tool in the standard function schema, then send a chat request with tools available and tool_choice="auto" so the model decides whether to call it. The method is client.chat.completions.create() — identical to the OpenAI SDK.

from groq import Groq
import requests, json

client = Groq(api_key="YOUR_GROQ_API_KEY")
SUPERHIGHWAY_KEY = "YOUR_SUPERHIGHWAY_API_KEY"

tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the live web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "What's the latest news about AI?"}],
    tools=tools,
    tool_choice="auto"
)

When the model wants live data, response.choices[0].message.tool_calls is populated. Each call carries tc.function.name and tc.function.arguments — the arguments come back as a JSON string, so you parse them with json.loads.

3. Dispatch the tool to Superhighway

Write a handler that routes each tool call to the matching Superhighway endpoint and returns a compact text summary the model can read back. The same function covers web search, news, and page scraping:

def call_tool(name, args):
    if name == "web_search":
        r = requests.get(
            "https://superhighway.walls.sh/search",
            params={"q": args["query"]},
            headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
        )
        results = r.json()["results"]
        return "\n".join(f"{x['title']}: {x['url']}\n{x['content']}" for x in results[:3])
    elif name == "news_search":
        r = requests.get(
            "https://superhighway.walls.sh/news",
            params={"q": args["query"]},
            headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
        )
        articles = r.json()["articles"]
        return "\n".join(f"{a['title']}: {a['url']}" for a in articles[:5])
    elif name == "scrape_page":
        r = requests.get(
            "https://superhighway.walls.sh/scrape",
            params={"url": args["url"]},
            headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
        )
        return r.json().get("markdown", "")[:2000]

To use news_search and scrape_page, add their schemas to the tools array alongside web_search (a news_search with a query property, and a scrape_page with a url property). The handler above already dispatches all three by name.

4. Agentic while-loop

A single round-trip works for one search, but the model may want several before it can answer. Wrap the exchange in a while loop that runs until the response has no more tool_calls. Groq's speed makes this loop especially fast — each pass through the model returns in hundreds of milliseconds, so a multi-step research run completes quickly:

messages = [{"role": "user", "content": query}]

while True:
    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    message = response.choices[0].message

    if not message.tool_calls:
        print(message.content)
        break

    messages.append(message)

    for tc in message.tool_calls:
        result = call_tool(tc.function.name, json.loads(tc.function.arguments))
        messages.append({
            "role": "tool",
            "tool_call_id": tc.id,
            "content": result
        })

The loop keeps the model in control: it searches as many times as it needs, appending one "role": "tool" message per call — linked by tool_call_id — and only returns the grounded answer when message.tool_calls is empty.

5. Model choice

Several Groq-hosted models support function calling. Pick based on the reasoning depth and context length your agent needs:

ModelContextBest for
llama-3.3-70b-versatile128kGeneral reasoning, multi-step
llama-3.1-8b-instant128kFast, simple lookups
mixtral-8x7b-3276832kMixture-of-experts, diverse tasks
gemma2-9b-it8kEfficient, structured output

For agent loops that chain several searches, llama-3.3-70b-versatile handles the multi-step reasoning well; for single quick lookups, llama-3.1-8b-instant trims latency further.

6. Groq vs OpenAI SDK

If you've used OpenAI's function calling, Groq is a drop-in. The API is OpenAI-compatible, so the patterns match exactly:

DetailGroqOpenAI
Packagegroqopenai
Chat methodclient.chat.completions.create()Same
Tool schema{"type": "function", "function": {...}}Same
Tool-call argumentsJSON string → json.loads()Same
Tool result message"role": "tool" + tool_call_idSame
Model IDsllama-3.3-70b-versatile, etc.gpt-4o, etc.

The only real differences are the package name and the model IDs. Code written for one ports to the other by swapping the client import and the model string.

7. Free tier and pay-per-call

Groq offers a free tier at console.groq.com — grab a Groq API key there. Superhighway's free tier needs no credit card either:

SUPERHIGHWAY_KEY = "YOUR_API_KEY"  # get free at https://superhighway.walls.sh/pricing

Or skip the key entirely with x402: your agent pays $0.002 per call in USDC on Base — no signup, no API key management. That's ideal for autonomous agents that provision their own access. See the x402 pay-per-call guide for the wallet setup.

Get your API key at /pricing (free tier: 1,000 calls/month). For the OpenAI equivalent, see the OpenAI function calling guide, or the Mistral guide.