Build a lead generation agent

Superhighway guides

Manual prospecting is the slowest part of B2B sales: searching for companies that fit your ideal customer profile, digging through each website for a contact and a hook, and copying it all into a spreadsheet. This guide builds a Python agent that does the whole loop automatically. It chains /search (find companies matching an ICP), /scrape (pull contact info, team, and product details off each site), and /research (deep company background), then uses an LLM to score fit and emit a CRM-ready record. Output is a CSV of qualified leads you can import into HubSpot or Salesforce.

1. What you'll build

A Python agent that:

Takes an ideal customer profile (ICP): industry, company size, location, keywords
Searches for companies matching that profile using /search
Scrapes each company's website to extract contact info, team, products, and a company description
Uses /research for deeper background on promising leads
Uses an LLM to score and structure each lead into a JSON record
Outputs a CSV/JSON of qualified leads ready for CRM import

2. Setup

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Search for companies matching your ICP

Start by turning your ICP into a search query. /search returns a list of candidate companies — the building block for the whole pipeline.

import requests, os, json

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"

def search_leads(
    industry: str,
    role: str = "founder OR CEO OR CTO",
    location: str = "",
    keywords: str = ""
) -> list[dict]:
    """Find companies in a target industry."""
    query = f"{industry} company {keywords} {location}".strip()

    r = requests.get(
        f"{BASE}/search",
        params={"q": query, "limit": 10},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

4. Scrape company website for contact and team info

/scrape returns each page as clean Markdown — no nav, scripts, or cookie banners. We sweep the common pages where contact details and team bios live, and truncate each so the prompt stays small.

def scrape_company(url: str) -> dict:
    """Scrape the company homepage and /about page."""
    pages = {}

    for path in ["", "/about", "/team", "/contact"]:
        target = url.rstrip("/") + path
        r = requests.get(
            f"{BASE}/scrape",
            params={"url": target},
            headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
        )
        data = r.json()
        if data.get("markdown"):
            pages[path or "home"] = data["markdown"][:2000]

    return pages

5. Deep research on promising leads

For candidates that look like a fit, /research pulls multi-source background — funding, team size, product positioning — that rarely lives on the company's own site. This is the context that makes outreach personal.

def research_company(company_name: str) -> str:
    """Get deep background on a company."""
    r = requests.get(
        f"{BASE}/research",
        params={
            "q": f"{company_name} company funding team product",
            "pages": 4
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return data.get("synthesis", data.get("markdown", ""))[:3000]

6. Score and structure the lead with an LLM

Now the LLM reads the scraped pages plus the research and emits a structured lead record — fit score, contact, key person, pain points, and a suggested next action — all judged against your ICP. A JSON schema keeps the output CRM-ready.

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

LEAD_SCHEMA = {
    "type": "object",
    "properties": {
        "company_name": {"type": "string"},
        "website": {"type": "string"},
        "description": {"type": "string"},
        "industry": {"type": "string"},
        "estimated_size": {"type": "string", "enum": ["1-10", "11-50", "51-200", "201-1000", "1000+", "unknown"]},
        "contact_email": {"type": "string"},
        "key_person": {"type": "string"},
        "key_person_role": {"type": "string"},
        "fit_score": {"type": "integer", "minimum": 1, "maximum": 10},
        "fit_reason": {"type": "string"},
        "pain_points": {"type": "array", "items": {"type": "string"}},
        "next_action": {"type": "string"}
    },
    "required": ["company_name", "website", "description", "fit_score", "fit_reason"]
}

def score_lead(url: str, pages: dict, research: str, icp: dict) -> dict | None:
    """Use an LLM to score and structure a lead based on ICP fit."""
    pages_text = "\n\n".join(
        f"### {page_name}\n{content}"
        for page_name, content in list(pages.items())[:3]
    )

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""You are a B2B sales researcher. Evaluate this company as a potential lead for our ideal customer profile (ICP):

ICP:
- Industry: {icp.get("industry", "any")}
- Target role: {icp.get("role", "any")}
- Keywords: {icp.get("keywords", "any")}

Score fit from 1-10. Extract contact info if present. Identify pain points relevant to our ICP."""
            },
            {
                "role": "user",
                "content": f"Company URL: {url}\n\nWebsite content:\n{pages_text}\n\nResearch:\n{research}"
            }
        ],
        response_format={"type": "json_object"}
    )

    try:
        lead = json.loads(response.choices[0].message.content)
        lead["website"] = url
        return lead
    except (json.JSONDecodeError, KeyError):
        return None

7. The full pipeline

Wire the steps together: search for candidates, scrape and research each one, score it, and keep only leads above your fit threshold.

import csv
from datetime import datetime

def generate_leads(
    icp: dict,
    min_fit_score: int = 6,
    max_leads: int = 20
) -> list[dict]:
    """
    icp: {
        "industry": "SaaS startups",
        "role": "CTO OR VP Engineering",
        "location": "San Francisco",
        "keywords": "developer tools"
    }
    """
    print(f"Searching for leads: {icp}")

    # Step 1: Find candidate companies
    results = search_leads(
        icp["industry"],
        icp.get("role", ""),
        icp.get("location", ""),
        icp.get("keywords", "")
    )
    print(f"Found {len(results)} candidates")

    leads = []
    for result in results[:max_leads]:
        url = result["url"]
        company_name = result.get("title", url)
        print(f"  Processing: {company_name}")

        # Step 2: Scrape company pages
        pages = scrape_company(url)

        # Step 3: Research (only for high-potential leads)
        research = ""
        if pages:
            research = research_company(company_name)

        # Step 4: Score and structure
        lead = score_lead(url, pages, research, icp)
        if lead and lead.get("fit_score", 0) >= min_fit_score:
            leads.append(lead)
            print(f"    Qualified (score: {lead['fit_score']}/10)")
        else:
            print(f"    Skipped (low fit or no data)")

    return leads

def save_leads(leads: list[dict], filename: str = "leads.csv"):
    if not leads:
        print("No qualified leads found.")
        return

    with open(filename, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=leads[0].keys(), extrasaction="ignore")
        writer.writeheader()
        writer.writerows(leads)
    print(f"\nSaved {len(leads)} qualified leads to {filename}")

if __name__ == "__main__":
    ICP = {
        "industry": "SaaS developer tools",
        "role": "CTO OR VP Engineering",
        "location": "San Francisco Bay Area",
        "keywords": "API platform infrastructure"
    }

    leads = generate_leads(ICP, min_fit_score=6, max_leads=15)
    save_leads(leads)

    # Print summary
    for lead in leads:
        print(f"\n{lead.get('company_name', 'Unknown')} — Score: {lead.get('fit_score')}/10")
        print(f"  {lead.get('description', '')[:100]}")
        print(f"  Contact: {lead.get('contact_email', 'not found')}")
        print(f"  Next action: {lead.get('next_action', '')}")

8. Extending the agent

Multi-query sweep — run multiple ICP search queries and deduplicate by domain to widen the funnel.
News trigger — use /news to find companies that recently raised funding or launched a product; those are high-intent triggers for outreach.
LinkedIn enrichment — add a /search step to find the LinkedIn profile of the key person.
Scheduling — run it weekly on GitHub Actions to build a growing lead pipeline.
CRM sync — emit JSON formatted for HubSpot or Salesforce import instead of CSV.

9. Responsible use

This agent is for researching publicly available company information — the same information visible in any web browser. Use it to research companies, not individuals, and always comply with the terms of service of the sites you scrape.

10. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

From here, the competitor analysis guide goes deeper on profiling companies, and the news briefing guide shows how to wire /news triggers into an outreach workflow.